feat(eda): scatters de pares más correlacionados + tipo de relación en capítulo CORRELACION

Añade al capítulo `correlacion` del AutomaticEDA la visualización con scatters de los pares numérico-numérico más correlacionados (positiva y negativamente) y, para cada uno, la clasificación del tipo de relación: lineal, polinómica (grado 2/3), monótona no-lineal o débil/sin forma. Funciones nuevas del registry (dominio datascience, grupo eda): - classify_relationship_type_py_datascience (pura): dadas dos listas numéricas pareadas, cruza Pearson r (lineal), Spearman ρ (monótona) y ajustes polinómicos de grado 2 y 3 (numpy.polyfit + R² manual) para etiquetar la forma. Reusa pearson y spearman_corr del registry. Umbrales calibrados para datos reales discretos/ruidosos (orden: débil → monótona → polinómica → lineal). Devuelve los coeficientes del mejor modelo para pintar la curva. No-throw. - relationship_scatter_figure_py_datascience (impure): construye la Figure matplotlib del scatter de un par con su recta/curva de ajuste y una anotación del tipo + métricas (r, ρ, R²lin, R²poly). Backend Agg sin pyplot global, downsample determinista de los puntos dibujados, tendencia ordenada (binned / por valor) para el caso monótona sin polinomio. Defensiva ante vacío. Capítulo correlacion.py (1.0.0 → 1.1.0): nueva sección "Relaciones más fuertes (scatter)" tras la matriz + tablas top. Toma los top-K pares num↔num por |valor| de profile['correlations']['pairs'], obtiene los datos crudos de cada par desde ctx['raw_numeric'] y emite, por par, un Figure dentro de un Group keep-together junto a una nota de texto con el tipo de relación (extraíble por pdftotext). Solo num↔num: los pares cat↔cat (Cramér's V) y num↔cat (razón de correlación) no llevan scatter. Cuando no hay raw_numeric (perfil lite/agregado o ctx None) los scatters se omiten sin lanzar; la matriz + tablas siguen. Verificado: golden EDA de titanic (run_models) — el capítulo Correlación del PDF y PPTX incluye los scatters (pclass↔fare → monótona no-lineal, sibsp↔parch → lineal, …) con su ajuste y etiqueta de tipo en texto. Tests de clasificación sintética (lineal, y=x² → polinómica, y=exp(x) → monótona, ruido → débil) + tests del capítulo (golden con raw_numeric, edge sin raw, par sin columna). Suite automatic_eda + pipeline render_automatic_eda verde (141 passed). fn index sin error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 20:37:01 +02:00
29 changed files with 1225 additions and 2625 deletions
@@ -31,7 +31,7 @@ import math

 from .. import model

-CHAPTER_VERSION = "1.0.0"
+CHAPTER_VERSION = "1.1.0"
 CHAPTER_ID = "correlacion"
 CHAPTER_TITLE = "Correlación"

@@ -47,6 +47,13 @@ _MAX_MATRIX_LABELS = 16
 # How many pairs to show in each of the top-positive / top-negative tables.
 _TOP_N = 10

+# How many of the strongest numeric-numeric pairs to draw as scatter plots on
+# each sign (positive / negative). A scatter per pair carries a fitted line/curve
+# and a relationship-type label; keeping the count small keeps the chapter
+# readable on a phone / a slide. Only signed (Pearson/Spearman) pairs qualify —
+# Cramér's V / correlation ratio pairs are not numeric-numeric, so no scatter.
+_SCATTER_TOP_N = 3
+
 # Glossary terms this chapter explains. Each is registered in the shared
 # collector (ctx['glossary']) and marked clickable on its first appearance in the
 # body — the canonical two-step pattern (see ``cat_distr`` for the reference
@@ -314,6 +321,139 @@ def _fdr_text(corr: dict, mark_term: bool = False) -> str | None:
    return " ".join(parts)


+def _is_seq(values) -> bool:
+    """True for a non-empty list/tuple of values (a raw numeric column)."""
+    return isinstance(values, (list, tuple)) and len(values) > 0
+
+
+def _select_scatter_pairs(pairs: list, top_n: int = _SCATTER_TOP_N):
+    """Pick the strongest numeric-numeric pairs to draw as scatters.
+
+    Only signed (Pearson/Spearman) pairs are numeric-numeric and thus eligible
+    for a scatter with a fitted curve. Returns up to ``top_n`` of the strongest
+    positive pairs followed by up to ``top_n`` of the strongest negative ones,
+    each ranked by magnitude. Mixed-type metrics (Cramér's V, correlation ratio,
+    mutual information) are excluded — they have no x/y scatter interpretation.
+    """
+    positive = []
+    negative = []
+    for pair in pairs:
+        if not isinstance(pair, dict) or not _is_signed(pair):
+            continue
+        value = pair.get("value")
+        if not _is_num(value):
+            continue
+        if value > 0:
+            positive.append(pair)
+        elif value < 0:
+            negative.append(pair)
+    positive.sort(key=lambda p: abs(float(p.get("value", 0.0))), reverse=True)
+    negative.sort(key=lambda p: abs(float(p.get("value", 0.0))), reverse=True)
+    return positive[:top_n] + negative[:top_n]
+
+
+def _classification_note(a: str, b: str, cls: dict) -> str:
+    """Human-readable sentence describing the relationship of a pair.
+
+    Plain text (not baked into the figure image) so the type label is selectable
+    in the PDF / extractable by pdftotext, and sits right next to its scatter
+    inside the keep-together Group.
+    """
+    tipo = model._safe_str(cls.get("tipo")) or "sin forma clara"
+    bits = []
+    pearson = cls.get("pearson")
+    spearman = cls.get("spearman")
+    r2_lin = cls.get("r2_linear")
+    r2_poly = None
+    for key in ("r2_poly2", "r2_poly3"):
+        v = cls.get(key)
+        if _is_num(v) and (r2_poly is None or float(v) > r2_poly):
+            r2_poly = float(v)
+    if _is_num(pearson):
+        bits.append(f"Pearson r={float(pearson):+.2f}")
+    if _is_num(spearman):
+        bits.append(f"Spearman ρ={float(spearman):+.2f}")
+    if _is_num(r2_lin):
+        bits.append(f"R² lineal={float(r2_lin):.2f}")
+    if r2_poly is not None:
+        bits.append(f"R² polinómico={r2_poly:.2f}")
+    metrics = "; ".join(bits)
+    text = (f"Relación **{tipo}** entre «{a}» y «{b}»."
+            + (f" {metrics}." if metrics else ""))
+    return text
+
+
+def _scatter_blocks(pairs: list, raw_numeric):
+    """Build keep-together scatter Groups for the strongest num-num pairs.
+
+    Returns a list of blocks (a Heading plus one Group per pair), or an empty
+    list when there is no raw numeric data (e.g. the lite profile drops
+    ``ctx['raw_numeric']`` to skip live recomputation) or the relationship
+    helpers are unavailable. Never raises: any failure degrades to no scatters,
+    leaving the matrix + tables intact.
+    """
+    if not isinstance(raw_numeric, dict) or not raw_numeric:
+        return []
+    selected = _select_scatter_pairs(pairs)
+    if not selected:
+        return []
+
+    # The relationship helpers live in the datascience package. Import lazily so
+    # the chapter still builds (matrix + tables) when they are absent.
+    try:
+        from datascience.classify_relationship_type import (
+            classify_relationship_type,
+        )
+        from datascience.relationship_scatter_figure import (
+            relationship_scatter_figure,
+        )
+    except Exception:  # noqa: BLE001 — degrade, never break the chapter.
+        return []
+
+    groups = []
+    for pair in selected:
+        a = pair.get("a")
+        b = pair.get("b")
+        xs = raw_numeric.get(a)
+        ys = raw_numeric.get(b)
+        # Edge: a selected pair has no raw column (aggregated profile, renamed
+        # column, …) — skip just that pair, keep the rest.
+        if not _is_seq(xs) or not _is_seq(ys):
+            continue
+        try:
+            cls = classify_relationship_type(list(xs), list(ys)) or {}
+        except Exception:  # noqa: BLE001
+            continue
+        a_lbl = model._safe_str(a)
+        b_lbl = model._safe_str(b)
+
+        def _make(xs=xs, ys=ys, a_lbl=a_lbl, b_lbl=b_lbl, cls=cls):
+            return relationship_scatter_figure(
+                list(xs), list(ys), x_label=a_lbl, y_label=b_lbl,
+                classification=cls)
+
+        groups.append(model.Group(blocks=[
+            model.Heading(text=f"{a_lbl} ↔ {b_lbl}", level=2),
+            model.Figure(
+                make=_make,
+                caption=(f"Dispersión de «{a_lbl}» frente a «{b_lbl}» con la "
+                         "curva de ajuste del mejor modelo.")),
+            model.Markdown(text=_classification_note(a_lbl, b_lbl, cls)),
+        ]))
+
+    if not groups:
+        return []
+    intro = model.Markdown(text=(
+        "Para los pares numéricos más fuertes (positivos y negativos) se dibuja "
+        "la nube de puntos con su ajuste y se clasifica el **tipo de relación**: "
+        "**lineal** (una recta basta), **polinómica** (curva de grado 2/3 que "
+        "mejora claramente el ajuste lineal), **monótona no-lineal** (crece o "
+        "decrece siempre pero no en línea recta; Spearman ≫ Pearson) o "
+        "**débil/sin forma**."))
+    return [model.Heading(text="Relaciones más fuertes (scatter)", level=2),
+            intro] + groups
+
+
 def build_correlacion(profile: dict, ctx: dict):
    """Build the Correlation Chapter, or None if there are no pairs to show.

@@ -392,6 +532,18 @@ def build_correlacion(profile: dict, ctx: dict):
            "No se han hallado correlaciones negativas significativas entre "
            "columnas numéricas.")))

+    # 2.5) Scatter plots of the strongest numeric-numeric pairs, each with its
+    # fitted curve and a relationship-type label (lineal / polinómica / monótona
+    # / débil). Needs the raw numeric sample (ctx['raw_numeric'], row-aligned);
+    # when it is absent (aggregated/lite profile) the scatters are simply omitted
+    # and the matrix + tables above stand on their own.
+    raw_numeric = None
+    if isinstance(ctx, dict):
+        raw_numeric = ctx.get("raw_numeric") or profile.get("raw_numeric")
+    else:
+        raw_numeric = profile.get("raw_numeric")
+    blocks.extend(_scatter_blocks(pairs, raw_numeric))
+
    # 3) Spuriousness caveat for level-based correlations (Granger–Newbold).
    caveat = corr.get("levels_caveat")
    if isinstance(caveat, str) and caveat.strip():
@@ -175,6 +175,105 @@ def test_anticorte_matriz_ancha_y_etiquetas_largas_no_se_cortan():
        assert "azufre" in _pdf_text(pdf)


+def _raw_numeric_for_profile(n: int = 80) -> dict:
+    """Row-aligned raw numeric sample matching the signed pairs of _profile().
+
+    Builds columns with a clear, deterministic shape so the relationship-type
+    classifier has something unambiguous to label:
+      - density vs alcohol: strong negative linear (the top-negative pair).
+      - alcohol vs quality: positive linear.
+      - ph, fixed_acidity, sulphates: filler columns for the remaining pairs.
+    """
+    import math as _m
+
+    alcohol = [8.0 + 0.05 * i for i in range(n)]
+    density = [1.0 - 0.002 * a for a in alcohol]           # neg linear vs alcohol
+    quality = [3.0 + 0.4 * a + (0.1 if i % 2 else -0.1)    # pos linear vs alcohol
+               for i, a in enumerate(alcohol)]
+    ph = [3.0 + 0.3 * _m.sin(i / 5.0) for i in range(n)]
+    fixed_acidity = [7.0 - 0.5 * p for p in ph]            # neg linear vs ph
+    sulphates = [0.5 + 0.01 * (i % 7) for i in range(n)]
+    return {
+        "alcohol": alcohol, "density": density, "quality": quality,
+        "ph": ph, "fixed_acidity": fixed_acidity, "sulphates": sulphates,
+    }
+
+
+def test_golden_scatters_de_pares_num_num_con_tipo_de_relacion():
+    """Con ctx['raw_numeric'], el capítulo añade scatters (Figure dentro de Group)
+    de los pares num-num más fuertes, cada uno con su etiqueta de tipo en texto."""
+    from datascience.automatic_eda.model import Group
+
+    ctx = {"raw_numeric": _raw_numeric_for_profile()}
+    ch = build_correlacion(_profile(), ctx)
+    assert ch is not None
+    groups = [b for b in ch.blocks if isinstance(b, Group)]
+    assert groups, "debe emitir al menos un Group con scatter"
+    # Cada Group lleva su figura (lazy) y una nota de texto con el tipo.
+    for g in groups:
+        gkinds = [b.kind for b in g.blocks]
+        assert "figure" in gkinds and "markdown" in gkinds
+    # La sección y la etiqueta de tipo aparecen como texto plano (extraíble).
+    headings = " ".join(b.text for b in ch.blocks if b.kind == "heading")
+    assert "Relaciones más fuertes" in headings
+    body = " ".join(b.text for g in groups for b in g.blocks
+                    if b.kind == "markdown")
+    assert any(t in body for t in
+               ("lineal", "polinómica", "monótona", "sin forma"))
+    # El par num-num más fuerte (density ↔ alcohol) tiene scatter; el par cat-cat
+    # (region ↔ type) NO — no es numérico.
+    assert "density" in body or "alcohol" in body
+    assert "region" not in body and "type" not in body
+
+
+def test_golden_pdf_muestra_scatters_con_etiqueta_de_tipo():
+    """En el PDF, el capítulo Correlación incluye los scatters y su etiqueta de
+    tipo en texto seleccionable (pdftotext la encuentra)."""
+    prof = _profile()
+    ctx = {"raw_numeric": _raw_numeric_for_profile()}
+    with tempfile.TemporaryDirectory() as d:
+        pdf = os.path.join(d, "corr_scatter.pdf")
+        rp = render_automatic_eda_pdf(prof, pdf, {"title": "EDA — wine",
+                                                  "ctx": ctx})
+        assert rp["path"] == pdf and rp["n_pages"] >= 1
+        txt = _pdf_text(pdf)
+        assert "Relaciones" in txt and "scatter" in txt.lower()
+        # Alguna etiqueta de tipo de relación, en texto.
+        assert any(t in txt for t in
+                   ("lineal", "polin", "monóton", "monoton", "sin forma"))
+
+
+def test_edge_sin_raw_numeric_omite_scatters_sin_lanzar():
+    """profile lite / ctx None: sin raw_numeric el capítulo omite los scatters
+    pero sigue emitiendo matriz + tablas (no lanza)."""
+    from datascience.automatic_eda.model import Group
+
+    for ctx in (None, {}, {"raw_numeric": None}, {"raw_numeric": {}}):
+        ch = build_correlacion(_profile(), ctx)
+        assert ch is not None
+        assert not [b for b in ch.blocks if isinstance(b, Group)]
+        # La matriz y al menos una tabla top siguen presentes.
+        assert any(b.kind == "figure" for b in ch.blocks)
+        assert any(b.kind == "data_table" for b in ch.blocks)
+
+
+def test_edge_par_sin_columna_cruda_se_omite_sin_lanzar():
+    """Si un par seleccionado no tiene su columna en raw_numeric, se omite ese
+    par (no lanza); los demás scatters se construyen igual."""
+    from datascience.automatic_eda.model import Group
+
+    raw = _raw_numeric_for_profile()
+    raw.pop("density", None)   # rompe el par density ↔ alcohol
+    ch = build_correlacion(_profile(), {"raw_numeric": raw})
+    assert ch is not None
+    groups = [b for b in ch.blocks if isinstance(b, Group)]
+    body = " ".join(b.text for g in groups for b in g.blocks
+                    if b.kind == "markdown")
+    # density desaparece de los scatters; otros pares (p.ej. ph↔fixed_acidity,
+    # alcohol↔quality) pueden seguir presentes sin error.
+    assert "density" not in body
+
+
 def test_glosario_engancha_metodos_y_fdr():
    """Mejora 4b: los métodos de correlación (Pearson, Spearman, Cramér's V,
    razón de correlación) y la corrección por comparaciones múltiples (FDR) se
@@ -1,594 +0,0 @@
-"""Missingness chapter (MISSINGNESS) — patterns of missing data.
-
-Complements the CALIDAD chapter: where CALIDAD reports *how much* is missing per
-column (the null percentage that lowers the completeness score), this chapter
-reports the **pattern** of the missing data — whether columns tend to be missing
-*together* (co-occurrence of absences) or independently. That distinction is what
-separates data that is missing completely at random ([[term:mcar]]MCAR[[/term]])
-from data missing as a function of another variable ([[term:mar]]MAR[[/term]]),
-which is the key question to settle before imputing or modelling.
-
-The chapter activates only when the table actually has missing data (at least one
-column with a null in the aggregated profile); otherwise it returns ``None`` and
-disappears from the document.
-
-Sections, in order:
-
-1. **Resumen global** — % of missing cells in the dataset, number of columns with
-   nulls, and complete rows (no missing) vs incomplete rows (≥1 missing).
-2. **Ranking por columna** — columns sorted by their null percentage, with a
-   horizontal bar figure.
-3. **Co-ocurrencia de ausencias** — the correlation of the binary is-null masks
-   between columns (which columns tend to be missing together): a heatmap plus a
-   table of the top column pairs that co-miss.
-4. **Patrones de fila** — the most frequent "which columns are missing together"
-   row patterns, in the style of missingno's pattern matrix.
-5. **Lectura MCAR/MAR** — an interpretive, *exploratory* note (not a confirmatory
-   test such as Little's) reading the absence correlations as a hint of MCAR
-   (independent absences) vs MAR (co-occurring absences).
-
-The aggregate per-column null counts come from the ``eda`` group ``TableProfile``
-(``columns[i]['null_count'] / 'null_pct'`` and the table-level ``null_cell_pct``).
-The per-row is-null mask needed for co-occurrence is built from raw data: a single
-DuckDB push-down over ``ctx['db_path'] / ctx['table']`` (same pattern as the
-AGREGACION chapter) covering ALL columns, with a fallback to the numeric-only
-``ctx['raw_numeric']`` when no database is reachable. All the heavy lifting is
-delegated to pure registry functions (``missingness_overview``,
-``missingness_correlation``, ``missingness_row_patterns``) and two figure helpers
-(``missingness_rank_bar_figure``, ``missingness_corr_heatmap_figure``); every one
-is imported lazily and degrades to an honest note so this chapter never raises.
-
-Contract: build_<id>(profile, ctx) -> Chapter | None ; CHAPTER_VERSION = "x.y.z".
-"""
-
-from __future__ import annotations
-
-from .. import model
-
-CHAPTER_VERSION = "1.0.0"
-CHAPTER_ID = "missingness"
-CHAPTER_TITLE = "Datos faltantes"
-
-# Sample cap for the per-row is-null mask push-down. Co-occurrence and row
-# patterns are computed on this sample; the global % of missing cells and the
-# per-column ranking come from the (exact) aggregated profile instead.
-MASK_SAMPLE = 5000
-# Thresholds for the MCAR/MAR heuristic note. A pair counts as a *strong*
-# co-occurrence when the absence correlation alone is high; as a *partial*
-# co-occurrence when the absences overlap materially (high Jaccard) even if the
-# Pearson correlation is modest — the usual case when one column is missing far
-# more often than the other (e.g. Cabin 77% vs Age 20% in Titanic), which dilutes
-# the correlation while the rows still co-miss in absolute terms.
-_CORR_STRONG = 0.30
-_JACCARD_NOTABLE = 0.20
-# Rows shown in the top-pairs and row-patterns tables (bounded, never silently
-# truncated: the table note reports the full count).
-_TOP_PAIRS = 12
-_TOP_PATTERNS = 12
-# Truncate long column names in tables (the renderer also wraps).
-_LABEL_MAX = 28
-
-# Glossary terms this chapter explains (contract §11.1). Registered in the shared
-# collector and marked clickable on their first appearance.
-_TERMS = {
-    "missingness": (
-        "Patrón de datos faltantes (missingness)",
-        "El patrón con el que faltan los datos: cuánto falta, en qué columnas y "
-        "si las ausencias de unas columnas coinciden (co-ocurren) con las de "
-        "otras. Analizarlo —no solo contar nulos— distingue datos que faltan al "
-        "azar (MCAR) de los que faltan en función de otra variable (MAR), lo que "
-        "decide cómo imputar o si descartar filas sin sesgar el análisis.",
-    ),
-    "mcar": (
-        "MCAR (Missing Completely At Random)",
-        "Los valores faltan de forma independiente de cualquier dato, observado o "
-        "no: las ausencias de unas columnas no se relacionan entre sí ni con los "
-        "valores. Es el caso más benigno —descartar filas o imputar la media no "
-        "introduce sesgo—, pero rara vez se cumple del todo en datos reales.",
-    ),
-    "mar": (
-        "MAR (Missing At Random)",
-        "La probabilidad de que un valor falte depende de OTRAS variables "
-        "observadas (p. ej. una medición que falta más en cierto grupo). Las "
-        "ausencias co-ocurren entre columnas o se relacionan con los valores de "
-        "otras; imputar exige condicionar en esas variables para no sesgar. La "
-        "co-ocurrencia fuerte de ausencias es un indicio (exploratorio) de MAR.",
-    ),
-}
-
-
-# --------------------------------------------------------------------------- #
-# Small defensive formatters (own copy: the chapter never imports siblings).
-# --------------------------------------------------------------------------- #
-def _fmt_int(value) -> str:
-    if value is None:
-        return "—"
-    try:
-        return f"{int(round(float(value))):,}".replace(",", ".")
-    except (TypeError, ValueError):
-        return model._safe_str(value)
-
-
-def _fmt_pct(value, decimals: int = 1) -> str:
-    """Format an already-0-100 value as a percentage. None -> placeholder."""
-    if value is None:
-        return "—"
-    try:
-        return f"{float(value):.{decimals}f}%"
-    except (TypeError, ValueError):
-        return model._safe_str(value)
-
-
-def _fmt_num(value, decimals: int = 3) -> str:
-    if value is None:
-        return "—"
-    try:
-        f = float(value)
-    except (TypeError, ValueError):
-        return model._safe_str(value)
-    if f != f:  # NaN
-        return "—"
-    text = f"{f:.{decimals}f}".rstrip("0").rstrip(".")
-    return text if text else "0"
-
-
-def _truncate(text, limit: int = _LABEL_MAX) -> str:
-    s = model._safe_str(text)
-    if len(s) <= limit:
-        return s
-    return s[: max(1, limit - 1)].rstrip() + "…"
-
-
-def _term(key: str, label: str, mark: bool) -> str:
-    if mark:
-        return f"[[term:{key}]]**{label}**[[/term]]"
-    return f"**{label}**"
-
-
-# --------------------------------------------------------------------------- #
-# Profile reads (exact, all rows).
-# --------------------------------------------------------------------------- #
-def _null_count_of(col: dict):
-    """Best-effort null count of a column: ``null_count`` or null_pct*n_rows."""
-    nc = col.get("null_count")
-    if isinstance(nc, (int, float)) and not isinstance(nc, bool):
-        return int(nc)
-    np_ = col.get("null_pct")
-    nr = col.get("n_rows")
-    if isinstance(np_, (int, float)) and isinstance(nr, (int, float)):
-        return int(round(float(np_) * float(nr)))
-    return 0
-
-
-def _columns_with_nulls(profile: dict):
-    """Return ``[(name, null_count, null_pct_0_100)]`` for columns with nulls,
-    sorted by null percentage descending. Reads the aggregated profile (exact)."""
-    cols = profile.get("columns") or []
-    out = []
-    for c in cols:
-        if not isinstance(c, dict):
-            continue
-        nc = _null_count_of(c)
-        if nc <= 0:
-            continue
-        np_ = c.get("null_pct")
-        nr = c.get("n_rows") or profile.get("n_rows")
-        if isinstance(np_, (int, float)) and not isinstance(np_, bool):
-            pct = float(np_) * 100.0 if np_ <= 1.0 else float(np_)
-        elif nr:
-            pct = nc / float(nr) * 100.0
-        else:
-            pct = None
-        out.append((c.get("name") or "(col)", nc, pct))
-    out.sort(key=lambda t: (t[2] if t[2] is not None else -1.0), reverse=True)
-    return out
-
-
-def _global_missing_pct(profile: dict):
-    """Table-level % of missing cells (0-100), exact, from the profile."""
-    v = profile.get("null_cell_pct")
-    if isinstance(v, (int, float)) and not isinstance(v, bool):
-        return float(v) * 100.0 if v <= 1.0 else float(v)
-    return None
-
-
-# --------------------------------------------------------------------------- #
-# Per-row is-null mask (sample): DuckDB push-down, fallback to raw_numeric.
-# --------------------------------------------------------------------------- #
-def _build_query_fn(ctx: dict):
-    """Return ``(query_fn, table)`` for a DuckDB-backed ctx, or ``(None, None)``.
-
-    Mirrors build_eda_render_ctx: a read-only closure over the registry wrapper.
-    Only DuckDB is supported here; any other backend degrades to raw_numeric."""
-    db_path = ctx.get("db_path")
-    table = ctx.get("table")
-    if not db_path or not table:
-        return None, None
-    try:
-        from infra import duckdb_query_readonly
-    except Exception:  # noqa: BLE001 — wrapper unavailable -> degrade.
-        return None, None
-
-    def query_fn(sql):
-        return duckdb_query_readonly(db_path, sql)
-
-    return query_fn, table
-
-
-def _null_mask(profile: dict, ctx: dict):
-    """Build the per-row is-null mask ``{col: [0/1, ...]}``.
-
-    Tries a single DuckDB push-down over ALL columns first (so categorical
-    columns like Cabin are covered, not only numeric ones); falls back to the
-    numeric-only ``ctx['raw_numeric']`` (None -> missing); returns ``(None, 0,
-    None)`` when neither is reachable. Never raises.
-    Returns ``(mask, n_sampled, source)`` with source in {"db","raw_numeric"}.
-    """
-    cols = profile.get("columns") or []
-    names = [c.get("name") for c in cols
-             if isinstance(c, dict) and c.get("name")]
-    # 1) DuckDB push-down over every column (covers categoricals too).
-    query_fn, table = _build_query_fn(ctx)
-    if query_fn is not None and names:
-        try:
-            from datascience.extract_null_mask import extract_null_mask
-
-            res = extract_null_mask(query_fn, table, names, max_rows=MASK_SAMPLE)
-            if isinstance(res, dict) and res.get("status") == "ok":
-                mask = res.get("mask") or {}
-                if mask:
-                    return mask, int(res.get("n") or 0), "db"
-        except Exception:  # noqa: BLE001 — degrade to raw_numeric.
-            pass
-    # 2) Fallback: numeric-only mask derived from raw_numeric (None -> missing).
-    rn = ctx.get("raw_numeric")
-    if isinstance(rn, dict) and rn:
-        mask = {}
-        for col, vals in rn.items():
-            if isinstance(vals, (list, tuple)):
-                mask[col] = [1 if v is None else 0 for v in vals]
-        if mask:
-            n = max((len(v) for v in mask.values()), default=0)
-            return mask, n, "raw_numeric"
-    return None, 0, None
-
-
-# --------------------------------------------------------------------------- #
-# Lazy registry delegations (each degrades to None on any failure).
-# --------------------------------------------------------------------------- #
-def _overview(mask: dict):
-    try:
-        from datascience.missingness_overview import missingness_overview
-
-        out = missingness_overview(mask)
-        return out if isinstance(out, dict) else None
-    except Exception:  # noqa: BLE001
-        return None
-
-
-def _correlation(mask: dict, top_k: int):
-    try:
-        from datascience.missingness_correlation import missingness_correlation
-
-        out = missingness_correlation(mask, top_k=top_k)
-        return out if isinstance(out, dict) else None
-    except Exception:  # noqa: BLE001
-        return None
-
-
-def _row_patterns(mask: dict, top_n: int):
-    try:
-        from datascience.missingness_row_patterns import missingness_row_patterns
-
-        out = missingness_row_patterns(mask, top_n=top_n)
-        return out if isinstance(out, dict) else None
-    except Exception:  # noqa: BLE001
-        return None
-
-
-def _rank_bar_make(names, pcts, title):
-    def make():
-        try:
-            from datascience.missingness_rank_bar_figure import (
-                missingness_rank_bar_figure,
-            )
-
-            return missingness_rank_bar_figure(names, pcts, title=title)
-        except Exception:  # noqa: BLE001 — minimal fallback figure.
-            return _fallback_fig("ranking de nulos no disponible")
-
-    return make
-
-
-def _heatmap_make(matrix, labels, title):
-    def make():
-        try:
-            from datascience.missingness_corr_heatmap_figure import (
-                missingness_corr_heatmap_figure,
-            )
-
-            return missingness_corr_heatmap_figure(matrix, labels, title=title)
-        except Exception:  # noqa: BLE001 — minimal fallback figure.
-            return _fallback_fig("heatmap de co-ocurrencia no disponible")
-
-    return make
-
-
-def _fallback_fig(message: str):
-    import matplotlib
-
-    matplotlib.use("Agg")
-    from matplotlib.figure import Figure
-
-    fig = Figure(figsize=(5.0, 2.2))
-    ax = fig.add_subplot(111)
-    ax.text(0.5, 0.5, message, ha="center", va="center")
-    ax.axis("off")
-    return fig
-
-
-# --------------------------------------------------------------------------- #
-# Block builders.
-# --------------------------------------------------------------------------- #
-def _summary_block(profile: dict, with_nulls: list, overview, sampled, n_total):
-    rows = []
-    gpct = _global_missing_pct(profile)
-    rows.append(("Celdas faltantes (global)", _fmt_pct(gpct)))
-    rows.append(("Columnas con faltantes", str(len(with_nulls))))
-    all_null = profile.get("all_null_cols")
-    if isinstance(all_null, (list, tuple)) and all_null:
-        rows.append(("Columnas 100% faltantes", str(len(all_null))))
-    if isinstance(overview, dict):
-        cr = overview.get("complete_rows")
-        ir = overview.get("incomplete_rows")
-        suffix = ""
-        if (isinstance(sampled, int) and isinstance(n_total, (int, float))
-                and sampled and n_total and sampled < n_total):
-            suffix = f" (sobre muestra de {_fmt_int(sampled)} filas)"
-        if cr is not None:
-            rows.append(("Filas completas (sin faltantes)",
-                         f"{_fmt_int(cr)} ({_fmt_pct(overview.get('complete_pct'))})"
-                         + suffix))
-        if ir is not None:
-            rows.append(("Filas con ≥1 faltante",
-                         f"{_fmt_int(ir)} "
-                         f"({_fmt_pct(overview.get('incomplete_pct'))})" + suffix))
-    return model.KVTable(rows=rows, title="Resumen de datos faltantes")
-
-
-def _ranking_block(with_nulls: list):
-    header = ["Columna", "Faltantes", "% faltante"]
-    rows = [[_truncate(n), _fmt_int(c), _fmt_pct(p)] for (n, c, p) in with_nulls]
-    if not rows:
-        return None
-    return model.DataTable(
-        header=header, rows=rows, title="Faltantes por columna",
-        note="ordenado de más a menos faltante")
-
-
-def _ranking_figure(with_nulls: list):
-    names = [n for (n, _, p) in with_nulls if p is not None]
-    pcts = [p for (_, _, p) in with_nulls if p is not None]
-    if not names:
-        return None
-    return model.Figure(
-        make=_rank_bar_make(names, pcts, "% de valores faltantes por columna"),
-        caption="Porcentaje de valores faltantes por columna (barras).")
-
-
-def _pairs_block(corr: dict):
-    """Top column pairs whose absences co-occur, as a table, or None."""
-    pairs = (corr or {}).get("pairs") or []
-    header = ["Columna A", "Columna B", "Corr. ausencia", "Co-faltan", "Jaccard"]
-    rows = []
-    for p in pairs[:_TOP_PAIRS]:
-        if not isinstance(p, dict):
-            continue
-        rows.append([
-            _truncate(p.get("a")),
-            _truncate(p.get("b")),
-            _fmt_num(p.get("corr")),
-            _fmt_int(p.get("co_missing")),
-            _fmt_num(p.get("jaccard")),
-        ])
-    if not rows:
-        return None
-    shown = len(rows)
-    total = len(pairs)
-    note = ("correlación de las máscaras is-null entre columnas; "
-            "«Co-faltan» = nº de filas en que ambas faltan a la vez")
-    if total > shown:
-        note += f" — top {shown} de {total} pares"
-    return model.DataTable(header=header, rows=rows,
-                           title="Pares de columnas que co-faltan", note=note)
-
-
-def _heatmap_block(corr: dict):
-    cols = (corr or {}).get("columns") or []
-    matrix = (corr or {}).get("matrix") or []
-    if len(cols) < 2 or not matrix:
-        return None
-    labels = [_truncate(c, 16) for c in cols]
-    return model.Figure(
-        make=_heatmap_make(matrix, labels, "Co-ocurrencia de ausencias"),
-        caption=("Correlación de las ausencias entre columnas (azul = faltan "
-                 "juntas; rojo = cuando una falta la otra tiende a estar)."))
-
-
-def _patterns_block(patterns_res: dict):
-    patterns = (patterns_res or {}).get("patterns") or []
-    header = ["Columnas que faltan juntas", "Filas", "%"]
-    rows = []
-    for p in patterns[:_TOP_PATTERNS]:
-        if not isinstance(p, dict):
-            continue
-        cols = p.get("missing_cols") or []
-        if cols:
-            label = ", ".join(_truncate(c, 18) for c in cols)
-        else:
-            label = "(fila completa — sin faltantes)"
-        rows.append([label, _fmt_int(p.get("n_rows")), _fmt_pct(p.get("pct"))])
-    if not rows:
-        return None
-    total = (patterns_res or {}).get("n_patterns")
-    shown = len(rows)
-    note = "cada fila es un patrón de «qué columnas faltan juntas»"
-    if isinstance(total, int) and total > shown:
-        note += f" — top {shown} de {total} patrones distintos"
-    return model.DataTable(header=header, rows=rows,
-                           title="Patrones de fila más comunes", note=note)
-
-
-def _mcar_mar_note(corr: dict, mark: bool):
-    """Interpretive, exploratory MCAR/MAR note from the absence correlations.
-
-    Reads the absence correlations at two levels so the verdict never contradicts
-    the visible evidence: a *strong* correlation flags a clear non-random (MAR)
-    pattern; a *partial* overlap (many rows co-miss — high Jaccard — even if the
-    correlation is diluted by one column being missing far more often) flags a
-    localized possible-MAR and cites the concrete co-missing pair; only when
-    neither holds does it read the absences as compatible with MCAR."""
-
-    def _pairs_with(attr_ok):
-        out = []
-        for p in (corr or {}).get("pairs") or []:
-            if isinstance(p, dict) and attr_ok(p):
-                out.append(p)
-        return out
-
-    def _cf(v):
-        try:
-            return float(v)
-        except (TypeError, ValueError):
-            return 0.0
-
-    strong = _pairs_with(lambda p: abs(_cf(p.get("corr"))) >= _CORR_STRONG)
-    partial = _pairs_with(
-        lambda p: _cf(p.get("corr")) > 0 and _cf(p.get("jaccard")) >= _JACCARD_NOTABLE)
-    mcar = _term("mcar", "MCAR", mark)
-    mar = _term("mar", "MAR", mark)
-    head = (
-        "**Lectura exploratoria MCAR/MAR.** Esta es una heurística basada en la "
-        "correlación de las ausencias entre columnas, NO un test confirmatorio "
-        "(como el de Little); orienta, no demuestra. ")
-    if strong:
-        top = strong[0]
-        ev = (f"«{model._safe_str(top.get('a'))}» y "
-              f"«{model._safe_str(top.get('b'))}» "
-              f"(corr {_fmt_num(top.get('corr'))})")
-        body = (
-            f"Hay ausencias que co-ocurren con fuerza —{ev}—: las columnas no "
-            f"faltan de forma independiente, lo que es un indicio de un patrón no "
-            f"aleatorio ({mar}). Antes de imputar o descartar filas conviene "
-            f"comprobar si la ausencia depende de otra variable observada; en ese "
-            f"caso la imputación debería condicionar en ella para no sesgar.")
-    elif partial:
-        top = max(partial, key=lambda p: _cf(p.get("jaccard")))
-        ev = (f"«{model._safe_str(top.get('a'))}» y "
-              f"«{model._safe_str(top.get('b'))}» faltan a la vez en "
-              f"{_fmt_int(top.get('co_missing'))} filas "
-              f"(Jaccard {_fmt_num(top.get('jaccard'))})")
-        body = (
-            f"Hay co-ocurrencia parcial de ausencias —{ev}—: algunas columnas "
-            f"tienden a faltar juntas aunque la correlación global sea modesta "
-            f"(habitual cuando una columna falta mucho más que la otra). Es un "
-            f"indicio de un posible patrón localizado no aleatorio ({mar}); "
-            f"conviene revisar si esa ausencia depende de otra variable observada "
-            f"antes de imputar, en lugar de asumir que faltan al azar.")
-    else:
-        body = (
-            f"Las ausencias entre columnas no muestran correlación ni solape "
-            f"relevante: parecen independientes, lo que es compatible con que "
-            f"falten al azar ({mcar}). Aun así, la ausencia podría depender de "
-            f"variables no observadas (la heurística no lo descarta).")
-    return model.Markdown(text=head + body)
-
-
-def _intro_block(mark: bool, source):
-    missingness = _term("missingness", "missingness", mark)
-    text = (
-        f"Este capítulo analiza el {missingness} de la tabla: no solo cuánto "
-        "falta (eso lo cubre la calidad), sino DÓNDE falta y si las columnas "
-        "faltan juntas. La co-ocurrencia de ausencias se calcula sobre la matriz "
-        "binaria «is-null» por fila.")
-    if source == "raw_numeric":
-        text += (" Nota: no se pudo leer la tabla cruda completa, así que la "
-                 "co-ocurrencia se limita a las columnas numéricas disponibles.")
-    return model.Markdown(text=text)
-
-
-# --------------------------------------------------------------------------- #
-# Entry point.
-# --------------------------------------------------------------------------- #
-def build_missingness(profile: dict, ctx: dict):
-    """Build the missingness Chapter, or None if the table has no missing data."""
-    if not isinstance(profile, dict):
-        profile = {}
-    ctx = ctx or {}
-
-    with_nulls = _columns_with_nulls(profile)
-    if not with_nulls:
-        return None  # no missing data anywhere -> chapter does not apply.
-
-    # Register glossary terms (if a collector is present) and mark them clickable.
-    glossary = ctx.get("glossary")
-    mark = False
-    if isinstance(glossary, model.GlossaryCollector):
-        for key, (label, definition) in _TERMS.items():
-            glossary.add(key, label, definition)
-        mark = True
-
-    # Per-row is-null mask (sample) for co-occurrence and row patterns.
-    mask, sampled, source = _null_mask(profile, ctx)
-    overview = _overview(mask) if mask else None
-    n_total = profile.get("n_rows")
-
-    blocks = [
-        model.Heading(text="Cuánto y dónde faltan datos", level=2),
-        _intro_block(mark, source),
-        _summary_block(profile, with_nulls, overview, sampled, n_total),
-        model.Heading(text="Faltantes por columna", level=2),
-    ]
-    ranking = _ranking_block(with_nulls)
-    if ranking is not None:
-        blocks.append(ranking)
-    rank_fig = _ranking_figure(with_nulls)
-    if rank_fig is not None:
-        blocks.append(rank_fig)
-
-    # Co-occurrence + row patterns need the per-row mask. Without it, say so.
-    if not mask:
-        blocks.append(model.Note(
-            "No se pudo construir la matriz «is-null» por fila (sin acceso a los "
-            "datos crudos), así que no se analiza la co-ocurrencia de ausencias "
-            "ni los patrones de fila en este informe."))
-        return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
-                             version=CHAPTER_VERSION, blocks=blocks)
-
-    corr = _correlation(mask, _TOP_PAIRS) or {}
-    co_blocks = [model.Heading(text="Co-ocurrencia de ausencias", level=2)]
-    heatmap = _heatmap_block(corr)
-    if heatmap is not None:
-        co_blocks.append(heatmap)
-    pairs = _pairs_block(corr)
-    if pairs is not None:
-        co_blocks.append(pairs)
-    if heatmap is None and pairs is None:
-        co_blocks.append(model.Note(
-            "Ninguna pareja de columnas comparte ausencias con variación "
-            "suficiente para correlacionarlas (p. ej. una sola columna con "
-            "faltantes), así que no hay co-ocurrencia que mostrar."))
-    # Keep the co-occurrence heading next to its heatmap and table.
-    blocks.append(model.Group(blocks=co_blocks))
-
-    patterns_res = _row_patterns(mask, _TOP_PATTERNS) or {}
-    patterns = _patterns_block(patterns_res)
-    if patterns is not None:
-        blocks.append(model.Heading(text="Patrones de fila", level=2))
-        blocks.append(patterns)
-
-    blocks.append(model.Heading(text="Lectura MCAR / MAR", level=2))
-    blocks.append(_mcar_mar_note(corr, mark))
-
-    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
-                         version=CHAPTER_VERSION, blocks=blocks)
@@ -1,162 +0,0 @@
-"""Tests for the MISSINGNESS chapter.
-
-Covers the Definition of Done for this chapter:
-  * Activates (non-None Chapter with the expected sections) when the profile has
-    missing data, building the co-occurrence from the per-row is-null mask.
-  * Returns None when the table has no missing data at all (edge case).
-  * Registers the MCAR/MAR/missingness glossary terms.
-  * The DuckDB push-down path covers categorical columns (not only numeric),
-    so a categorical column that co-misses with a numeric one is detected.
-"""
-
-import os
-import sys
-
-_HERE = os.path.dirname(os.path.abspath(__file__))
-_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", "..", ".."))  # python/functions
-if _FUNCTIONS not in sys.path:
-    sys.path.insert(0, _FUNCTIONS)
-
-from datascience.automatic_eda import model  # noqa: E402
-from datascience.automatic_eda.chapters.missingness import (  # noqa: E402
-    build_missingness,
-)
-
-
-def _titles(chapter):
-    """Collect heading texts and table/figure titles for assertions."""
-    out = []
-    for b in chapter.blocks:
-        kind = getattr(b, "kind", None)
-        if kind == "heading":
-            out.append(("heading", getattr(b, "text", "")))
-        elif kind in ("data_table", "kv_table"):
-            out.append((kind, getattr(b, "title", "")))
-        elif kind == "group":
-            for inner in getattr(b, "blocks", []):
-                ik = getattr(inner, "kind", None)
-                if ik == "heading":
-                    out.append(("heading", getattr(inner, "text", "")))
-                elif ik in ("data_table", "kv_table"):
-                    out.append((ik, getattr(inner, "title", "")))
-                elif ik == "figure":
-                    out.append(("figure", getattr(inner, "caption", "")))
-        elif kind == "figure":
-            out.append(("figure", getattr(b, "caption", "")))
-    return out
-
-
-def _all_text(chapter):
-    parts = []
-    def walk(blocks):
-        for b in blocks:
-            for attr in ("text", "title", "note", "caption"):
-                v = getattr(b, attr, None)
-                if v:
-                    parts.append(str(v))
-            if getattr(b, "kind", None) == "group":
-                walk(getattr(b, "blocks", []))
-    walk(chapter.blocks)
-    return "\n".join(parts)
-
-
-def test_returns_none_when_no_missing_data():
-    profile = {
-        "n_rows": 4,
-        "null_cell_pct": 0.0,
-        "columns": [
-            {"name": "a", "null_count": 0, "null_pct": 0.0, "n_rows": 4},
-            {"name": "b", "null_count": 0, "null_pct": 0.0, "n_rows": 4},
-        ],
-    }
-    assert build_missingness(profile, {}) is None
-
-
-def test_activates_with_cooccurrence_via_raw_numeric():
-    # a and b are missing in EXACTLY the same rows (0,1,2) -> perfect absence
-    # correlation. c has no nulls. No db_path -> the chapter falls back to the
-    # numeric raw_numeric mask.
-    profile = {
-        "n_rows": 6,
-        "null_cell_pct": (0.5 + 0.5 + 0.0) / 3.0,
-        "columns": [
-            {"name": "a", "null_count": 3, "null_pct": 0.5, "n_rows": 6},
-            {"name": "b", "null_count": 3, "null_pct": 0.5, "n_rows": 6},
-            {"name": "c", "null_count": 0, "null_pct": 0.0, "n_rows": 6},
-        ],
-    }
-    glossary = model.GlossaryCollector()
-    ctx = {
-        "raw_numeric": {
-            "a": [None, None, None, 1.0, 2.0, 3.0],
-            "b": [None, None, None, 4.0, 5.0, 6.0],
-        },
-        "glossary": glossary,
-    }
-    ch = build_missingness(profile, ctx)
-    assert ch is not None
-    assert ch.id == "missingness"
-    assert ch.blocks
-
-    titles = _titles(ch)
-    headings = {t for (k, t) in titles if k == "heading"}
-    # Core sections present.
-    assert any("Cuánto y dónde" in h for h in headings)
-    assert any("Faltantes por columna" in h for h in headings)
-    assert any("Co-ocurrencia" in h for h in headings)
-    assert any("MCAR" in h for h in headings)
-    # A summary KVTable, a ranking DataTable, a co-occurrence figure and the
-    # pairs table all exist.
-    kinds = {k for (k, _) in titles}
-    assert "kv_table" in kinds
-    assert "data_table" in kinds
-    assert "figure" in kinds
-
-    # Glossary terms registered.
-    keys = {t["key"] for t in glossary.terms()}
-    assert {"missingness", "mcar", "mar"} <= keys
-
-    # The MCAR/MAR note reads the co-occurrence; with a perfect overlap it must
-    # flag the non-random (MAR) reading.
-    text = _all_text(ch)
-    assert "MAR" in text
-
-
-def test_db_pushdown_covers_categorical_column(tmp_path):
-    """The is-null mask push-down must cover a categorical column, so a
-    categorical that co-misses with a numeric one shows up in the pairs."""
-    import duckdb
-
-    db = str(tmp_path / "miss.duckdb")
-    con = duckdb.connect(db)
-    con.execute("CREATE TABLE t (num1 DOUBLE, num2 DOUBLE, cat VARCHAR)")
-    # num1 and cat are NULL together in the first 4 of 10 rows; num2 never null.
-    rows = []
-    for i in range(10):
-        if i < 4:
-            rows.append((None, float(i), None))
-        else:
-            rows.append((float(i), float(i), f"c{i}"))
-    con.executemany("INSERT INTO t VALUES (?,?,?)", rows)
-    con.close()
-
-    profile = {
-        "n_rows": 10,
-        "null_cell_pct": (0.4 + 0.0 + 0.4) / 3.0,
-        "columns": [
-            {"name": "num1", "null_count": 4, "null_pct": 0.4, "n_rows": 10},
-            {"name": "num2", "null_count": 0, "null_pct": 0.0, "n_rows": 10},
-            {"name": "cat", "null_count": 4, "null_pct": 0.4, "n_rows": 10},
-        ],
-    }
-    ctx = {"db_path": db, "table": "t", "glossary": model.GlossaryCollector()}
-    ch = build_missingness(profile, ctx)
-    assert ch is not None
-
-    # The pairs table must mention both num1 and cat (they co-miss perfectly),
-    # which is only possible if the mask covered the categorical column.
-    text = _all_text(ch)
-    assert "num1" in text and "cat" in text
-    # Co-occurrence section + a pairs data table exist.
-    titles = _titles(ch)
-    assert any("co-faltan" in (t or "").lower() for (k, t) in titles)
@@ -32,7 +32,6 @@ CHAPTER_ORDER = [
    "num_distr",     # numeric distributions
    "cat_distr",     # categorical distributions
    "calidad",       # data quality
-    "missingness",   # missing-data patterns (co-occurrence of absences; MCAR/MAR)
    "correlacion",   # correlations / associations
    "relaciones",    # key relations: declared/candidate PK + FK (inter/intra-table)
    "modelos",       # cheap models (PCA/KMeans/outliers)
@@ -0,0 +1,68 @@
+---
+name: classify_relationship_type
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: pure
+signature: "def classify_relationship_type(xs: list, ys: list) -> dict"
+description: "Clasifica el TIPO de relacion entre dos variables numericas pareadas por indice para el EDA automatico del grupo eda. Limpia los pares de forma defensiva (descarta None/bool/NaN/inf), reusa pearson y spearman_corr del registry y ajusta polinomios de grado 2 y 3 con numpy.polyfit (R^2 manual), y a partir de esas senales etiqueta la forma: 'lineal', 'polinomica (grado 2/3)', 'monotona no-lineal' o 'debil/sin forma'. Orden de decision: debil -> monotona -> polinomica -> lineal (la primera que matchea gana), con umbrales calibrados para datos reales discretos/ruidosos. Devuelve ademas los coeficientes del mejor modelo en orden de numpy.polyval para pintar la curva de ajuste sobre el scatter. Funcion pura no-throw: ante datos insuficientes (menos de 5 pares validos o varianza ~0) o cualquier fallo devuelve el dict canonico con tipo='debil/sin forma' y el resto a None."
+tags: [eda, correlation, relationship, classification, polyfit, datascience, pure]
+params:
+  - name: xs
+    desc: "Lista (o tupla) de valores numericos de la primera variable, pareada por indice con ys. Cada par xs[i],ys[i] se descarta si cualquiera de los dos es None, bool, NaN o inf. Lectura defensiva."
+  - name: ys
+    desc: "Lista (o tupla) de valores numericos de la segunda variable, pareada por indice con xs. Mismas reglas de limpieza que xs."
+output: "Dict con SIEMPRE las mismas 8 claves: tipo (str: 'lineal' | 'polinómica (grado 2)' | 'polinómica (grado 3)' | 'monótona no-lineal' | 'débil/sin forma'); pearson (float|None: coeficiente de Pearson r); r2_linear (float|None: r**2 del ajuste lineal); spearman (float|None: rho de Spearman); r2_poly2 (float|None: R^2 del ajuste polinomico de grado 2); r2_poly3 (float|None: R^2 del ajuste de grado 3); best_degree (int|None: grado del modelo elegido — 1 lineal, 2/3 polinomico, None si monotona/debil); coeffs (list|None: coeficientes del mejor modelo en orden de numpy.polyval para pintar la curva, o None). Ante datos insuficientes o error: tipo='débil/sin forma' y el resto de claves a None."
+uses_functions: [pearson_py_datascience, spearman_corr_py_datascience]
+uses_types: []
+returns: []
+returns_optional: false
+error_type: ""
+imports: [numpy]
+tested: true
+tests: ["test_lineal", "test_polinomica_cuadratica", "test_monotona_no_lineal", "test_monotona_exponencial", "test_debil_sin_forma", "test_lista_vacia_no_lanza", "test_longitudes_distintas_no_lanza", "test_todos_none_no_lanza", "test_entradas_none_no_lanza", "test_constante_no_lanza", "test_filtra_nan_inf_bool"]
+test_file_path: "python/functions/datascience/classify_relationship_type_test.py"
+file_path: "python/functions/datascience/classify_relationship_type.py"
+---
+
+## Ejemplo
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience.classify_relationship_type import classify_relationship_type
+import numpy as np
+
+# Relacion claramente cuadratica (forma de parabola) sobre dominio simetrico.
+x = list(np.linspace(-10, 10, 60))
+y = [v * v for v in x]
+
+res = classify_relationship_type(x, y)
+print(res["tipo"])         # 'polinómica (grado 2)'
+print(res["best_degree"])  # 2
+print(res["r2_linear"])    # 0.0   -> el Pearson lineal no ve la parabola
+print(res["r2_poly2"])     # 1.0
+print(res["coeffs"])       # [1.0, -0.0, -0.0]  -> numpy.polyval(coeffs, x) ~ x**2
+
+# El capitulo pinta la curva de ajuste cuando coeffs no es None:
+#   if res["coeffs"] is not None:
+#       xs_fit = np.linspace(min(x), max(x), 200)
+#       ys_fit = np.polyval(res["coeffs"], xs_fit)
+#       ax.plot(xs_fit, ys_fit)   # curva sobre el ax.scatter(x, y)
+```
+
+## Cuando usarla
+
+- Usala en el capitulo de relaciones/correlaciones del EDA automatico, despues de detectar dos columnas numericas con alguna asociacion, para decidir QUE curva de ajuste pintar sobre el scatter (recta, parabola, cubica o ninguna) y poner una etiqueta legible al tipo de relacion.
+- Cuando un Pearson bajo no signifique "sin relacion": esta funcion cruza Pearson con Spearman y con ajustes polinomicos para distinguir una relacion lineal debil de una monotona no-lineal (que el rango si capta) o de una curva polinomica.
+- Cuando necesites un punto de entrada determinista y no-throw que, con los mismos datos, devuelva siempre el mismo `tipo` y los mismos `coeffs` listos para `numpy.polyval` sin tener que ajustar modelos a mano en el capitulo.
+
+## Gotchas
+
+- Funcion pura, deterministica y no-throw: ante menos de 5 pares validos, varianza ~0 (xs o ys constante) o cualquier excepcion interna devuelve el dict canonico `tipo="débil/sin forma"` con el resto de claves a `None`. El dict SIEMPRE trae las 8 claves: nunca compruebes existencia, comprueba `None`.
+- El orden de decision importa: `débil -> monótona -> polinómica -> lineal` (la primera que matchee gana). La monotonia se evalua ANTES que el ajuste polinomico, asi que una curva monotona suave (exp, log, potencias) sale `monótona no-lineal` aunque un cubico tambien la ajuste — la dominancia del rango (Spearman >> Pearson) es la senal mas interpretable. Solo cae en `polinómica` una forma curva NO monotona (p.ej. una parabola, Spearman ~0 pero R^2 polinomico alto).
+- Umbrales fijos (calibrados para EDA con datos discretos/ruidosos, no para inferencia formal): `débil/sin forma` si las tres senales son bajas a la vez (`abs(pearson) < 0.3` y `abs(spearman) < 0.3` y `mejor_poly < 0.3`); `monótona no-lineal` si `abs(spearman) - abs(pearson) >= 0.1` y `abs(spearman) >= 0.4`; `polinómica (grado N)` si el mejor polinomico mejora `>= 0.1` sobre el lineal y su R^2 `>= 0.3`; en cualquier otro caso con senal (no debil) `lineal`. El suelo de 0.3 evita llamar "debil" a relaciones reales pero discretas (conteos, escalas ordinales) con R^2 bajo pero direccion clara.
+- `coeffs` va en orden de `numpy.polyval` (grado descendente). Para `lineal` es `[pendiente, intercepto]` (grado 1); para `polinómica` los del grado elegido; para `monótona no-lineal` y `débil/sin forma` es `None` (el scatter pintara una curva suavizada o nada — lo decide el capitulo, no esta funcion).
+- `best_degree` prefiere el grado 2 sobre el 3 cuando empatan dentro de 0.02 de R^2 (parsimonia): no esperes grado 3 salvo que mejore claramente.
+- Los pares con `None`, `bool`, `NaN` o `inf` se descartan por indice en silencio; `bool` cuenta como no-numerico (un `True` no es `1`). El dominio de los datos afecta al resultado: una parabola sobre un dominio simetrico da Pearson ~0 (sale `polinómica`), pero sobre un dominio asimetrico el Pearson sube y puede salir `lineal`.
@@ -0,0 +1,187 @@
+"""Clasifica el TIPO de relacion entre dos variables numericas pareadas.
+
+Funcion pura del grupo eda. Dadas dos listas numericas pareadas por indice,
+limpia los pares de forma defensiva, calcula correlaciones lineal (Pearson) y de
+rangos (Spearman) y ajustes polinomicos de grado 2 y 3, y a partir de esas
+senales etiqueta la forma de la relacion para el EDA automatico:
+
+    "lineal" | "polinómica (grado 2)" | "polinómica (grado 3)" |
+    "monótona no-lineal" | "débil/sin forma"
+
+Ademas devuelve los coeficientes del mejor modelo (en orden de numpy.polyval)
+para que el capitulo pinte la curva de ajuste sobre el scatter. Reusa las
+funciones del registry `pearson` y `spearman_corr` en vez de reimplementarlas.
+
+NUNCA lanza: ante cualquier fallo o dato insuficiente devuelve el dict canonico
+con tipo="débil/sin forma" y el resto de claves a None.
+"""
+
+import math
+import warnings
+
+import numpy as np
+
+from datascience.datascience import pearson
+from datascience.spearman_corr import spearman_corr
+
+# Forma canonica de la respuesta cuando no se puede clasificar (datos
+# insuficientes, varianza nula o error interno). Siempre las mismas claves.
+_WEAK = {
+    "tipo": "débil/sin forma",
+    "pearson": None,
+    "r2_linear": None,
+    "spearman": None,
+    "r2_poly2": None,
+    "r2_poly3": None,
+    "best_degree": None,
+    "coeffs": None,
+}
+
+
+def _is_num(v) -> bool:
+    """True si v es un numero real finito (int/float, no bool, no NaN, no inf)."""
+    return (
+        isinstance(v, (int, float))
+        and not isinstance(v, bool)
+        and not (isinstance(v, float) and (math.isnan(v) or math.isinf(v)))
+    )
+
+
+def _poly_r2(coeffs, x_arr, y_arr, ss_tot: float) -> float:
+    """R^2 de un ajuste polinomico: 1 - SS_res/SS_tot. 0 si SS_tot==0."""
+    if ss_tot == 0.0:
+        return 0.0
+    pred = np.polyval(coeffs, x_arr)
+    ss_res = float(np.sum((y_arr - pred) ** 2))
+    return 1.0 - ss_res / ss_tot
+
+
+def classify_relationship_type(xs: list, ys: list) -> dict:
+    """Clasifica el tipo de relacion entre dos variables numericas pareadas.
+
+    Empareja xs[i],ys[i] por indice y descarta el par si cualquiera de los dos
+    es None, bool, NaN o inf. Sobre los pares limpios calcula Pearson r
+    (r2_linear = r**2), Spearman rho y los R^2 de ajustes polinomicos de grado 2
+    y 3 (con numpy.polyfit + R^2 manual). Con esas senales decide la etiqueta.
+
+    Orden de evaluacion de la etiqueta (la primera que matchee gana). Los
+    umbrales estan calibrados para datos reales, a menudo discretos y ruidosos
+    (conteos, escalas ordinales): una relacion con |r| >= 0.3, |rho| >= 0.3 o un
+    polinomio con R^2 >= 0.3 ya tiene FORMA y no debe etiquetarse como "debil".
+        1. "débil/sin forma" — todas las senales bajas a la vez:
+           abs(pearson) < 0.3 y abs(spearman) < 0.3 y mejor_poly < 0.3.
+        2. "monótona no-lineal" — el rango (Spearman) capta una monotonia que el
+           Pearson lineal no: abs(spearman) - abs(pearson) >= 0.1 y
+           abs(spearman) >= 0.4. No se fuerza un polinomio (coeffs/best_degree =
+           None); el capitulo dibuja la tendencia ordenada sobre el scatter.
+        3. "polinómica (grado N)" — el mejor polinomico mejora claramente sobre
+           el lineal (mejor_poly - r2_linear >= 0.1) y mejor_poly >= 0.3. N es el
+           grado (2 o 3) con mejor R^2, prefiriendo el 2 si empatan dentro de 0.02
+           (parsimonia).
+        4. "lineal" — el resto: hay senal (no es debil) y la forma que existe es
+           esencialmente lineal. best_degree=1, coeffs del ajuste de grado 1.
+
+    Si hay menos de 5 pares validos, o la varianza de xs o de ys es ~0
+    (constante), devuelve directamente "débil/sin forma".
+
+    Args:
+        xs: lista (o tupla) de valores numericos de la primera variable,
+            pareada por indice con ys. Pares con None/bool/NaN/inf se descartan.
+        ys: lista (o tupla) de valores numericos de la segunda variable,
+            pareada por indice con xs.
+
+    Returns:
+        dict con SIEMPRE las mismas claves:
+            tipo (str), pearson (float|None), r2_linear (float|None),
+            spearman (float|None), r2_poly2 (float|None), r2_poly3 (float|None),
+            best_degree (int|None: 1, 2, 3 o None),
+            coeffs (list|None: coeficientes en orden de numpy.polyval, o None).
+        Nunca lanza: ante fallo o datos insuficientes devuelve el dict debil.
+    """
+    try:
+        if xs is None or ys is None:
+            return dict(_WEAK)
+
+        pairs = [
+            (float(x), float(y))
+            for x, y in zip(xs, ys)
+            if _is_num(x) and _is_num(y)
+        ]
+
+        # Datos insuficientes para hablar de forma de la relacion.
+        if len(pairs) < 5:
+            return dict(_WEAK)
+
+        clean_x = [p[0] for p in pairs]
+        clean_y = [p[1] for p in pairs]
+
+        # Varianza ~0 en cualquiera de las series => relacion indefinida.
+        if len(set(clean_x)) < 2 or len(set(clean_y)) < 2:
+            return dict(_WEAK)
+        x_arr = np.asarray(clean_x, dtype=float)
+        y_arr = np.asarray(clean_y, dtype=float)
+        if float(np.var(x_arr)) < 1e-15 or float(np.var(y_arr)) < 1e-15:
+            return dict(_WEAK)
+
+        # Correlaciones reutilizando las funciones del registry.
+        r = pearson(clean_x, clean_y)
+        spearman = spearman_corr(clean_x, clean_y)
+        r2_linear = r ** 2
+
+        # Ajustes polinomicos grado 2 y 3 con R^2 manual.
+        ss_tot = float(np.sum((y_arr - float(np.mean(y_arr))) ** 2))
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            c1 = np.polyfit(x_arr, y_arr, 1)
+            c2 = np.polyfit(x_arr, y_arr, 2)
+            c3 = np.polyfit(x_arr, y_arr, 3)
+        r2_poly2 = _poly_r2(c2, x_arr, y_arr, ss_tot)
+        r2_poly3 = _poly_r2(c3, x_arr, y_arr, ss_tot)
+
+        mejor_poly = max(r2_poly2, r2_poly3)
+        # Grado del mejor polinomico, con preferencia por la parsimonia: solo se
+        # elige el grado 3 si supera al grado 2 por mas de 0.02.
+        best_poly_degree = 3 if (r2_poly3 - r2_poly2) > 0.02 else 2
+
+        abs_s = abs(spearman)
+        abs_p = abs(r)
+
+        # Decision en orden: debil-temprano -> monotona -> polinomica -> lineal.
+        if abs_p < 0.3 and abs_s < 0.3 and mejor_poly < 0.3:
+            # Ninguna senal supera el suelo de forma: relacion debil/sin forma.
+            tipo = "débil/sin forma"
+            best_degree = None
+            coeffs = None
+        elif (abs_s - abs_p) >= 0.1 and abs_s >= 0.4:
+            # Spearman (rango) capta una monotonia que el Pearson lineal no:
+            # relacion monotona no-lineal. No se fuerza un polinomio que tal vez
+            # no ajusta bien; el capitulo dibuja la tendencia ordenada.
+            tipo = "monótona no-lineal"
+            best_degree = None
+            coeffs = None
+        elif (mejor_poly - r2_linear) >= 0.1 and mejor_poly >= 0.3:
+            tipo = "polinómica (grado {})".format(best_poly_degree)
+            best_degree = best_poly_degree
+            best_coeffs = c2 if best_poly_degree == 2 else c3
+            coeffs = [float(c) for c in best_coeffs]
+        else:
+            # Hay senal (no es debil) y no es ni monotona-pura ni polinomica:
+            # la correlacion que existe es esencialmente lineal.
+            tipo = "lineal"
+            best_degree = 1
+            coeffs = [float(c) for c in c1]
+
+        return {
+            "tipo": tipo,
+            "pearson": round(float(r), 6),
+            "r2_linear": round(float(r2_linear), 6),
+            "spearman": round(float(spearman), 6),
+            "r2_poly2": round(float(r2_poly2), 6),
+            "r2_poly3": round(float(r2_poly3), 6),
+            "best_degree": best_degree,
+            "coeffs": (
+                [round(c, 8) for c in coeffs] if coeffs is not None else None
+            ),
+        }
+    except Exception:
+        return dict(_WEAK)
@@ -0,0 +1,174 @@
+"""Tests para classify_relationship_type."""
+
+import os
+import sys
+
+import numpy as np
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+from classify_relationship_type import classify_relationship_type
+
+# Claves que el dict de salida debe contener SIEMPRE.
+_EXPECTED_KEYS = {
+    "tipo", "pearson", "r2_linear", "spearman",
+    "r2_poly2", "r2_poly3", "best_degree", "coeffs",
+}
+
+
+def _assert_shape(r):
+    """Toda salida tiene exactamente las 8 claves canonicas."""
+    assert isinstance(r, dict)
+    assert set(r.keys()) == _EXPECTED_KEYS
+
+
+def test_lineal():
+    """Golden: y = 2x + 1 con ruido pequeno -> 'lineal', best_degree=1."""
+    rng = np.random.default_rng(42)
+    x = np.linspace(0.0, 10.0, 50)
+    y = 2.0 * x + 1.0 + rng.normal(0.0, 0.3, 50)
+
+    r = classify_relationship_type(list(x), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"] == "lineal"
+    assert r["best_degree"] == 1
+    assert r["r2_linear"] >= 0.5
+    # coeffs ~ [pendiente, intercepto] del ajuste de grado 1.
+    assert r["coeffs"] is not None and len(r["coeffs"]) == 2
+    assert abs(r["coeffs"][0] - 2.0) < 0.1   # pendiente ~2
+    assert abs(r["coeffs"][1] - 1.0) < 0.3   # intercepto ~1
+
+
+def test_polinomica_cuadratica():
+    """Golden: y = x**2 sobre [-10, 10] -> 'polinómica', best_degree in (2, 3)."""
+    x = np.linspace(-10.0, 10.0, 60)
+    y = x ** 2
+
+    r = classify_relationship_type(list(x), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"].startswith("polinómica")
+    assert r["best_degree"] in (2, 3)
+    # Una parabola perfecta queda capturada por el grado 2 (parsimonia).
+    assert r["best_degree"] == 2
+    assert r["r2_poly2"] > 0.99
+    assert r["coeffs"] is not None and len(r["coeffs"]) == r["best_degree"] + 1
+
+
+def test_monotona_no_lineal():
+    """Golden: monotona convexa de cola pesada -> 'monótona no-lineal'.
+
+    y = 1/(N+1-i)**2 es estrictamente creciente (Spearman ~ 1) pero su cola
+    explosiva hace que ni la recta ni un polinomio de grado 2/3 la ajusten
+    (R^2 polinomico < 0.5), de modo que el Pearson lineal NO capta la relacion
+    que el rango (Spearman) si ve. Construccion deterministica (sin azar).
+    """
+    n = 200
+    i = np.arange(n, dtype=float)
+    y = 1.0 / (n + 1 - i) ** 2
+
+    r = classify_relationship_type(list(i), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"] == "monótona no-lineal"
+    assert r["best_degree"] is None
+    assert r["coeffs"] is None
+    # Spearman fuerte y claramente por encima del Pearson.
+    assert abs(r["spearman"]) >= 0.5
+    assert abs(r["spearman"]) - abs(r["pearson"]) >= 0.15
+
+
+def test_monotona_exponencial():
+    """DoD literal: y = exp(x) (monotona no-lineal) -> 'monótona no-lineal'.
+
+    exp es estrictamente creciente (Spearman = 1) pero el Pearson lineal queda
+    claramente por debajo (~0.86), así que la dominancia del rango la marca como
+    monótona no-lineal en vez de lineal o polinómica.
+    """
+    x = np.linspace(0.0, 5.0, 80)
+    y = np.exp(x)
+
+    r = classify_relationship_type(list(x), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"] == "monótona no-lineal"
+    assert r["best_degree"] is None and r["coeffs"] is None
+    assert abs(r["spearman"]) >= 0.9
+    assert abs(r["spearman"]) - abs(r["pearson"]) >= 0.1
+
+
+def test_debil_sin_forma():
+    """Golden: x e y independientes (semilla fija) -> 'débil/sin forma'."""
+    rng = np.random.default_rng(0)
+    x = rng.normal(0.0, 1.0, 200)
+    y = rng.normal(0.0, 1.0, 200)
+
+    r = classify_relationship_type(list(x), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"] == "débil/sin forma"
+    assert r["best_degree"] is None
+    assert r["coeffs"] is None
+    # Todas las senales son bajas.
+    assert abs(r["pearson"]) < 0.3
+    assert r["r2_linear"] < 0.1
+
+
+def test_lista_vacia_no_lanza():
+    """Edge: listas vacias -> dict debil canonico, sin lanzar."""
+    r = classify_relationship_type([], [])
+    _assert_shape(r)
+    assert r["tipo"] == "débil/sin forma"
+    assert r["pearson"] is None
+    assert r["r2_linear"] is None
+    assert r["spearman"] is None
+    assert r["r2_poly2"] is None
+    assert r["r2_poly3"] is None
+    assert r["best_degree"] is None
+    assert r["coeffs"] is None
+
+
+def test_longitudes_distintas_no_lanza():
+    """Edge: listas de distinta longitud -> empareja por indice, no lanza."""
+    # zip trunca a la longitud minima: solo 3 pares (< 5) -> debil.
+    r = classify_relationship_type([1, 2, 3, 4, 5, 6, 7, 8], [1.0, 2.0, 3.0])
+    _assert_shape(r)
+    assert r["tipo"] == "débil/sin forma"
+    assert r["best_degree"] is None
+
+
+def test_todos_none_no_lanza():
+    """Edge: todos los valores None -> ningun par valido -> debil, no lanza."""
+    r = classify_relationship_type([None, None, None, None, None, None],
+                                   [None, None, None, None, None, None])
+    _assert_shape(r)
+    assert r["tipo"] == "débil/sin forma"
+    assert r["coeffs"] is None
+
+
+def test_entradas_none_no_lanza():
+    """Edge: xs/ys None directamente -> debil, no lanza."""
+    assert classify_relationship_type(None, None)["tipo"] == "débil/sin forma"
+    assert classify_relationship_type([1.0, 2.0], None)["tipo"] == "débil/sin forma"
+
+
+def test_constante_no_lanza():
+    """Edge: ys constante (varianza ~0) -> debil, no lanza."""
+    r = classify_relationship_type([1, 2, 3, 4, 5, 6, 7], [5, 5, 5, 5, 5, 5, 5])
+    _assert_shape(r)
+    assert r["tipo"] == "débil/sin forma"
+
+
+def test_filtra_nan_inf_bool():
+    """Edge: pares con NaN/inf/bool/None se descartan por indice."""
+    nan = float("nan")
+    inf = float("inf")
+    # Solo i=0,1,2,3,4 quedan validos (5 pares) y forman una recta perfecta.
+    xs = [0.0, 1.0, 2.0, 3.0, 4.0, nan, inf, True, None]
+    ys = [1.0, 3.0, 5.0, 7.0, 9.0, 1.0, 2.0, 3.0, 4.0]
+    r = classify_relationship_type(xs, ys)
+    _assert_shape(r)
+    # Los 5 pares validos son y = 2x + 1 exacto -> lineal.
+    assert r["tipo"] == "lineal"
+    assert r["best_degree"] == 1
@@ -1,97 +0,0 @@
---
-name: extract_null_mask
-kind: function
-lang: py
-domain: datascience
-version: "1.0.0"
-purity: impure
-signature: "def extract_null_mask(query_fn, table: str, columns: list, max_rows: int = 5000) -> dict"
-description: "Extrae la mascara de nulos (1=falta / 0=presente) de una muestra de filas de una tabla, una lista 0/1 por columna alineada por fila, para alimentar el capitulo de calidad / patron de nulos de AutomaticEDA sin que el capitulo toque la base de datos. Recibe un lector read-only inyectado `query_fn(sql) -> dict` (mismo contrato que duckdb_query_readonly / pg_query / el `_q` de profile_table) y NO abre ninguna conexion por su cuenta. Construye UNA sola query que proyecta por cada columna `CASE WHEN \"col\" IS NULL THEN 1 ELSE 0 END` con identificadores escapados y LIMIT. Devuelve dict dict-no-throw: columns (efectivamente leidas, en orden), mask (lista int 0/1 por columna, misma longitud todas) y n. Una celda None se cuenta defensivamente como 1 (falta)."
-tags: [eda, nulls, missing, datascience, automatic-eda, extraction, read-only, duckdb, postgres, python]
-uses_functions: []
-uses_types: []
-returns: []
-returns_optional: false
-error_type: "error_go_core"
-imports: []
-params:
-  - name: query_fn
-    desc: "callable lector read-only del backend activo. Recibe un string SQL y devuelve un dict {'status':'ok','rows':[{col:val,...},...]} (mismo contrato que duckdb_query_readonly o el `_q` de profile_table). NO se abre ninguna conexion dentro de la funcion: toda la lectura pasa por query_fn. Si es None -> error."
-  - name: table
-    desc: "nombre de la tabla de la que muestrear la mascara de nulos. Se escapa con comillas dobles en la query. Vacio o None -> status error."
-  - name: columns
-    desc: "lista de nombres de columna a evaluar. Cada una produce una entrada en `mask` con una lista 0/1 paralela por fila (1=IS NULL, 0=presente). Cada nombre se escapa con comillas dobles. Vacia o None -> status error."
-  - name: max_rows
-    desc: "limite de filas a muestrear (clausula LIMIT). Default 5000. Protege frente a tablas enormes; con LIMIT obtienes el primer tramo, no un muestreo uniforme."
-output: "dict (nunca lanza). En exito: {'status':'ok','table':str,'columns':[str,...] (en orden),'mask':{col:[int 0/1,...],...} (1=falta/IS NULL, 0=presente; todas las listas con misma longitud = n),'n':int}. En error (sin lanzar): {'status':'error','error':str,'table':str,'columns':[],'mask':{},'n':0}. Errores: query_fn None, table vacia, columns vacia, o query_fn devuelve status!='ok' (se propaga su error)."
-tested: true
-tests: ["test_golden_mask_alineada", "test_celda_none_cuenta_como_falta", "test_columns_vacia_status_error", "test_query_fn_status_error_propaga", "test_query_fn_none_da_error_sin_reventar", "test_sql_contiene_case_y_limit"]
-test_file_path: "python/functions/datascience/extract_null_mask_test.py"
-file_path: "python/functions/datascience/extract_null_mask.py"
---
-
-## Ejemplo
-
-```python
-import sys, os
-sys.path.insert(0, os.path.join("python", "functions"))
-from datascience.extract_null_mask import extract_null_mask
-from infra import duckdb_query_readonly
-
-# El lector read-only se inyecta como closure (igual que el `_q` de profile_table).
-db = "data/clientes.duckdb"
-def _q(sql):
-    return duckdb_query_readonly(db, sql)
-
-res = extract_null_mask(_q, "clientes", ["email", "telefono", "edad"])
-# res == {
-#   "status": "ok",
-#   "table": "clientes",
-#   "columns": ["email", "telefono", "edad"],
-#   "mask": {
-#     "email":    [0, 0, 1, 0, ...],   # fila 2 sin email
-#     "telefono": [1, 0, 1, 0, ...],
-#     "edad":     [0, 0, 0, 1, ...],
-#   },
-#   "n": 5000,
-# }
-
-# % de nulos por columna a partir de la muestra:
-pct = {c: 100 * sum(bits) / max(res["n"], 1) for c, bits in res["mask"].items()}
-
-# Se entrega al capitulo de calidad sin que este toque la BD:
-ctx = {"null_mask": res}
-```
-
-## Cuando usarla
-
-Cuando el capitulo de calidad / patron de nulos de AutomaticEDA necesita saber
-DONDE faltan los valores (no solo cuantos) y NO debe abrir la base de datos por
-su cuenta: extraes aqui la mascara 0/1 por columna alineada por fila y se la pasas
-en `ctx['null_mask']`. Usala siempre que quieras detectar co-ocurrencia de nulos
-(filas que fallan en varias columnas a la vez), calcular el % de nulos sobre una
-muestra, o pintar un heatmap de missingness reutilizando un unico lector read-only
-inyectado, en vez de hacer N `COUNT(*) WHERE col IS NULL` por separado.
-
-## Gotchas
-
- **Impura**: lee de la base de datos a traves de `query_fn`. No abre conexiones
-  por su cuenta — depende por completo del lector inyectado. Sigue el estilo
-  dict-no-throw del grupo `eda`: nunca lanza; ante cualquier fallo devuelve
-  `{"status":"error","error":...}` con `columns=[]`, `mask={}`, `n=0`.
- **`error_type` en el frontmatter es `error_go_core` por convencion del registry**
-  (toda funcion impura debe declararlo y el indexer lo exige), pero el codigo
-  NO lanza esa excepcion: degrada al dict de error. Es metadata, no comportamiento.
- **Muestra, no censo**: con `LIMIT max_rows` obtienes el primer tramo de filas que
-  devuelva el backend, no un muestreo uniforme ni la tabla entera. El % de nulos
-  derivado es una estimacion sobre esa muestra; para el conteo exacto usa un
-  agregado `COUNT(*)`/`COUNT(col)` aparte.
- **Alineacion por fila**: `mask[col][i]` corresponde a la misma fila `i` que
-  `mask[otra_col][i]`. Todas las listas tienen longitud `n`, asi que puedes cruzar
-  columnas por indice (co-ocurrencia de nulos) sin re-alinear.
- **Defensa None -> 1**: el SQL ya devuelve 0/1, pero si una celda llega como `None`
-  (CASE no aplicado, columna ausente en la fila, backend que nulifica) se cuenta
-  como 1 (falta). Un valor inesperado no convertible a int se trata como presente (0).
- **No loguear los datos crudos**: aunque `mask` es solo 0/1, los nombres de columna
-  pueden revelar el esquema. En trazas usa `n` y el numero de columnas, no el dict
-  completo.
@@ -1,101 +0,0 @@
-"""extract_null_mask — extrae la mascara de nulos (1=falta / 0=presente) de una tabla.
-
-Lector read-only inyectado: recibe `query_fn(sql) -> dict` con el mismo contrato
-que duckdb_query_readonly / pg_query (y que el `_q` de profile_table):
-`{"status": "ok", "rows": [{col: val, ...}, ...]}`. Esta funcion NO abre ninguna
-conexion por su cuenta — solo usa `query_fn`. Construye UNA sola query que, por
-cada columna pedida, evalua `CASE WHEN "col" IS NULL THEN 1 ELSE 0 END` y devuelve
-una muestra de filas con esos bits. El resultado es un dict `mask` con una lista
-0/1 por columna, alineada por fila (1 = el valor falta / IS NULL, 0 = presente),
-listo para alimentar el capitulo de calidad / patron de nulos de AutomaticEDA sin
-que el capitulo toque la base de datos.
-
-Estilo dict-no-throw del grupo `eda`: nunca lanza; captura cualquier excepcion y
-degrada a `{"status": "error", "error": str, ...}`.
-"""
-
-
-def _to_bit(value):
-    """Coacciona el valor 0/1 del CASE a int de forma defensiva.
-
-    El SQL ya devuelve 0 (presente) o 1 (falta). Por si una celda llega como None
-    (el CASE no se aplico o el backend la nulifico), se cuenta como 1 (falta). El
-    resto se reduce a int: un entero distinto de 0 cuenta como 1 (falta), 0 como
-    presente. Un valor no convertible se trata como presente (0) — nunca lanza.
-    """
-    if value is None:
-        return 1
-    try:
-        return 1 if int(value) != 0 else 0
-    except (TypeError, ValueError):
-        return 0
-
-
-def extract_null_mask(query_fn, table, columns, max_rows=5000):
-    """Extrae la mascara de nulos (1=falta / 0=presente) de una muestra de la tabla.
-
-    Args:
-        query_fn: callable lector read-only del backend activo. Recibe un string
-            SQL y devuelve un dict {"status": "ok", "rows": [{col: val, ...}]}
-            (mismo contrato que duckdb_query_readonly / el `_q` de profile_table).
-            No se abre ninguna conexion aqui: toda la lectura pasa por query_fn.
-        table: nombre de la tabla. Se escapa con comillas dobles en la query.
-        columns: lista de nombres de columna a evaluar. Cada una produce una
-            entrada en `mask` con una lista 0/1 paralela por fila. Vacia o None ->
-            status error.
-        max_rows: limite de filas a muestrear (clausula LIMIT). Default 5000.
-
-    Returns:
-        dict (nunca lanza):
-            {
-              "status": "ok" | "error",
-              "error": str,                 # solo si status == "error"
-              "table": str,
-              "columns": [str, ...],        # columnas efectivamente leidas, en orden
-              "mask": {col: [int 0/1, ...], ...},  # alineada por fila, 1=falta, 0=presente
-              "n": int                      # nº de filas muestreadas
-            }
-        Todas las listas de `mask` tienen la misma longitud (= n).
-    """
-    base = {"status": "ok", "table": table, "columns": [], "mask": {}, "n": 0}
-    try:
-        if query_fn is None:
-            return {**base, "status": "error", "error": "query_fn es None"}
-        if not table:
-            return {**base, "status": "error", "error": "table es obligatorio"}
-        if not columns:
-            return {**base, "status": "error", "error": "columns vacío"}
-
-        # Identificadores escapados con comillas dobles (como hace profile_table)
-        # para tolerar nombres con mayusculas/espacios/palabras reservadas. Cada
-        # columna se proyecta como su propio bit IS NULL conservando el alias.
-        select_sql = ", ".join(
-            f'(CASE WHEN "{c}" IS NULL THEN 1 ELSE 0 END) AS "{c}"' for c in columns
-        )
-        sql = f'SELECT {select_sql} FROM "{table}" LIMIT {int(max_rows)}'
-
-        q = query_fn(sql)
-        if not isinstance(q, dict) or q.get("status") != "ok":
-            err = (
-                q.get("error", "query_fn fallo")
-                if isinstance(q, dict)
-                else "query_fn no devolvio un dict"
-            )
-            return {**base, "status": "error", "error": err}
-
-        rows = q.get("rows", []) or []
-        mask = {c: [] for c in columns}
-        for row in rows:
-            for c in columns:
-                # row.get tolera filas que no traigan la columna (None -> falta).
-                mask[c].append(_to_bit(row.get(c) if isinstance(row, dict) else None))
-
-        return {
-            "status": "ok",
-            "table": table,
-            "columns": list(columns),
-            "mask": mask,
-            "n": len(rows),
-        }
-    except Exception as e:  # noqa: BLE001 - dict-no-throw: degradar, nunca lanzar
-        return {**base, "status": "error", "error": str(e)}
@@ -1,116 +0,0 @@
-"""Tests para extract_null_mask.
-
-No usa DuckDB real: inyecta un query_fn FAKE (closure) que devuelve filas
-predefinidas (simulando el SELECT de bits 0/1) y, opcionalmente, captura el SQL
-recibido para verificar la query generada (CASE WHEN ... IS NULL + LIMIT). Asi el
-test es autocontenido y no depende de ningun backend.
-"""
-
-import os
-import sys
-
-sys.path.insert(0, os.path.dirname(__file__))
-
-from extract_null_mask import extract_null_mask
-
-
-def _fake_query(rows, captured=None, status="ok", error=None):
-    """Crea un query_fn FAKE.
-
-    `captured` (lista opcional) recibe el SQL ejecutado para poder inspeccionarlo.
-    `status`/`error` permiten simular un fallo del backend.
-    """
-
-    def _q(sql):
-        if captured is not None:
-            captured.append(sql)
-        if status != "ok":
-            return {"status": "error", "error": error or "boom"}
-        return {"status": "ok", "rows": rows}
-
-    return _q
-
-
-def test_golden_mask_alineada():
-    """Golden: mask 0/1 por columna alineada por fila, n correcto, status ok."""
-    # Cada fila simula el SELECT (CASE WHEN col IS NULL THEN 1 ELSE 0 END) AS col.
-    rows = [
-        {"email": 0, "telefono": 1, "edad": 0},
-        {"email": 0, "telefono": 0, "edad": 1},
-        {"email": 1, "telefono": 1, "edad": 0},
-    ]
-    res = extract_null_mask(_fake_query(rows), "clientes", ["email", "telefono", "edad"])
-    assert res["status"] == "ok"
-    assert res["table"] == "clientes"
-    assert res["columns"] == ["email", "telefono", "edad"]
-    assert res["n"] == 3
-    assert res["mask"]["email"] == [0, 0, 1]
-    assert res["mask"]["telefono"] == [1, 0, 1]
-    assert res["mask"]["edad"] == [0, 1, 0]
-    # Todas las listas con la misma longitud.
-    assert all(len(v) == res["n"] for v in res["mask"].values())
-
-
-def test_celda_none_cuenta_como_falta():
-    """Una celda None se cuenta defensivamente como 1 (falta)."""
-    rows = [
-        {"email": 0, "telefono": None},
-        {"email": None, "telefono": 1},
-        {"email": 1, "telefono": 0},
-    ]
-    res = extract_null_mask(_fake_query(rows), "clientes", ["email", "telefono"])
-    assert res["status"] == "ok"
-    assert res["mask"]["email"] == [0, 1, 1]
-    assert res["mask"]["telefono"] == [1, 1, 0]
-    assert res["n"] == 3
-
-
-def test_columns_vacia_status_error():
-    """columns vacia -> status error con columns/mask/n vacios."""
-    res = extract_null_mask(_fake_query([]), "clientes", [])
-    assert res["status"] == "error"
-    assert "columns" in res["error"]
-    assert res["table"] == "clientes"
-    assert res["columns"] == []
-    assert res["mask"] == {}
-    assert res["n"] == 0
-
-
-def test_query_fn_status_error_propaga():
-    """query_fn que devuelve status != ok -> se propaga como error, mask {}."""
-    res = extract_null_mask(
-        _fake_query([], status="error", error="db locked"),
-        "clientes",
-        ["email"],
-    )
-    assert res["status"] == "error"
-    assert "db locked" in res["error"]
-    assert res["mask"] == {}
-    assert res["n"] == 0
-
-
-def test_query_fn_none_da_error_sin_reventar():
-    """query_fn None -> error degradado, sin excepcion."""
-    res = extract_null_mask(None, "clientes", ["email"])
-    assert res["status"] == "error"
-    assert res["columns"] == []
-    assert res["mask"] == {}
-    assert res["n"] == 0
-
-
-def test_sql_contiene_case_y_limit():
-    """La query genera un CASE WHEN IS NULL por columna escapada + LIMIT sobre la tabla."""
-    captured = []
-    rows = [{"email": 0}]
-    extract_null_mask(
-        _fake_query(rows, captured),
-        "clientes_tbl",
-        ["email"],
-        max_rows=123,
-    )
-    assert len(captured) == 1
-    sql = captured[0]
-    assert 'CASE WHEN "email" IS NULL THEN 1 ELSE 0 END' in sql
-    assert 'AS "email"' in sql
-    assert 'FROM "clientes_tbl"' in sql
-    assert "LIMIT 123" in sql
@@ -1,103 +0,0 @@
---
-id: missingness_corr_heatmap_figure_py_datascience
-name: missingness_corr_heatmap_figure
-kind: function
-lang: py
-domain: datascience
-version: "1.0.0"
-purity: impure
-signature: "def missingness_corr_heatmap_figure(matrix, labels, title=\"Co-ocurrencia de ausencias\") -> \"matplotlib.figure.Figure\""
-description: "Construye una figura matplotlib (heatmap) de la matriz NxN de correlación de ausencias entre columnas: +1 = dos columnas suelen ser nulas a la vez, -1 = cuando una falta la otra está presente, 0 = ausencias independientes. Usa ax.imshow con coolwarm fijado a [-1,1], ticks con los labels truncados (X rotados 45º), colorbar y anota el valor de cada celda si N<=12. Devuelve un matplotlib.figure.Figure listo para rasterizar por el renderer del informe EDA (capítulo de datos faltantes). Backend Agg sin pyplot global; defensivo ante matrix/labels vacíos o celdas no numéricas (nunca lanza)."
-tags: [eda, missing, missingness, correlation, heatmap, matplotlib, figure, visualization, datascience, impure]
-uses_functions: []
-uses_types: []
-returns: []
-returns_optional: false
-error_type: "error_go_core"
-imports: [matplotlib]
-example: |
-  from datascience.missingness_corr_heatmap_figure import missingness_corr_heatmap_figure
-  matrix = [
-      [1.0, 0.82, -0.10],
-      [0.82, 1.0, 0.05],
-      [-0.10, 0.05, 1.0],
-  ]
-  labels = ["telefono", "movil", "email"]
-  fig = missingness_corr_heatmap_figure(matrix, labels, title="Co-ocurrencia de ausencias")
-tested: true
-tests:
-  - "test_returns_figure_with_axes"
-  - "test_empty_matrix_does_not_raise_and_returns_figure"
-  - "test_empty_labels_returns_message_figure"
-  - "test_large_matrix_omits_annotations"
-  - "test_ragged_and_non_numeric_cells_are_handled"
-test_file_path: "python/functions/datascience/missingness_corr_heatmap_figure_test.py"
-file_path: "python/functions/datascience/missingness_corr_heatmap_figure.py"
-params:
-  - name: matrix
-    desc: "Lista de listas (NxN) de floats en [-1,1]: la correlación de ausencias por pares de columnas. Puede venir vacía. Filas de longitud desigual se toleran (se rellenan/recortan a N); celdas None, NaN o no numéricas se coercen a 0.0. No se muta el original."
-  - name: labels
-    desc: "Lista de N nombres de columna, paralela a matrix. Puede venir vacía (devuelve figura \"sin columnas con ausencia variable\"). Se truncan a ~14 chars con elipsis para los ticks; los originales no se mutan."
-  - name: title
-    desc: "Título de la figura. Se trunca a ~60 chars con elipsis si es muy largo. Default \"Co-ocurrencia de ausencias\"."
-output: "Un matplotlib.figure.Figure (figsize 6.4x5.2, dpi 150) con un Axes heatmap (imshow vmin=-1, vmax=1, cmap coolwarm) más una colorbar etiquetada \"correlación de ausencias\". Ticks en ambos ejes con los labels truncados (X rotados 45º). Si N<=12 cada celda lleva su valor numérico anotado (texto blanco sobre celdas saturadas, oscuro sobre pálidas); con N grande se omiten las anotaciones para no saturar. Si matrix o labels vienen vacíos devuelve una Figure con texto centrado \"sin columnas con ausencia variable\"; cualquier error inesperado se captura y devuelve una Figure con el mensaje de error (nunca lanza). El caller rasteriza/cierra la figura; la función no la muestra ni la guarda."
---
-
-## Ejemplo
-
-```python
-from datascience.missingness_corr_heatmap_figure import missingness_corr_heatmap_figure
-
-# Correlación de ausencias entre 3 columnas de contacto:
-# telefono y movil tienden a faltar juntos (0.82); email es casi independiente.
-matrix = [
-    [1.00, 0.82, -0.10],
-    [0.82, 1.00,  0.05],
-    [-0.10, 0.05, 1.00],
-]
-labels = ["telefono", "movil", "email"]
-
-fig = missingness_corr_heatmap_figure(
-    matrix,
-    labels,
-    title="Co-ocurrencia de ausencias",
-)
-
-# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
-fig.savefig("/tmp/missingness_heatmap.png")
-```
-
-## Cuando usarla
-
-Úsala en el capítulo de datos faltantes de un informe EDA cuando quieras ver de
-un vistazo qué columnas faltan juntas (mismo formulario sin rellenar, mismo
-proceso roto) frente a columnas cuyas ausencias son independientes. Pásale la
-matriz de correlación de ausencias (calculada sobre la máscara de nulos, p. ej.
-`df.isnull().corr()`) restringida a las columnas que de verdad tienen ausencia
-variable, junto con sus nombres. Es la pareja "estructura" del ranking de % de
-nulos: las barras dicen *cuánto* falta cada columna, este heatmap dice *si las
-ausencias están relacionadas* entre columnas.
-
-## Gotchas
-
- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
-  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
-  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
-  es thread-safe; esta función evita ese riesgo construyendo el `Figure`
-  directamente, así que es segura de llamar en bucle desde el renderer.
- **El caller cierra la figura.** Devuelve el `Figure` pero no lo muestra ni lo
-  guarda. Quien la consume debe rasterizarla y luego liberarla
-  (`matplotlib.pyplot.close(fig)`) para no acumular memoria en lotes grandes.
- **Escala de color fija en [-1, 1].** `vmin=-1`, `vmax=1` están fijados a
-  propósito para que el color sea comparable entre informes y entre columnas. No
-  se autoescala al rango real de la matriz; valores fuera de `[-1, 1]` se
-  saturan al extremo del colormap.
- **Anotaciones solo con N<=12.** Por encima de 12 columnas el grid de números
-  se vuelve ilegible y se omite; queda solo el color + la colorbar. Filtra a las
-  columnas con ausencia variable antes de llamar para no llegar a matrices
-  enormes.
- **Defensiva, nunca lanza.** `matrix=[]`, `labels=[]`, filas cortas, celdas
-  `None`/`NaN`/no numéricas o cualquier error inesperado se manejan sin propagar:
-  en el peor caso devuelve una `Figure` con "sin columnas con ausencia variable"
-  o con el texto del error. No envuelvas la llamada en try/except por miedo a un
-  raise — no lo hay.
@@ -1,158 +0,0 @@
-"""Impure EDA helper: heatmap of missingness co-occurrence (`eda` group).
-
-Builds a matplotlib heatmap of the pairwise missingness correlation matrix of a
-dataset: a value near ``+1`` means two columns tend to be null together, near
-``-1`` means when one is null the other tends to be present, and ``0`` means
-their absences are independent. Returns a ready-to-rasterize
-``matplotlib.figure.Figure``; it never shows nor saves it.
-
-Impure because it touches matplotlib's rendering machinery. It uses the headless
-Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
-global state and is safe to call repeatedly from a report renderer.
-"""
-
-import matplotlib
-
-matplotlib.use("Agg")
-
-from matplotlib.figure import Figure  # noqa: E402
-
-# Muted gray for secondary text (no-data / fallback messages).
-_MUTED_TEXT = "#5f6b7a"
-# Soft red for the error fallback message (kept readable, not alarming).
-_ERROR_TEXT = "#b00020"
-
-
-def _truncate(text, width: int = 14) -> str:
-    """Truncate ``text`` to ``width`` chars, appending an ellipsis if cut."""
-    s = "" if text is None else str(text)
-    if len(s) <= width:
-        return s
-    if width <= 1:
-        return s[:width]
-    return s[: width - 1] + "…"
-
-
-def _message_figure(message: str, color: str = _MUTED_TEXT) -> "Figure":
-    """Return a fallback ``Figure`` carrying a single centered message."""
-    fig = Figure(figsize=(6.4, 4.0), dpi=150)
-    ax = fig.add_subplot(111)
-    ax.axis("off")
-    ax.text(
-        0.5,
-        0.5,
-        message,
-        ha="center",
-        va="center",
-        fontsize=12,
-        color=color,
-        wrap=True,
-        transform=ax.transAxes,
-    )
-    fig.tight_layout()
-    return fig
-
-
-def missingness_corr_heatmap_figure(
-    matrix,
-    labels,
-    title: str = "Co-ocurrencia de ausencias",
-) -> "matplotlib.figure.Figure":
-    """Build a heatmap figure of a missingness correlation matrix.
-
-    Renders an ``NxN`` matrix of missingness correlations in ``[-1, 1]`` with a
-    diverging ``coolwarm`` colormap (fixed ``vmin=-1``, ``vmax=1`` so the color
-    scale is comparable across reports). Both axes are tick-labelled with the
-    column names (truncated to ~14 chars; the X labels rotated 45°). A colorbar
-    is attached. When the matrix is small (``N <= 12``) each cell is annotated
-    with its numeric value; for larger matrices the annotations are omitted to
-    avoid an unreadable grid.
-
-    The function is fully defensive: empty/ragged/non-numeric input never raises.
-    When there is nothing valid to draw it returns a ``Figure`` carrying a
-    centered "sin columnas con ausencia variable" message, and any unexpected
-    error is caught and turned into a fallback ``Figure`` carrying the error text.
-
-    Args:
-        matrix: List of lists (``NxN``) of floats in ``[-1, 1]`` — the pairwise
-            missingness correlation. May be empty; rows of unequal length are
-            tolerated by treating the matrix as invalid only when it is empty or
-            its label count does not match. Non-numeric/``None`` cells are
-            coerced to ``0.0``.
-        labels: List of ``N`` column names, parallel to ``matrix``. May be empty.
-            Truncated for display; the originals are not mutated.
-        title: Figure title. Default "Co-ocurrencia de ausencias".
-
-    Returns:
-        A ``matplotlib.figure.Figure`` with a single heatmap Axes plus a
-        colorbar. The caller is responsible for rasterizing/closing it.
-    """
-    try:
-        # --- Validate shape: need a non-empty square-ish matrix with labels.
-        if (
-            not isinstance(matrix, (list, tuple))
-            or not isinstance(labels, (list, tuple))
-            or len(matrix) == 0
-            or len(labels) == 0
-        ):
-            return _message_figure("sin columnas con ausencia variable")
-
-        n = len(labels)
-        # Build a clean NxN grid: coerce each cell to float, default 0.0, pad/clip
-        # rows so a ragged input never crashes imshow.
-        grid = []
-        for i in range(n):
-            row_src = matrix[i] if i < len(matrix) else []
-            if not isinstance(row_src, (list, tuple)):
-                row_src = []
-            row = []
-            for j in range(n):
-                cell = row_src[j] if j < len(row_src) else 0.0
-                try:
-                    val = float(cell)
-                except (TypeError, ValueError):
-                    val = 0.0
-                if val != val:  # NaN guard.
-                    val = 0.0
-                row.append(val)
-            grid.append(row)
-
-        fig = Figure(figsize=(6.4, 5.2), dpi=150)
-        ax = fig.add_subplot(111)
-
-        im = ax.imshow(grid, vmin=-1, vmax=1, cmap="coolwarm", aspect="equal")
-
-        short = [_truncate(lab, 14) for lab in labels]
-        ax.set_xticks(range(n))
-        ax.set_yticks(range(n))
-        ax.set_xticklabels(short, rotation=45, ha="right", fontsize=8)
-        ax.set_yticklabels(short, fontsize=8)
-
-        # Annotate each cell only when the grid is small enough to stay legible.
-        if n <= 12:
-            for i in range(n):
-                for j in range(n):
-                    val = grid[i][j]
-                    # White text over saturated (dark) cells, dark over pale.
-                    txt_color = "white" if abs(val) >= 0.55 else "#202020"
-                    ax.text(
-                        j,
-                        i,
-                        f"{val:.2f}",
-                        ha="center",
-                        va="center",
-                        fontsize=7,
-                        color=txt_color,
-                    )
-
-        cbar = fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
-        cbar.ax.tick_params(labelsize=8)
-        cbar.set_label("correlación de ausencias", fontsize=8)
-
-        if title:
-            ax.set_title(_truncate(title, 60), fontsize=12, loc="center", pad=10)
-
-        fig.tight_layout()
-        return fig
-    except Exception as exc:  # noqa: BLE001 — never raise from a figure builder.
-        return _message_figure(f"error al dibujar heatmap: {exc}", color=_ERROR_TEXT)
@@ -1,62 +0,0 @@
-"""Tests para missingness_corr_heatmap_figure (heatmap de ausencias, grupo eda).
-
-Usa el backend Agg sin pyplot; no muestra ni guarda figuras. Cada test cierra
-explícitamente la Figure construida (matplotlib.pyplot.close) para no acumular
-estado entre tests.
-"""
-
-import matplotlib
-
-matplotlib.use("Agg")
-
-import matplotlib.pyplot as plt  # noqa: E402
-from matplotlib.figure import Figure  # noqa: E402
-
-from missingness_corr_heatmap_figure import missingness_corr_heatmap_figure
-
-
-def _identity_matrix(n):
-    """Matriz NxN con diagonal 1.0 y resto 0.0 (correlación de ausencias)."""
-    return [[1.0 if i == j else 0.0 for j in range(n)] for i in range(n)]
-
-
-def test_returns_figure_with_axes():
-    matrix = [[1.0, 0.3, -0.2], [0.3, 1.0, 0.5], [-0.2, 0.5, 1.0]]
-    labels = ["edad", "ingresos", "ciudad"]
-    fig = missingness_corr_heatmap_figure(matrix, labels, title="ausencias")
-    assert isinstance(fig, Figure)
-    # Heatmap (>=1 axes) + colorbar añade su propio Axes -> al menos 1.
-    assert len(fig.axes) >= 1
-    plt.close(fig)
-
-
-def test_empty_matrix_does_not_raise_and_returns_figure():
-    fig = missingness_corr_heatmap_figure([], [], title="vacía")
-    assert isinstance(fig, Figure)
-    assert len(fig.axes) >= 1
-    plt.close(fig)
-
-
-def test_empty_labels_returns_message_figure():
-    fig = missingness_corr_heatmap_figure([[1.0]], [], title="sin labels")
-    assert isinstance(fig, Figure)
-    plt.close(fig)
-
-
-def test_large_matrix_omits_annotations():
-    n = 16
-    fig = missingness_corr_heatmap_figure(
-        _identity_matrix(n), [f"col_{i}" for i in range(n)]
-    )
-    assert isinstance(fig, Figure)
-    assert len(fig.axes) >= 1
-    plt.close(fig)
-
-
-def test_ragged_and_non_numeric_cells_are_handled():
-    # Fila corta + celda None + celda string -> se rellenan/coercen sin lanzar.
-    matrix = [[1.0, None], ["x", 1.0, 0.5]]
-    labels = ["a", "b"]
-    fig = missingness_corr_heatmap_figure(matrix, labels)
-    assert isinstance(fig, Figure)
-    plt.close(fig)
@@ -1,68 +0,0 @@
---
-name: missingness_correlation
-kind: function
-lang: py
-domain: datascience
-version: "1.0.0"
-purity: pure
-signature: "def missingness_correlation(null_mask: dict, top_k: int = 20) -> dict"
-description: "Co-ocurrencia de ausencias: nucleo del capitulo de missingness del grupo eda. Recibe la mascara binaria de nulos de una tabla (1 = falta, 0 = presente, alineada por fila) y mide hasta que punto las columnas faltan juntas. Calcula la matriz de correlacion de Pearson entre los vectores binarios de ausencia de las columnas con varianza (al menos un 1 y un 0), mas las cifras de solapamiento de conjuntos por par (co-missing, either-missing, Jaccard). Excluye las columnas constantes en su ausencia (correlacion indefinida) y reporta cuantas. Compone la funcion atomica pearson del registry; no la reimplementa. Lectura defensiva; NUNCA lanza."
-tags: [eda, missingness, correlation, pearson, co-occurrence, jaccard, datascience]
-params:
-  - name: null_mask
-    desc: "dict {col: [int 0/1, ...]} con la mascara de ausencias de la tabla, alineada por fila: 1 = el valor falta en esa fila, 0 = presente. Todas las listas se asumen de la misma longitud (numero de filas). Valores truthy distintos de 0 se tratan como ausencia; entradas no-lista se ignoran sin romper."
-  - name: top_k
-    desc: "Numero maximo de pares a devolver en `pairs`, ordenados por valor absoluto de correlacion descendente. Default 20. Solo limita la lista de pares; la matriz cubre siempre todas las columnas con varianza."
-output: "dict con: columns (columnas con varianza en la ausencia, en orden de entrada); matrix (len(columns) x len(columns) de correlacion de Pearson entre las mascaras binarias, diagonal 1.0); pairs (hasta top_k pares i<j ordenados por |corr| desc, cada uno {a, b, corr, co_missing, either_missing, jaccard} donde co_missing = filas en que ambas faltan, either_missing = filas en que al menos una falta, jaccard = co_missing/either_missing o 0.0 si either_missing=0); n_excluded (nº de columnas con algun nulo pero sin varianza, constantes en la ausencia); excluded_cols (esas columnas en orden de entrada). Si hay <2 columnas con varianza, columns/matrix/pairs van vacios pero n_excluded/excluded_cols se rellenan. NUNCA lanza."
-uses_functions: [pearson_py_datascience]
-uses_types: []
-returns: []
-returns_optional: false
-error_type: ""
-imports: []
-tested: true
-tests: ["test_co_ocurrencia_fuerte_corr_uno_jaccard_uno", "test_ausencias_disjuntas_corr_negativa_jaccard_cero", "test_columna_sin_varianza_se_excluye", "test_menos_de_dos_columnas_con_varianza_vacio_pero_cuenta_excluidas", "test_mask_vacio_todo_vacio", "test_top_k_limita_pares", "test_no_lanza_con_entradas_raras"]
-test_file_path: "python/functions/datascience/missingness_correlation_test.py"
-file_path: "python/functions/datascience/missingness_correlation.py"
---
-
-## Ejemplo
-
-```python
-import sys, os
-sys.path.insert(0, os.path.join("python", "functions"))
-from datascience.missingness_correlation import missingness_correlation
-
-# Mascara de ausencias de 6 filas. 1 = falta, 0 = presente.
-mask = {
-    "ingresos":  [1, 0, 1, 0, 1, 0],   # falta junto a "deducciones"
-    "deducciones": [1, 0, 1, 0, 1, 0], # mismas filas que "ingresos"
-    "telefono":  [0, 0, 0, 1, 0, 0],   # casi siempre presente
-    "verificado": [1, 1, 1, 1, 1, 1],  # siempre ausente -> constante, excluida
-}
-out = missingness_correlation(mask, top_k=10)
-
-print(out["columns"])        # ['ingresos', 'deducciones', 'telefono']
-print(out["n_excluded"])     # 1
-print(out["excluded_cols"])  # ['verificado']
-
-# El par mas fuerte: ingresos y deducciones faltan siempre juntas.
-top = out["pairs"][0]
-print(top["a"], top["b"], round(top["corr"], 3))  # ingresos deducciones 1.0
-print(top["co_missing"], top["either_missing"], top["jaccard"])  # 3 3 1.0
-```
-
-## Cuando usarla
-
- Usala en el capitulo de **missingness** de `AutomaticEDA` cuando ya tengas la mascara binaria de nulos por columna y quieras detectar **patrones de ausencia conjunta**: que columnas faltan siempre juntas (posible misma fuente/proceso roto) y cuales faltan de forma independiente.
- Cuando necesites ordenar los pares de columnas por fuerza de co-ocurrencia (|corr|) para priorizar que bloques de ausencia investigar o imputar juntos.
- Cuando quieras la cifra de solapamiento de conjuntos (Jaccard, co-missing) ademas de la correlacion lineal, para distinguir "faltan juntas" de "estan presentes juntas".
- Antes de elegir una estrategia de imputacion: dos columnas con corr de ausencia ~1.0 no aportan informacion independiente sobre por que falta la otra.
-
-## Gotchas
-
- Funcion pura, sin I/O y determinista. Lectura defensiva: entradas no-dict, columnas no-lista o vacias se ignoran sin lanzar.
- Solo entran al calculo las columnas con **varianza en la ausencia** (al menos un 1 y al menos un 0). Una columna siempre-presente (todo 0) no aporta ausencia y **no** se cuenta como excluida; una columna siempre-ausente o constante con nulos (todo 1) tiene correlacion indefinida y se excluye, sumando a `n_excluded` / `excluded_cols`.
- Con menos de 2 columnas con varianza, `columns`/`matrix`/`pairs` quedan vacios pero `n_excluded`/`excluded_cols` se rellenan igual — el caller debe contemplar el caso "sin pares".
- La correlacion es la de Pearson sobre vectores binarios (equivale al coeficiente phi). El signo importa: corr negativa = las ausencias tienden a ser **complementarias** (cuando una falta, la otra suele estar presente).
- Asume todas las listas alineadas por fila y de la misma longitud. Si vienen de longitudes distintas, `pearson` opera sobre el solapamiento que permita `zip` y degrada a 0.0 cuando no hay varianza efectiva; alinea la mascara antes de llamar.
@@ -1,120 +0,0 @@
-"""Co-ocurrencia de ausencias: matriz de correlacion de Pearson entre mascaras de nulos.
-
-Funcion pura del grupo eda, nucleo del capitulo de missingness. Recibe la mascara
-binaria de ausencias de una tabla (1 = falta, 0 = presente, alineada por fila) y
-mide hasta que punto las columnas faltan juntas. Para cada par de columnas con
-varianza en su ausencia calcula la correlacion de Pearson entre los vectores
-binarios, mas las cifras de solapamiento de conjuntos (co-missing, either-missing,
-Jaccard). Compone la funcion atomica `pearson` del registry; no reimplementa la
-correlacion. Lectura defensiva; NUNCA lanza.
-"""
-
-from datascience import pearson
-
-
-def missingness_correlation(null_mask, top_k=20) -> dict:
-    """Correlacion de co-ocurrencia de ausencias entre columnas.
-
-    Args:
-        null_mask: dict {col: [int 0/1, ...]} alineado por fila (1 = el valor
-            falta en esa fila). Todas las listas se asumen de la misma longitud.
-        top_k: numero maximo de pares a devolver, ordenados por |corr| desc.
-
-    Returns:
-        dict con:
-          - columns: columnas con varianza en la ausencia (al menos un 1 y al
-            menos un 0), en orden de entrada.
-          - matrix: matriz len(columns) x len(columns) de correlacion de Pearson
-            entre las mascaras binarias, diagonal 1.0.
-          - pairs: lista de hasta top_k pares (i<j) ordenados por |corr| desc.
-            Cada par: {a, b, corr, co_missing, either_missing, jaccard}.
-          - n_excluded: numero de columnas con algun nulo pero sin varianza
-            (constantes en la ausencia: siempre presentes o siempre ausentes).
-          - excluded_cols: lista de esas columnas (en orden de entrada).
-
-        Si hay menos de 2 columnas con varianza, columns/matrix/pairs van vacios
-        pero n_excluded/excluded_cols se rellenan igualmente. NUNCA lanza.
-    """
-    # Salida base, defensiva ante entradas no-dict.
-    result = {
-        "columns": [],
-        "matrix": [],
-        "pairs": [],
-        "n_excluded": 0,
-        "excluded_cols": [],
-    }
-
-    if not isinstance(null_mask, dict) or not null_mask:
-        return result
-
-    varying = []          # columnas con varianza en la ausencia
-    varying_vecs = []     # sus vectores binarios saneados (floats 0.0/1.0)
-    excluded_cols = []    # columnas con nulos pero sin varianza (constantes)
-
-    for col, raw in null_mask.items():
-        if not isinstance(raw, (list, tuple)):
-            continue
-        # Sanea a 0/1: cualquier valor truthy distinto de 0 cuenta como ausencia.
-        vec = [1 if bool(v) else 0 for v in raw]
-        if not vec:
-            continue
-        ones = sum(vec)
-        zeros = len(vec) - ones
-        if ones > 0 and zeros > 0:
-            varying.append(col)
-            varying_vecs.append([float(v) for v in vec])
-        elif ones > 0:
-            # Tiene nulos pero todos (constante en la ausencia): sin varianza.
-            excluded_cols.append(col)
-        # ones == 0 -> columna siempre presente, sin nulos: no se cuenta como
-        # excluida (no aporta ausencia al analisis de co-ocurrencia).
-
-    result["n_excluded"] = len(excluded_cols)
-    result["excluded_cols"] = excluded_cols
-
-    n = len(varying)
-    if n < 2:
-        return result
-
-    result["columns"] = list(varying)
-
-    # Matriz de correlacion de Pearson, diagonal 1.0.
-    matrix = [[0.0] * n for _ in range(n)]
-    for i in range(n):
-        matrix[i][i] = 1.0
-    for i in range(n):
-        for j in range(i + 1, n):
-            r = pearson(varying_vecs[i], varying_vecs[j])
-            matrix[i][j] = r
-            matrix[j][i] = r
-    result["matrix"] = matrix
-
-    # Pares con cifras de solapamiento de conjuntos.
-    pairs = []
-    for i in range(n):
-        vi = varying_vecs[i]
-        for j in range(i + 1, n):
-            vj = varying_vecs[j]
-            co_missing = 0
-            either_missing = 0
-            for a, b in zip(vi, vj):
-                a_miss = a != 0.0
-                b_miss = b != 0.0
-                if a_miss and b_miss:
-                    co_missing += 1
-                if a_miss or b_miss:
-                    either_missing += 1
-            jaccard = co_missing / either_missing if either_missing > 0 else 0.0
-            pairs.append({
-                "a": varying[i],
-                "b": varying[j],
-                "corr": matrix[i][j],
-                "co_missing": co_missing,
-                "either_missing": either_missing,
-                "jaccard": jaccard,
-            })
-
-    pairs.sort(key=lambda p: abs(p["corr"]), reverse=True)
-    result["pairs"] = pairs[:top_k] if top_k is not None and top_k >= 0 else pairs
-
-    return result
@@ -1,115 +0,0 @@
-"""Tests para missingness_correlation."""
-
-from datascience.missingness_correlation import missingness_correlation
-
-
-def test_co_ocurrencia_fuerte_corr_uno_jaccard_uno():
-    # a y b faltan EXACTAMENTE en las mismas filas -> corr 1.0, jaccard 1.0.
-    mask = {
-        "a": [1, 0, 1, 0, 1, 0],
-        "b": [1, 0, 1, 0, 1, 0],
-    }
-    out = missingness_correlation(mask)
-    assert out["columns"] == ["a", "b"]
-    assert out["n_excluded"] == 0
-    # Diagonal 1.0, off-diagonal ~1.0.
-    assert out["matrix"][0][0] == 1.0
-    assert out["matrix"][1][1] == 1.0
-    assert abs(out["matrix"][0][1] - 1.0) < 1e-9
-    assert len(out["pairs"]) == 1
-    pair = out["pairs"][0]
-    assert {pair["a"], pair["b"]} == {"a", "b"}
-    assert abs(pair["corr"] - 1.0) < 1e-9
-    assert pair["co_missing"] == 3       # filas 0,2,4
-    assert pair["either_missing"] == 3   # mismas filas
-    assert abs(pair["jaccard"] - 1.0) < 1e-9
-
-
-def test_ausencias_disjuntas_corr_negativa_jaccard_cero():
-    # a y b nunca faltan en la misma fila -> co_missing 0, jaccard 0, corr <= 0.
-    mask = {
-        "a": [1, 1, 0, 0],
-        "b": [0, 0, 1, 1],
-    }
-    out = missingness_correlation(mask)
-    assert out["columns"] == ["a", "b"]
-    pair = out["pairs"][0]
-    assert pair["co_missing"] == 0
-    assert pair["either_missing"] == 4
-    assert pair["jaccard"] == 0.0
-    # Solapamiento nulo + ausencias complementarias -> correlacion negativa.
-    assert pair["corr"] < 0.0
-    assert abs(pair["corr"] - out["matrix"][0][1]) < 1e-12
-
-
-def test_columna_sin_varianza_se_excluye():
-    # c esta siempre presente (todo 0): no aporta ausencia -> no entra ni como
-    # excluida. d esta siempre ausente (todo 1): tiene nulos pero sin varianza
-    # -> excluida y n_excluded incrementa. a y b tienen varianza.
-    mask = {
-        "a": [1, 0, 1, 0],
-        "b": [1, 0, 0, 0],
-        "c": [0, 0, 0, 0],   # siempre presente
-        "d": [1, 1, 1, 1],   # siempre ausente, constante
-    }
-    out = missingness_correlation(mask)
-    assert out["columns"] == ["a", "b"]
-    assert "d" in out["excluded_cols"]
-    assert "c" not in out["excluded_cols"]
-    assert out["n_excluded"] == 1
-    # Matriz solo de las columnas con varianza.
-    assert len(out["matrix"]) == 2
-    assert len(out["matrix"][0]) == 2
-
-
-def test_menos_de_dos_columnas_con_varianza_vacio_pero_cuenta_excluidas():
-    # Solo una columna con varianza (a) + una constante-ausente (d).
-    mask = {
-        "a": [1, 0, 1, 0],
-        "d": [1, 1, 1, 1],
-    }
-    out = missingness_correlation(mask)
-    assert out["columns"] == []
-    assert out["matrix"] == []
-    assert out["pairs"] == []
-    assert out["n_excluded"] == 1
-    assert out["excluded_cols"] == ["d"]
-
-
-def test_mask_vacio_todo_vacio():
-    out = missingness_correlation({})
-    assert out == {
-        "columns": [],
-        "matrix": [],
-        "pairs": [],
-        "n_excluded": 0,
-        "excluded_cols": [],
-    }
-
-
-def test_top_k_limita_pares():
-    # 4 columnas con varianza -> 6 pares; top_k=2 deja 2.
-    mask = {
-        "a": [1, 0, 1, 0, 0],
-        "b": [1, 0, 0, 1, 0],
-        "c": [0, 1, 1, 0, 1],
-        "d": [1, 1, 0, 0, 1],
-    }
-    out = missingness_correlation(mask, top_k=2)
-    assert len(out["columns"]) == 4
-    assert len(out["pairs"]) == 2
-    # Ordenados por |corr| desc.
-    assert abs(out["pairs"][0]["corr"]) >= abs(out["pairs"][1]["corr"])
-
-
-def test_no_lanza_con_entradas_raras():
-    # Valores no-lista y no-dict no deben romper.
-    assert missingness_correlation(None)["columns"] == []
-    mask = {
-        "a": [1, 0, 1, 0],
-        "b": [1, 0, 1, 0],
-        "bad": "not a list",
-        "empty": [],
-    }
-    out = missingness_correlation(mask)
-    assert out["columns"] == ["a", "b"]
@@ -1,99 +0,0 @@
---
-id: missingness_overview_py_datascience
-name: missingness_overview
-kind: function
-lang: py
-domain: datascience
-version: "1.0.0"
-purity: pure
-signature: "def missingness_overview(null_mask) -> dict"
-description: "Resumen de ausencias a nivel de dataset a partir de una máscara de nulos 0/1 por columna ({col: [1=falta, 0=presente]} alineada por fila). Calcula celdas y porcentaje de datos faltantes, cuántas columnas tienen algún nulo y cuántas filas son completas vs. incompletas. Estilo dict-no-throw del grupo eda: nunca lanza. Lectura defensiva — no-dict o dict vacío devuelve todo a 0; columnas no-lista se tratan como vacías; listas de longitud distinta se alinean a la longitud máxima rellenando la cola corta como presente (0); valores None/no-int cuentan como presente; sin ZeroDivisionError."
-tags: [eda, missing, missingness, nulls, profiling, datascience, pure]
-uses_functions: []
-uses_types: []
-returns: []
-returns_optional: false
-error_type: ""
-imports: []
-example: |
-  from datascience.missingness_overview import missingness_overview
-  mask = {
-      "a": [1, 0, 0, 0, 1],
-      "b": [1, 0, 1, 0, 0],
-      "c": [0, 0, 0, 0, 1],
-  }
-  missingness_overview(mask)
-  # n_missing_cells=5, missing_cell_pct≈33.33, complete_rows=2, incomplete_rows=3
-tested: true
-tests:
-  - "test_cooccurrence_three_cols_exact"
-  - "test_empty_dict_all_zero"
-  - "test_output_keys_contract"
-  - "test_not_a_dict_returns_zero"
-  - "test_no_nulls_all_complete"
-  - "test_none_values_treated_as_present"
-  - "test_unequal_lengths_pad_with_max"
-  - "test_columns_present_but_no_rows"
-  - "test_never_raises_on_garbage"
-test_file_path: "python/functions/datascience/missingness_overview_test.py"
-file_path: "python/functions/datascience/missingness_overview.py"
-params:
-  - name: null_mask
-    desc: "Dict {col_name: [int 0/1, ...]} con la máscara de nulos por columna, alineada por fila (1 = el valor falta, 0 = el valor está presente). Normalmente todas las listas tienen la misma longitud = nº de filas. Lectura defensiva: si no es dict o está vacío se devuelve todo a 0; columnas cuyo valor no es lista/tupla se tratan como vacías; listas de longitud distinta se alinean a la longitud máxima (las posiciones inexistentes de las columnas más cortas cuentan como presentes, 0); valores None o no enteros cuentan como presentes."
-output: "Dict con exactamente 9 claves, todas siempre presentes (la función nunca lanza): n_rows (longitud de fila = longitud máxima entre columnas, 0 si vacío), n_cols (nº de columnas), n_cols_with_null (columnas con >=1 falta), n_missing_cells (suma total de 1s), missing_cell_pct (0-100 = n_missing_cells / (n_rows*n_cols) * 100), complete_rows (filas sin ninguna falta), incomplete_rows (filas con >=1 falta), complete_pct (0-100), incomplete_pct (0-100). Los porcentajes son 0.0 cuando el denominador es 0 (sin ZeroDivisionError)."
---
-
-## Ejemplo
-
-```python
-from datascience.missingness_overview import missingness_overview
-
-# Máscara de nulos por columna: 1 = falta, 0 = presente, alineada por fila.
-mask = {
-    "a": [1, 0, 0, 0, 1],
-    "b": [1, 0, 1, 0, 0],
-    "c": [0, 0, 0, 0, 1],
-}
-
-missingness_overview(mask)
-# {
-#   "n_rows": 5,
-#   "n_cols": 3,
-#   "n_cols_with_null": 3,      # a, b y c tienen al menos una falta
-#   "n_missing_cells": 5,       # 2 (a) + 2 (b) + 1 (c)
-#   "missing_cell_pct": 33.33,  # 5 / (5*3) * 100
-#   "complete_rows": 2,         # filas 1 y 3 sin ninguna falta
-#   "incomplete_rows": 3,       # filas 0 (a&b), 2 (b), 4 (a&c)
-#   "complete_pct": 40.0,       # 2 / 5 * 100
-#   "incomplete_pct": 60.0,     # 3 / 5 * 100
-# }
-
-missingness_overview({})
-# Todo a 0: {"n_rows": 0, "n_cols": 0, "n_cols_with_null": 0,
-#            "n_missing_cells": 0, "missing_cell_pct": 0.0,
-#            "complete_rows": 0, "incomplete_rows": 0,
-#            "complete_pct": 0.0, "incomplete_pct": 0.0}
-```
-
-## Cuando usarla
-
-Úsala al perfilar un dataset cuando ya tienes una máscara de nulos 0/1 por
-columna (p. ej. derivada del paso de carga/perfilado del EDA) y quieres la foto
-global de ausencias en una llamada: cuánta proporción de celdas falta, cuántas
-columnas están afectadas y, sobre todo, cuántas filas quedan completas vs.
-incompletas. Es el bloque resumen del capítulo de calidad/missingness de un EDA,
-y la base para decidir estrategias de imputación o de borrado de filas. Como es
-pura y dict-no-throw, puedes alimentarla con la máscara tal cual sin validarla
-antes: entradas malformadas degradan a ceros en vez de romper el pipeline.
-
-## Gotchas
-
- **`n_rows` es la longitud máxima entre columnas.** Con listas de longitud
-  desigual, las posiciones que faltan en las columnas más cortas se cuentan como
-  presentes (`0`); no se descartan filas. En el caso normal (todas las listas de
-  igual longitud) `n_rows` es simplemente esa longitud.
- **Solo el valor exacto `1` cuenta como falta.** `None`, `0`, cadenas y
-  cualquier otro valor se tratan como presentes. `True` (== 1) también cuenta
-  como falta por la igualdad.
- **Porcentajes en escala 0-100**, no fracciones. División por cero protegida:
-  con `n_rows*n_cols == 0` los porcentajes salen `0.0`.
@@ -1,116 +0,0 @@
-"""Pure EDA helper: dataset-level missingness overview from a 0/1 null mask.
-
-Part of the `eda` capability group. Consumes a per-column null mask
-(``{col_name: [int 0/1, ...]}`` aligned by row, ``1`` = value is missing,
-``0`` = value is present) and derives dataset-wide missingness metrics: cell
-count and percentage of missing data, how many columns carry any null, and how
-many rows are complete vs. incomplete.
-
-Dict-no-throw style of the `eda` group: it NEVER raises. A non-dict, an empty
-dict, malformed columns, ragged lists or non-int cell values all degrade
-gracefully to the zero/contract output. Stdlib only.
-
-Ragged-length policy: columns are allowed to have different lengths. ``n_rows``
-is the **maximum** column length; positions that don't exist in a shorter
-column are treated as present (``0``). This keeps the ``n_rows * n_cols`` cell
-grid well defined without dropping rows.
-"""
-
-
-def _is_missing(value) -> int:
-    """Return ``1`` iff ``value`` denotes a missing cell, else ``0``.
-
-    Only an exact equality to ``1`` (covers ``int`` ``1`` and ``float`` ``1.0``)
-    counts as missing. ``None``, ``0``, strings and any other value are treated
-    as present. The comparison cannot raise for standard inputs.
-    """
-    try:
-        return 1 if value == 1 else 0
-    except Exception:
-        return 0
-
-
-def missingness_overview(null_mask) -> dict:
-    """Summarize dataset-level missingness from a 0/1 null mask.
-
-    Args:
-        null_mask: Dict ``{col_name: [int 0/1, ...]}`` where each list is aligned
-            by row (``1`` = missing, ``0`` = present). Lists are normally all the
-            same length (= number of rows). Defensive: a non-dict or empty dict
-            returns the all-zero contract; non-list columns are treated as empty;
-            ragged lists are aligned to the maximum length, padding the missing
-            tail of shorter columns as present (``0``); ``None`` / non-int cells
-            count as present.
-
-    Returns:
-        Dict with exactly these keys, all always present (the function never
-        raises): ``n_rows``, ``n_cols``, ``n_cols_with_null``,
-        ``n_missing_cells``, ``missing_cell_pct`` (0-100), ``complete_rows``,
-        ``incomplete_rows``, ``complete_pct`` (0-100), ``incomplete_pct``
-        (0-100). Percentages are ``0.0`` when the denominator is zero (no
-        ``ZeroDivisionError``).
-    """
-    zero = {
-        "n_rows": 0,
-        "n_cols": 0,
-        "n_cols_with_null": 0,
-        "n_missing_cells": 0,
-        "missing_cell_pct": 0.0,
-        "complete_rows": 0,
-        "incomplete_rows": 0,
-        "complete_pct": 0.0,
-        "incomplete_pct": 0.0,
-    }
-
-    if not isinstance(null_mask, dict) or not null_mask:
-        return dict(zero)
-
-    # Normalize every column to a list; non-list columns become empty.
-    cols = {}
-    for name, seq in null_mask.items():
-        cols[name] = seq if isinstance(seq, (list, tuple)) else []
-
-    n_cols = len(cols)
-    lengths = [len(seq) for seq in cols.values()]
-    n_rows = max(lengths) if lengths else 0
-
-    if n_rows == 0:
-        # Columns exist but carry no rows: everything zero except n_cols.
-        out = dict(zero)
-        out["n_cols"] = n_cols
-        return out
-
-    n_missing_cells = 0
-    n_cols_with_null = 0
-    row_has_missing = [False] * n_rows
-
-    for seq in cols.values():
-        col_len = len(seq)
-        col_has_null = False
-        for r in range(n_rows):
-            if r < col_len and _is_missing(seq[r]):
-                n_missing_cells += 1
-                row_has_missing[r] = True
-                col_has_null = True
-        if col_has_null:
-            n_cols_with_null += 1
-
-    incomplete_rows = sum(1 for flag in row_has_missing if flag)
-    complete_rows = n_rows - incomplete_rows
-
-    total_cells = n_rows * n_cols
-    missing_cell_pct = (n_missing_cells / total_cells * 100.0) if total_cells else 0.0
-    complete_pct = complete_rows / n_rows * 100.0
-    incomplete_pct = incomplete_rows / n_rows * 100.0
-
-    return {
-        "n_rows": n_rows,
-        "n_cols": n_cols,
-        "n_cols_with_null": n_cols_with_null,
-        "n_missing_cells": n_missing_cells,
-        "missing_cell_pct": missing_cell_pct,
-        "complete_rows": complete_rows,
-        "incomplete_rows": incomplete_rows,
-        "complete_pct": complete_pct,
-        "incomplete_pct": incomplete_pct,
-    }
@@ -1,146 +0,0 @@
-"""Tests para missingness_overview."""
-
-import sys
-import os
-
-import pytest
-
-sys.path.insert(0, os.path.dirname(__file__))
-
-from missingness_overview import missingness_overview
-
-
-# Output contract: every call returns exactly these 9 keys.
-EXPECTED_KEYS = {
-    "n_rows",
-    "n_cols",
-    "n_cols_with_null",
-    "n_missing_cells",
-    "missing_cell_pct",
-    "complete_rows",
-    "incomplete_rows",
-    "complete_pct",
-    "incomplete_pct",
-}
-
-
-def test_cooccurrence_three_cols_exact():
-    # 3 columns, 5 rows. Hand-computed expectations:
-    #   col a missing at rows 0, 4      -> 2
-    #   col b missing at rows 0, 2      -> 2
-    #   col c missing at row  4         -> 1
-    #   n_missing_cells = 5, total_cells = 5*3 = 15 -> 33.333...%
-    #   row 0 (a&b co-occur)  -> incomplete
-    #   row 1 (all present)   -> complete
-    #   row 2 (b only)        -> incomplete
-    #   row 3 (all present)   -> complete
-    #   row 4 (a&c co-occur)  -> incomplete
-    mask = {
-        "a": [1, 0, 0, 0, 1],
-        "b": [1, 0, 1, 0, 0],
-        "c": [0, 0, 0, 0, 1],
-    }
-    out = missingness_overview(mask)
-    assert out["n_rows"] == 5
-    assert out["n_cols"] == 3
-    assert out["n_cols_with_null"] == 3
-    assert out["n_missing_cells"] == 5
-    assert out["missing_cell_pct"] == pytest.approx(33.33333333, abs=1e-6)
-    assert out["complete_rows"] == 2
-    assert out["incomplete_rows"] == 3
-    assert out["complete_pct"] == pytest.approx(40.0)
-    assert out["incomplete_pct"] == pytest.approx(60.0)
-
-
-def test_empty_dict_all_zero():
-    out = missingness_overview({})
-    assert out == {
-        "n_rows": 0,
-        "n_cols": 0,
-        "n_cols_with_null": 0,
-        "n_missing_cells": 0,
-        "missing_cell_pct": 0.0,
-        "complete_rows": 0,
-        "incomplete_rows": 0,
-        "complete_pct": 0.0,
-        "incomplete_pct": 0.0,
-    }
-
-
-def test_output_keys_contract():
-    # The 9-key contract holds even for the garbage/zero path.
-    assert set(missingness_overview({}).keys()) == EXPECTED_KEYS
-    assert set(missingness_overview({"a": [1, 0]}).keys()) == EXPECTED_KEYS
-
-
-def test_not_a_dict_returns_zero():
-    for bad in (None, [1, 0, 1], 42, "nope", 3.14):
-        out = missingness_overview(bad)
-        assert out["n_rows"] == 0
-        assert out["n_cols"] == 0
-        assert out["n_missing_cells"] == 0
-        assert out["missing_cell_pct"] == 0.0
-
-
-def test_no_nulls_all_complete():
-    mask = {"a": [0, 0, 0], "b": [0, 0, 0]}
-    out = missingness_overview(mask)
-    assert out["n_rows"] == 3
-    assert out["n_cols"] == 2
-    assert out["n_cols_with_null"] == 0
-    assert out["n_missing_cells"] == 0
-    assert out["missing_cell_pct"] == 0.0
-    assert out["complete_rows"] == 3
-    assert out["incomplete_rows"] == 0
-    assert out["complete_pct"] == pytest.approx(100.0)
-    assert out["incomplete_pct"] == pytest.approx(0.0)
-
-
-def test_none_values_treated_as_present():
-    # None and other non-1 values count as present (0).
-    mask = {"a": [None, 1, None, "x", 0]}
-    out = missingness_overview(mask)
-    assert out["n_rows"] == 5
-    assert out["n_cols"] == 1
-    assert out["n_missing_cells"] == 1  # only the explicit 1 at row 1
-    assert out["n_cols_with_null"] == 1
-    assert out["complete_rows"] == 4
-    assert out["incomplete_rows"] == 1
-
-
-def test_unequal_lengths_pad_with_max():
-    # Ragged lists: n_rows = max length; shorter column padded as present.
-    #   a = [1, 1] -> missing at rows 0, 1
-    #   b = [0]    -> row 1 padded to present
-    #   n_rows = 2, n_cols = 2, total_cells = 4, n_missing_cells = 2 -> 50%
-    mask = {"a": [1, 1], "b": [0]}
-    out = missingness_overview(mask)
-    assert out["n_rows"] == 2
-    assert out["n_cols"] == 2
-    assert out["n_cols_with_null"] == 1
-    assert out["n_missing_cells"] == 2
-    assert out["missing_cell_pct"] == pytest.approx(50.0)
-    assert out["complete_rows"] == 0
-    assert out["incomplete_rows"] == 2
-    assert out["incomplete_pct"] == pytest.approx(100.0)
-
-
-def test_columns_present_but_no_rows():
-    # Columns exist but all empty -> zero metrics, n_cols preserved.
-    out = missingness_overview({"a": [], "b": []})
-    assert out["n_rows"] == 0
-    assert out["n_cols"] == 2
-    assert out["n_missing_cells"] == 0
-    assert out["missing_cell_pct"] == 0.0
-    assert out["complete_pct"] == 0.0
-
-
-def test_never_raises_on_garbage():
-    # Non-list column values, mixed junk -> must not raise.
-    mask = {"a": "not a list", "b": 123, "c": [1, 0, 1]}
-    out = missingness_overview(mask)
-    assert set(out.keys()) == EXPECTED_KEYS
-    assert out["n_rows"] == 3
-    assert out["n_cols"] == 3
-    assert out["n_missing_cells"] == 2  # only col c contributes
-    assert out["n_cols_with_null"] == 1
@@ -1,93 +0,0 @@
---
-id: missingness_rank_bar_figure_py_datascience
-name: missingness_rank_bar_figure
-kind: function
-lang: py
-domain: datascience
-version: "1.0.0"
-purity: impure
-signature: "def missingness_rank_bar_figure(names, pcts, title=\"% de valores faltantes por columna\") -> \"matplotlib.figure.Figure\""
-description: "Construye una figura matplotlib de barras horizontales que ordena las columnas de un dataset por su porcentaje de valores faltantes (0-100), la mayor arriba, etiquetando cada barra con su NN.N% al final. Usa ax.barh, eje X fijo 0-100 y labels truncados a ~22 chars. Devuelve un matplotlib.figure.Figure listo para rasterizar por el renderer del informe EDA (capítulo de datos faltantes). Backend Agg sin pyplot global; defensivo ante listas vacías, longitudes desiguales o valores no numéricos (nunca lanza)."
-tags: [eda, missing, missingness, ranking, bar, barh, matplotlib, figure, visualization, datascience, impure]
-uses_functions: []
-uses_types: []
-returns: []
-returns_optional: false
-error_type: "error_go_core"
-imports: [matplotlib]
-example: |
-  from datascience.missingness_rank_bar_figure import missingness_rank_bar_figure
-  names = ["edad", "ingresos", "ciudad", "email"]
-  pcts = [12.5, 40.0, 3.2, 0.0]
-  fig = missingness_rank_bar_figure(names, pcts, title="% de valores faltantes por columna")
-tested: true
-tests:
-  - "test_returns_figure_with_axes"
-  - "test_sorted_descending_largest_on_top"
-  - "test_empty_lists_do_not_raise_and_returns_figure"
-  - "test_xlim_is_zero_to_hundred"
-  - "test_length_mismatch_and_non_numeric_are_handled"
-test_file_path: "python/functions/datascience/missingness_rank_bar_figure_test.py"
-file_path: "python/functions/datascience/missingness_rank_bar_figure.py"
-params:
-  - name: names
-    desc: "Lista de nombres de columna. Puede venir vacía (devuelve figura \"sin datos faltantes\"). Los items se convierten a str y se truncan a ~22 chars con elipsis para las etiquetas del eje Y; los originales no se mutan."
-  - name: pcts
-    desc: "Lista paralela a names con el % de nulos en [0,100]. Valores None, NaN o no numéricos se coercen a 0.0 y los negativos se recortan a 0. Si len(names) != len(pcts) se recorta al menor de ambos para no romper."
-  - name: title
-    desc: "Título de la figura. Se trunca a ~60 chars con elipsis si es muy largo. Default \"% de valores faltantes por columna\"."
-output: "Un matplotlib.figure.Figure (figsize 6.4 x alto adaptativo según nº de barras, dpi 150) con un Axes de barras horizontales (ax.barh) ordenadas por % descendente, la mayor arriba. Eje X fijado a [0,100] con label \"% faltante\", etiquetas del eje Y truncadas a ~22 chars, y cada barra anotada con su NN.N% al final. Si names o pcts vienen vacíos devuelve una Figure con texto centrado \"sin datos faltantes\"; cualquier error inesperado se captura y devuelve una Figure con el mensaje de error (nunca lanza). El caller rasteriza/cierra la figura; la función no la muestra ni la guarda."
---
-
-## Ejemplo
-
-```python
-from datascience.missingness_rank_bar_figure import missingness_rank_bar_figure
-
-# % de nulos por columna (p. ej. (df.isnull().mean() * 100).
-names = ["edad", "ingresos", "ciudad", "email"]
-pcts = [12.5, 40.0, 3.2, 0.0]
-
-fig = missingness_rank_bar_figure(
-    names,
-    pcts,
-    title="% de valores faltantes por columna",
-)
-
-# ingresos (40.0%) queda arriba; email (0.0%) abajo.
-# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
-fig.savefig("/tmp/missingness_rank.png")
-```
-
-## Cuando usarla
-
-Úsala al abrir el capítulo de datos faltantes de un informe EDA para responder
-"¿qué columnas están más incompletas?" de un vistazo. Pásale los nombres de
-columna y el % de nulos de cada una (`(df.isnull().mean() * 100).round(1)`); la
-función se encarga de ordenar de mayor a menor y poner la peor arriba. Es la
-pareja "magnitud" del heatmap de co-ocurrencia: las barras dicen *cuánto* falta
-en cada columna, el heatmap dice *si esas ausencias están relacionadas* entre
-columnas.
-
-## Gotchas
-
- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
-  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
-  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
-  es thread-safe; esta función evita ese riesgo construyendo el `Figure`
-  directamente, así que es segura de llamar en bucle desde el renderer.
- **El caller cierra la figura.** Devuelve el `Figure` pero no lo muestra ni lo
-  guarda. Quien la consume debe rasterizarla y luego liberarla
-  (`matplotlib.pyplot.close(fig)`) para no acumular memoria en lotes grandes.
- **Espera porcentajes 0-100, no fracciones 0-1.** El eje X está fijado a
-  `[0, 100]`. Si pasas fracciones (`0.4` en vez de `40.0`) las barras saldrán
-  pegadas al origen. Multiplica por 100 antes de llamar.
- **Alto adaptativo.** La altura de la figura crece con el número de barras
-  (hasta un tope) para que reports con muchas columnas sigan legibles; aun así,
-  conviene filtrar a las columnas con algún nulo antes de llamar para no listar
-  decenas de barras a 0%.
- **Defensiva, nunca lanza.** Listas vacías, longitudes desiguales, valores
-  `None`/`NaN`/no numéricos o cualquier error inesperado se manejan sin propagar:
-  en el peor caso devuelve una `Figure` con "sin datos faltantes" o con el texto
-  del error. No envuelvas la llamada en try/except por miedo a un raise — no lo
-  hay.
@@ -1,150 +0,0 @@
-"""Impure EDA helper: ranked bar figure of missing-value share (`eda` group).
-
-Builds a horizontal bar chart ranking the columns of a dataset by their
-percentage of missing values (0-100), largest at the top, each bar labelled with
-its ``NN.N%`` at the end. Returns a ready-to-rasterize
-``matplotlib.figure.Figure``; it never shows nor saves it.
-
-Impure because it touches matplotlib's rendering machinery. It uses the headless
-Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
-global state and is safe to call repeatedly from a report renderer.
-"""
-
-import matplotlib
-
-matplotlib.use("Agg")
-
-from matplotlib.figure import Figure  # noqa: E402
-
-# Muted gray for secondary text (no-data / fallback messages).
-_MUTED_TEXT = "#5f6b7a"
-# Soft red for the error fallback message.
-_ERROR_TEXT = "#b00020"
-# Bar fill — a calm blue that reads well on white at report size.
-_BAR_COLOR = "#4C72B0"
-
-
-def _truncate(text, width: int = 22) -> str:
-    """Truncate ``text`` to ``width`` chars, appending an ellipsis if cut."""
-    s = "" if text is None else str(text)
-    if len(s) <= width:
-        return s
-    if width <= 1:
-        return s[:width]
-    return s[: width - 1] + "…"
-
-
-def _message_figure(message: str, color: str = _MUTED_TEXT) -> "Figure":
-    """Return a fallback ``Figure`` carrying a single centered message."""
-    fig = Figure(figsize=(6.4, 4.0), dpi=150)
-    ax = fig.add_subplot(111)
-    ax.axis("off")
-    ax.text(
-        0.5,
-        0.5,
-        message,
-        ha="center",
-        va="center",
-        fontsize=12,
-        color=color,
-        wrap=True,
-        transform=ax.transAxes,
-    )
-    fig.tight_layout()
-    return fig
-
-
-def missingness_rank_bar_figure(
-    names,
-    pcts,
-    title: str = "% de valores faltantes por columna",
-) -> "matplotlib.figure.Figure":
-    """Build a horizontal ranked bar figure of missing-value share per column.
-
-    Pairs each column name with its missing percentage, sorts by percentage
-    descending and draws horizontal bars with the largest at the top. The X axis
-    is pinned to ``[0, 100]`` so bars are comparable across reports, each bar is
-    annotated with its ``NN.N%`` at the end, and the Y tick labels are truncated
-    to ~22 chars.
-
-    The function is fully defensive: empty/mismatched/non-numeric input never
-    raises. When there is nothing valid to draw it returns a ``Figure`` carrying
-    a centered "sin datos faltantes" message, and any unexpected error is caught
-    and turned into a fallback ``Figure`` carrying the error text.
-
-    Args:
-        names: List of column names. May be empty. Items are stringified and
-            truncated for display; the originals are not mutated.
-        pcts: List parallel to ``names`` of missing-value percentages in
-            ``[0, 100]``. Non-numeric/``None`` values are coerced to ``0.0`` and
-            negatives are clamped to ``0``. The list is truncated to
-            ``min(len(names), len(pcts))`` so a length mismatch never crashes.
-        title: Figure title. Default "% de valores faltantes por columna".
-
-    Returns:
-        A ``matplotlib.figure.Figure`` with a single horizontal-bar Axes. The
-        caller is responsible for rasterizing/closing it.
-    """
-    try:
-        if (
-            not isinstance(names, (list, tuple))
-            or not isinstance(pcts, (list, tuple))
-            or len(names) == 0
-            or len(pcts) == 0
-        ):
-            return _message_figure("sin datos faltantes")
-
-        # --- Pair names with coerced percentages, tolerating length mismatch.
-        pairs = []
-        for name, pct in zip(names, pcts):
-            try:
-                val = float(pct)
-            except (TypeError, ValueError):
-                val = 0.0
-            if val != val:  # NaN guard.
-                val = 0.0
-            val = max(0.0, val)
-            pairs.append((name, val))
-
-        if not pairs:
-            return _message_figure("sin datos faltantes")
-
-        # Sort by percentage descending; barh draws bottom-up, so the largest
-        # ends at the top when we reverse the order before plotting.
-        pairs.sort(key=lambda p: p[1], reverse=True)
-        ordered = list(reversed(pairs))  # smallest first -> largest on top.
-
-        labels = [_truncate(name, 22) for name, _ in ordered]
-        values = [val for _, val in ordered]
-        y_pos = range(len(ordered))
-
-        # Height scales with the number of bars so dense reports stay readable.
-        height = max(2.4, min(0.4 * len(ordered) + 1.2, 14.0))
-        fig = Figure(figsize=(6.4, height), dpi=150)
-        ax = fig.add_subplot(111)
-
-        ax.barh(list(y_pos), values, color=_BAR_COLOR, edgecolor="white")
-        ax.set_yticks(list(y_pos))
-        ax.set_yticklabels(labels, fontsize=8)
-        ax.set_xlim(0, 100)
-        ax.set_xlabel("% faltante", fontsize=9)
-
-        # Annotate each bar with its percentage at the end of the bar.
-        for y, val in zip(y_pos, values):
-            ax.text(
-                min(val + 1.5, 99.0),
-                y,
-                f"{val:.1f}%",
-                va="center",
-                ha="left" if val < 90 else "right",
-                fontsize=7,
-                color="#202020",
-            )
-
-        if title:
-            ax.set_title(_truncate(title, 60), fontsize=12, loc="left", pad=10)
-
-        fig.tight_layout()
-        return fig
-    except Exception as exc:  # noqa: BLE001 — never raise from a figure builder.
-        return _message_figure(f"error al dibujar barras: {exc}", color=_ERROR_TEXT)
@@ -1,64 +0,0 @@
-"""Tests para missingness_rank_bar_figure (barras de % faltante, grupo eda).
-
-Usa el backend Agg sin pyplot; no muestra ni guarda figuras. Cada test cierra
-explícitamente la Figure construida (matplotlib.pyplot.close) para no acumular
-estado entre tests.
-"""
-
-import matplotlib
-
-matplotlib.use("Agg")
-
-import matplotlib.pyplot as plt  # noqa: E402
-from matplotlib.figure import Figure  # noqa: E402
-
-from missingness_rank_bar_figure import missingness_rank_bar_figure
-
-
-def test_returns_figure_with_axes():
-    names = ["edad", "ingresos", "ciudad"]
-    pcts = [12.5, 40.0, 3.2]
-    fig = missingness_rank_bar_figure(names, pcts, title="faltantes")
-    assert isinstance(fig, Figure)
-    assert len(fig.axes) >= 1
-    plt.close(fig)
-
-
-def test_sorted_descending_largest_on_top():
-    names = ["a", "b", "c"]
-    pcts = [10.0, 50.0, 25.0]
-    fig = missingness_rank_bar_figure(names, pcts)
-    ax = fig.axes[0]
-    # barh dibuja de abajo arriba; la mayor (50, "b") debe quedar arriba (mayor y).
-    bars = ax.patches
-    # El último parche (mayor índice y) corresponde a la barra superior.
-    widths = [b.get_width() for b in bars]
-    assert max(widths) == 50.0
-    # La barra con la mayor anchura es la de mayor coordenada y (arriba).
-    top_bar = max(bars, key=lambda b: b.get_y())
-    assert top_bar.get_width() == 50.0
-    plt.close(fig)
-
-
-def test_empty_lists_do_not_raise_and_returns_figure():
-    fig = missingness_rank_bar_figure([], [], title="vacía")
-    assert isinstance(fig, Figure)
-    assert len(fig.axes) >= 1
-    plt.close(fig)
-
-
-def test_xlim_is_zero_to_hundred():
-    fig = missingness_rank_bar_figure(["a"], [42.0])
-    ax = fig.axes[0]
-    assert ax.get_xlim() == (0.0, 100.0)
-    plt.close(fig)
-
-
-def test_length_mismatch_and_non_numeric_are_handled():
-    # Más names que pcts + un pct None -> zip recorta y None se coacciona a 0.
-    names = ["a", "b", "c"]
-    pcts = [None, 30.0]
-    fig = missingness_rank_bar_figure(names, pcts)
-    assert isinstance(fig, Figure)
-    assert len(fig.axes) >= 1
-    plt.close(fig)
@@ -1,65 +0,0 @@
---
-name: missingness_row_patterns
-kind: function
-lang: py
-domain: datascience
-version: "1.0.0"
-purity: pure
-signature: "def missingness_row_patterns(null_mask, top_n=10) -> dict"
-description: "Agrupa las filas de un dataset por su patron de ausencias (estilo matriz de missingno): para cada fila, el patron es la tupla ORDENADA de columnas que faltan en esa fila (las que tienen 1 en el null_mask). Cuenta la frecuencia de cada patron distinto, incluido el patron vacio (fila completa). Devuelve el top_n por frecuencia con su pct sobre el total. Pura, lectura defensiva, NUNCA lanza; {} -> n_rows 0."
-tags: [eda, missingness, missingno, patterns, profiling, datascience, data-quality]
-params:
-  - name: null_mask
-    desc: "Dict {col: [0/1, ...]} alineado por fila, donde 1 = la celda falta en esa fila y 0 = presente. Todas las columnas deberian tener la misma longitud (una entrada por fila); si difieren, n_rows es la lista mas larga y las celdas fuera de rango cuentan como presentes. Las claves se ordenan por str(col) para canonizar el patron. {} (o no-dict) -> n_rows 0."
-  - name: top_n
-    desc: "Maximo de patrones devueltos en `patterns`, rankeados por n_rows desc (desempate: menos columnas primero, luego nombres de columna). El recuento total de patrones distintos siempre se reporta en `n_patterns`, no se trunca. Default 10. Valores negativos -> 0; no-int -> 10."
-output: "Dict {n_rows: int (filas totales), n_patterns: int (patrones distintos, incluye el patron vacio = fila completa), complete_rows: int (filas con patron vacio, nada falta), patterns: lista del top_n ordenada por n_rows desc con [{missing_cols: [col,...] (vacio = fila completa), n_rows: int, pct: float 0-100 sobre n_rows total, redondeado a 2 decimales}]}. Para {} devuelve n_rows 0 y patterns []. NUNCA lanza."
-uses_functions: []
-uses_types: []
-returns: []
-returns_optional: false
-error_type: ""
-imports: []
-tested: true
-tests: ["test_patron_dominante_completas_singleton", "test_mask_vacio", "test_top_n_trunca_pero_cuenta_todos"]
-test_file_path: "python/functions/datascience/missingness_row_patterns_test.py"
-file_path: "python/functions/datascience/missingness_row_patterns.py"
---
-
-## Ejemplo
-
-```python
-import sys, os
-sys.path.insert(0, os.path.join("python", "functions"))
-from datascience.missingness_row_patterns import missingness_row_patterns
-
-# null_mask alineado por fila: 1 = la celda falta en esa fila.
-null_mask = {
-    "A": [1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
-    "B": [1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
-    "C": [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
-}
-out = missingness_row_patterns(null_mask, top_n=10)
-print(out["n_rows"], out["n_patterns"], out["complete_rows"])  # 10 3 5
-for p in out["patterns"]:
-    label = p["missing_cols"] or "(fila completa)"
-    print(label, p["n_rows"], p["pct"])
-# (fila completa) 5 50.0
-# ['A', 'B'] 4 40.0
-# ['C'] 1 10.0
-```
-
-## Cuando usarla
-
- Usala en el capitulo de calidad/ausencias de `AutomaticEDA` para mostrar la "matriz de patrones de missingno": en vez de pintar celda a celda, resume que combinaciones de columnas se quedan en blanco juntas y con que frecuencia.
- Cuando ya tengas el null_mask por columna (1=falta) y quieras detectar co-ausencia estructural ("A y B siempre faltan juntas") antes de decidir una imputacion o un drop conjunto de columnas.
- Cuando necesites una tabla compacta "patron -> nº filas -> pct" para un report o un grafico de barras de los patrones de ausencia mas comunes, separando ademas cuantas filas estan completas (`complete_rows`).
-
-## Gotchas
-
- Funcion pura, sin I/O y determinista. Lectura defensiva: `{}` o un no-dict devuelven `n_rows` 0 con `patterns` []. NUNCA lanza.
- El patron vacio (fila completa, `missing_cols=[]`) SI cuenta como patron: aparece en `n_patterns` y puede aparecer en `patterns`. El consumidor lo etiqueta como "(fila completa)".
- `pct` es sobre `n_rows` total (0-100), redondeado a 2 decimales. La suma de los `pct` de TODOS los patrones es 100; si `top_n` trunca, los `pct` mostrados sumaran menos.
- Las columnas se ordenan por `str(col)` para canonizar cada patron, asi `{A,B}` y `{B,A}` colapsan al mismo patron `["A", "B"]`.
- Una celda cuenta como ausente solo si vale 1 (`int(cell) == 1`); 0, None y valores no numericos se tratan como presentes.
- Si las listas de columnas tienen longitudes distintas, `n_rows` es la mas larga y las posiciones fuera de rango de una columna corta cuentan como presentes (0).
@@ -1,107 +0,0 @@
-"""missingness_row_patterns — distinct per-row missingness patterns (missingno matrix style).
-
-Pure function: no I/O, deterministic, NEVER raises. Given a per-column null mask
-aligned by row ({col: [0/1, ...]}, 1 = missing), it groups rows by their missing
-"pattern" — the sorted tuple of column names that are missing in that row — and
-counts how often each distinct pattern occurs.
-
-This mirrors the missingno matrix idea: instead of plotting per-cell nullity, it
-collapses each row to the SET of columns it lacks, surfacing co-missing structure
-(e.g. "A and B always go missing together"). The empty pattern (a fully complete
-row) is a first-class pattern and may appear in the result with missing_cols=[];
-the caller labels it "(fila completa)".
-"""
-
-
-def _is_missing(cell) -> bool:
-    """A cell counts as missing when it equals 1 (truthy 0/1 mask).
-
-    None / 0 / non-numeric are treated as present. Defensive: never raises.
-    """
-    try:
-        return int(cell) == 1
-    except (TypeError, ValueError):
-        return bool(cell)
-
-
-def missingness_row_patterns(null_mask, top_n=10) -> dict:
-    """Count distinct per-row missingness patterns from a column null mask.
-
-    For each row, its pattern is the sorted tuple of column names missing in that
-    row (the columns whose value is 1). The frequency of each distinct pattern is
-    counted, including the empty pattern (a complete row with nothing missing).
-
-    Args:
-        null_mask: Dict {col: [0/1, ...]} aligned by row, where 1 means the cell
-            is missing in that row. Read defensively; columns with differing
-            lengths are tolerated (n_rows is the longest list; out-of-range cells
-            count as present). Empty dict -> n_rows 0.
-        top_n: Maximum number of patterns returned in `patterns`, ranked by
-            n_rows desc (tiebreak: fewer columns first, then column names). The
-            full count of distinct patterns is always reported in `n_patterns`.
-
-    Returns:
-        Dict:
-        {
-          "n_rows": int,            # total rows
-          "n_patterns": int,        # distinct patterns (incl. the empty pattern)
-          "complete_rows": int,     # rows with the empty pattern (nothing missing)
-          "patterns": [             # top_n patterns, n_rows desc
-             {"missing_cols": [col, ...], "n_rows": int, "pct": float}  # [] = complete row
-          ],
-        }
-        For {} (or a non-dict) returns n_rows 0 and patterns []. NEVER raises.
-    """
-    empty = {"n_rows": 0, "n_patterns": 0, "complete_rows": 0, "patterns": []}
-    if not isinstance(null_mask, dict) or not null_mask:
-        return empty
-
-    # Stable, canonical column order so each row's pattern tuple is sorted.
-    items = sorted(null_mask.items(), key=lambda kv: str(kv[0]))
-    names = [str(k) for k, _ in items]
-    lists = [v if isinstance(v, (list, tuple)) else [] for _, v in items]
-
-    n_rows = max((len(lst) for lst in lists), default=0)
-    if n_rows == 0:
-        return empty
-
-    # Defensive parsing of top_n.
-    try:
-        limit = int(top_n)
-    except (TypeError, ValueError):
-        limit = 10
-    if limit < 0:
-        limit = 0
-
-    counts: dict = {}
-    n_cols = len(names)
-    for r in range(n_rows):
-        # names is sorted, so iterating in order yields an already-sorted tuple.
-        pattern = tuple(
-            names[c]
-            for c in range(n_cols)
-            if r < len(lists[c]) and _is_missing(lists[c][r])
-        )
-        counts[pattern] = counts.get(pattern, 0) + 1
-
-    complete_rows = counts.get((), 0)
-    n_patterns = len(counts)
-
-    # Rank: n_rows desc, then fewer columns first, then column names (deterministic).
-    ordered = sorted(counts.items(), key=lambda kv: (-kv[1], len(kv[0]), kv[0]))
-
-    patterns = [
-        {
-            "missing_cols": list(pat),
-            "n_rows": cnt,
-            "pct": round(100.0 * cnt / n_rows, 2),
-        }
-        for pat, cnt in ordered[:limit]
-    ]
-
-    return {
-        "n_rows": n_rows,
-        "n_patterns": n_patterns,
-        "complete_rows": complete_rows,
-        "patterns": patterns,
-    }
@@ -1,87 +0,0 @@
-"""Tests para missingness_row_patterns."""
-
-import os
-import sys
-
-sys.path.insert(0, os.path.dirname(__file__))
-
-from missingness_row_patterns import missingness_row_patterns
-
-_EXPECTED_KEYS = {"n_rows", "n_patterns", "complete_rows", "patterns"}
-
-
-def test_patron_dominante_completas_singleton():
-    """Golden: {A,B} co-faltan en 4 filas + 5 filas completas + 1 singleton {C}."""
-    # 10 filas. A y B faltan juntas en las filas 0-3; filas 4-8 completas;
-    # la fila 9 solo le falta C.
-    null_mask = {
-        "A": [1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
-        "B": [1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
-        "C": [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
-    }
-    out = missingness_row_patterns(null_mask)
-
-    assert set(out.keys()) == _EXPECTED_KEYS
-    assert out["n_rows"] == 10
-    # 3 patrones distintos: (A,B), () y (C,).
-    assert out["n_patterns"] == 3
-    # 5 filas completas (filas 4-8).
-    assert out["complete_rows"] == 5
-
-    # Orden: n_rows desc; desempate menos columnas primero.
-    # () tiene 5 filas, (A,B) 4, (C,) 1.
-    pats = out["patterns"]
-    assert len(pats) == 3
-
-    assert pats[0]["missing_cols"] == []
-    assert pats[0]["n_rows"] == 5
-    assert pats[0]["pct"] == 50.0
-
-    assert pats[1]["missing_cols"] == ["A", "B"]
-    assert pats[1]["n_rows"] == 4
-    assert pats[1]["pct"] == 40.0
-
-    assert pats[2]["missing_cols"] == ["C"]
-    assert pats[2]["n_rows"] == 1
-    assert pats[2]["pct"] == 10.0
-
-    # Tipos de salida.
-    assert isinstance(out["n_rows"], int)
-    assert isinstance(pats[0]["pct"], float)
-
-
-def test_mask_vacio():
-    """{} -> n_rows 0, sin patrones, nunca lanza."""
-    out = missingness_row_patterns({})
-    assert out == {
-        "n_rows": 0,
-        "n_patterns": 0,
-        "complete_rows": 0,
-        "patterns": [],
-    }
-    # No dict / None tambien degradan a vacio sin lanzar.
-    assert missingness_row_patterns(None)["n_rows"] == 0
-    # Columnas presentes pero listas vacias -> n_rows 0.
-    assert missingness_row_patterns({"A": [], "B": []})["patterns"] == []
-
-
-def test_top_n_trunca_pero_cuenta_todos():
-    """top_n limita `patterns`, pero n_patterns reporta TODOS los distintos."""
-    null_mask = {
-        "A": [0, 1, 1, 0, 1],
-        "B": [0, 0, 0, 1, 1],
-        "C": [0, 0, 0, 0, 1],
-    }
-    # Filas: ()  (A,)  (A,)  (B,)  (A,B,C)
-    out = missingness_row_patterns(null_mask, top_n=2)
-
-    assert out["n_rows"] == 5
-    assert out["n_patterns"] == 4          # (), (A,), (B,), (A,B,C)
-    assert out["complete_rows"] == 1
-    # Solo 2 patrones devueltos pese a haber 4.
-    assert len(out["patterns"]) == 2
-    # (A,) domina con 2 filas; desempate del 2o entre los de 1 fila -> () (0 cols).
-    assert out["patterns"][0]["missing_cols"] == ["A"]
-    assert out["patterns"][0]["n_rows"] == 2
-    assert out["patterns"][1]["missing_cols"] == []
-    assert out["patterns"][1]["n_rows"] == 1
@@ -0,0 +1,122 @@
+---
+id: relationship_scatter_figure_py_datascience
+name: relationship_scatter_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def relationship_scatter_figure(xs: list, ys: list, x_label: str = \"\", y_label: str = \"\", classification: dict = None, max_points: int = 2000) -> \"matplotlib.figure.Figure\""
+description: "Construye una figura matplotlib scatter de un par de variables numéricas con su curva/recta de ajuste y una anotación del tipo de relación (lineal, polinómica grado 2/3, monótona no-lineal, etc.) más sus métricas (r, ρ, R²lin, R²poly). Consume el dict de classify_relationship_type; si es None lo calcula internamente reusando esa función. Devuelve un matplotlib.figure.Figure listo para rasterizar por el renderer del informe EDA (PDF/PPTX). Backend Agg sin pyplot global; downsample determinista de los puntos dibujados; defensivo ante vacío/None."
+tags: [eda, correlation, scatter, relationship, matplotlib, figure, visualization, datascience, impure]
+uses_functions: [classify_relationship_type_py_datascience]
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib, numpy]
+example: |
+  from relationship_scatter_figure import relationship_scatter_figure
+  xs = [float(i) for i in range(100)]
+  ys = [0.5 * x * x - x + 3 for x in xs]
+  classification = {
+      "tipo": "polinómica (grado 2)", "pearson": 0.97, "spearman": 0.99,
+      "r2_linear": 0.92, "r2_poly2": 0.999, "r2_poly3": 0.999,
+      "best_degree": 2, "coeffs": [0.5, -1.0, 3.0],
+  }
+  fig = relationship_scatter_figure(xs, ys, x_label="dosis", y_label="efecto", classification=classification)
+tested: true
+tests:
+  - "test_returns_figure"
+  - "test_downsample_determinista"
+  - "test_empty_no_lanza"
+  - "test_classification_none"
+test_file_path: "python/functions/datascience/relationship_scatter_figure_test.py"
+file_path: "python/functions/datascience/relationship_scatter_figure.py"
+params:
+  - name: xs
+    desc: "Lista (o tupla) de valores x. Se emparejan por índice con ys. Valores None, bool, NaN o inf descartan ese par (lectura defensiva)."
+  - name: ys
+    desc: "Lista (o tupla) de valores y, paralela a xs. Mismas reglas defensivas que xs."
+  - name: x_label
+    desc: "Etiqueta del eje/título para la variable x. Default \"\" (en el título cae a \"x\")."
+  - name: y_label
+    desc: "Etiqueta del eje/título para la variable y. Default \"\" (en el título cae a \"y\")."
+  - name: classification
+    desc: "Opcional. Dict de classify_relationship_type con claves tipo, pearson, r2_linear, spearman, r2_poly2, r2_poly3, best_degree, coeffs. Si es None se calcula internamente importando y llamando a classify_relationship_type sobre los pares limpios (self-contained). Si el módulo hermano no está disponible, se dibuja el scatter sin curva de ajuste ni anotación. Default None."
+  - name: max_points
+    desc: "Tope del nº de puntos DIBUJADOS. Si los pares limpios superan el tope, la nube se submuestrea por paso fijo ceil(n/max_points) tomando pairs[::step] — DETERMINISTA, no aleatorio, reproducible. La clasificación/ajuste usa SIEMPRE todos los pares limpios; el downsample solo adelgaza el dibujo. Valor no-positivo o no-int desactiva el downsample. Default 2000."
+output: "Un matplotlib.figure.Figure (figsize 6.4x4.0, dpi 150) con un Axes scatter (puntos semitransparentes alpha 0.5, color #4C72B0), la curva/recta de ajuste (numpy.polyval sobre coeffs, color #C44E52) cuando hay un ajuste polinómico disponible, título \"{x_label} ↔ {y_label}\", labels de ejes y una caja de anotación en la esquina superior izquierda con el tipo de relación y las métricas disponibles (r, ρ, R²lin, R²poly; se omiten las None). Si tras la limpieza hay menos de 2 pares válidos, devuelve igualmente una Figure con un texto centrado \"Sin datos suficientes para el scatter\" (nunca lanza). El caller rasteriza/cierra la figura; la función no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+from relationship_scatter_figure import relationship_scatter_figure
+
+# Par numérico con relación cuadrática y su clasificación (de
+# classify_relationship_type). Pasándola explícita evitas recomputarla.
+xs = [float(i) for i in range(100)]
+ys = [0.5 * x * x - x + 3 for x in xs]
+classification = {
+    "tipo": "polinómica (grado 2)",
+    "pearson": 0.97,
+    "spearman": 0.99,
+    "r2_linear": 0.92,
+    "r2_poly2": 0.999,
+    "r2_poly3": 0.999,
+    "best_degree": 2,
+    "coeffs": [0.5, -1.0, 3.0],
+}
+
+fig = relationship_scatter_figure(
+    xs, ys, x_label="dosis", y_label="efecto", classification=classification
+)
+
+# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
+fig.savefig("/tmp/scatter_dosis_efecto.png")
+
+# Con classification=None la función la calcula internamente (self-contained):
+fig2 = relationship_scatter_figure(xs, ys, x_label="dosis", y_label="efecto")
+```
+
+## Cuando usarla
+
+Úsala dentro del informe EDA automático cuando quieras visualizar de un vistazo
+la relación entre dos variables numéricas: la nube de puntos, la curva que mejor
+la ajusta y una etiqueta legible del tipo de relación con sus métricas. Es la
+pareja "vista humana" de `classify_relationship_type`: esa función decide el
+tipo y los coeficientes; esta los pinta en una `Figure` que el renderer del
+informe rasteriza a PDF/PPTX. Pásale el dict de clasificación si ya lo tienes
+calculado (evitas recomputar el ajuste); si no, déjalo en `None` y la función lo
+resuelve sola sobre los pares limpios. Pensada para móvil: anotación pequeña
+(fontsize 8) y nube adelgazada por `max_points` para que el PDF no pese.
+
+## Gotchas
+
+- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
+  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
+  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
+  es thread-safe; esta función lo evita construyendo el `Figure` directamente,
+  así que es segura de llamar en bucle desde el renderer.
+- **El caller cierra la figura.** Devuelve el `Figure` pero no lo muestra ni lo
+  guarda. Quien la consume debe rasterizarla y luego liberarla
+  (`matplotlib.pyplot.close(fig)`) para no acumular memoria en lotes grandes de
+  pares de columnas.
+- **Downsample determinista, solo del dibujo.** Cuando los pares limpios superan
+  `max_points`, la nube DIBUJADA se adelgaza por paso fijo `pairs[::step]`
+  (reproducible, no aleatorio). La clasificación y el ajuste usan SIEMPRE todos
+  los pares limpios; el downsample no altera las métricas ni la curva.
+- **`classification=None` ⇒ se calcula sola.** Importa y llama a
+  `classify_relationship_type` sobre los pares limpios. Si ese módulo hermano no
+  está disponible (entorno incompleto), NO lanza: dibuja el scatter sin curva de
+  ajuste ni anotación. Pasar la clasificación explícita es más barato (no
+  recomputa el ajuste).
+- **Sin curva para `monótona no-lineal`.** Cuando `coeffs` es `None` o
+  `best_degree` es `None` (p.ej. tipo "monótona no-lineal"), no se pinta recta
+  polinómica — solo la nube y la anotación. Tampoco se dibuja la curva si el
+  rango de x es nulo (todos los x iguales). Nunca falla por esto.
+- **Defensiva, nunca lanza.** `xs=[]`, `ys=[]`, menos de 2 pares válidos, ends
+  `None`/`bool`/`NaN`/`inf` o `coeffs` malformado se manejan sin error: en el
+  peor caso devuelve una `Figure` con "Sin datos suficientes para el scatter".
+  No envuelvas la llamada en try/except por miedo a un raise — no lo hay.
@@ -0,0 +1,322 @@
+"""Impure EDA helper: scatter figure of a numeric pair with its fit (`eda` group).
+
+Builds a matplotlib scatter of two numeric variables, overlays the fitted
+curve/line implied by the relationship classification (linear, polynomial of
+degree 2/3, etc.) and annotates the relationship type with its available
+metrics. Returns a ready-to-rasterize ``matplotlib.figure.Figure``; it never
+shows nor saves it.
+
+Impure because it touches matplotlib's rendering machinery. It uses the headless
+Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
+global state and is safe to call repeatedly from a report renderer.
+
+To keep the rendered PDF/PPTX light on phones, when the number of valid pairs
+exceeds ``max_points`` the *plotted* points are down-sampled DETERMINISTICALLY by
+a fixed step (``pairs[::step]``), never randomly, so the output is reproducible.
+The classification/fit always uses every clean pair; the down-sample only thins
+the drawn cloud.
+"""
+
+import math
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import numpy as np  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+# Sober blue for the scatter cloud and red for the fitted curve (Tufte: the
+# data points are the primary ink, the fit is the secondary highlight).
+_POINT_COLOR = "#4C72B0"
+_FIT_COLOR = "#C44E52"
+# Muted gray for the no-data fallback message.
+_MUTED_TEXT = "#5f6b7a"
+
+
+def _finite(value):
+    """Coerce ``value`` to a finite float, or return None when not usable.
+
+    bool is a subclass of int, but a real numeric measurement is never a bool,
+    so True/False are treated as missing instead of coercing to 1.0/0.0. NaN and
+    +/-infinity are never valid either.
+    """
+    if value is None or isinstance(value, bool):
+        return None
+    try:
+        f = float(value)
+    except (TypeError, ValueError):
+        return None
+    if math.isnan(f) or math.isinf(f):
+        return None
+    return f
+
+
+def _clean_pairs(xs, ys):
+    """Pair ``xs[i], ys[i]`` by index, dropping any pair with a non-finite end."""
+    pairs = []
+    if isinstance(xs, (list, tuple)) and isinstance(ys, (list, tuple)):
+        n = min(len(xs), len(ys))
+        for i in range(n):
+            x = _finite(xs[i])
+            y = _finite(ys[i])
+            if x is None or y is None:
+                continue
+            pairs.append((x, y))
+    return pairs
+
+
+def _ordered_trend(xs_clean, ys_clean, n_bins: int = 12):
+    """Return (x_trend, y_trend): the ordered trend of y over x for a monotonic
+    relationship that has no polynomial fit.
+
+    When x has few distinct values (an ordinal/discrete scale) the trend is the
+    mean of y per distinct x value. Otherwise x is split into ``n_bins`` ordered
+    quantile bins and each point is (mean x, mean y) of the bin. Returns
+    ``(None, None)`` when there is nothing meaningful to draw.
+    """
+    x_arr = np.asarray(xs_clean, dtype=float)
+    y_arr = np.asarray(ys_clean, dtype=float)
+    if x_arr.size < 2:
+        return None, None
+    uniq = np.unique(x_arr)
+    if uniq.size <= max(2, n_bins):
+        # Discrete x: one trend point per distinct value (mean y).
+        xt = uniq
+        yt = np.array([float(np.mean(y_arr[x_arr == ux])) for ux in uniq])
+        return xt, yt
+    # Continuous x: ordered quantile bins, (mean x, mean y) per bin.
+    order = np.argsort(x_arr, kind="stable")
+    x_sorted = x_arr[order]
+    y_sorted = y_arr[order]
+    chunks_x = np.array_split(x_sorted, n_bins)
+    chunks_y = np.array_split(y_sorted, n_bins)
+    xt = np.array([float(np.mean(cx)) for cx in chunks_x if cx.size])
+    yt = np.array([float(np.mean(cy)) for cy in chunks_y if cy.size])
+    return xt, yt
+
+
+def _no_data_figure(message: str) -> "matplotlib.figure.Figure":
+    """A bare Figure carrying a centered muted message (defensive fallback)."""
+    fig = Figure(figsize=(6.4, 4.0), dpi=150)
+    ax = fig.add_subplot(111)
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=12,
+        color=_MUTED_TEXT,
+        transform=ax.transAxes,
+    )
+    fig.tight_layout()
+    return fig
+
+
+def _metrics_caption(classification: dict) -> str:
+    """Format the available metrics of a classification dict into one line.
+
+    Omits the metrics that are None. Keys consumed (any may be absent/None):
+    ``pearson`` (r), ``spearman`` (rho), ``r2_linear`` (R²lin) and the best
+    polynomial R² (``r2_poly3`` if a cubic was the best fit, else ``r2_poly2``).
+    """
+    parts = []
+    r = _finite(classification.get("pearson"))
+    if r is not None:
+        parts.append(f"r={r:.2f}")
+    rho = _finite(classification.get("spearman"))
+    if rho is not None:
+        parts.append(f"ρ={rho:.2f}")
+    r2_lin = _finite(classification.get("r2_linear"))
+    if r2_lin is not None:
+        parts.append(f"R²lin={r2_lin:.2f}")
+    # Prefer the R² of the best polynomial degree when it is a poly fit.
+    best_degree = classification.get("best_degree")
+    r2_poly = None
+    if best_degree == 3:
+        r2_poly = _finite(classification.get("r2_poly3"))
+    elif best_degree == 2:
+        r2_poly = _finite(classification.get("r2_poly2"))
+    if r2_poly is None:
+        # Fall back to whichever poly R² is present (cubic first).
+        r2_poly = _finite(classification.get("r2_poly3"))
+        if r2_poly is None:
+            r2_poly = _finite(classification.get("r2_poly2"))
+    if r2_poly is not None:
+        parts.append(f"R²poly={r2_poly:.2f}")
+    return "  ".join(parts)
+
+
+def relationship_scatter_figure(
+    xs: list,
+    ys: list,
+    x_label: str = "",
+    y_label: str = "",
+    classification: dict = None,
+    max_points: int = 2000,
+) -> "matplotlib.figure.Figure":
+    """Build a scatter figure of a numeric pair with its fit and a type label.
+
+    Cleans the pairs defensively (drops any pair with a None/bool/NaN/inf end),
+    plots a semi-transparent scatter cloud (down-sampled deterministically when
+    it exceeds ``max_points``), overlays the polynomial fit implied by
+    ``classification`` and annotates the relationship type plus its available
+    metrics in a corner box.
+
+    The fit and classification always use every clean pair; only the drawn cloud
+    is thinned by the down-sample. When ``classification`` is None it is computed
+    internally by reusing ``classify_relationship_type`` over the clean pairs, so
+    the function is self-contained.
+
+    The function is fully defensive: empty input, fewer than 2 clean pairs, a
+    missing/None ``coeffs`` or a missing sibling classifier never raise. When
+    there is nothing valid to draw it still returns a ``Figure`` carrying a
+    centered "Sin datos suficientes para el scatter" message.
+
+    Args:
+        xs: List (or tuple) of x values. Paired by index with ``ys``. Values that
+            are None, bool, NaN or infinite discard that pair. Read defensively.
+        ys: List (or tuple) of y values, parallel to ``xs``. Same defensive rules.
+        x_label: Axis/title label for the x variable. Default "" (falls back to
+            "x" in the title).
+        y_label: Axis/title label for the y variable. Default "" (falls back to
+            "y" in the title).
+        classification: Optional dict from ``classify_relationship_type`` with
+            keys ``tipo, pearson, r2_linear, spearman, r2_poly2, r2_poly3,
+            best_degree, coeffs``. When None, it is computed internally by
+            importing and calling ``classify_relationship_type`` over the clean
+            pairs. When that sibling module is unavailable, the scatter is still
+            drawn (no fit curve, no annotation).
+        max_points: Cap on the number of *plotted* points. When the number of
+            clean pairs exceeds this cap, the drawn cloud is down-sampled by a
+            fixed step ``ceil(n/max_points)`` taking ``pairs[::step]`` —
+            DETERMINISTIC, not random, so the figure is reproducible. A
+            non-positive or non-int value disables down-sampling. Default 2000.
+
+    Returns:
+        A ``matplotlib.figure.Figure`` (figsize 6.4x4.0, dpi 150) with a single
+        scatter Axes, the fitted curve (when a polynomial fit is available) and a
+        corner annotation with the relationship type and metrics. When there are
+        fewer than 2 clean pairs it returns a Figure with a centered "Sin datos
+        suficientes para el scatter" message. The caller rasterizes/closes it.
+    """
+    pairs = _clean_pairs(xs, ys)
+    if len(pairs) < 2:
+        return _no_data_figure("Sin datos suficientes para el scatter")
+
+    # Full clean coordinates feed the classification/fit; the plotted cloud is
+    # what gets thinned.
+    xs_clean = [p[0] for p in pairs]
+    ys_clean = [p[1] for p in pairs]
+
+    # Resolve the classification. If not provided, reuse the sibling classifier
+    # over ALL clean pairs (self-contained). Missing module => no fit/annotation.
+    cls = classification
+    if cls is None:
+        try:
+            from classify_relationship_type import classify_relationship_type
+
+            cls = classify_relationship_type(xs_clean, ys_clean)
+        except Exception:
+            cls = None
+    if not isinstance(cls, dict):
+        cls = {}
+
+    # --- Deterministic down-sampling of the DRAWN points only.
+    n_total = len(pairs)
+    if (
+        isinstance(max_points, int)
+        and not isinstance(max_points, bool)
+        and max_points > 0
+        and n_total > max_points
+    ):
+        step = math.ceil(n_total / max_points)
+        sampled = pairs[::step]
+    else:
+        sampled = pairs
+
+    x_plot = [p[0] for p in sampled]
+    y_plot = [p[1] for p in sampled]
+
+    fig = Figure(figsize=(6.4, 4.0), dpi=150)
+    ax = fig.add_subplot(111)
+
+    ax.scatter(
+        x_plot,
+        y_plot,
+        s=12,
+        alpha=0.5,
+        color=_POINT_COLOR,
+        edgecolors="none",
+        rasterized=True,
+    )
+
+    # --- Fitted curve/line over the full clean x range.
+    coeffs = cls.get("coeffs")
+    best_degree = cls.get("best_degree")
+    tipo = cls.get("tipo")
+    x_min, x_max = min(xs_clean), max(xs_clean)
+    drew_fit = False
+    if coeffs is not None and best_degree is not None and x_max > x_min:
+        try:
+            coeff_arr = np.asarray(coeffs, dtype=float)
+            if coeff_arr.ndim == 1 and coeff_arr.size > 0 and np.all(np.isfinite(coeff_arr)):
+                x_line = np.linspace(x_min, x_max, 200)
+                y_line = np.polyval(coeff_arr, x_line)
+                if np.all(np.isfinite(y_line)):
+                    ax.plot(x_line, y_line, color=_FIT_COLOR, linewidth=2)
+                    drew_fit = True
+        except Exception:
+            # Never fail the figure because of a malformed coeffs array.
+            pass
+
+    # A monotonic non-linear relationship has no fitted polynomial (coeffs is
+    # None by design — a low-degree polynomial would mislead). Draw instead the
+    # ordered trend of y over x so the reader still sees the shape: y averaged
+    # within ordered x-bins (or per distinct x value when x is discrete with few
+    # levels, e.g. an ordinal scale). Defensive: any failure leaves the cloud.
+    if (not drew_fit and isinstance(tipo, str) and "monóton" in tipo.lower()
+            and x_max > x_min):
+        try:
+            xt, yt = _ordered_trend(xs_clean, ys_clean)
+            if xt is not None and len(xt) >= 2:
+                ax.plot(xt, yt, color=_FIT_COLOR, linewidth=2, marker="o",
+                        markersize=3)
+        except Exception:
+            pass
+
+    # --- Labels and title.
+    tx = x_label if x_label else "x"
+    ty = y_label if y_label else "y"
+    ax.set_title(f"{tx} ↔ {ty}", fontsize=12, loc="left", pad=8)
+    ax.set_xlabel(x_label)
+    ax.set_ylabel(y_label)
+
+    # --- Corner annotation: relationship type + available metrics.
+    caption_lines = []
+    if tipo:
+        caption_lines.append(str(tipo))
+    metrics_line = _metrics_caption(cls)
+    if metrics_line:
+        caption_lines.append(metrics_line)
+    if caption_lines:
+        ax.text(
+            0.03,
+            0.97,
+            "\n".join(caption_lines),
+            transform=ax.transAxes,
+            ha="left",
+            va="top",
+            fontsize=8,
+            bbox=dict(
+                boxstyle="round,pad=0.35",
+                facecolor="white",
+                edgecolor="#cccccc",
+                alpha=0.85,
+            ),
+        )
+
+    fig.tight_layout()
+    return fig
@@ -0,0 +1,100 @@
+"""Tests para relationship_scatter_figure (scatter de un par numérico, grupo eda).
+
+Usa el backend Agg sin pyplot global; no muestra ni guarda figuras. Cada test
+cierra explícitamente la Figure construida (matplotlib.pyplot.close) para no
+acumular estado entre tests.
+"""
+
+import os
+import sys
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.collections import PathCollection  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from relationship_scatter_figure import relationship_scatter_figure
+
+
+def _scatter_offsets(fig):
+    """Return the plotted points of the first PathCollection (scatter) found."""
+    for ax in fig.axes:
+        for coll in ax.collections:
+            if isinstance(coll, PathCollection):
+                return coll.get_offsets()
+    return None
+
+
+def test_returns_figure():
+    xs = [float(i) for i in range(20)]
+    ys = [2.0 * x + 1.0 for x in xs]  # y = 2x + 1
+    classification = {
+        "tipo": "lineal",
+        "pearson": 1.0,
+        "r2_linear": 1.0,
+        "spearman": 1.0,
+        "r2_poly2": 1.0,
+        "r2_poly3": 1.0,
+        "best_degree": 1,
+        "coeffs": [2.0, 1.0],
+    }
+    fig = relationship_scatter_figure(
+        xs, ys, x_label="a", y_label="b", classification=classification
+    )
+    assert hasattr(fig, "savefig")
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_downsample_determinista():
+    n = 5000
+    xs = [float(i) for i in range(n)]
+    ys = [0.5 * x for x in xs]
+    classification = {
+        "tipo": "lineal",
+        "pearson": 1.0,
+        "r2_linear": 1.0,
+        "spearman": 1.0,
+        "r2_poly2": 1.0,
+        "r2_poly3": 1.0,
+        "best_degree": 1,
+        "coeffs": [0.5, 0.0],
+    }
+    fig = relationship_scatter_figure(
+        xs, ys, x_label="x", y_label="y", classification=classification, max_points=1000
+    )
+    assert isinstance(fig, Figure)
+    offsets = _scatter_offsets(fig)
+    assert offsets is not None
+    # El nº de puntos dibujados no debe exceder el cap.
+    assert len(offsets) <= 1000
+    plt.close(fig)
+
+
+def test_empty_no_lanza():
+    fig = relationship_scatter_figure([], [], x_label="x", y_label="y")
+    assert isinstance(fig, Figure)
+    plt.close(fig)
+
+
+def test_classification_none():
+    # Solo se ejecuta si el módulo hermano classify_relationship_type existe.
+    try:
+        import classify_relationship_type  # noqa: F401
+    except Exception:
+        import pytest
+
+        pytest.skip("classify_relationship_type aún no disponible")
+    xs = [float(i) for i in range(30)]
+    ys = [3.0 * x - 2.0 for x in xs]
+    fig = relationship_scatter_figure(
+        xs, ys, x_label="a", y_label="b", classification=None
+    )
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)