feat(eda): scatters de pares más correlacionados + tipo de relación en capítulo CORRELACION

Añade al capítulo `correlacion` del AutomaticEDA la visualización con scatters de los pares numérico-numérico más correlacionados (positiva y negativamente) y, para cada uno, la clasificación del tipo de relación: lineal, polinómica (grado 2/3), monótona no-lineal o débil/sin forma. Funciones nuevas del registry (dominio datascience, grupo eda): - classify_relationship_type_py_datascience (pura): dadas dos listas numéricas pareadas, cruza Pearson r (lineal), Spearman ρ (monótona) y ajustes polinómicos de grado 2 y 3 (numpy.polyfit + R² manual) para etiquetar la forma. Reusa pearson y spearman_corr del registry. Umbrales calibrados para datos reales discretos/ruidosos (orden: débil → monótona → polinómica → lineal). Devuelve los coeficientes del mejor modelo para pintar la curva. No-throw. - relationship_scatter_figure_py_datascience (impure): construye la Figure matplotlib del scatter de un par con su recta/curva de ajuste y una anotación del tipo + métricas (r, ρ, R²lin, R²poly). Backend Agg sin pyplot global, downsample determinista de los puntos dibujados, tendencia ordenada (binned / por valor) para el caso monótona sin polinomio. Defensiva ante vacío. Capítulo correlacion.py (1.0.0 → 1.1.0): nueva sección "Relaciones más fuertes (scatter)" tras la matriz + tablas top. Toma los top-K pares num↔num por |valor| de profile['correlations']['pairs'], obtiene los datos crudos de cada par desde ctx['raw_numeric'] y emite, por par, un Figure dentro de un Group keep-together junto a una nota de texto con el tipo de relación (extraíble por pdftotext). Solo num↔num: los pares cat↔cat (Cramér's V) y num↔cat (razón de correlación) no llevan scatter. Cuando no hay raw_numeric (perfil lite/agregado o ctx None) los scatters se omiten sin lanzar; la matriz + tablas siguen. Verificado: golden EDA de titanic (run_models) — el capítulo Correlación del PDF y PPTX incluye los scatters (pclass↔fare → monótona no-lineal, sibsp↔parch → lineal, …) con su ajuste y etiqueta de tipo en texto. Tests de clasificación sintética (lineal, y=x² → polinómica, y=exp(x) → monótona, ruido → débil) + tests del capítulo (golden con raw_numeric, edge sin raw, par sin columna). Suite automatic_eda + pipeline render_automatic_eda verde (141 passed). fn index sin error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 20:37:01 +02:00
19 changed files with 1225 additions and 1423 deletions
@@ -72,10 +72,8 @@ from .profile_datetime import profile_datetime
 from .resample_timeseries import resample_timeseries
 from .add_pdf_internal_links import add_pdf_internal_links
 from .suggest_intratable_fk_candidates import suggest_intratable_fk_candidates
-from .draw_join_graph_figure import draw_join_graph_figure

 __all__ = [
-    "draw_join_graph_figure",
    "suggest_intratable_fk_candidates",
    "detect_time_column",
    "extract_timeseries_raw",
@@ -31,7 +31,7 @@ import math

 from .. import model

-CHAPTER_VERSION = "1.0.0"
+CHAPTER_VERSION = "1.1.0"
 CHAPTER_ID = "correlacion"
 CHAPTER_TITLE = "Correlación"

@@ -47,6 +47,13 @@ _MAX_MATRIX_LABELS = 16
 # How many pairs to show in each of the top-positive / top-negative tables.
 _TOP_N = 10

+# How many of the strongest numeric-numeric pairs to draw as scatter plots on
+# each sign (positive / negative). A scatter per pair carries a fitted line/curve
+# and a relationship-type label; keeping the count small keeps the chapter
+# readable on a phone / a slide. Only signed (Pearson/Spearman) pairs qualify —
+# Cramér's V / correlation ratio pairs are not numeric-numeric, so no scatter.
+_SCATTER_TOP_N = 3
+
 # Glossary terms this chapter explains. Each is registered in the shared
 # collector (ctx['glossary']) and marked clickable on its first appearance in the
 # body — the canonical two-step pattern (see ``cat_distr`` for the reference
@@ -314,6 +321,139 @@ def _fdr_text(corr: dict, mark_term: bool = False) -> str | None:
    return " ".join(parts)


+def _is_seq(values) -> bool:
+    """True for a non-empty list/tuple of values (a raw numeric column)."""
+    return isinstance(values, (list, tuple)) and len(values) > 0
+
+
+def _select_scatter_pairs(pairs: list, top_n: int = _SCATTER_TOP_N):
+    """Pick the strongest numeric-numeric pairs to draw as scatters.
+
+    Only signed (Pearson/Spearman) pairs are numeric-numeric and thus eligible
+    for a scatter with a fitted curve. Returns up to ``top_n`` of the strongest
+    positive pairs followed by up to ``top_n`` of the strongest negative ones,
+    each ranked by magnitude. Mixed-type metrics (Cramér's V, correlation ratio,
+    mutual information) are excluded — they have no x/y scatter interpretation.
+    """
+    positive = []
+    negative = []
+    for pair in pairs:
+        if not isinstance(pair, dict) or not _is_signed(pair):
+            continue
+        value = pair.get("value")
+        if not _is_num(value):
+            continue
+        if value > 0:
+            positive.append(pair)
+        elif value < 0:
+            negative.append(pair)
+    positive.sort(key=lambda p: abs(float(p.get("value", 0.0))), reverse=True)
+    negative.sort(key=lambda p: abs(float(p.get("value", 0.0))), reverse=True)
+    return positive[:top_n] + negative[:top_n]
+
+
+def _classification_note(a: str, b: str, cls: dict) -> str:
+    """Human-readable sentence describing the relationship of a pair.
+
+    Plain text (not baked into the figure image) so the type label is selectable
+    in the PDF / extractable by pdftotext, and sits right next to its scatter
+    inside the keep-together Group.
+    """
+    tipo = model._safe_str(cls.get("tipo")) or "sin forma clara"
+    bits = []
+    pearson = cls.get("pearson")
+    spearman = cls.get("spearman")
+    r2_lin = cls.get("r2_linear")
+    r2_poly = None
+    for key in ("r2_poly2", "r2_poly3"):
+        v = cls.get(key)
+        if _is_num(v) and (r2_poly is None or float(v) > r2_poly):
+            r2_poly = float(v)
+    if _is_num(pearson):
+        bits.append(f"Pearson r={float(pearson):+.2f}")
+    if _is_num(spearman):
+        bits.append(f"Spearman ρ={float(spearman):+.2f}")
+    if _is_num(r2_lin):
+        bits.append(f"R² lineal={float(r2_lin):.2f}")
+    if r2_poly is not None:
+        bits.append(f"R² polinómico={r2_poly:.2f}")
+    metrics = "; ".join(bits)
+    text = (f"Relación **{tipo}** entre «{a}» y «{b}»."
+            + (f" {metrics}." if metrics else ""))
+    return text
+
+
+def _scatter_blocks(pairs: list, raw_numeric):
+    """Build keep-together scatter Groups for the strongest num-num pairs.
+
+    Returns a list of blocks (a Heading plus one Group per pair), or an empty
+    list when there is no raw numeric data (e.g. the lite profile drops
+    ``ctx['raw_numeric']`` to skip live recomputation) or the relationship
+    helpers are unavailable. Never raises: any failure degrades to no scatters,
+    leaving the matrix + tables intact.
+    """
+    if not isinstance(raw_numeric, dict) or not raw_numeric:
+        return []
+    selected = _select_scatter_pairs(pairs)
+    if not selected:
+        return []
+
+    # The relationship helpers live in the datascience package. Import lazily so
+    # the chapter still builds (matrix + tables) when they are absent.
+    try:
+        from datascience.classify_relationship_type import (
+            classify_relationship_type,
+        )
+        from datascience.relationship_scatter_figure import (
+            relationship_scatter_figure,
+        )
+    except Exception:  # noqa: BLE001 — degrade, never break the chapter.
+        return []
+
+    groups = []
+    for pair in selected:
+        a = pair.get("a")
+        b = pair.get("b")
+        xs = raw_numeric.get(a)
+        ys = raw_numeric.get(b)
+        # Edge: a selected pair has no raw column (aggregated profile, renamed
+        # column, …) — skip just that pair, keep the rest.
+        if not _is_seq(xs) or not _is_seq(ys):
+            continue
+        try:
+            cls = classify_relationship_type(list(xs), list(ys)) or {}
+        except Exception:  # noqa: BLE001
+            continue
+        a_lbl = model._safe_str(a)
+        b_lbl = model._safe_str(b)
+
+        def _make(xs=xs, ys=ys, a_lbl=a_lbl, b_lbl=b_lbl, cls=cls):
+            return relationship_scatter_figure(
+                list(xs), list(ys), x_label=a_lbl, y_label=b_lbl,
+                classification=cls)
+
+        groups.append(model.Group(blocks=[
+            model.Heading(text=f"{a_lbl} ↔ {b_lbl}", level=2),
+            model.Figure(
+                make=_make,
+                caption=(f"Dispersión de «{a_lbl}» frente a «{b_lbl}» con la "
+                         "curva de ajuste del mejor modelo.")),
+            model.Markdown(text=_classification_note(a_lbl, b_lbl, cls)),
+        ]))
+
+    if not groups:
+        return []
+    intro = model.Markdown(text=(
+        "Para los pares numéricos más fuertes (positivos y negativos) se dibuja "
+        "la nube de puntos con su ajuste y se clasifica el **tipo de relación**: "
+        "**lineal** (una recta basta), **polinómica** (curva de grado 2/3 que "
+        "mejora claramente el ajuste lineal), **monótona no-lineal** (crece o "
+        "decrece siempre pero no en línea recta; Spearman ≫ Pearson) o "
+        "**débil/sin forma**."))
+    return [model.Heading(text="Relaciones más fuertes (scatter)", level=2),
+            intro] + groups
+
+
 def build_correlacion(profile: dict, ctx: dict):
    """Build the Correlation Chapter, or None if there are no pairs to show.

@@ -392,6 +532,18 @@ def build_correlacion(profile: dict, ctx: dict):
            "No se han hallado correlaciones negativas significativas entre "
            "columnas numéricas.")))

+    # 2.5) Scatter plots of the strongest numeric-numeric pairs, each with its
+    # fitted curve and a relationship-type label (lineal / polinómica / monótona
+    # / débil). Needs the raw numeric sample (ctx['raw_numeric'], row-aligned);
+    # when it is absent (aggregated/lite profile) the scatters are simply omitted
+    # and the matrix + tables above stand on their own.
+    raw_numeric = None
+    if isinstance(ctx, dict):
+        raw_numeric = ctx.get("raw_numeric") or profile.get("raw_numeric")
+    else:
+        raw_numeric = profile.get("raw_numeric")
+    blocks.extend(_scatter_blocks(pairs, raw_numeric))
+
    # 3) Spuriousness caveat for level-based correlations (Granger–Newbold).
    caveat = corr.get("levels_caveat")
    if isinstance(caveat, str) and caveat.strip():
@@ -175,6 +175,105 @@ def test_anticorte_matriz_ancha_y_etiquetas_largas_no_se_cortan():
        assert "azufre" in _pdf_text(pdf)


+def _raw_numeric_for_profile(n: int = 80) -> dict:
+    """Row-aligned raw numeric sample matching the signed pairs of _profile().
+
+    Builds columns with a clear, deterministic shape so the relationship-type
+    classifier has something unambiguous to label:
+      - density vs alcohol: strong negative linear (the top-negative pair).
+      - alcohol vs quality: positive linear.
+      - ph, fixed_acidity, sulphates: filler columns for the remaining pairs.
+    """
+    import math as _m
+
+    alcohol = [8.0 + 0.05 * i for i in range(n)]
+    density = [1.0 - 0.002 * a for a in alcohol]           # neg linear vs alcohol
+    quality = [3.0 + 0.4 * a + (0.1 if i % 2 else -0.1)    # pos linear vs alcohol
+               for i, a in enumerate(alcohol)]
+    ph = [3.0 + 0.3 * _m.sin(i / 5.0) for i in range(n)]
+    fixed_acidity = [7.0 - 0.5 * p for p in ph]            # neg linear vs ph
+    sulphates = [0.5 + 0.01 * (i % 7) for i in range(n)]
+    return {
+        "alcohol": alcohol, "density": density, "quality": quality,
+        "ph": ph, "fixed_acidity": fixed_acidity, "sulphates": sulphates,
+    }
+
+
+def test_golden_scatters_de_pares_num_num_con_tipo_de_relacion():
+    """Con ctx['raw_numeric'], el capítulo añade scatters (Figure dentro de Group)
+    de los pares num-num más fuertes, cada uno con su etiqueta de tipo en texto."""
+    from datascience.automatic_eda.model import Group
+
+    ctx = {"raw_numeric": _raw_numeric_for_profile()}
+    ch = build_correlacion(_profile(), ctx)
+    assert ch is not None
+    groups = [b for b in ch.blocks if isinstance(b, Group)]
+    assert groups, "debe emitir al menos un Group con scatter"
+    # Cada Group lleva su figura (lazy) y una nota de texto con el tipo.
+    for g in groups:
+        gkinds = [b.kind for b in g.blocks]
+        assert "figure" in gkinds and "markdown" in gkinds
+    # La sección y la etiqueta de tipo aparecen como texto plano (extraíble).
+    headings = " ".join(b.text for b in ch.blocks if b.kind == "heading")
+    assert "Relaciones más fuertes" in headings
+    body = " ".join(b.text for g in groups for b in g.blocks
+                    if b.kind == "markdown")
+    assert any(t in body for t in
+               ("lineal", "polinómica", "monótona", "sin forma"))
+    # El par num-num más fuerte (density ↔ alcohol) tiene scatter; el par cat-cat
+    # (region ↔ type) NO — no es numérico.
+    assert "density" in body or "alcohol" in body
+    assert "region" not in body and "type" not in body
+
+
+def test_golden_pdf_muestra_scatters_con_etiqueta_de_tipo():
+    """En el PDF, el capítulo Correlación incluye los scatters y su etiqueta de
+    tipo en texto seleccionable (pdftotext la encuentra)."""
+    prof = _profile()
+    ctx = {"raw_numeric": _raw_numeric_for_profile()}
+    with tempfile.TemporaryDirectory() as d:
+        pdf = os.path.join(d, "corr_scatter.pdf")
+        rp = render_automatic_eda_pdf(prof, pdf, {"title": "EDA — wine",
+                                                  "ctx": ctx})
+        assert rp["path"] == pdf and rp["n_pages"] >= 1
+        txt = _pdf_text(pdf)
+        assert "Relaciones" in txt and "scatter" in txt.lower()
+        # Alguna etiqueta de tipo de relación, en texto.
+        assert any(t in txt for t in
+                   ("lineal", "polin", "monóton", "monoton", "sin forma"))
+
+
+def test_edge_sin_raw_numeric_omite_scatters_sin_lanzar():
+    """profile lite / ctx None: sin raw_numeric el capítulo omite los scatters
+    pero sigue emitiendo matriz + tablas (no lanza)."""
+    from datascience.automatic_eda.model import Group
+
+    for ctx in (None, {}, {"raw_numeric": None}, {"raw_numeric": {}}):
+        ch = build_correlacion(_profile(), ctx)
+        assert ch is not None
+        assert not [b for b in ch.blocks if isinstance(b, Group)]
+        # La matriz y al menos una tabla top siguen presentes.
+        assert any(b.kind == "figure" for b in ch.blocks)
+        assert any(b.kind == "data_table" for b in ch.blocks)
+
+
+def test_edge_par_sin_columna_cruda_se_omite_sin_lanzar():
+    """Si un par seleccionado no tiene su columna en raw_numeric, se omite ese
+    par (no lanza); los demás scatters se construyen igual."""
+    from datascience.automatic_eda.model import Group
+
+    raw = _raw_numeric_for_profile()
+    raw.pop("density", None)   # rompe el par density ↔ alcohol
+    ch = build_correlacion(_profile(), {"raw_numeric": raw})
+    assert ch is not None
+    groups = [b for b in ch.blocks if isinstance(b, Group)]
+    body = " ".join(b.text for g in groups for b in g.blocks
+                    if b.kind == "markdown")
+    # density desaparece de los scatters; otros pares (p.ej. ph↔fixed_acidity,
+    # alcohol↔quality) pueden seguir presentes sin error.
+    assert "density" not in body
+
+
 def test_glosario_engancha_metodos_y_fdr():
    """Mejora 4b: los métodos de correlación (Pearson, Spearman, Cramér's V,
    razón de correlación) y la corrección por comparaciones múltiples (FDR) se
@@ -0,0 +1,68 @@
+---
+name: classify_relationship_type
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: pure
+signature: "def classify_relationship_type(xs: list, ys: list) -> dict"
+description: "Clasifica el TIPO de relacion entre dos variables numericas pareadas por indice para el EDA automatico del grupo eda. Limpia los pares de forma defensiva (descarta None/bool/NaN/inf), reusa pearson y spearman_corr del registry y ajusta polinomios de grado 2 y 3 con numpy.polyfit (R^2 manual), y a partir de esas senales etiqueta la forma: 'lineal', 'polinomica (grado 2/3)', 'monotona no-lineal' o 'debil/sin forma'. Orden de decision: debil -> monotona -> polinomica -> lineal (la primera que matchea gana), con umbrales calibrados para datos reales discretos/ruidosos. Devuelve ademas los coeficientes del mejor modelo en orden de numpy.polyval para pintar la curva de ajuste sobre el scatter. Funcion pura no-throw: ante datos insuficientes (menos de 5 pares validos o varianza ~0) o cualquier fallo devuelve el dict canonico con tipo='debil/sin forma' y el resto a None."
+tags: [eda, correlation, relationship, classification, polyfit, datascience, pure]
+params:
+  - name: xs
+    desc: "Lista (o tupla) de valores numericos de la primera variable, pareada por indice con ys. Cada par xs[i],ys[i] se descarta si cualquiera de los dos es None, bool, NaN o inf. Lectura defensiva."
+  - name: ys
+    desc: "Lista (o tupla) de valores numericos de la segunda variable, pareada por indice con xs. Mismas reglas de limpieza que xs."
+output: "Dict con SIEMPRE las mismas 8 claves: tipo (str: 'lineal' | 'polinómica (grado 2)' | 'polinómica (grado 3)' | 'monótona no-lineal' | 'débil/sin forma'); pearson (float|None: coeficiente de Pearson r); r2_linear (float|None: r**2 del ajuste lineal); spearman (float|None: rho de Spearman); r2_poly2 (float|None: R^2 del ajuste polinomico de grado 2); r2_poly3 (float|None: R^2 del ajuste de grado 3); best_degree (int|None: grado del modelo elegido — 1 lineal, 2/3 polinomico, None si monotona/debil); coeffs (list|None: coeficientes del mejor modelo en orden de numpy.polyval para pintar la curva, o None). Ante datos insuficientes o error: tipo='débil/sin forma' y el resto de claves a None."
+uses_functions: [pearson_py_datascience, spearman_corr_py_datascience]
+uses_types: []
+returns: []
+returns_optional: false
+error_type: ""
+imports: [numpy]
+tested: true
+tests: ["test_lineal", "test_polinomica_cuadratica", "test_monotona_no_lineal", "test_monotona_exponencial", "test_debil_sin_forma", "test_lista_vacia_no_lanza", "test_longitudes_distintas_no_lanza", "test_todos_none_no_lanza", "test_entradas_none_no_lanza", "test_constante_no_lanza", "test_filtra_nan_inf_bool"]
+test_file_path: "python/functions/datascience/classify_relationship_type_test.py"
+file_path: "python/functions/datascience/classify_relationship_type.py"
+---
+
+## Ejemplo
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience.classify_relationship_type import classify_relationship_type
+import numpy as np
+
+# Relacion claramente cuadratica (forma de parabola) sobre dominio simetrico.
+x = list(np.linspace(-10, 10, 60))
+y = [v * v for v in x]
+
+res = classify_relationship_type(x, y)
+print(res["tipo"])         # 'polinómica (grado 2)'
+print(res["best_degree"])  # 2
+print(res["r2_linear"])    # 0.0   -> el Pearson lineal no ve la parabola
+print(res["r2_poly2"])     # 1.0
+print(res["coeffs"])       # [1.0, -0.0, -0.0]  -> numpy.polyval(coeffs, x) ~ x**2
+
+# El capitulo pinta la curva de ajuste cuando coeffs no es None:
+#   if res["coeffs"] is not None:
+#       xs_fit = np.linspace(min(x), max(x), 200)
+#       ys_fit = np.polyval(res["coeffs"], xs_fit)
+#       ax.plot(xs_fit, ys_fit)   # curva sobre el ax.scatter(x, y)
+```
+
+## Cuando usarla
+
+- Usala en el capitulo de relaciones/correlaciones del EDA automatico, despues de detectar dos columnas numericas con alguna asociacion, para decidir QUE curva de ajuste pintar sobre el scatter (recta, parabola, cubica o ninguna) y poner una etiqueta legible al tipo de relacion.
+- Cuando un Pearson bajo no signifique "sin relacion": esta funcion cruza Pearson con Spearman y con ajustes polinomicos para distinguir una relacion lineal debil de una monotona no-lineal (que el rango si capta) o de una curva polinomica.
+- Cuando necesites un punto de entrada determinista y no-throw que, con los mismos datos, devuelva siempre el mismo `tipo` y los mismos `coeffs` listos para `numpy.polyval` sin tener que ajustar modelos a mano en el capitulo.
+
+## Gotchas
+
+- Funcion pura, deterministica y no-throw: ante menos de 5 pares validos, varianza ~0 (xs o ys constante) o cualquier excepcion interna devuelve el dict canonico `tipo="débil/sin forma"` con el resto de claves a `None`. El dict SIEMPRE trae las 8 claves: nunca compruebes existencia, comprueba `None`.
+- El orden de decision importa: `débil -> monótona -> polinómica -> lineal` (la primera que matchee gana). La monotonia se evalua ANTES que el ajuste polinomico, asi que una curva monotona suave (exp, log, potencias) sale `monótona no-lineal` aunque un cubico tambien la ajuste — la dominancia del rango (Spearman >> Pearson) es la senal mas interpretable. Solo cae en `polinómica` una forma curva NO monotona (p.ej. una parabola, Spearman ~0 pero R^2 polinomico alto).
+- Umbrales fijos (calibrados para EDA con datos discretos/ruidosos, no para inferencia formal): `débil/sin forma` si las tres senales son bajas a la vez (`abs(pearson) < 0.3` y `abs(spearman) < 0.3` y `mejor_poly < 0.3`); `monótona no-lineal` si `abs(spearman) - abs(pearson) >= 0.1` y `abs(spearman) >= 0.4`; `polinómica (grado N)` si el mejor polinomico mejora `>= 0.1` sobre el lineal y su R^2 `>= 0.3`; en cualquier otro caso con senal (no debil) `lineal`. El suelo de 0.3 evita llamar "debil" a relaciones reales pero discretas (conteos, escalas ordinales) con R^2 bajo pero direccion clara.
+- `coeffs` va en orden de `numpy.polyval` (grado descendente). Para `lineal` es `[pendiente, intercepto]` (grado 1); para `polinómica` los del grado elegido; para `monótona no-lineal` y `débil/sin forma` es `None` (el scatter pintara una curva suavizada o nada — lo decide el capitulo, no esta funcion).
+- `best_degree` prefiere el grado 2 sobre el 3 cuando empatan dentro de 0.02 de R^2 (parsimonia): no esperes grado 3 salvo que mejore claramente.
+- Los pares con `None`, `bool`, `NaN` o `inf` se descartan por indice en silencio; `bool` cuenta como no-numerico (un `True` no es `1`). El dominio de los datos afecta al resultado: una parabola sobre un dominio simetrico da Pearson ~0 (sale `polinómica`), pero sobre un dominio asimetrico el Pearson sube y puede salir `lineal`.
@@ -0,0 +1,187 @@
+"""Clasifica el TIPO de relacion entre dos variables numericas pareadas.
+
+Funcion pura del grupo eda. Dadas dos listas numericas pareadas por indice,
+limpia los pares de forma defensiva, calcula correlaciones lineal (Pearson) y de
+rangos (Spearman) y ajustes polinomicos de grado 2 y 3, y a partir de esas
+senales etiqueta la forma de la relacion para el EDA automatico:
+
+    "lineal" | "polinómica (grado 2)" | "polinómica (grado 3)" |
+    "monótona no-lineal" | "débil/sin forma"
+
+Ademas devuelve los coeficientes del mejor modelo (en orden de numpy.polyval)
+para que el capitulo pinte la curva de ajuste sobre el scatter. Reusa las
+funciones del registry `pearson` y `spearman_corr` en vez de reimplementarlas.
+
+NUNCA lanza: ante cualquier fallo o dato insuficiente devuelve el dict canonico
+con tipo="débil/sin forma" y el resto de claves a None.
+"""
+
+import math
+import warnings
+
+import numpy as np
+
+from datascience.datascience import pearson
+from datascience.spearman_corr import spearman_corr
+
+# Forma canonica de la respuesta cuando no se puede clasificar (datos
+# insuficientes, varianza nula o error interno). Siempre las mismas claves.
+_WEAK = {
+    "tipo": "débil/sin forma",
+    "pearson": None,
+    "r2_linear": None,
+    "spearman": None,
+    "r2_poly2": None,
+    "r2_poly3": None,
+    "best_degree": None,
+    "coeffs": None,
+}
+
+
+def _is_num(v) -> bool:
+    """True si v es un numero real finito (int/float, no bool, no NaN, no inf)."""
+    return (
+        isinstance(v, (int, float))
+        and not isinstance(v, bool)
+        and not (isinstance(v, float) and (math.isnan(v) or math.isinf(v)))
+    )
+
+
+def _poly_r2(coeffs, x_arr, y_arr, ss_tot: float) -> float:
+    """R^2 de un ajuste polinomico: 1 - SS_res/SS_tot. 0 si SS_tot==0."""
+    if ss_tot == 0.0:
+        return 0.0
+    pred = np.polyval(coeffs, x_arr)
+    ss_res = float(np.sum((y_arr - pred) ** 2))
+    return 1.0 - ss_res / ss_tot
+
+
+def classify_relationship_type(xs: list, ys: list) -> dict:
+    """Clasifica el tipo de relacion entre dos variables numericas pareadas.
+
+    Empareja xs[i],ys[i] por indice y descarta el par si cualquiera de los dos
+    es None, bool, NaN o inf. Sobre los pares limpios calcula Pearson r
+    (r2_linear = r**2), Spearman rho y los R^2 de ajustes polinomicos de grado 2
+    y 3 (con numpy.polyfit + R^2 manual). Con esas senales decide la etiqueta.
+
+    Orden de evaluacion de la etiqueta (la primera que matchee gana). Los
+    umbrales estan calibrados para datos reales, a menudo discretos y ruidosos
+    (conteos, escalas ordinales): una relacion con |r| >= 0.3, |rho| >= 0.3 o un
+    polinomio con R^2 >= 0.3 ya tiene FORMA y no debe etiquetarse como "debil".
+        1. "débil/sin forma" — todas las senales bajas a la vez:
+           abs(pearson) < 0.3 y abs(spearman) < 0.3 y mejor_poly < 0.3.
+        2. "monótona no-lineal" — el rango (Spearman) capta una monotonia que el
+           Pearson lineal no: abs(spearman) - abs(pearson) >= 0.1 y
+           abs(spearman) >= 0.4. No se fuerza un polinomio (coeffs/best_degree =
+           None); el capitulo dibuja la tendencia ordenada sobre el scatter.
+        3. "polinómica (grado N)" — el mejor polinomico mejora claramente sobre
+           el lineal (mejor_poly - r2_linear >= 0.1) y mejor_poly >= 0.3. N es el
+           grado (2 o 3) con mejor R^2, prefiriendo el 2 si empatan dentro de 0.02
+           (parsimonia).
+        4. "lineal" — el resto: hay senal (no es debil) y la forma que existe es
+           esencialmente lineal. best_degree=1, coeffs del ajuste de grado 1.
+
+    Si hay menos de 5 pares validos, o la varianza de xs o de ys es ~0
+    (constante), devuelve directamente "débil/sin forma".
+
+    Args:
+        xs: lista (o tupla) de valores numericos de la primera variable,
+            pareada por indice con ys. Pares con None/bool/NaN/inf se descartan.
+        ys: lista (o tupla) de valores numericos de la segunda variable,
+            pareada por indice con xs.
+
+    Returns:
+        dict con SIEMPRE las mismas claves:
+            tipo (str), pearson (float|None), r2_linear (float|None),
+            spearman (float|None), r2_poly2 (float|None), r2_poly3 (float|None),
+            best_degree (int|None: 1, 2, 3 o None),
+            coeffs (list|None: coeficientes en orden de numpy.polyval, o None).
+        Nunca lanza: ante fallo o datos insuficientes devuelve el dict debil.
+    """
+    try:
+        if xs is None or ys is None:
+            return dict(_WEAK)
+
+        pairs = [
+            (float(x), float(y))
+            for x, y in zip(xs, ys)
+            if _is_num(x) and _is_num(y)
+        ]
+
+        # Datos insuficientes para hablar de forma de la relacion.
+        if len(pairs) < 5:
+            return dict(_WEAK)
+
+        clean_x = [p[0] for p in pairs]
+        clean_y = [p[1] for p in pairs]
+
+        # Varianza ~0 en cualquiera de las series => relacion indefinida.
+        if len(set(clean_x)) < 2 or len(set(clean_y)) < 2:
+            return dict(_WEAK)
+        x_arr = np.asarray(clean_x, dtype=float)
+        y_arr = np.asarray(clean_y, dtype=float)
+        if float(np.var(x_arr)) < 1e-15 or float(np.var(y_arr)) < 1e-15:
+            return dict(_WEAK)
+
+        # Correlaciones reutilizando las funciones del registry.
+        r = pearson(clean_x, clean_y)
+        spearman = spearman_corr(clean_x, clean_y)
+        r2_linear = r ** 2
+
+        # Ajustes polinomicos grado 2 y 3 con R^2 manual.
+        ss_tot = float(np.sum((y_arr - float(np.mean(y_arr))) ** 2))
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            c1 = np.polyfit(x_arr, y_arr, 1)
+            c2 = np.polyfit(x_arr, y_arr, 2)
+            c3 = np.polyfit(x_arr, y_arr, 3)
+        r2_poly2 = _poly_r2(c2, x_arr, y_arr, ss_tot)
+        r2_poly3 = _poly_r2(c3, x_arr, y_arr, ss_tot)
+
+        mejor_poly = max(r2_poly2, r2_poly3)
+        # Grado del mejor polinomico, con preferencia por la parsimonia: solo se
+        # elige el grado 3 si supera al grado 2 por mas de 0.02.
+        best_poly_degree = 3 if (r2_poly3 - r2_poly2) > 0.02 else 2
+
+        abs_s = abs(spearman)
+        abs_p = abs(r)
+
+        # Decision en orden: debil-temprano -> monotona -> polinomica -> lineal.
+        if abs_p < 0.3 and abs_s < 0.3 and mejor_poly < 0.3:
+            # Ninguna senal supera el suelo de forma: relacion debil/sin forma.
+            tipo = "débil/sin forma"
+            best_degree = None
+            coeffs = None
+        elif (abs_s - abs_p) >= 0.1 and abs_s >= 0.4:
+            # Spearman (rango) capta una monotonia que el Pearson lineal no:
+            # relacion monotona no-lineal. No se fuerza un polinomio que tal vez
+            # no ajusta bien; el capitulo dibuja la tendencia ordenada.
+            tipo = "monótona no-lineal"
+            best_degree = None
+            coeffs = None
+        elif (mejor_poly - r2_linear) >= 0.1 and mejor_poly >= 0.3:
+            tipo = "polinómica (grado {})".format(best_poly_degree)
+            best_degree = best_poly_degree
+            best_coeffs = c2 if best_poly_degree == 2 else c3
+            coeffs = [float(c) for c in best_coeffs]
+        else:
+            # Hay senal (no es debil) y no es ni monotona-pura ni polinomica:
+            # la correlacion que existe es esencialmente lineal.
+            tipo = "lineal"
+            best_degree = 1
+            coeffs = [float(c) for c in c1]
+
+        return {
+            "tipo": tipo,
+            "pearson": round(float(r), 6),
+            "r2_linear": round(float(r2_linear), 6),
+            "spearman": round(float(spearman), 6),
+            "r2_poly2": round(float(r2_poly2), 6),
+            "r2_poly3": round(float(r2_poly3), 6),
+            "best_degree": best_degree,
+            "coeffs": (
+                [round(c, 8) for c in coeffs] if coeffs is not None else None
+            ),
+        }
+    except Exception:
+        return dict(_WEAK)
@@ -0,0 +1,174 @@
+"""Tests para classify_relationship_type."""
+
+import os
+import sys
+
+import numpy as np
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+from classify_relationship_type import classify_relationship_type
+
+# Claves que el dict de salida debe contener SIEMPRE.
+_EXPECTED_KEYS = {
+    "tipo", "pearson", "r2_linear", "spearman",
+    "r2_poly2", "r2_poly3", "best_degree", "coeffs",
+}
+
+
+def _assert_shape(r):
+    """Toda salida tiene exactamente las 8 claves canonicas."""
+    assert isinstance(r, dict)
+    assert set(r.keys()) == _EXPECTED_KEYS
+
+
+def test_lineal():
+    """Golden: y = 2x + 1 con ruido pequeno -> 'lineal', best_degree=1."""
+    rng = np.random.default_rng(42)
+    x = np.linspace(0.0, 10.0, 50)
+    y = 2.0 * x + 1.0 + rng.normal(0.0, 0.3, 50)
+
+    r = classify_relationship_type(list(x), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"] == "lineal"
+    assert r["best_degree"] == 1
+    assert r["r2_linear"] >= 0.5
+    # coeffs ~ [pendiente, intercepto] del ajuste de grado 1.
+    assert r["coeffs"] is not None and len(r["coeffs"]) == 2
+    assert abs(r["coeffs"][0] - 2.0) < 0.1   # pendiente ~2
+    assert abs(r["coeffs"][1] - 1.0) < 0.3   # intercepto ~1
+
+
+def test_polinomica_cuadratica():
+    """Golden: y = x**2 sobre [-10, 10] -> 'polinómica', best_degree in (2, 3)."""
+    x = np.linspace(-10.0, 10.0, 60)
+    y = x ** 2
+
+    r = classify_relationship_type(list(x), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"].startswith("polinómica")
+    assert r["best_degree"] in (2, 3)
+    # Una parabola perfecta queda capturada por el grado 2 (parsimonia).
+    assert r["best_degree"] == 2
+    assert r["r2_poly2"] > 0.99
+    assert r["coeffs"] is not None and len(r["coeffs"]) == r["best_degree"] + 1
+
+
+def test_monotona_no_lineal():
+    """Golden: monotona convexa de cola pesada -> 'monótona no-lineal'.
+
+    y = 1/(N+1-i)**2 es estrictamente creciente (Spearman ~ 1) pero su cola
+    explosiva hace que ni la recta ni un polinomio de grado 2/3 la ajusten
+    (R^2 polinomico < 0.5), de modo que el Pearson lineal NO capta la relacion
+    que el rango (Spearman) si ve. Construccion deterministica (sin azar).
+    """
+    n = 200
+    i = np.arange(n, dtype=float)
+    y = 1.0 / (n + 1 - i) ** 2
+
+    r = classify_relationship_type(list(i), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"] == "monótona no-lineal"
+    assert r["best_degree"] is None
+    assert r["coeffs"] is None
+    # Spearman fuerte y claramente por encima del Pearson.
+    assert abs(r["spearman"]) >= 0.5
+    assert abs(r["spearman"]) - abs(r["pearson"]) >= 0.15
+
+
+def test_monotona_exponencial():
+    """DoD literal: y = exp(x) (monotona no-lineal) -> 'monótona no-lineal'.
+
+    exp es estrictamente creciente (Spearman = 1) pero el Pearson lineal queda
+    claramente por debajo (~0.86), así que la dominancia del rango la marca como
+    monótona no-lineal en vez de lineal o polinómica.
+    """
+    x = np.linspace(0.0, 5.0, 80)
+    y = np.exp(x)
+
+    r = classify_relationship_type(list(x), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"] == "monótona no-lineal"
+    assert r["best_degree"] is None and r["coeffs"] is None
+    assert abs(r["spearman"]) >= 0.9
+    assert abs(r["spearman"]) - abs(r["pearson"]) >= 0.1
+
+
+def test_debil_sin_forma():
+    """Golden: x e y independientes (semilla fija) -> 'débil/sin forma'."""
+    rng = np.random.default_rng(0)
+    x = rng.normal(0.0, 1.0, 200)
+    y = rng.normal(0.0, 1.0, 200)
+
+    r = classify_relationship_type(list(x), list(y))
+    _assert_shape(r)
+
+    assert r["tipo"] == "débil/sin forma"
+    assert r["best_degree"] is None
+    assert r["coeffs"] is None
+    # Todas las senales son bajas.
+    assert abs(r["pearson"]) < 0.3
+    assert r["r2_linear"] < 0.1
+
+
+def test_lista_vacia_no_lanza():
+    """Edge: listas vacias -> dict debil canonico, sin lanzar."""
+    r = classify_relationship_type([], [])
+    _assert_shape(r)
+    assert r["tipo"] == "débil/sin forma"
+    assert r["pearson"] is None
+    assert r["r2_linear"] is None
+    assert r["spearman"] is None
+    assert r["r2_poly2"] is None
+    assert r["r2_poly3"] is None
+    assert r["best_degree"] is None
+    assert r["coeffs"] is None
+
+
+def test_longitudes_distintas_no_lanza():
+    """Edge: listas de distinta longitud -> empareja por indice, no lanza."""
+    # zip trunca a la longitud minima: solo 3 pares (< 5) -> debil.
+    r = classify_relationship_type([1, 2, 3, 4, 5, 6, 7, 8], [1.0, 2.0, 3.0])
+    _assert_shape(r)
+    assert r["tipo"] == "débil/sin forma"
+    assert r["best_degree"] is None
+
+
+def test_todos_none_no_lanza():
+    """Edge: todos los valores None -> ningun par valido -> debil, no lanza."""
+    r = classify_relationship_type([None, None, None, None, None, None],
+                                   [None, None, None, None, None, None])
+    _assert_shape(r)
+    assert r["tipo"] == "débil/sin forma"
+    assert r["coeffs"] is None
+
+
+def test_entradas_none_no_lanza():
+    """Edge: xs/ys None directamente -> debil, no lanza."""
+    assert classify_relationship_type(None, None)["tipo"] == "débil/sin forma"
+    assert classify_relationship_type([1.0, 2.0], None)["tipo"] == "débil/sin forma"
+
+
+def test_constante_no_lanza():
+    """Edge: ys constante (varianza ~0) -> debil, no lanza."""
+    r = classify_relationship_type([1, 2, 3, 4, 5, 6, 7], [5, 5, 5, 5, 5, 5, 5])
+    _assert_shape(r)
+    assert r["tipo"] == "débil/sin forma"
+
+
+def test_filtra_nan_inf_bool():
+    """Edge: pares con NaN/inf/bool/None se descartan por indice."""
+    nan = float("nan")
+    inf = float("inf")
+    # Solo i=0,1,2,3,4 quedan validos (5 pares) y forman una recta perfecta.
+    xs = [0.0, 1.0, 2.0, 3.0, 4.0, nan, inf, True, None]
+    ys = [1.0, 3.0, 5.0, 7.0, 9.0, 1.0, 2.0, 3.0, 4.0]
+    r = classify_relationship_type(xs, ys)
+    _assert_shape(r)
+    # Los 5 pares validos son y = 2x + 1 exacto -> lineal.
+    assert r["tipo"] == "lineal"
+    assert r["best_degree"] == 1
@@ -1,103 +0,0 @@
---
-id: draw_join_graph_figure_py_datascience
-name: draw_join_graph_figure
-kind: function
-lang: py
-domain: datascience
-version: "1.0.0"
-purity: impure
-signature: "def draw_join_graph_figure(join_graph: dict, title: str = None) -> \"matplotlib.figure.Figure\""
-description: "Rasteriza el join graph de una base (relaciones FK inter-tabla, salida de build_join_graph) a un matplotlib.figure.Figure: nodos circulares con el nombre de cada tabla (hubs en color de acento cálido, el resto neutro) y aristas dirigidas etiquetadas from_col→to_col (más la cardinalidad si viene). Es la contrapartida dibujada del string Mermaid para que el capítulo de relaciones del informe AutomaticEDA muestre un diagrama real. Layout networkx spring_layout determinista (seed=42), backend Agg sin abrir ventanas; defensivo: nunca lanza y nunca hace I/O."
-tags: [eda, plot, relations, graph, matplotlib, figure, networkx, datascience, impure]
-uses_functions: []
-uses_types: []
-returns: []
-returns_optional: false
-error_type: "error_go_core"
-imports: [matplotlib, networkx]
-example: |
-  from draw_join_graph_figure import draw_join_graph_figure
-  join_graph = {
-      "nodes": [
-          {"table": "customers", "out_degree": 0, "in_degree": 1, "role": "dimension"},
-          {"table": "orders", "out_degree": 1, "in_degree": 0, "role": "fact"},
-      ],
-      "edges": [
-          {"from_table": "orders", "from_col": "customer_id",
-           "to_table": "customers", "to_col": "id", "cardinality": "N:1"},
-      ],
-      "hubs": ["orders"],
-  }
-  fig = draw_join_graph_figure(join_graph, title="Relaciones FK")
-  fig.savefig("/tmp/join_graph.png")
-tested: true
-tests:
-  - "test_returns_figure_with_axis"
-  - "test_savefig_produces_nonempty_png"
-  - "test_empty_dict_does_not_raise_and_savefig_png"
-  - "test_none_does_not_raise_and_savefig_png"
-test_file_path: "python/functions/datascience/draw_join_graph_figure_test.py"
-file_path: "python/functions/datascience/draw_join_graph_figure.py"
-params:
-  - name: join_graph
-    desc: "Dict producido por build_join_graph. Claves: `nodes` (list[dict] con table, out_degree, in_degree, role), `edges` (list[dict] con from_table, from_col, to_table, to_col y opcional cardinality/inclusion) y `hubs` (list[str] de tablas hub a destacar en color cálido). Claves ausentes, items no-dict, None o {} se toleran (devuelve Figure con texto, sin lanzar). Los nombres de nodo se derivan también de las aristas, así que un grafo con edges pero sin nodes explícitos igual se dibuja."
-  - name: title
-    desc: "Título dibujado sobre el diagrama. Si se omite (None) se usa \"Join graph\". Default None."
-output: "Un matplotlib.figure.Figure (figsize 7x5) con un único Axes que contiene el diagrama node-link dirigido: tablas como nodos circulares etiquetados (hubs en acento cálido #DD8452, resto en azul neutro #4C72B0) y FKs como flechas dirigidas con etiqueta from_col→to_col (+ cardinalidad). Si join_graph no tiene nodos ni aristas (o es None/{}), devuelve igualmente una Figure con el texto centrado \"Sin relaciones FK detectadas.\"; ante cualquier fallo interno devuelve una Figure con un mensaje genérico (nunca lanza). El caller rasteriza/cierra la figura; la función no la muestra ni la guarda."
---
-
-## Ejemplo
-
-```python
-from draw_join_graph_figure import draw_join_graph_figure
-
-# `join_graph` es la salida de build_join_graph (nodes + edges + hubs).
-join_graph = {
-    "nodes": [
-        {"table": "customers", "out_degree": 0, "in_degree": 1, "role": "dimension"},
-        {"table": "orders", "out_degree": 2, "in_degree": 0, "role": "fact"},
-        {"table": "products", "out_degree": 0, "in_degree": 1, "role": "dimension"},
-    ],
-    "edges": [
-        {"from_table": "orders", "from_col": "customer_id",
-         "to_table": "customers", "to_col": "id", "cardinality": "N:1"},
-        {"from_table": "orders", "from_col": "product_id",
-         "to_table": "products", "to_col": "id", "cardinality": "N:1"},
-    ],
-    "hubs": ["orders"],  # `orders` se pinta en color de acento (tabla de hechos)
-}
-
-fig = draw_join_graph_figure(join_graph, title="Relaciones FK")
-
-# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
-fig.savefig("/tmp/join_graph.png")
-```
-
-## Cuando usarla
-
-Úsala en el capítulo de relaciones de un informe AutomaticEDA cuando quieras un
-diagrama **dibujado** del esquema relacional, no solo el bloque Mermaid pegable.
-Pásale directamente la salida de `build_join_graph` (`nodes` + `edges` + `hubs`)
-y obtienes una `matplotlib.figure.Figure` lista para que el renderer perezoso la
-rasterice. Es la pareja visual del string Mermaid: Mermaid sirve para pegar en
-Markdown/docs que lo soporten; esta función produce la imagen real (PNG/PDF) que
-va embebida en informes que no renderizan Mermaid.
-
-## Gotchas
-
- **Impura por matplotlib.** Fija el backend `Agg` al importar — no abre
-  ventanas ni depende de un display. Segura de llamar en lotes desde el
-  renderer.
- **Layout determinista (`seed=42`).** Usa `nx.spring_layout(G, seed=42)`, así
-  que la misma entrada produce el mismo diagrama (test reproducible). Para
-  grafos de 0/1 nodos usa una posición fija centrada en vez del spring layout.
- **No hace I/O.** No llama `plt.show()` ni guarda a disco — solo devuelve la
-  `Figure`. Quien la consume la rasteriza y la libera (`plt.close(fig)`) para no
-  acumular memoria en informes con muchas tablas.
- **Devuelve una Figure, NO un dict.** A diferencia de `build_join_graph` (que
-  devuelve el dict del grafo), esta función devuelve el objeto de figura ya
-  dibujado.
- **Defensiva, nunca lanza.** `None`, `{}`, claves ausentes o items malformados
-  se manejan sin error: en el peor caso devuelve una `Figure` con
-  "Sin relaciones FK detectadas." (vacío) o un mensaje genérico (fallo interno).
-  No la envuelvas en try/except por miedo a un raise — no lo hay.
@@ -1,214 +0,0 @@
-"""Impure EDA helper: rasterize a join graph to a matplotlib Figure (`eda` group).
-
-Takes the join graph produced by ``build_join_graph`` (inter-table FK relations)
-and draws it as a directed node-link diagram on a ready-to-rasterize
-``matplotlib.figure.Figure``. Hub tables (the ones with the highest out-degree,
-candidate fact tables of a star schema) are highlighted in a warm accent colour;
-the rest use a neutral colour. Directed edges carry a ``from_col→to_col`` label
-(plus the cardinality when present).
-
-This is the *drawn* counterpart of the Mermaid string that ``build_join_graph``
-also emits: the relations chapter of an AutomaticEDA report can show a real
-picture instead of only the pasteable Mermaid block.
-
-Impure because it touches matplotlib's rendering machinery. It pins the headless
-Agg backend and a deterministic ``spring_layout`` seed so the output is
-reproducible. It never raises: on any internal failure (or empty input) it
-returns a ``Figure`` carrying a centered message, so the lazy render of the
-document is never broken.
-"""
-
-import matplotlib
-
-matplotlib.use("Agg")
-
-import matplotlib.pyplot as plt  # noqa: E402
-import networkx as nx  # noqa: E402
-
-# Warm accent reserved for hub tables (candidate fact tables / star-schema cores).
-_HUB_COLOR = "#DD8452"
-# Neutral blue for every other table.
-_NODE_COLOR = "#4C72B0"
-# Muted gray for the empty/error message text.
-_MUTED_TEXT = "#5f6b7a"
-# Edge colour and label colour.
-_EDGE_COLOR = "#7a7a7a"
-_EDGE_LABEL_COLOR = "#34495e"
-# Constant node size; shared with the edge drawing so arrowheads stop at the
-# node boundary instead of being hidden under the marker.
-_NODE_SIZE = 2200
-
-
-def _text_figure(message: str) -> "matplotlib.figure.Figure":
-    """Return a blank Figure carrying a single centered message.
-
-    Used both for the "no relations" case and as the never-raise fallback.
-    """
-    fig, ax = plt.subplots(figsize=(7, 5))
-    ax.axis("off")
-    ax.text(
-        0.5,
-        0.5,
-        message,
-        ha="center",
-        va="center",
-        fontsize=12,
-        color=_MUTED_TEXT,
-        transform=ax.transAxes,
-    )
-    fig.tight_layout()
-    return fig
-
-
-def _edge_label(edge: dict) -> str:
-    """Build the ``from_col→to_col`` label of an edge, appending cardinality."""
-    fc = edge.get("from_col")
-    tc = edge.get("to_col")
-    if fc is not None and tc is not None:
-        label = f"{fc}→{tc}"
-    elif fc is not None:
-        label = str(fc)
-    elif tc is not None:
-        label = str(tc)
-    else:
-        label = ""
-    card = edge.get("cardinality")
-    if card:
-        label = f"{label} ({card})" if label else str(card)
-    return label
-
-
-def draw_join_graph_figure(join_graph: dict, title: str = None):
-    """Rasterize a join graph to a matplotlib Figure.
-
-    Builds a ``networkx.DiGraph`` from the graph's nodes and edges, lays it out
-    with a deterministic ``spring_layout`` (``seed=42``) and draws it on a
-    ``matplotlib.figure.Figure``: tables as labelled circular nodes (hubs in a
-    warm accent, the rest neutral) and FK relations as directed arrows labelled
-    ``from_col→to_col`` (plus cardinality when available).
-
-    The function never raises. On empty/``None`` input it returns a Figure with
-    a centered "Sin relaciones FK detectadas." message; on any internal failure
-    it returns a Figure with a generic centered message. It never shows the
-    figure nor writes it to disk — the document renderer rasterizes it.
-
-    Args:
-        join_graph: Dict produced by ``build_join_graph`` with keys ``nodes``
-            (list of ``{table, out_degree, in_degree, role}``), ``edges`` (list
-            of ``{from_table, from_col, to_table, to_col, cardinality?,
-            inclusion?}``) and ``hubs`` (list of hub table names to highlight).
-            Missing keys, non-dict items, ``None`` or ``{}`` are all tolerated.
-        title: Optional title drawn above the diagram. When omitted, the title
-            defaults to "Join graph".
-
-    Returns:
-        A ``matplotlib.figure.Figure`` (figsize 7x5) with a single Axes holding
-        the node-link diagram. The caller rasterizes/closes it.
-    """
-    try:
-        jg = join_graph if isinstance(join_graph, dict) else {}
-        nodes = jg.get("nodes") or []
-        edges = jg.get("edges") or []
-        hubs = {h for h in (jg.get("hubs") or []) if h is not None}
-
-        # Collect node names from the declared nodes and, defensively, from the
-        # edges (so a graph with edges but no explicit nodes still draws).
-        node_names: list = []
-        seen: set = set()
-
-        def _register(name) -> None:
-            if name is not None and name not in seen:
-                seen.add(name)
-                node_names.append(name)
-
-        for n in nodes:
-            if isinstance(n, dict):
-                _register(n.get("table"))
-        for e in edges:
-            if isinstance(e, dict):
-                _register(e.get("from_table"))
-                _register(e.get("to_table"))
-
-        if not node_names:
-            return _text_figure("Sin relaciones FK detectadas.")
-
-        graph = nx.DiGraph()
-        for name in node_names:
-            graph.add_node(name)
-
-        edge_labels: dict = {}
-        for e in edges:
-            if not isinstance(e, dict):
-                continue
-            ft = e.get("from_table")
-            tt = e.get("to_table")
-            if ft is None or tt is None:
-                continue
-            graph.add_edge(ft, tt)
-            edge_labels[(ft, tt)] = _edge_label(e)
-
-        fig, ax = plt.subplots(figsize=(7, 5))
-
-        # Deterministic layout. Fixed positions for trivial graphs so a single
-        # node sits centered instead of at an arbitrary spring-layout point.
-        if graph.number_of_nodes() <= 1:
-            pos = {name: (0.5, 0.5) for name in graph.nodes()}
-        else:
-            pos = nx.spring_layout(graph, seed=42)
-
-        node_colors = [
-            _HUB_COLOR if name in hubs else _NODE_COLOR for name in graph.nodes()
-        ]
-        nx.draw_networkx_nodes(
-            graph,
-            pos,
-            ax=ax,
-            node_color=node_colors,
-            node_size=_NODE_SIZE,
-            node_shape="o",
-            edgecolors="white",
-            linewidths=1.5,
-        )
-        nx.draw_networkx_labels(
-            graph,
-            pos,
-            ax=ax,
-            font_size=9,
-            font_color="white",
-            font_weight="bold",
-        )
-        nx.draw_networkx_edges(
-            graph,
-            pos,
-            ax=ax,
-            arrows=True,
-            arrowstyle="-|>",
-            arrowsize=18,
-            edge_color=_EDGE_COLOR,
-            width=1.4,
-            connectionstyle="arc3,rad=0.06",
-            node_size=_NODE_SIZE,
-        )
-        if any(lbl for lbl in edge_labels.values()):
-            nx.draw_networkx_edge_labels(
-                graph,
-                pos,
-                edge_labels=edge_labels,
-                ax=ax,
-                font_size=7,
-                font_color=_EDGE_LABEL_COLOR,
-                bbox={
-                    "boxstyle": "round,pad=0.2",
-                    "fc": "white",
-                    "ec": "none",
-                    "alpha": 0.7,
-                },
-            )
-
-        ax.set_title(title if title else "Join graph", fontsize=13)
-        ax.axis("off")
-        fig.tight_layout()
-        return fig
-    except Exception:
-        # Never raise — the document render is lazy and must not be broken.
-        return _text_figure("No se pudo dibujar el join graph.")
@@ -1,84 +0,0 @@
-"""Tests para draw_join_graph_figure (rasteriza el join graph, grupo eda).
-
-Usa el backend Agg sin abrir ventanas; cada test cierra la Figure construida
-(matplotlib.pyplot.close) para no acumular estado entre tests. Las aserciones de
-guardado escriben a tmp_path (fixture de pytest) y comprueban que el PNG no está
-vacío.
-"""
-
-import matplotlib
-
-matplotlib.use("Agg")
-
-import matplotlib.pyplot as plt  # noqa: E402
-from matplotlib.figure import Figure  # noqa: E402
-
-from draw_join_graph_figure import draw_join_graph_figure
-
-
-def _make_join_graph():
-    """Join graph mínimo: 3 nodos (customers/orders/products) y 2 aristas.
-
-    orders -> customers y orders -> products. `orders` es el hub (out_degree 2).
-    """
-    return {
-        "nodes": [
-            {"table": "customers", "out_degree": 0, "in_degree": 1, "role": "dimension"},
-            {"table": "orders", "out_degree": 2, "in_degree": 0, "role": "fact"},
-            {"table": "products", "out_degree": 0, "in_degree": 1, "role": "dimension"},
-        ],
-        "edges": [
-            {
-                "from_table": "orders",
-                "from_col": "customer_id",
-                "to_table": "customers",
-                "to_col": "id",
-                "cardinality": "N:1",
-                "inclusion": 1.0,
-            },
-            {
-                "from_table": "orders",
-                "from_col": "product_id",
-                "to_table": "products",
-                "to_col": "id",
-                "cardinality": "N:1",
-                "inclusion": 0.98,
-            },
-        ],
-        "hubs": ["orders"],
-    }
-
-
-def test_returns_figure_with_axis():
-    fig = draw_join_graph_figure(_make_join_graph(), title="Relaciones FK")
-    assert isinstance(fig, Figure)
-    # Al menos un eje con el diagrama.
-    assert len(fig.axes) >= 1
-    plt.close(fig)
-
-
-def test_savefig_produces_nonempty_png(tmp_path):
-    fig = draw_join_graph_figure(_make_join_graph())
-    out = tmp_path / "g.png"
-    fig.savefig(out)
-    assert out.exists()
-    assert out.stat().st_size > 0
-    plt.close(fig)
-
-
-def test_empty_dict_does_not_raise_and_savefig_png(tmp_path):
-    fig = draw_join_graph_figure({})
-    assert isinstance(fig, Figure)
-    out = tmp_path / "empty.png"
-    fig.savefig(out)
-    assert out.stat().st_size > 0
-    plt.close(fig)
-
-
-def test_none_does_not_raise_and_savefig_png(tmp_path):
-    fig = draw_join_graph_figure(None)
-    assert isinstance(fig, Figure)
-    out = tmp_path / "none.png"
-    fig.savefig(out)
-    assert out.stat().st_size > 0
-    plt.close(fig)
@@ -0,0 +1,122 @@
+---
+id: relationship_scatter_figure_py_datascience
+name: relationship_scatter_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def relationship_scatter_figure(xs: list, ys: list, x_label: str = \"\", y_label: str = \"\", classification: dict = None, max_points: int = 2000) -> \"matplotlib.figure.Figure\""
+description: "Construye una figura matplotlib scatter de un par de variables numéricas con su curva/recta de ajuste y una anotación del tipo de relación (lineal, polinómica grado 2/3, monótona no-lineal, etc.) más sus métricas (r, ρ, R²lin, R²poly). Consume el dict de classify_relationship_type; si es None lo calcula internamente reusando esa función. Devuelve un matplotlib.figure.Figure listo para rasterizar por el renderer del informe EDA (PDF/PPTX). Backend Agg sin pyplot global; downsample determinista de los puntos dibujados; defensivo ante vacío/None."
+tags: [eda, correlation, scatter, relationship, matplotlib, figure, visualization, datascience, impure]
+uses_functions: [classify_relationship_type_py_datascience]
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib, numpy]
+example: |
+  from relationship_scatter_figure import relationship_scatter_figure
+  xs = [float(i) for i in range(100)]
+  ys = [0.5 * x * x - x + 3 for x in xs]
+  classification = {
+      "tipo": "polinómica (grado 2)", "pearson": 0.97, "spearman": 0.99,
+      "r2_linear": 0.92, "r2_poly2": 0.999, "r2_poly3": 0.999,
+      "best_degree": 2, "coeffs": [0.5, -1.0, 3.0],
+  }
+  fig = relationship_scatter_figure(xs, ys, x_label="dosis", y_label="efecto", classification=classification)
+tested: true
+tests:
+  - "test_returns_figure"
+  - "test_downsample_determinista"
+  - "test_empty_no_lanza"
+  - "test_classification_none"
+test_file_path: "python/functions/datascience/relationship_scatter_figure_test.py"
+file_path: "python/functions/datascience/relationship_scatter_figure.py"
+params:
+  - name: xs
+    desc: "Lista (o tupla) de valores x. Se emparejan por índice con ys. Valores None, bool, NaN o inf descartan ese par (lectura defensiva)."
+  - name: ys
+    desc: "Lista (o tupla) de valores y, paralela a xs. Mismas reglas defensivas que xs."
+  - name: x_label
+    desc: "Etiqueta del eje/título para la variable x. Default \"\" (en el título cae a \"x\")."
+  - name: y_label
+    desc: "Etiqueta del eje/título para la variable y. Default \"\" (en el título cae a \"y\")."
+  - name: classification
+    desc: "Opcional. Dict de classify_relationship_type con claves tipo, pearson, r2_linear, spearman, r2_poly2, r2_poly3, best_degree, coeffs. Si es None se calcula internamente importando y llamando a classify_relationship_type sobre los pares limpios (self-contained). Si el módulo hermano no está disponible, se dibuja el scatter sin curva de ajuste ni anotación. Default None."
+  - name: max_points
+    desc: "Tope del nº de puntos DIBUJADOS. Si los pares limpios superan el tope, la nube se submuestrea por paso fijo ceil(n/max_points) tomando pairs[::step] — DETERMINISTA, no aleatorio, reproducible. La clasificación/ajuste usa SIEMPRE todos los pares limpios; el downsample solo adelgaza el dibujo. Valor no-positivo o no-int desactiva el downsample. Default 2000."
+output: "Un matplotlib.figure.Figure (figsize 6.4x4.0, dpi 150) con un Axes scatter (puntos semitransparentes alpha 0.5, color #4C72B0), la curva/recta de ajuste (numpy.polyval sobre coeffs, color #C44E52) cuando hay un ajuste polinómico disponible, título \"{x_label} ↔ {y_label}\", labels de ejes y una caja de anotación en la esquina superior izquierda con el tipo de relación y las métricas disponibles (r, ρ, R²lin, R²poly; se omiten las None). Si tras la limpieza hay menos de 2 pares válidos, devuelve igualmente una Figure con un texto centrado \"Sin datos suficientes para el scatter\" (nunca lanza). El caller rasteriza/cierra la figura; la función no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+from relationship_scatter_figure import relationship_scatter_figure
+
+# Par numérico con relación cuadrática y su clasificación (de
+# classify_relationship_type). Pasándola explícita evitas recomputarla.
+xs = [float(i) for i in range(100)]
+ys = [0.5 * x * x - x + 3 for x in xs]
+classification = {
+    "tipo": "polinómica (grado 2)",
+    "pearson": 0.97,
+    "spearman": 0.99,
+    "r2_linear": 0.92,
+    "r2_poly2": 0.999,
+    "r2_poly3": 0.999,
+    "best_degree": 2,
+    "coeffs": [0.5, -1.0, 3.0],
+}
+
+fig = relationship_scatter_figure(
+    xs, ys, x_label="dosis", y_label="efecto", classification=classification
+)
+
+# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
+fig.savefig("/tmp/scatter_dosis_efecto.png")
+
+# Con classification=None la función la calcula internamente (self-contained):
+fig2 = relationship_scatter_figure(xs, ys, x_label="dosis", y_label="efecto")
+```
+
+## Cuando usarla
+
+Úsala dentro del informe EDA automático cuando quieras visualizar de un vistazo
+la relación entre dos variables numéricas: la nube de puntos, la curva que mejor
+la ajusta y una etiqueta legible del tipo de relación con sus métricas. Es la
+pareja "vista humana" de `classify_relationship_type`: esa función decide el
+tipo y los coeficientes; esta los pinta en una `Figure` que el renderer del
+informe rasteriza a PDF/PPTX. Pásale el dict de clasificación si ya lo tienes
+calculado (evitas recomputar el ajuste); si no, déjalo en `None` y la función lo
+resuelve sola sobre los pares limpios. Pensada para móvil: anotación pequeña
+(fontsize 8) y nube adelgazada por `max_points` para que el PDF no pese.
+
+## Gotchas
+
+- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
+  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
+  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
+  es thread-safe; esta función lo evita construyendo el `Figure` directamente,
+  así que es segura de llamar en bucle desde el renderer.
+- **El caller cierra la figura.** Devuelve el `Figure` pero no lo muestra ni lo
+  guarda. Quien la consume debe rasterizarla y luego liberarla
+  (`matplotlib.pyplot.close(fig)`) para no acumular memoria en lotes grandes de
+  pares de columnas.
+- **Downsample determinista, solo del dibujo.** Cuando los pares limpios superan
+  `max_points`, la nube DIBUJADA se adelgaza por paso fijo `pairs[::step]`
+  (reproducible, no aleatorio). La clasificación y el ajuste usan SIEMPRE todos
+  los pares limpios; el downsample no altera las métricas ni la curva.
+- **`classification=None` ⇒ se calcula sola.** Importa y llama a
+  `classify_relationship_type` sobre los pares limpios. Si ese módulo hermano no
+  está disponible (entorno incompleto), NO lanza: dibuja el scatter sin curva de
+  ajuste ni anotación. Pasar la clasificación explícita es más barato (no
+  recomputa el ajuste).
+- **Sin curva para `monótona no-lineal`.** Cuando `coeffs` es `None` o
+  `best_degree` es `None` (p.ej. tipo "monótona no-lineal"), no se pinta recta
+  polinómica — solo la nube y la anotación. Tampoco se dibuja la curva si el
+  rango de x es nulo (todos los x iguales). Nunca falla por esto.
+- **Defensiva, nunca lanza.** `xs=[]`, `ys=[]`, menos de 2 pares válidos, ends
+  `None`/`bool`/`NaN`/`inf` o `coeffs` malformado se manejan sin error: en el
+  peor caso devuelve una `Figure` con "Sin datos suficientes para el scatter".
+  No envuelvas la llamada en try/except por miedo a un raise — no lo hay.
@@ -0,0 +1,322 @@
+"""Impure EDA helper: scatter figure of a numeric pair with its fit (`eda` group).
+
+Builds a matplotlib scatter of two numeric variables, overlays the fitted
+curve/line implied by the relationship classification (linear, polynomial of
+degree 2/3, etc.) and annotates the relationship type with its available
+metrics. Returns a ready-to-rasterize ``matplotlib.figure.Figure``; it never
+shows nor saves it.
+
+Impure because it touches matplotlib's rendering machinery. It uses the headless
+Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
+global state and is safe to call repeatedly from a report renderer.
+
+To keep the rendered PDF/PPTX light on phones, when the number of valid pairs
+exceeds ``max_points`` the *plotted* points are down-sampled DETERMINISTICALLY by
+a fixed step (``pairs[::step]``), never randomly, so the output is reproducible.
+The classification/fit always uses every clean pair; the down-sample only thins
+the drawn cloud.
+"""
+
+import math
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import numpy as np  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+# Sober blue for the scatter cloud and red for the fitted curve (Tufte: the
+# data points are the primary ink, the fit is the secondary highlight).
+_POINT_COLOR = "#4C72B0"
+_FIT_COLOR = "#C44E52"
+# Muted gray for the no-data fallback message.
+_MUTED_TEXT = "#5f6b7a"
+
+
+def _finite(value):
+    """Coerce ``value`` to a finite float, or return None when not usable.
+
+    bool is a subclass of int, but a real numeric measurement is never a bool,
+    so True/False are treated as missing instead of coercing to 1.0/0.0. NaN and
+    +/-infinity are never valid either.
+    """
+    if value is None or isinstance(value, bool):
+        return None
+    try:
+        f = float(value)
+    except (TypeError, ValueError):
+        return None
+    if math.isnan(f) or math.isinf(f):
+        return None
+    return f
+
+
+def _clean_pairs(xs, ys):
+    """Pair ``xs[i], ys[i]`` by index, dropping any pair with a non-finite end."""
+    pairs = []
+    if isinstance(xs, (list, tuple)) and isinstance(ys, (list, tuple)):
+        n = min(len(xs), len(ys))
+        for i in range(n):
+            x = _finite(xs[i])
+            y = _finite(ys[i])
+            if x is None or y is None:
+                continue
+            pairs.append((x, y))
+    return pairs
+
+
+def _ordered_trend(xs_clean, ys_clean, n_bins: int = 12):
+    """Return (x_trend, y_trend): the ordered trend of y over x for a monotonic
+    relationship that has no polynomial fit.
+
+    When x has few distinct values (an ordinal/discrete scale) the trend is the
+    mean of y per distinct x value. Otherwise x is split into ``n_bins`` ordered
+    quantile bins and each point is (mean x, mean y) of the bin. Returns
+    ``(None, None)`` when there is nothing meaningful to draw.
+    """
+    x_arr = np.asarray(xs_clean, dtype=float)
+    y_arr = np.asarray(ys_clean, dtype=float)
+    if x_arr.size < 2:
+        return None, None
+    uniq = np.unique(x_arr)
+    if uniq.size <= max(2, n_bins):
+        # Discrete x: one trend point per distinct value (mean y).
+        xt = uniq
+        yt = np.array([float(np.mean(y_arr[x_arr == ux])) for ux in uniq])
+        return xt, yt
+    # Continuous x: ordered quantile bins, (mean x, mean y) per bin.
+    order = np.argsort(x_arr, kind="stable")
+    x_sorted = x_arr[order]
+    y_sorted = y_arr[order]
+    chunks_x = np.array_split(x_sorted, n_bins)
+    chunks_y = np.array_split(y_sorted, n_bins)
+    xt = np.array([float(np.mean(cx)) for cx in chunks_x if cx.size])
+    yt = np.array([float(np.mean(cy)) for cy in chunks_y if cy.size])
+    return xt, yt
+
+
+def _no_data_figure(message: str) -> "matplotlib.figure.Figure":
+    """A bare Figure carrying a centered muted message (defensive fallback)."""
+    fig = Figure(figsize=(6.4, 4.0), dpi=150)
+    ax = fig.add_subplot(111)
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=12,
+        color=_MUTED_TEXT,
+        transform=ax.transAxes,
+    )
+    fig.tight_layout()
+    return fig
+
+
+def _metrics_caption(classification: dict) -> str:
+    """Format the available metrics of a classification dict into one line.
+
+    Omits the metrics that are None. Keys consumed (any may be absent/None):
+    ``pearson`` (r), ``spearman`` (rho), ``r2_linear`` (R²lin) and the best
+    polynomial R² (``r2_poly3`` if a cubic was the best fit, else ``r2_poly2``).
+    """
+    parts = []
+    r = _finite(classification.get("pearson"))
+    if r is not None:
+        parts.append(f"r={r:.2f}")
+    rho = _finite(classification.get("spearman"))
+    if rho is not None:
+        parts.append(f"ρ={rho:.2f}")
+    r2_lin = _finite(classification.get("r2_linear"))
+    if r2_lin is not None:
+        parts.append(f"R²lin={r2_lin:.2f}")
+    # Prefer the R² of the best polynomial degree when it is a poly fit.
+    best_degree = classification.get("best_degree")
+    r2_poly = None
+    if best_degree == 3:
+        r2_poly = _finite(classification.get("r2_poly3"))
+    elif best_degree == 2:
+        r2_poly = _finite(classification.get("r2_poly2"))
+    if r2_poly is None:
+        # Fall back to whichever poly R² is present (cubic first).
+        r2_poly = _finite(classification.get("r2_poly3"))
+        if r2_poly is None:
+            r2_poly = _finite(classification.get("r2_poly2"))
+    if r2_poly is not None:
+        parts.append(f"R²poly={r2_poly:.2f}")
+    return "  ".join(parts)
+
+
+def relationship_scatter_figure(
+    xs: list,
+    ys: list,
+    x_label: str = "",
+    y_label: str = "",
+    classification: dict = None,
+    max_points: int = 2000,
+) -> "matplotlib.figure.Figure":
+    """Build a scatter figure of a numeric pair with its fit and a type label.
+
+    Cleans the pairs defensively (drops any pair with a None/bool/NaN/inf end),
+    plots a semi-transparent scatter cloud (down-sampled deterministically when
+    it exceeds ``max_points``), overlays the polynomial fit implied by
+    ``classification`` and annotates the relationship type plus its available
+    metrics in a corner box.
+
+    The fit and classification always use every clean pair; only the drawn cloud
+    is thinned by the down-sample. When ``classification`` is None it is computed
+    internally by reusing ``classify_relationship_type`` over the clean pairs, so
+    the function is self-contained.
+
+    The function is fully defensive: empty input, fewer than 2 clean pairs, a
+    missing/None ``coeffs`` or a missing sibling classifier never raise. When
+    there is nothing valid to draw it still returns a ``Figure`` carrying a
+    centered "Sin datos suficientes para el scatter" message.
+
+    Args:
+        xs: List (or tuple) of x values. Paired by index with ``ys``. Values that
+            are None, bool, NaN or infinite discard that pair. Read defensively.
+        ys: List (or tuple) of y values, parallel to ``xs``. Same defensive rules.
+        x_label: Axis/title label for the x variable. Default "" (falls back to
+            "x" in the title).
+        y_label: Axis/title label for the y variable. Default "" (falls back to
+            "y" in the title).
+        classification: Optional dict from ``classify_relationship_type`` with
+            keys ``tipo, pearson, r2_linear, spearman, r2_poly2, r2_poly3,
+            best_degree, coeffs``. When None, it is computed internally by
+            importing and calling ``classify_relationship_type`` over the clean
+            pairs. When that sibling module is unavailable, the scatter is still
+            drawn (no fit curve, no annotation).
+        max_points: Cap on the number of *plotted* points. When the number of
+            clean pairs exceeds this cap, the drawn cloud is down-sampled by a
+            fixed step ``ceil(n/max_points)`` taking ``pairs[::step]`` —
+            DETERMINISTIC, not random, so the figure is reproducible. A
+            non-positive or non-int value disables down-sampling. Default 2000.
+
+    Returns:
+        A ``matplotlib.figure.Figure`` (figsize 6.4x4.0, dpi 150) with a single
+        scatter Axes, the fitted curve (when a polynomial fit is available) and a
+        corner annotation with the relationship type and metrics. When there are
+        fewer than 2 clean pairs it returns a Figure with a centered "Sin datos
+        suficientes para el scatter" message. The caller rasterizes/closes it.
+    """
+    pairs = _clean_pairs(xs, ys)
+    if len(pairs) < 2:
+        return _no_data_figure("Sin datos suficientes para el scatter")
+
+    # Full clean coordinates feed the classification/fit; the plotted cloud is
+    # what gets thinned.
+    xs_clean = [p[0] for p in pairs]
+    ys_clean = [p[1] for p in pairs]
+
+    # Resolve the classification. If not provided, reuse the sibling classifier
+    # over ALL clean pairs (self-contained). Missing module => no fit/annotation.
+    cls = classification
+    if cls is None:
+        try:
+            from classify_relationship_type import classify_relationship_type
+
+            cls = classify_relationship_type(xs_clean, ys_clean)
+        except Exception:
+            cls = None
+    if not isinstance(cls, dict):
+        cls = {}
+
+    # --- Deterministic down-sampling of the DRAWN points only.
+    n_total = len(pairs)
+    if (
+        isinstance(max_points, int)
+        and not isinstance(max_points, bool)
+        and max_points > 0
+        and n_total > max_points
+    ):
+        step = math.ceil(n_total / max_points)
+        sampled = pairs[::step]
+    else:
+        sampled = pairs
+
+    x_plot = [p[0] for p in sampled]
+    y_plot = [p[1] for p in sampled]
+
+    fig = Figure(figsize=(6.4, 4.0), dpi=150)
+    ax = fig.add_subplot(111)
+
+    ax.scatter(
+        x_plot,
+        y_plot,
+        s=12,
+        alpha=0.5,
+        color=_POINT_COLOR,
+        edgecolors="none",
+        rasterized=True,
+    )
+
+    # --- Fitted curve/line over the full clean x range.
+    coeffs = cls.get("coeffs")
+    best_degree = cls.get("best_degree")
+    tipo = cls.get("tipo")
+    x_min, x_max = min(xs_clean), max(xs_clean)
+    drew_fit = False
+    if coeffs is not None and best_degree is not None and x_max > x_min:
+        try:
+            coeff_arr = np.asarray(coeffs, dtype=float)
+            if coeff_arr.ndim == 1 and coeff_arr.size > 0 and np.all(np.isfinite(coeff_arr)):
+                x_line = np.linspace(x_min, x_max, 200)
+                y_line = np.polyval(coeff_arr, x_line)
+                if np.all(np.isfinite(y_line)):
+                    ax.plot(x_line, y_line, color=_FIT_COLOR, linewidth=2)
+                    drew_fit = True
+        except Exception:
+            # Never fail the figure because of a malformed coeffs array.
+            pass
+
+    # A monotonic non-linear relationship has no fitted polynomial (coeffs is
+    # None by design — a low-degree polynomial would mislead). Draw instead the
+    # ordered trend of y over x so the reader still sees the shape: y averaged
+    # within ordered x-bins (or per distinct x value when x is discrete with few
+    # levels, e.g. an ordinal scale). Defensive: any failure leaves the cloud.
+    if (not drew_fit and isinstance(tipo, str) and "monóton" in tipo.lower()
+            and x_max > x_min):
+        try:
+            xt, yt = _ordered_trend(xs_clean, ys_clean)
+            if xt is not None and len(xt) >= 2:
+                ax.plot(xt, yt, color=_FIT_COLOR, linewidth=2, marker="o",
+                        markersize=3)
+        except Exception:
+            pass
+
+    # --- Labels and title.
+    tx = x_label if x_label else "x"
+    ty = y_label if y_label else "y"
+    ax.set_title(f"{tx} ↔ {ty}", fontsize=12, loc="left", pad=8)
+    ax.set_xlabel(x_label)
+    ax.set_ylabel(y_label)
+
+    # --- Corner annotation: relationship type + available metrics.
+    caption_lines = []
+    if tipo:
+        caption_lines.append(str(tipo))
+    metrics_line = _metrics_caption(cls)
+    if metrics_line:
+        caption_lines.append(metrics_line)
+    if caption_lines:
+        ax.text(
+            0.03,
+            0.97,
+            "\n".join(caption_lines),
+            transform=ax.transAxes,
+            ha="left",
+            va="top",
+            fontsize=8,
+            bbox=dict(
+                boxstyle="round,pad=0.35",
+                facecolor="white",
+                edgecolor="#cccccc",
+                alpha=0.85,
+            ),
+        )
+
+    fig.tight_layout()
+    return fig
@@ -0,0 +1,100 @@
+"""Tests para relationship_scatter_figure (scatter de un par numérico, grupo eda).
+
+Usa el backend Agg sin pyplot global; no muestra ni guarda figuras. Cada test
+cierra explícitamente la Figure construida (matplotlib.pyplot.close) para no
+acumular estado entre tests.
+"""
+
+import os
+import sys
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.collections import PathCollection  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from relationship_scatter_figure import relationship_scatter_figure
+
+
+def _scatter_offsets(fig):
+    """Return the plotted points of the first PathCollection (scatter) found."""
+    for ax in fig.axes:
+        for coll in ax.collections:
+            if isinstance(coll, PathCollection):
+                return coll.get_offsets()
+    return None
+
+
+def test_returns_figure():
+    xs = [float(i) for i in range(20)]
+    ys = [2.0 * x + 1.0 for x in xs]  # y = 2x + 1
+    classification = {
+        "tipo": "lineal",
+        "pearson": 1.0,
+        "r2_linear": 1.0,
+        "spearman": 1.0,
+        "r2_poly2": 1.0,
+        "r2_poly3": 1.0,
+        "best_degree": 1,
+        "coeffs": [2.0, 1.0],
+    }
+    fig = relationship_scatter_figure(
+        xs, ys, x_label="a", y_label="b", classification=classification
+    )
+    assert hasattr(fig, "savefig")
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_downsample_determinista():
+    n = 5000
+    xs = [float(i) for i in range(n)]
+    ys = [0.5 * x for x in xs]
+    classification = {
+        "tipo": "lineal",
+        "pearson": 1.0,
+        "r2_linear": 1.0,
+        "spearman": 1.0,
+        "r2_poly2": 1.0,
+        "r2_poly3": 1.0,
+        "best_degree": 1,
+        "coeffs": [0.5, 0.0],
+    }
+    fig = relationship_scatter_figure(
+        xs, ys, x_label="x", y_label="y", classification=classification, max_points=1000
+    )
+    assert isinstance(fig, Figure)
+    offsets = _scatter_offsets(fig)
+    assert offsets is not None
+    # El nº de puntos dibujados no debe exceder el cap.
+    assert len(offsets) <= 1000
+    plt.close(fig)
+
+
+def test_empty_no_lanza():
+    fig = relationship_scatter_figure([], [], x_label="x", y_label="y")
+    assert isinstance(fig, Figure)
+    plt.close(fig)
+
+
+def test_classification_none():
+    # Solo se ejecuta si el módulo hermano classify_relationship_type existe.
+    try:
+        import classify_relationship_type  # noqa: F401
+    except Exception:
+        import pytest
+
+        pytest.skip("classify_relationship_type aún no disponible")
+    xs = [float(i) for i in range(30)]
+    ys = [3.0 * x - 2.0 for x in xs]
+    fig = relationship_scatter_figure(
+        xs, ys, x_label="a", y_label="b", classification=None
+    )
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
@@ -34,7 +34,6 @@ from .upsert_xlsx_sheet import upsert_xlsx_sheet
 from .duckdb_query_readonly import duckdb_query_readonly
 from .duckdb_execute import duckdb_execute
 from .duckdb_upsert import duckdb_upsert
-from .load_folder_to_duckdb import load_folder_to_duckdb
 from .imap_connect import imap_connect
 from .imap_list_mailboxes import imap_list_mailboxes
 from .imap_search import imap_search
@@ -51,7 +50,6 @@ __all__ = [
    "upsert_xlsx_sheet",
    "duckdb_query_readonly",
    "duckdb_execute",
-    "load_folder_to_duckdb",
    "duckdb_upsert",
    "pg_insert_rows",
    "pg_apply_sql",
@@ -1,100 +0,0 @@
---
-name: load_folder_to_duckdb
-kind: function
-lang: py
-domain: infra
-version: "1.0.0"
-purity: impure
-signature: "def load_folder_to_duckdb(folder: str, db_path: str = None, pattern: str = '*.csv,*.parquet,*.json') -> dict"
-description: "Escanea el primer nivel de una CARPETA buscando archivos tabulares (CSV/TSV/TXT, Parquet, JSON/NDJSON) y los carga como tablas en una base DuckDB usando los lectores nativos read_csv_auto/read_parquet/read_json_auto. Es la pieza de entrada del EDA a nivel de carpeta (grupo eda). Por cada archivo crea una tabla cuyo nombre se deriva del basename saneado a [0-9a-zA-Z_] en minusculas (prefijo t_ si empieza por digito, sufijos _2/_3 ante colisiones, tabla_<i> si queda vacio). El path se escapa (comilla simple '->'') antes de interpolarlo porque los lectores DuckDB no aceptan el path como parametro posicional. Glob NO recursivo: un glob.glob(os.path.join(folder, g)) por cada patron del CSV, dedup y ordenado. db_path=None genera una DuckDB temporal (mkstemp, se borra el placeholder vacio porque DuckDB rechaza un archivo de 0 bytes) y devuelve su ruta. Un fallo al cargar un archivo concreto no aborta el resto: se registra en errors y se continua. Devuelve siempre un dict sin lanzar (estilo del grupo duckdb): {status:'ok', db_path, tables, errors} en exito (carpeta sin archivos tabulares incluida, tables=[]) y {status:'error', error} cuando la carpeta no existe o falla algo global. Depende del paquete duckdb (1.5.2)."
-tags: [eda, duckdb, ingest, etl, folder]
-uses_functions: []
-uses_types: []
-returns: []
-returns_optional: false
-error_type: "error_py_core"
-imports: [glob, os, re, tempfile, duckdb]
-params:
-  - name: folder
-    desc: "ruta a un directorio. Se escanea solo su primer nivel (NO recursivo). Si no existe o no es un directorio devuelve {status:'error'} sin lanzar."
-  - name: db_path
-    desc: "ruta del archivo DuckDB destino, abierto en modo read-write (lo crea si no existe). None (default) genera una DuckDB temporal unica con tempfile.mkstemp y devuelve su ruta en el campo db_path del retorno. DuckDB es single-writer: si otro proceso lo tiene abierto en escritura, connect falla con error de lock devuelto en el dict."
-  - name: pattern
-    desc: "CSV de globs separados por coma (default '*.csv,*.parquet,*.json'). Cada glob se aplica con glob.glob(os.path.join(folder, g)) sobre el primer nivel de folder; los resultados de todos los globs se deduplican y ordenan. Los globs con ** NO descienden recursivamente (glob.glob sin recursive=True)."
-output: "dict. En exito: {status:'ok', db_path:str (ruta DuckDB usada), tables:[{name:str, source_file:str, n_rows:int}], errors:[{name?:str, source_file:str, error:str}]}. La carpeta sin archivos tabulares es un exito con tables=[] y errors=[]. En error (sin lanzar): {status:'error', error:str}."
-tested: true
-tests:
-  - "test_carga_dos_csv_como_tablas"
-  - "test_db_path_none_crea_temporal"
-  - "test_carpeta_vacia_es_ok_sin_tablas"
-  - "test_carpeta_inexistente_devuelve_status_error"
-test_file_path: "python/functions/infra/load_folder_to_duckdb_test.py"
-file_path: "python/functions/infra/load_folder_to_duckdb.py"
---
-
-## Ejemplo
-
-```python
-import sys
-sys.path.insert(0, "python/functions")
-from infra.load_folder_to_duckdb import load_folder_to_duckdb
-
-# Preparar una carpeta de demo con dos CSV.
-import os
-os.makedirs("/tmp/eda_folder_demo", exist_ok=True)
-with open("/tmp/eda_folder_demo/ventas.csv", "w") as f:
-    f.write("id,total\n1,10.5\n2,20.0\n3,5.25\n")
-with open("/tmp/eda_folder_demo/clientes.csv", "w") as f:
-    f.write("id,nombre\n1,ana\n2,luis\n")
-
-# Cargar todos los tabulares de la carpeta a una DuckDB temporal.
-res = load_folder_to_duckdb("/tmp/eda_folder_demo")
-print(res["status"])    # ok
-print(res["db_path"])   # /tmp/tmpXXXXXXXX.duckdb (temporal)
-for t in res["tables"]:
-    print(t["name"], t["n_rows"])   # ventas 3  /  clientes 2
-
-# Persistir en una DuckDB concreta y limitar a CSV.
-res2 = load_folder_to_duckdb(
-    "/tmp/eda_folder_demo",
-    db_path="/tmp/eda_folder_demo/folder.duckdb",
-    pattern="*.csv",
-)
-print(res2["tables"])   # [{'name': 'clientes', ...}, {'name': 'ventas', ...}]
-```
-
-## Cuando usarla
-
-Cuando tienes una carpeta de datos sueltos (un dump, un export, varios CSV/Parquet
-descargados) y quieres analizarlos juntos con SQL sin montar la ingesta a mano,
-archivo por archivo. Es el primer eslabon del EDA a nivel de carpeta (grupo `eda`):
-deja una DuckDB con una tabla por archivo, lista para perfilar con
-`duckdb_table_schema_py_infra`, consultar con `duckdb_query_readonly_py_infra`, o
-correlacionar aguas abajo. Usala antes de cualquier paso de perfilado cuando la
-unidad de trabajo es "todos los archivos de este directorio".
-
-## Gotchas
-
- **Glob NO recursivo**: solo se escanea el primer nivel de `folder`. Archivos en
-  subdirectorios se ignoran (ni siquiera con `**` en el patron, porque
-  `glob.glob` se llama sin `recursive=True`). Si necesitas recursion, aplana la
-  carpeta antes o amplia la funcion.
- **Saneo de nombres de tabla**: el basename se reduce a `[0-9a-zA-Z_]` en
-  minusculas. `Ventas 2024.csv` -> tabla `ventas_2024`. Dos archivos distintos
-  pueden sanear al mismo nombre (`a-b.csv` y `a_b.csv`); el segundo se desambigua
-  con sufijo `_2`, `_3`, ... El mapeo real archivo->tabla esta en `tables[].name`
-  / `tables[].source_file`, no lo asumas.
- **`read_json_auto` requiere JSON tabular** (array de objetos u objetos NDJSON
-  homogeneos). Un JSON anidado o irregular puede fallar la carga de ESA tabla; el
-  error se registra en `errors` y el resto de archivos siguen cargandose.
- **Extension desconocida = se salta**, no falla: queda anotada en `errors` con
-  `unsupported extension`. Mapeo de lectores: `.csv/.tsv/.txt`->`read_csv_auto`,
-  `.parquet/.pq`->`read_parquet`, `.json/.ndjson`->`read_json_auto`.
- **Escritura real en disco (impura)**. DuckDB es single-writer: si otro proceso
-  tiene `db_path` abierto en escritura, `connect` falla con error de lock devuelto
-  en el dict. Un `db_path` con un directorio padre inexistente tambien falla.
- **`db_path=None` crea un archivo temporal que NO se borra solo**: la ruta se
-  devuelve en `db_path` para que el llamador la consuma y la limpie cuando termine.
- **Tipos inferidos por los lectores `_auto`**: los tipos de columna los infiere
-  DuckDB. Revisa el schema con `duckdb_table_schema_py_infra` si el tipado importa
-  aguas abajo.
@@ -1,175 +0,0 @@
-"""Carga una carpeta de archivos tabulares (CSV/Parquet/JSON) como tablas DuckDB.
-
-Funcion impura: escanea el primer nivel de un directorio buscando archivos que
-casen con uno o varios globs, y por cada archivo crea una tabla en una base
-DuckDB usando los lectores nativos (`read_csv_auto`, `read_parquet`,
-`read_json_auto`). Es la pieza de entrada del EDA a nivel de carpeta (grupo
-`eda`): deja una DuckDB con una tabla por archivo, lista para perfilar y
-correlacionar aguas abajo.
-
-Devuelve siempre un dict sin lanzar excepciones, siguiendo el estilo del grupo
-duckdb del registry: {status:'ok', db_path, tables, errors} en exito (incluida
-la carpeta sin archivos tabulares, que es un exito con tables=[]) y
-{status:'error', error:str} cuando la carpeta no existe o falla algo global.
-
-El nombre de cada tabla se deriva del basename del archivo, saneado a
-`[0-9a-zA-Z_]` en minusculas, prefijado con `t_` si empieza por digito, y
-desambiguado con sufijos `_2`, `_3`, ... ante colisiones. El path del archivo se
-escapa (comilla simple, `'`->`''`) antes de interpolarlo en el SQL del lector,
-ya que los lectores DuckDB no admiten el path como parametro posicional. Un fallo
-al cargar un archivo concreto NO aborta el resto: se registra en `errors` y se
-continua con los siguientes.
-"""
-
-import glob
-import os
-import re
-import tempfile
-
-
-def _sanitize_table_name(basename_no_ext: str, index: int) -> str:
-    """Deriva un identificador de tabla valido desde el basename de un archivo.
-
-    Reemplaza todo lo que no sea ``[0-9a-zA-Z_]`` por ``_`` y baja a minusculas.
-    Si tras el saneo queda vacio, usa ``tabla_<index>``. Si empieza por digito,
-    prefija ``t_`` para que sea un identificador SQL valido.
-    """
-    name = re.sub(r"[^0-9a-zA-Z_]", "_", basename_no_ext).lower()
-    if not name:
-        name = f"tabla_{index}"
-    if name[0].isdigit():
-        name = "t_" + name
-    return name
-
-
-def _reader_for_extension(ext: str, quoted_path: str):
-    """Devuelve la expresion de lector DuckDB para una extension, o None.
-
-    El ``quoted_path`` ya viene escapado y entre comillas simples. Extensiones
-    desconocidas devuelven None para que el llamador salte el archivo.
-    """
-    ext = ext.lower()
-    if ext in (".csv", ".tsv", ".txt"):
-        return f"read_csv_auto('{quoted_path}')"
-    if ext in (".parquet", ".pq"):
-        return f"read_parquet('{quoted_path}')"
-    if ext in (".json", ".ndjson"):
-        return f"read_json_auto('{quoted_path}')"
-    return None
-
-
-def load_folder_to_duckdb(
-    folder: str,
-    db_path: str = None,
-    pattern: str = "*.csv,*.parquet,*.json",
-) -> dict:
-    """Carga los archivos tabulares de una carpeta como tablas en una DuckDB.
-
-    Args:
-        folder: ruta a un directorio. Si no existe o no es un directorio,
-            devuelve {status:'error', ...} sin lanzar.
-        db_path: ruta de la DuckDB destino (read-write, se crea si no existe). Si
-            es None, se genera una base temporal con NamedTemporaryFile y su ruta
-            se devuelve en el retorno (`db_path`).
-        pattern: CSV de globs separados por coma (default
-            "*.csv,*.parquet,*.json"). Cada glob se aplica con
-            glob.glob(os.path.join(folder, g)) en el primer nivel (NO recursivo);
-            los resultados se deduplican y ordenan.
-
-    Returns:
-        dict. En exito: {status:'ok', db_path:str, tables:[{name, source_file,
-        n_rows}], errors:[{name?, source_file, error}]}. La carpeta sin archivos
-        tabulares es un exito con tables=[] y errors=[]. En error (sin lanzar):
-        {status:'error', error:str}.
-    """
-    if not isinstance(folder, str) or not os.path.isdir(folder):
-        return {
-            "status": "error",
-            "error": f"folder does not exist or is not a directory: {folder!r}",
-        }
-
-    conn = None
-    try:
-        # Resolver la ruta de la DuckDB destino. Si no se da, reservar un nombre
-        # temporal unico y borrar el archivo vacio que crea mkstemp: DuckDB 1.5.2
-        # rechaza abrir un archivo de 0 bytes ("not a valid DuckDB database
-        # file"), por lo que debe crear el archivo el mismo desde cero.
-        if db_path is None:
-            fd, tmp_name = tempfile.mkstemp(suffix=".duckdb")
-            os.close(fd)
-            os.remove(tmp_name)
-            db_path = tmp_name
-
-        # Resolver los archivos: un glob por cada patron, dedup + orden estable.
-        globs = [g.strip() for g in pattern.split(",") if g.strip()]
-        found = set()
-        for g in globs:
-            for path in glob.glob(os.path.join(folder, g)):
-                if os.path.isfile(path):
-                    found.add(path)
-        files = sorted(found)
-
-        conn = __import__("duckdb").connect(db_path)
-
-        tables = []
-        errors = []
-        used_names = set()
-
-        for i, path in enumerate(files):
-            base = os.path.basename(path)
-            stem, ext = os.path.splitext(base)
-            quoted_path = path.replace("'", "''")
-            reader = _reader_for_extension(ext, quoted_path)
-            if reader is None:
-                errors.append(
-                    {
-                        "source_file": path,
-                        "error": f"unsupported extension: {ext!r}",
-                    }
-                )
-                continue
-
-            name = _sanitize_table_name(stem, i)
-            # Desambiguar colisiones con sufijos _2, _3, ...
-            if name in used_names:
-                suffix = 2
-                while f"{name}_{suffix}" in used_names:
-                    suffix += 1
-                name = f"{name}_{suffix}"
-
-            quoted_ident = '"' + name.replace('"', '""') + '"'
-            try:
-                conn.execute(
-                    f"CREATE TABLE {quoted_ident} AS SELECT * FROM {reader}"
-                )
-                n_rows = conn.execute(
-                    f"SELECT count(*) FROM {quoted_ident}"
-                ).fetchone()[0]
-                used_names.add(name)
-                tables.append(
-                    {
-                        "name": name,
-                        "source_file": path,
-                        "n_rows": int(n_rows),
-                    }
-                )
-            except Exception as e:  # noqa: BLE001
-                errors.append(
-                    {
-                        "name": name,
-                        "source_file": path,
-                        "error": str(e),
-                    }
-                )
-
-        return {
-            "status": "ok",
-            "db_path": db_path,
-            "tables": tables,
-            "errors": errors,
-        }
-    except Exception as e:  # noqa: BLE001
-        return {"status": "error", "error": str(e)}
-    finally:
-        if conn is not None:
-            conn.close()
@@ -1,73 +0,0 @@
-"""Tests para load_folder_to_duckdb."""
-
-import os
-import sys
-
-sys.path.insert(0, os.path.dirname(__file__))
-
-import duckdb  # noqa: E402
-
-from load_folder_to_duckdb import load_folder_to_duckdb  # noqa: E402
-
-
-def _write_csv(path: str, header: str, rows: list[str]) -> None:
-    with open(path, "w", encoding="utf-8") as f:
-        f.write(header + "\n")
-        for r in rows:
-            f.write(r + "\n")
-
-
-def test_carga_dos_csv_como_tablas(tmp_path):
-    _write_csv(
-        str(tmp_path / "ventas.csv"),
-        "id,total",
-        ["1,10.5", "2,20.0", "3,5.25"],
-    )
-    _write_csv(
-        str(tmp_path / "clientes.csv"),
-        "id,nombre",
-        ["1,ana", "2,luis"],
-    )
-    db = tmp_path / "out.duckdb"
-    res = load_folder_to_duckdb(str(tmp_path), str(db))
-
-    assert res["status"] == "ok", res
-    assert res["errors"] == []
-    assert len(res["tables"]) == 2
-    assert res["db_path"] == str(db)
-    assert os.path.exists(str(db))
-
-    by_name = {t["name"]: t for t in res["tables"]}
-    assert by_name["ventas"]["n_rows"] == 3
-    assert by_name["clientes"]["n_rows"] == 2
-
-    # Verificar que las tablas existen realmente en la base.
-    con = duckdb.connect(str(db), read_only=True)
-    assert con.execute("SELECT count(*) FROM ventas").fetchone()[0] == 3
-    assert con.execute("SELECT count(*) FROM clientes").fetchone()[0] == 2
-    con.close()
-
-
-def test_db_path_none_crea_temporal(tmp_path):
-    _write_csv(str(tmp_path / "datos.csv"), "x", ["1", "2"])
-    res = load_folder_to_duckdb(str(tmp_path))
-    assert res["status"] == "ok", res
-    assert res["db_path"]
-    assert os.path.exists(res["db_path"])
-    assert len(res["tables"]) == 1
-    assert res["tables"][0]["n_rows"] == 2
-    os.remove(res["db_path"])
-
-
-def test_carpeta_vacia_es_ok_sin_tablas(tmp_path):
-    db = tmp_path / "out.duckdb"
-    res = load_folder_to_duckdb(str(tmp_path), str(db))
-    assert res["status"] == "ok", res
-    assert res["tables"] == []
-    assert res["errors"] == []
-
-
-def test_carpeta_inexistente_devuelve_status_error(tmp_path):
-    res = load_folder_to_duckdb(str(tmp_path / "no_existe"))
-    assert res["status"] == "error"
-    assert "folder" in res["error"]
@@ -1,115 +0,0 @@
---
-name: render_automatic_eda_folder
-kind: pipeline
-lang: py
-domain: pipelines
-purity: impure
-version: "1.0.0"
-signature: "def render_automatic_eda_folder(path: str, out_dir: str = \"reports\", basename: str = None, profile_level: str = \"standard\", emit_pdf: bool = True, emit_pptx: bool = True, emit_md: bool = True, per_table_eda: bool = False, min_inclusion: float = 0.9, ctx_extra: dict = None) -> dict"
-description: "Informe AutomaticEDA a nivel de BASE one-shot de una CARPETA de archivos tabulares (CSV/Parquet/JSON) o de una DuckDB existente. Carga la carpeta a una DuckDB temporal con load_folder_to_duckdb (o usa la DuckDB dada directa), perfila TODA la base con profile_database (resumen de cada tabla + FK candidatas por containment + join graph con diagrama Mermaid), ENSAMBLA un documento-base por capitulos (portada-base con nombre/n tablas/totales/fecha/fuente, resumen de tablas con una fila por tabla, y relaciones inter-tabla con la tabla de FK candidatas + una Figure matplotlib REAL del join graph dibujada con draw_join_graph_figure mas el texto Mermaid) y lo renderiza con el motor AutomaticEDA a PDF (A5 movil), PPTX (16:9) y Markdown autocontenido a la vez. Con per_table_eda=True anexa los capitulos de mini-EDA de cada tabla (build_document por tabla). Es el hermano a nivel de base de render_automatic_eda (que perfila UNA tabla): aqui el informe es de la base y de sus relaciones. Devuelve las rutas de PDF/PPTX/MD, el manifiesto y el DatabaseProfile."
-tags: [eda, duckdb, database, profiling, relations, pipeline, dataops, report, pdf, pptx, launcher]
-uses_functions:
-  - load_folder_to_duckdb_py_infra
-  - profile_database_py_pipelines
-  - render_automatic_eda_pdf_py_datascience
-  - render_automatic_eda_pptx_py_datascience
-  - render_automatic_eda_markdown_py_datascience
-  - draw_join_graph_figure_py_datascience
-uses_types: []
-returns: []
-returns_optional: false
-error_type: error_go_core
-imports: []
-tested: true
-tests:
-  - "golden: carpeta con 3 CSV relacionados (customers/orders/products) emite PDF+PPTX+MD del documento-base con 3 tablas y la FK orders.customer_id->customers.id"
-  - "edge: carpeta vacia -> status ok con documento minimo, sin lanzar"
-  - "edge: 1 sola tabla -> funciona sin relaciones (capitulo relaciones dice 'sin FK')"
-test_file_path: "python/functions/pipelines/render_automatic_eda_folder_test.py"
-file_path: "python/functions/pipelines/render_automatic_eda_folder.py"
-params:
-  - name: path
-    desc: "DIRECTORIO con archivos tabulares (CSV/Parquet/JSON) que se cargan a una DuckDB temporal, o una DuckDB ya existente (.duckdb/.ddb/.db) que se perfila directa."
-  - name: out_dir
-    desc: "Directorio de salida de los informes (se crea si no existe). Default 'reports'."
-  - name: basename
-    desc: "Nombre base de los archivos sin extension. Default 'aeda_base_<nombre>_<timestamp>'."
-  - name: profile_level
-    desc: "Preset de coste del perfil por tabla ('lite'/'standard'/'full'); ajusta el sample que profile_database pasa a cada tabla (lite=2000, standard/full=5000)."
-  - name: emit_pdf
-    desc: "Emite el PDF A5 movil del documento-base. Default True."
-  - name: emit_pptx
-    desc: "Emite el PPTX 16:9 del documento-base. Default True."
-  - name: emit_md
-    desc: "Emite el Markdown autocontenido del documento-base. Default True."
-  - name: per_table_eda
-    desc: "Si True, anexa al documento-base los capitulos de mini-EDA de cada tabla (Heading 'Tabla: <n>' + build_document por tabla). Default False (solo documento-base: portada + resumen + relaciones)."
-  - name: min_inclusion
-    desc: "Umbral de inclusion (0-1) para emitir una FK candidata (se pasa a profile_database). Default 0.9."
-  - name: ctx_extra
-    desc: "Dict opcional de claves de presentacion (p.ej. dataset_name, description) que se mezclan en el contexto de la portada-base."
-output: "Dict dict-no-throw. En exito: {status:'ok', pdf_path, pptx_path, md_path, manifest_path, n_tables, n_pages, n_slides, md_chars, db_path, db_profile}. En error: {status:'error', error:str}."
---
-
-# render_automatic_eda_folder
-
-EDA de una **carpeta / base multi-tabla** → informe AutomaticEDA por capítulos
-en PDF (móvil A5) + PPTX (16:9) + Markdown, en una sola llamada. Es el hermano a
-nivel de **base** de `render_automatic_eda` (que perfila una sola tabla): aquí el
-documento resume **todas** las tablas y, sobre todo, sus **relaciones**
-inter-tabla (FK candidatas por containment + join graph con diagrama Mermaid).
-
-Compone, sin reimplementar su lógica: `load_folder_to_duckdb` (carga la carpeta),
-`profile_database` (perfila la base + infiere FK + join graph) y los tres
-renderers del motor AutomaticEDA (`render_automatic_eda_pdf`/`_pptx`/`_markdown`),
-que aceptan directamente la lista de capítulos del documento-base que este
-pipeline ensambla. El pipeline de tabla única (`render_automatic_eda`) queda
-intacto: esto es aditivo.
-
-## Ejemplo
-
-```bash
-# Carpeta con varios CSV/Parquet/JSON relacionados:
-./fn run render_automatic_eda_folder /tmp/eda_folder_demo
-
-# Una DuckDB ya existente (rama directa):
-./fn run render_automatic_eda_folder temp/bigdata/taxi.duckdb
-```
-
-```python
-import sys, os
-sys.path.insert(0, os.path.join("python", "functions"))
-from pipelines.render_automatic_eda_folder import render_automatic_eda_folder
-
-r = render_automatic_eda_folder("/tmp/eda_folder_demo", out_dir="reports")
-# r["status"] == "ok"; r["pdf_path"], r["pptx_path"], r["md_path"]
-# r["n_tables"] == 3; r["db_profile"]["fk_candidates"] incluye
-#   orders.customer_id -> customers.id
-```
-
-## Cuando usarla
-
-Cuando quieras un EDA de una **base entera** (una carpeta de exports o una
-DuckDB con varias tablas), no de una sola tabla: para ver de un vistazo qué
-tablas hay, su tamaño y calidad, y cómo se relacionan (FK candidatas + diagrama),
-en el mismo formato rico por capítulos (PDF móvil + PPTX + MD) que el EDA de
-tabla. Usa `per_table_eda=True` cuando además quieras el mini-EDA de cada tabla
-anexado.
-
-## Gotchas
-
- Impuro: lee archivos del disco y escribe PDF/PPTX/MD en `out_dir`. En la rama
-  "carpeta" crea una **DuckDB temporal** (su ruta sale en `db_path`); no se borra
-  automáticamente (queda para reinspección).
- `path` se interpreta así: directorio → se carga la carpeta; archivo con
-  extensión `.duckdb`/`.ddb`/`.db` → se usa directo; cualquier otro archivo o un
-  path inexistente → `{status:'error'}` (no lanza).
- El escaneo de la carpeta es **no recursivo** (solo el primer nivel) y por
-  defecto cubre `*.csv,*.parquet,*.json` (ver `load_folder_to_duckdb`).
- El join graph se rasteriza a una **Figure matplotlib real** (vía
-  `draw_join_graph_figure`) que aparece dibujada en PDF/PPTX (nodos = tablas,
-  flechas = FK). Además, el **texto Mermaid** del grafo se incluye como bloque de
-  código (en el Markdown queda como diagrama renderizable y es útil para pegar a
-  un LLM).
- Carpeta vacía o con 1 sola tabla: funciona igual; el capítulo de relaciones
-  dice "sin FK". dict-no-throw en todos los caminos.
@@ -1,366 +0,0 @@
-"""render_automatic_eda_folder — EDA de una CARPETA / base multi-tabla one-shot.
-
-Pipeline impuro del grupo de capacidad `eda`, a nivel de BASE. Dada una CARPETA
-de archivos tabulares (CSV/Parquet/JSON) o una DuckDB ya existente, produce el
-informe AutomaticEDA de la BASE en sus tres formatos a la vez (PDF móvil A5 +
-PPTX 16:9 + Markdown autocontenido), con los capítulos POBLADOS, en una sola
-llamada. Es el hermano a nivel de base de ``render_automatic_eda`` (que perfila
-UNA tabla): aquí el documento por capítulos resume TODAS las tablas y, sobre
-todo, sus RELACIONES inter-tabla (FK candidatas + join graph).
-
-Compone funciones del registry SIN reimplementar su lógica:
-
-  - load_folder_to_duckdb : carga una carpeta de archivos a una DuckDB temporal
-                            (rama "carpeta"). En la rama "ya es duckdb" se omite.
-  - profile_database      : perfila TODA la base (resumen de cada tabla,
-                            TableProfiles completos, FK candidatas por
-                            containment y join graph con diagrama Mermaid).
-  - render_automatic_eda_pdf      : renderiza el documento-base por capítulos a PDF.
-  - render_automatic_eda_pptx     : renderiza el mismo documento-base a PPTX.
-  - render_automatic_eda_markdown : serializa el mismo documento-base a Markdown
-                                    autocontenido (texto + tablas markdown).
-  - build_document        : (solo con per_table_eda=True) ensambla los capítulos
-                            canónicos de CADA tabla para anexarlos al documento.
-
-La capa propia de este pipeline es ENSAMBLAR EL DOCUMENTO-BASE de capítulos a
-partir del ``DatabaseProfile`` que devuelve ``profile_database`` y cablear los
-tres renderers del motor AutomaticEDA. El documento-base mínimo tiene tres
-capítulos: portada-base (nombre/nº tablas/totales/fecha/fuente), resumen de
-tablas (una fila por tabla) y relaciones inter-tabla (FK candidatas + diagrama
-Mermaid). Con ``per_table_eda=True`` anexa, por cada tabla, sus capítulos de
-mini-EDA.
-
-Estilo dict-no-throw del grupo `eda`: nunca lanza; captura cualquier error y
-degrada a ``{"status": "error", "error": str}``.
-"""
-
-import os
-from datetime import datetime, timezone
-
-from datascience import (
-    draw_join_graph_figure,
-    render_automatic_eda_markdown,
-    render_automatic_eda_pdf,
-    render_automatic_eda_pptx,
-)
-from datascience.automatic_eda import build_document
-from infra import load_folder_to_duckdb
-from pipelines.profile_database import profile_database
-
-# Mapa profile_level -> tamaño de muestra por columna del perfil de cada tabla.
-# A nivel de base el coste lo domina el nº de tablas; el preset solo ajusta el
-# sample que profile_database pasa a profile_table.
-_SAMPLE_BY_LEVEL = {"lite": 2000, "standard": 5000, "full": 5000}
-
-# Extensiones que se consideran "una DuckDB ya hecha" en la rama directa.
-_DUCKDB_EXTS = (".duckdb", ".ddb", ".db")
-
-
-def _fmt_num(v) -> str:
-    """Formatea un entero con separador de millar; '—' si no es número."""
-    if isinstance(v, bool) or not isinstance(v, (int, float)):
-        return "—"
-    try:
-        return f"{int(v):,}".replace(",", ".")
-    except Exception:  # noqa: BLE001
-        return str(v)
-
-
-def _portada_chapter(db_profile: dict, source_path: str, db_path: str,
-                     meta_ctx: dict) -> dict:
-    """Capítulo de portada a nivel de base (NO reusa chapters/portada.py, que es
-    de tabla única): nombre de la base, nº de tablas, totales y procedencia."""
-    tables = db_profile.get("tables", []) or []
-    total_rows = sum(
-        (t.get("n_rows") or 0) for t in tables if isinstance(t.get("n_rows"), (int, float))
-    )
-    total_cols = sum(
-        (t.get("n_cols") or 0) for t in tables if isinstance(t.get("n_cols"), (int, float))
-    )
-    base_name = (meta_ctx or {}).get("dataset_name") or os.path.basename(
-        os.path.normpath(source_path)
-    ) or source_path
-
-    rows = [
-        ("Base", base_name),
-        ("Tablas", _fmt_num(db_profile.get("n_tables"))),
-        ("Filas totales", _fmt_num(total_rows)),
-        ("Columnas totales", _fmt_num(total_cols)),
-        ("Relaciones FK", _fmt_num(len(db_profile.get("fk_candidates", []) or []))),
-        ("Fuente", source_path),
-        ("DuckDB", db_path),
-        ("Generado", datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")),
-    ]
-    blocks = [
-        {"kind": "heading", "text": f"EDA de la base — {base_name}", "level": 1},
-        {"kind": "kv_table", "rows": rows, "title": "Resumen de la base"},
-    ]
-    errs = db_profile.get("errors", []) or []
-    if errs:
-        blocks.append({
-            "kind": "note",
-            "text": f"{len(errs)} aviso(s) durante el perfilado (ver detalle).",
-        })
-    return {"id": "portada_base", "title": "Portada", "version": "1.0.0",
-            "blocks": blocks}
-
-
-def _resumen_chapter(db_profile: dict) -> dict:
-    """Capítulo con una fila por tabla: filas, columnas, calidad, key_candidates."""
-    header = ["Tabla", "Filas", "Columnas", "Calidad", "key_candidates"]
-    rows = []
-    for t in db_profile.get("tables", []) or []:
-        keys = ", ".join(t.get("key_candidates") or []) or "—"
-        rows.append([
-            t.get("table"),
-            _fmt_num(t.get("n_rows")),
-            _fmt_num(t.get("n_cols")),
-            t.get("quality_score"),
-            keys,
-        ])
-    if rows:
-        blocks = [{
-            "kind": "data_table", "header": header, "rows": rows,
-            "title": "Tablas de la base",
-            "note": "Una fila por tabla. Calidad = score agregado del TableProfile.",
-        }]
-    else:
-        blocks = [{"kind": "note",
-                   "text": "La base no contiene tablas perfilables."}]
-    return {"id": "resumen_tablas", "title": "Resumen de tablas",
-            "version": "1.0.0", "blocks": blocks}
-
-
-def _relaciones_chapter(db_profile: dict) -> dict:
-    """Capítulo de relaciones inter-tabla: tabla de FK candidatas + diagrama
-    Mermaid del join graph (vuelca el Mermaid como bloque de código)."""
-    fks = db_profile.get("fk_candidates", []) or []
-    blocks = [{
-        "kind": "heading", "text": "Relaciones inter-tabla", "level": 2,
-    }]
-    if fks:
-        header = ["From", "To", "Inclusión", "Cardinalidad"]
-        rows = []
-        for fk in fks:
-            frm = f"{fk.get('from_table')}.{fk.get('from_col')}"
-            to = f"{fk.get('to_table')}.{fk.get('to_col')}"
-            inc = fk.get("inclusion")
-            inc_s = f"{inc:.3f}" if isinstance(inc, (int, float)) else str(inc)
-            rows.append([frm, to, inc_s, fk.get("cardinality")])
-        blocks.append({
-            "kind": "data_table", "header": header, "rows": rows,
-            "title": "FK candidatas (por containment de valores)",
-            "note": "Inclusión = fracción de valores de From contenidos en To.",
-        })
-    else:
-        blocks.append({
-            "kind": "note",
-            "text": "Sin relaciones FK candidatas detectadas entre las tablas.",
-        })
-
-    join_graph = db_profile.get("join_graph") or {}
-    has_edges = bool(join_graph.get("edges"))
-    if has_edges:
-        blocks.append({"kind": "heading", "text": "Diagrama (join graph)",
-                       "level": 3})
-        # Figure matplotlib REAL del grafo de relaciones (nodos = tablas,
-        # aristas = FK). Lazy via `make`: el renderer la construye solo al
-        # paginar, y se rasteriza en PDF/PPTX. draw_join_graph_figure nunca
-        # lanza (devuelve una Figure de error si algo falla).
-        blocks.append({
-            "kind": "figure",
-            "make": (lambda jg=join_graph: draw_join_graph_figure(
-                jg, title="Join graph (relaciones inter-tabla)")),
-            "caption": "Grafo de relaciones: nodos = tablas, flechas = FK "
-                       "candidatas (etiqueta from_col→to_col).",
-            "height_in": 4.5,
-        })
-        # Además, el Mermaid en texto: en el Markdown queda como diagrama
-        # renderizable y es útil para pegar a un LLM.
-        mermaid = (join_graph.get("mermaid", "") or "").strip()
-        if mermaid:
-            blocks.append({"kind": "markdown",
-                           "text": "```mermaid\n" + mermaid + "\n```"})
-    return {"id": "relaciones", "title": "Relaciones inter-tabla",
-            "version": "1.0.0", "blocks": blocks}
-
-
-def _build_db_document(db_profile: dict, source_path: str, db_path: str,
-                       meta_ctx: dict, per_table_eda: bool) -> list:
-    """Ensambla el documento-base por capítulos a partir del DatabaseProfile.
-
-    Mínimo: portada-base + resumen de tablas + relaciones. Con per_table_eda
-    True anexa, por cada tabla, un capítulo separador + los capítulos canónicos
-    de su mini-EDA (reusando build_document sobre cada TableProfile)."""
-    chapters = [
-        _portada_chapter(db_profile, source_path, db_path, meta_ctx),
-        _resumen_chapter(db_profile),
-        _relaciones_chapter(db_profile),
-    ]
-    if per_table_eda:
-        for prof in db_profile.get("table_profiles", []) or []:
-            tname = prof.get("table") or "tabla"
-            chapters.append({
-                "id": f"tabla_{tname}", "title": f"Tabla: {tname}",
-                "version": "1.0.0",
-                "blocks": [{"kind": "heading", "text": f"Tabla: {tname}",
-                            "level": 1}],
-            })
-            try:
-                # build_document devuelve los capítulos canónicos de la tabla.
-                # ctx None -> los capítulos que necesitan datos crudos degradan,
-                # pero salen completos los de portada/overview/distrib/calidad.
-                chapters.extend(build_document(prof, None) or [])
-            except Exception:  # noqa: BLE001 — una tabla mala no rompe el doc.
-                chapters.append({
-                    "id": f"tabla_{tname}_err", "title": f"Tabla: {tname}",
-                    "version": "1.0.0",
-                    "blocks": [{"kind": "note",
-                                "text": "No se pudo ensamblar el mini-EDA de "
-                                        "esta tabla."}],
-                })
-    return chapters
-
-
-def _resolve_db_path(path: str) -> dict:
-    """Resuelve el DuckDB a perfilar desde ``path``.
-
-    - Directorio  -> carga la carpeta con load_folder_to_duckdb (DuckDB temp).
-    - Archivo .duckdb/.ddb/.db -> se usa directo (rama "ya es duckdb").
-    - Otro archivo / inexistente -> error.
-
-    Devuelve {status, db_path, loaded, n_tables, load_errors}.
-    """
-    if os.path.isdir(path):
-        lr = load_folder_to_duckdb(path)
-        if lr.get("status") != "ok":
-            return {"status": "error",
-                    "error": f"load_folder_to_duckdb falló: {lr.get('error')}"}
-        return {
-            "status": "ok",
-            "db_path": lr.get("db_path"),
-            "loaded": True,
-            "n_tables": len(lr.get("tables", []) or []),
-            "load_errors": lr.get("errors", []) or [],
-        }
-    if os.path.isfile(path):
-        if path.lower().endswith(_DUCKDB_EXTS):
-            return {"status": "ok", "db_path": path, "loaded": False,
-                    "n_tables": None, "load_errors": []}
-        return {"status": "error",
-                "error": f"'{path}' no es un directorio ni una DuckDB "
-                         f"(extensiones {_DUCKDB_EXTS})."}
-    return {"status": "error", "error": f"path no existe: {path}"}
-
-
-def render_automatic_eda_folder(
-    path: str,
-    out_dir: str = "reports",
-    basename: str = None,
-    profile_level: str = "standard",
-    emit_pdf: bool = True,
-    emit_pptx: bool = True,
-    emit_md: bool = True,
-    per_table_eda: bool = False,
-    min_inclusion: float = 0.9,
-    ctx_extra: dict = None,
-) -> dict:
-    """Perfila una CARPETA (o una DuckDB) y emite el informe AutomaticEDA de la base.
-
-    Args:
-        path: o bien un DIRECTORIO con archivos tabulares (CSV/Parquet/JSON) que
-            se cargan a una DuckDB temporal, o bien una DuckDB ya existente
-            (``.duckdb``/``.ddb``/``.db``) que se perfila directa.
-        out_dir: directorio de salida (se crea si no existe). Default "reports".
-        basename: nombre base de los archivos sin extensión. Default
-            "aeda_base_<nombre>_<timestamp>".
-        profile_level: preset de coste del perfil por tabla ("lite"/"standard"/
-            "full"); ajusta el ``sample`` que profile_database pasa a cada tabla.
-        emit_pdf / emit_pptx / emit_md: qué formatos emitir. Default los tres.
-        per_table_eda: si True, anexa al documento-base los capítulos de mini-EDA
-            de cada tabla (un Heading "Tabla: <n>" + build_document por tabla).
-            Default False (solo el documento-base: portada + resumen + relaciones).
-        min_inclusion: umbral de inclusión para emitir una FK candidata (0-1).
-        ctx_extra: dict opcional de claves de presentación (p.ej. dataset_name,
-            description) que se mezclan en el contexto de la portada.
-
-    Returns:
-        dict (nunca lanza). En éxito::
-
-            {"status": "ok", "pdf_path": str|None, "pptx_path": str|None,
-             "md_path": str|None, "manifest_path": str|None,
-             "n_tables": int, "n_pages": int|None, "n_slides": int|None,
-             "md_chars": int|None, "db_path": str, "db_profile": <DatabaseProfile>}
-
-        En error: {"status": "error", "error": str}.
-    """
-    try:
-        # 1) Resolver la DuckDB a perfilar (cargar carpeta o usar la dada).
-        rdb = _resolve_db_path(path)
-        if rdb.get("status") != "ok":
-            return {"status": "error", "error": rdb.get("error")}
-        db_path = rdb.get("db_path")
-
-        # 2) Perfilar la base entera (resumen + FK + join graph). Sin report
-        # propio (write_report/emit_pdf False): este pipeline emite el suyo.
-        sample = _SAMPLE_BY_LEVEL.get(profile_level, 5000)
-        pres = profile_database(
-            db_path, sample=sample, write_report=False,
-            min_inclusion=min_inclusion, emit_pdf=False,
-        )
-        if pres.get("status") != "ok":
-            return {"status": "error",
-                    "error": f"profile_database falló: {pres.get('error')}"}
-        db_profile = pres.get("db_profile") or {}
-
-        # 3) Ensamblar el documento-base por capítulos.
-        meta_ctx = dict(ctx_extra or {})
-        chapters = _build_db_document(
-            db_profile, path, db_path, meta_ctx, per_table_eda
-        )
-
-        # 4) Render a los tres formatos desde el MISMO documento por capítulos.
-        os.makedirs(out_dir, exist_ok=True)
-        ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
-        nm = (meta_ctx.get("dataset_name")
-              or os.path.basename(os.path.normpath(path)) or "base")
-        nm = "".join(c if c.isalnum() else "_" for c in str(nm)).strip("_") or "base"
-        base = basename or f"aeda_base_{nm}_{ts}"
-        title = f"EDA base — {meta_ctx.get('dataset_name') or nm}"
-        meta = {"title": title}
-
-        pdf_path = pptx_path = md_path = manifest_path = None
-        n_pages = n_slides = md_chars = None
-
-        if emit_pdf:
-            target = os.path.join(out_dir, base + ".pdf")
-            rpdf = render_automatic_eda_pdf(chapters, target, meta) or {}
-            pdf_path = rpdf.get("path")
-            n_pages = rpdf.get("n_pages")
-            manifest_path = rpdf.get("manifest_path")
-        if emit_pptx:
-            target = os.path.join(out_dir, base + ".pptx")
-            rpptx = render_automatic_eda_pptx(chapters, target, meta) or {}
-            pptx_path = rpptx.get("path")
-            n_slides = rpptx.get("n_slides")
-        if emit_md:
-            target = os.path.join(out_dir, base + ".md")
-            rmd = render_automatic_eda_markdown(chapters, target, meta) or {}
-            md_path = rmd.get("path")
-            md_chars = rmd.get("n_chars")
-
-        return {
-            "status": "ok",
-            "pdf_path": pdf_path,
-            "pptx_path": pptx_path,
-            "md_path": md_path,
-            "manifest_path": manifest_path,
-            "n_tables": db_profile.get("n_tables"),
-            "n_pages": n_pages,
-            "n_slides": n_slides,
-            "md_chars": md_chars,
-            "db_path": db_path,
-            "db_profile": db_profile,
-        }
-    except Exception as e:  # noqa: BLE001 — dict-no-throw: degradar, nunca lanzar.
-        return {"status": "error", "error": str(e)}
@@ -1,188 +0,0 @@
-"""Tests para render_automatic_eda_folder — EDA de una carpeta / base multi-tabla.
-
-Golden: una carpeta con 3 CSV relacionados (customers/orders/products) produce el
-documento-base en PDF + PPTX + MD, con las 3 tablas en el resumen y la FK
-orders.customer_id -> customers.id en el capítulo de relaciones. Edges: carpeta
-vacía (documento mínimo, sin lanzar), 1 sola tabla (sin relaciones) y la rama
-"ya es una DuckDB" sobre un archivo .duckdb existente.
-"""
-
-import os
-import sys
-
-import duckdb
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
-
-from pipelines.render_automatic_eda_folder import (
-    _relaciones_chapter,
-    render_automatic_eda_folder,
-)
-
-
-def _write_demo_folder(folder: str) -> None:
-    """3 CSV relacionados: orders.customer_id -> customers.id (FK detectable)."""
-    with open(os.path.join(folder, "customers.csv"), "w", encoding="utf-8") as fh:
-        fh.write("id,name,city\n")
-        fh.write("1,Alice,Madrid\n2,Bob,Barcelona\n3,Carol,Valencia\n"
-                 "4,Dave,Sevilla\n5,Eve,Madrid\n")
-    with open(os.path.join(folder, "orders.csv"), "w", encoding="utf-8") as fh:
-        fh.write("order_id,customer_id,product_id,total\n")
-        fh.write("100,1,10,49.90\n101,1,11,12.50\n102,2,10,49.90\n"
-                 "103,3,12,8.00\n104,3,11,12.50\n105,5,10,49.90\n"
-                 "106,2,12,8.00\n")
-    with open(os.path.join(folder, "products.csv"), "w", encoding="utf-8") as fh:
-        fh.write("product_id,product_name,price\n")
-        fh.write("10,Widget,49.90\n11,Gadget,12.50\n12,Gizmo,8.00\n")
-
-
-def _has_fk(db_profile: dict, from_t: str, from_c: str, to_t: str) -> bool:
-    for fk in db_profile.get("fk_candidates", []) or []:
-        if (fk.get("from_table") == from_t and fk.get("from_col") == from_c
-                and fk.get("to_table") == to_t):
-            return True
-    return False
-
-
-def test_golden_folder_three_csv(tmp_path):
-    """Carpeta con 3 CSV relacionados -> PDF+PPTX+MD, 3 tablas, FK detectada."""
-    folder = tmp_path / "demo"
-    folder.mkdir()
-    _write_demo_folder(str(folder))
-    out = tmp_path / "out"
-
-    r = render_automatic_eda_folder(str(folder), out_dir=str(out))
-
-    assert r["status"] == "ok", r
-    assert r["n_tables"] == 3
-    # Los tres formatos se emitieron y existen en disco.
-    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
-    assert r["pptx_path"] and os.path.exists(r["pptx_path"])
-    assert r["md_path"] and os.path.exists(r["md_path"])
-    assert (r["n_pages"] or 0) >= 1
-    assert (r["n_slides"] or 0) >= 1
-    # La FK orders.customer_id -> customers.id se detecta por containment.
-    assert _has_fk(r["db_profile"], "orders", "customer_id", "customers"), \
-        r["db_profile"].get("fk_candidates")
-    # El Markdown menciona las 3 tablas y la relación.
-    md = open(r["md_path"], encoding="utf-8").read()
-    for t in ("customers", "orders", "products"):
-        assert t in md
-    assert "customer_id" in md
-
-
-def test_edge_empty_folder(tmp_path):
-    """Carpeta vacía -> status ok con documento mínimo, sin lanzar."""
-    folder = tmp_path / "empty"
-    folder.mkdir()
-    out = tmp_path / "out"
-
-    r = render_automatic_eda_folder(str(folder), out_dir=str(out))
-
-    assert r["status"] == "ok", r
-    assert r["n_tables"] == 0
-    # Aun sin tablas, emite el documento-base mínimo (portada + resumen vacío +
-    # relaciones "sin FK").
-    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
-    assert r["md_path"] and os.path.exists(r["md_path"])
-
-
-def test_edge_single_table_no_relations(tmp_path):
-    """Carpeta con 1 sola tabla -> funciona sin relaciones (capítulo 'sin FK')."""
-    folder = tmp_path / "single"
-    folder.mkdir()
-    with open(folder / "lonely.csv", "w", encoding="utf-8") as fh:
-        fh.write("a,b\n1,x\n2,y\n3,z\n")
-    out = tmp_path / "out"
-
-    r = render_automatic_eda_folder(str(folder), out_dir=str(out))
-
-    assert r["status"] == "ok", r
-    assert r["n_tables"] == 1
-    assert not (r["db_profile"].get("fk_candidates") or [])
-    md = open(r["md_path"], encoding="utf-8").read()
-    assert "Sin relaciones FK" in md or "sin FK" in md.lower()
-
-
-def test_accepts_existing_duckdb(tmp_path):
-    """Rama 'ya es una DuckDB': un archivo .duckdb existente se perfila directo."""
-    db = tmp_path / "base.duckdb"
-    conn = duckdb.connect(str(db))
-    try:
-        conn.execute("CREATE TABLE customers (id INTEGER, name VARCHAR)")
-        conn.execute("INSERT INTO customers VALUES (1,'Ana'),(2,'Luis'),(3,'Eva')")
-        conn.execute("CREATE TABLE orders (oid INTEGER, customer_id INTEGER)")
-        conn.execute("INSERT INTO orders VALUES (10,1),(11,2),(12,1),(13,3)")
-    finally:
-        conn.close()
-    out = tmp_path / "out"
-
-    r = render_automatic_eda_folder(str(db), out_dir=str(out))
-
-    assert r["status"] == "ok", r
-    assert r["n_tables"] == 2
-    assert r["db_path"] == str(db)
-    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
-
-
-def test_emit_flags_select_formats(tmp_path):
-    """emit_pdf/pptx/md controlan qué formatos se emiten."""
-    folder = tmp_path / "demo"
-    folder.mkdir()
-    _write_demo_folder(str(folder))
-    out = tmp_path / "out"
-
-    r = render_automatic_eda_folder(
-        str(folder), out_dir=str(out),
-        emit_pdf=True, emit_pptx=False, emit_md=False,
-    )
-    assert r["status"] == "ok", r
-    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
-    assert r["pptx_path"] is None
-    assert r["md_path"] is None
-
-
-def test_path_does_not_exist(tmp_path):
-    """Path inexistente -> status error, sin lanzar."""
-    r = render_automatic_eda_folder(str(tmp_path / "nope"))
-    assert r["status"] == "error"
-    assert "no existe" in r["error"].lower()
-
-
-def test_relaciones_chapter_has_real_figure_when_edges():
-    """Con edges, el capítulo de relaciones incluye un bloque Figure matplotlib
-    REAL (no solo el texto Mermaid): su make() devuelve una Figure."""
-    db_profile = {
-        "join_graph": {
-            "nodes": [
-                {"table": "orders", "out_degree": 1, "in_degree": 0, "role": "fact"},
-                {"table": "customers", "out_degree": 0, "in_degree": 1, "role": "dim"},
-            ],
-            "edges": [{"from_table": "orders", "from_col": "customer_id",
-                       "to_table": "customers", "to_col": "id",
-                       "cardinality": "N:1"}],
-            "mermaid": "graph LR orders --> customers",
-            "hubs": ["orders"],
-        },
-        "fk_candidates": [{"from_table": "orders", "from_col": "customer_id",
-                           "to_table": "customers", "to_col": "id",
-                           "inclusion": 1.0, "cardinality": "N:1"}],
-    }
-    ch = _relaciones_chapter(db_profile)
-    figs = [b for b in ch["blocks"] if b.get("kind") == "figure"]
-    assert len(figs) == 1, ch["blocks"]
-    # El make() perezoso produce una matplotlib Figure real.
-    import matplotlib
-    matplotlib.use("Agg")
-    fig = figs[0]["make"]()
-    from matplotlib.figure import Figure
-    assert isinstance(fig, Figure)
-    assert fig.get_axes(), "la Figure del join graph debe tener al menos un eje"
-
-
-def test_relaciones_chapter_no_figure_when_no_edges():
-    """Sin edges, no se añade bloque Figure (capítulo dice 'sin FK')."""
-    db_profile = {"join_graph": {"nodes": [], "edges": [], "mermaid": "",
-                                 "hubs": []}, "fk_candidates": []}
-    ch = _relaciones_chapter(db_profile)
-    assert not [b for b in ch["blocks"] if b.get("kind") == "figure"]