feat(eda): rasterizar join graph a Figure matplotlib real en el capitulo de relaciones

draw_join_graph_figure (datascience, grupo eda): dibuja el join graph de la base como una matplotlib Figure real (networkx spring_layout seed=42, nodos = tablas, hubs destacados, flechas dirigidas con etiqueta from_col->to_col + cardinalidad). Nunca lanza: devuelve una Figure de error si algo falla; entrada vacia -> Figure 'Sin relaciones FK detectadas'. render_automatic_eda_folder ahora inserta esa Figure (bloque Figure lazy via make) en el capitulo de relaciones cuando hay edges, ademas del texto Mermaid (util para el MD/LLM). Antes solo se volcaba el texto del grafo; ahora el PDF/PPTX muestran el diagrama dibujado. Tests nuevos: la Figure real se construye con edges y se omite sin edges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat(eda): EDA de carpeta/base multi-tabla -> AutomaticEDA por capitulos (PDF+PPTX+MD)
2026-06-30 20:57:52 +02:00 · 2026-06-30 20:34:10 +02:00 · 2026-06-30 19:53:57 +02:00 · 2026-06-30 19:45:09 +02:00 · 2026-06-30 19:26:33 +02:00 · 2026-06-30 19:24:22 +02:00
22 changed files with 2713 additions and 103 deletions
@@ -64,6 +64,7 @@ from .exploratory_caveats import exploratory_caveats
 from .render_eda_pdf import render_eda_pdf, render_eda_pdf_relational
 from .render_automatic_eda_pdf import render_automatic_eda_pdf
 from .render_automatic_eda_pptx import render_automatic_eda_pptx
+from .render_automatic_eda_markdown import render_automatic_eda_markdown
 from .detect_time_column import detect_time_column
 from .extract_timeseries_raw import extract_timeseries_raw
 from .build_eda_render_ctx import build_eda_render_ctx
@@ -71,8 +72,10 @@ from .profile_datetime import profile_datetime
 from .resample_timeseries import resample_timeseries
 from .add_pdf_internal_links import add_pdf_internal_links
 from .suggest_intratable_fk_candidates import suggest_intratable_fk_candidates
+from .draw_join_graph_figure import draw_join_graph_figure

 __all__ = [
+    "draw_join_graph_figure",
    "suggest_intratable_fk_candidates",
    "detect_time_column",
    "extract_timeseries_raw",
@@ -82,6 +85,7 @@ __all__ = [
    "resample_timeseries",
    "render_automatic_eda_pdf",
    "render_automatic_eda_pptx",
+    "render_automatic_eda_markdown",
    "decode_qr_image",
    "adf_kpss_stationarity",
    "acf_pacf",
@@ -36,6 +36,7 @@ from .model import (  # noqa: F401
 from .chapters_registry import CHAPTER_ORDER, build_chapter, build_document  # noqa: F401
 from .render_pdf_impl import render_pdf  # noqa: F401
 from .render_pptx_impl import render_pptx  # noqa: F401
+from .render_md_impl import render_md  # noqa: F401

 __all__ = [
    "ENGINE_NAME",
@@ -60,4 +61,5 @@ __all__ = [
    "build_document",
    "render_pdf",
    "render_pptx",
+    "render_md",
 ]
@@ -1,19 +1,25 @@
 """Categorical distributions chapter (CAT DISTR).

-Third reference chapter for AutomaticEDA. For every categorical column it shows,
-fulfilling the user's request:
+Third reference chapter for AutomaticEDA. Each categorical column gets **its own
+page (PDF) / slide (PPTX)**: every column is wrapped in a keep-together
+``model.Group`` with ``page_break_before=True`` (except the first, which may share
+the intro's page), so its chart sits next to its tables and no column is split.

-1. A short opening explanation of **Shannon entropy** (what it measures, its 0
-   and log2(k) bounds, the normalized 0–1 version) and the dataset row total used
-   as a comparison baseline.
-2. Per column, a cardinality key/value table: distinct values, ``% distinct``
-   (distinct / total rows), total dataset rows, singleton values (frequency 1),
-   entropy with its theoretical maximum and the normalized ratio, mode, imbalance
-   and string-length stats.
-3. A short note flagging problematic cardinality (id-like ≈100% distinct, or a
+A short intro names the clickable **[[term:entropia]]entropía[[/term]]** term —
+the full definition lives in the GLOSARIO chapter, so it is NOT repeated inline
+here (one click jumps to the glossary entry). The intro also carries the dataset
+row total used as a comparison baseline.
+
+Per column the Group contains, in order:
+
+1. A cardinality key/value table: distinct values, ``% distinct`` (distinct /
+   total rows), total dataset rows, singleton values (frequency 1), entropy with
+   its theoretical maximum and the normalized ratio, mode, imbalance and
+   string-length stats.
+2. A short note flagging problematic cardinality (id-like ≈100% distinct, or a
   single dominating category).
-4. A ``top-k`` table (value / count / %).
-5. A **donut pie chart** of the most common categories (top-k + an "Otros"
+3. A ``top-k`` table (value / count / %).
+4. A **donut pie chart** of the most common categories (top-k + an "Otros"
   bucket), drawn lazily so the renderers scale it to fit entirely.

 Data comes from the ``eda`` group: each ``columns[i]['categorical']`` is the
@@ -33,7 +39,7 @@ import math

 from .. import model

-CHAPTER_VERSION = "1.1.0"
+CHAPTER_VERSION = "1.2.0"
 CHAPTER_ID = "cat_distr"
 CHAPTER_TITLE = "Distribuciones categóricas"

@@ -53,11 +59,17 @@ _TERM_ENTROPIA_DEF = (
 # Cap the number of categorical columns rendered to keep the document bounded;
 # the rest are summarized in a closing note (no silent truncation).
 MAX_COLS = 40
-# Rows shown in each top-k table and explicit slices in the pie.
-TOP_TABLE_ROWS = 15
+# Rows shown in each top-k table and explicit slices in the pie. Kept moderate so
+# the whole column — cardinality table + top-k table + donut — fits on ONE
+# page/slide with the chart next to its tables; the table note still reports
+# "top N of M" so nothing is silently hidden. For id-like columns (≈100%
+# distinct) the top-k table is dropped entirely (it would be a list of unique
+# values — pure noise), which also frees the room the donut needs (see build).
+TOP_TABLE_ROWS = 8
 PIE_TOP_K = 6
-# Truncate very long category labels in tables (the renderer also wraps).
-LABEL_MAX = 48
+# Truncate very long category labels in tables (the renderer also wraps). Kept
+# tight so a column with long id-like values (names, tickets) still fits its page.
+LABEL_MAX = 28


 def _fmt_int(value) -> str:
@@ -267,45 +279,55 @@ def _normalize_card(card: dict) -> dict:


 def _cardinality_block(card: dict):
-    """KVTable with the cardinality / entropy metrics for one column."""
+    """KVTable with the cardinality / entropy metrics for one column.
+
+    Related metrics are grouped onto a single row each (distinct/%/unique;
+    entropy bits/max/normalized; length min/mean/max) so the whole column —
+    table + chart — fits one page/slide without dropping any datum; the short
+    16:9 PPTX slide does not fit one metric per row plus a chart otherwise."""
    n_singletons = card.get("n_singletons")
    if n_singletons is not None and card.get("n_singletons_partial"):
-        singletons = f"≥{_fmt_int(n_singletons)} (en top mostrado)"
+        singletons = f"≥{_fmt_int(n_singletons)}"
    elif n_singletons is not None:
        singletons = _fmt_int(n_singletons)
    else:
        singletons = "—"

-    entropy_ref = _fmt_num(card.get("entropy"))
-    emax = card.get("entropy_max")
-    if emax is not None:
-        entropy_ref = f"{entropy_ref} (máx {_fmt_num(emax)})"
+    # Distinct count · % distinct · unique (frequency 1) on one row.
+    distinct_combo = (f"{_fmt_int(card.get('n_distinct'))} · "
+                      f"{_fmt_pct_value(card.get('pct_distinct'))} · "
+                      f"{singletons} únicos")
+
+    # Entropy bits · theoretical max · normalized 0–1 on one row.
+    entropy_combo = (f"{_fmt_num(card.get('entropy'))} bits · "
+                     f"máx {_fmt_num(card.get('entropy_max'))} · "
+                     f"norm {_fmt_num(card.get('entropy_norm'))}")

    mode = card.get("mode")
    mode_pct = card.get("mode_pct")
-    mode_str = "—" if mode is None else model._safe_str(mode)
+    mode_str = "—" if mode is None else _truncate(mode, 32)
    if mode is not None and mode_pct is not None:
        mode_str = f"{mode_str} ({_fmt_pct_value(mode_pct)})"

    rows = [
-        ("Valores distintos", _fmt_int(card.get("n_distinct"))),
-        ("% distintos", _fmt_pct_value(card.get("pct_distinct"))),
+        ("Distintos · % · únicos", distinct_combo),
        ("Total filas (dataset)", _fmt_int(card.get("n_rows"))),
-        ("Valores únicos (frecuencia 1)", singletons),
-        ("Entropía (bits)", entropy_ref),
-        ("Entropía normalizada (0–1)", _fmt_num(card.get("entropy_norm"))),
+        ("Entropía (bits · máx · norm)", entropy_combo),
        ("Moda", mode_str),
    ]
    imbalance = card.get("imbalance")
-    if imbalance is not None:
-        rows.append(("Desbalance", _fmt_num(imbalance)))
    lm = card.get("len_min")
    lmean = card.get("len_mean")
    lmax = card.get("len_max")
+    # Imbalance and string length (both secondary) share one closing row.
+    extras = []
+    if imbalance is not None:
+        extras.append(f"desbalance {_fmt_num(imbalance)}")
    if any(v is not None for v in (lm, lmean, lmax)):
-        rows.append((
-            "Longitud (mín/media/máx)",
-            f"{_fmt_num(lm)} / {_fmt_num(lmean)} / {_fmt_num(lmax)}"))
+        extras.append(
+            f"long. {_fmt_num(lm)}/{_fmt_num(lmean)}/{_fmt_num(lmax)}")
+    if extras:
+        rows.append(("Desbalance · longitud", " · ".join(extras)))
    return model.KVTable(rows=rows, title="Cardinalidad")


@@ -315,7 +337,8 @@ def _flag_note(card: dict):
        return model.Note(
            "Casi todos los valores son distintos (≈100% distintos): la columna "
            "se comporta como un identificador y aporta poco para agrupar o "
-            "comparar categorías.")
+            "comparar categorías. No se lista el top de categorías (serían "
+            "valores casi todos únicos).")
    if card.get("dominated"):
        mp = card.get("mode_pct")
        mp_str = _fmt_pct_value(mp) if mp is not None else "muy alta"
@@ -335,7 +358,7 @@ def _topk_table(cat: dict):
        if not isinstance(t, dict):
            continue
        rows.append([
-            model._safe_str(t.get("value")),
+            _truncate(t.get("value")),
            _fmt_int(t.get("count")),
            _pct_from_maybe_fraction(t.get("pct")),
        ])
@@ -353,20 +376,16 @@ def _topk_table(cat: dict):
 def _intro_blocks(n_rows, mark_term: bool = False):
    total = _fmt_int(n_rows)
    # Mark the first appearance of the term as a clickable glossary jump when the
-    # term was registered (mark_term). The visible text is identical either way.
-    entropia = ("[[term:entropia]]**entropía de Shannon**[[/term]]" if mark_term
-                else "**entropía de Shannon**")
+    # term was registered (mark_term). The full definition of entropy lives in the
+    # GLOSARIO chapter, so the intro only names the clickable term here instead of
+    # repeating the long explanation (avoids the redundancy with the glossary).
+    entropia = ("[[term:entropia]]entropía[[/term]]" if mark_term
+                else "entropía")
    text = (
-        f"La {entropia} mide cómo de repartidos están los valores de "
-        "una columna categórica, en bits. Vale 0 cuando una sola categoría "
-        "concentra todas las filas (máxima previsibilidad) y alcanza su máximo, "
-        "log2(k) para k categorías distintas, cuando todas aparecen por igual "
-        "(máxima diversidad). La **entropía normalizada** (entropía dividida por "
-        "su máximo) la lleva al rango 0–1 para comparar columnas con distinto "
-        "número de categorías. Para cada columna se muestran los valores "
-        "distintos, el porcentaje que representan sobre el total de filas, los "
-        "valores únicos (que aparecen una sola vez), la tabla de las categorías "
-        "más frecuentes y un gráfico de tarta (donut) de las más comunes."
+        f"Cada columna categórica ocupa su propia página: sus métricas de "
+        f"cardinalidad —incluida la {entropia}—, una nota que señala cardinalidad "
+        "problemática, la tabla de las categorías más frecuentes y un gráfico de "
+        "tarta (donut) de las más comunes, todo junto."
    )
    if n_rows is not None:
        text += f" El dataset tiene {total} filas en total como referencia."
@@ -398,24 +417,37 @@ def build_cat_distr(profile: dict, ctx: dict):
    blocks = list(_intro_blocks(n_rows, mark_term=mark_term))

    rendered = cat_cols[:MAX_COLS]
-    for col in rendered:
+    for idx, col in enumerate(rendered):
        name = col.get("name") or "(columna)"
        cat = col.get("categorical") or {}
        card = _normalize_card(_cardinality(cat, n_rows))

-        blocks.append(model.Heading(text=str(name), level=2))
-        blocks.append(_cardinality_block(card))
+        # One Group per categorical column: heading + cardinality table + flag
+        # note + top-k table + donut figure are kept together and the renderer
+        # starts each on a fresh page/slide (page_break_before) so every column
+        # gets its own page with its chart next to its tables. The first column
+        # may share the intro's page (no forced break) to avoid a near-empty page.
+        col_blocks = [
+            model.Heading(text=str(name), level=2),
+            _cardinality_block(card),
+        ]
        note = _flag_note(card)
        if note is not None:
-            blocks.append(note)
-        topk = _topk_table(cat)
-        if topk is not None:
-            blocks.append(topk)
-        blocks.append(model.Figure(
+            col_blocks.append(note)
+        # For id-like columns (≈100% distinct) the top-k is a list of unique
+        # values — pure noise; skip it (the flag note already explains why) and
+        # let the donut take that room so the whole column fits one page/slide.
+        if not card.get("id_like"):
+            topk = _topk_table(cat)
+            if topk is not None:
+                col_blocks.append(topk)
+        col_blocks.append(model.Figure(
            make=_pie_make(cat.get("top") or [], card.get("n_distinct"),
                           str(name), n_rows),
            caption=(f"Categorías más comunes de «{_truncate(name, 32)}» "
                     "(donut: top-k + «Otros»)")))
+        blocks.append(model.Group(blocks=col_blocks,
+                                  page_break_before=(idx > 0)))

    if len(cat_cols) > len(rendered):
        omitted = len(cat_cols) - len(rendered)
@@ -2,11 +2,14 @@

 Self-contained: builds synthetic TableProfiles (no DuckDB) so the suite is fast
 and deterministic. Verifies that ``build_cat_distr`` emits the blocks the user
-asked for (entropy intro, distinct/total/%-distinct/unique metrics, top-k table
-and a donut figure), that the chapter renders inside the full document to both
-PDF and PPTX showing that content, that a profile with no categorical columns
-yields ``None`` without raising, and that long labels / many columns are never
-cut in either output.
+asked for (distinct/total/%-distinct/unique metrics, top-k table and a donut
+figure), that EACH categorical column is wrapped in its own keep-together
+``Group`` that starts on a fresh page/slide (one column per page, chart next to
+its tables), that the long entropy explanation is NOT repeated inline (it lives
+in the glossary — only the clickable term is kept), that the chapter renders
+inside the full document to both PDF and PPTX showing that content, that a
+profile with no categorical columns yields ``None`` without raising, and that
+long labels / many columns are never cut in either output.
 """

 import os
@@ -17,7 +20,8 @@ from pypdf import PdfReader
 from pptx import Presentation

 from datascience.automatic_eda.model import (
-    DataTable, Figure, Heading, KVTable, Note,
+    DataTable, Figure, GlossaryCollector, Group, Heading, KVTable, Markdown,
+    Note,
 )
 from datascience.automatic_eda.chapters.cat_distr import (
    CHAPTER_ID, CHAPTER_VERSION, build_cat_distr,
@@ -81,8 +85,20 @@ def _pptx_text(path: str) -> str:
    return re.sub(r"\s+", " ", " ".join(parts))


-def _kinds(chapter):
-    return [b.kind for b in chapter.blocks]
+def _flatten(blocks):
+    """Expand keep-together Groups so the per-column heading/table/figure are
+    inspectable as a flat block list (the chapter wraps each column in a Group)."""
+    out = []
+    for b in blocks:
+        if getattr(b, "kind", "") == "group":
+            out.extend(_flatten(getattr(b, "blocks", []) or []))
+        else:
+            out.append(b)
+    return out
+
+
+def _column_groups(chapter):
+    return [b for b in chapter.blocks if isinstance(b, Group)]


 def test_golden_build_cat_distr_emite_bloques_pedidos():
@@ -90,36 +106,101 @@ def test_golden_build_cat_distr_emite_bloques_pedidos():
    assert ch is not None
    assert ch.id == CHAPTER_ID
    assert ch.version == CHAPTER_VERSION
-    kinds = _kinds(ch)
-    # Entropy intro present.
+
+    # Entropy intro present, but the long explanation is gone (it lives in the
+    # glossary now): only the term is named, no log2/normalizada walkthrough.
    headings = [b.text for b in ch.blocks if isinstance(b, Heading)]
    assert any("Entrop" in h for h in headings)
-    md = next(b for b in ch.blocks if b.kind == "markdown")
-    assert "entropía" in md.text.lower() and "log2" in md.text
-    # Cardinality metrics: distinct, total rows, %-distinct, unique values.
-    kv = next(b for b in ch.blocks if isinstance(b, KVTable))
+    md = next(b for b in ch.blocks if isinstance(b, Markdown))
+    assert "entropía" in md.text.lower()
+    assert "log2" not in md.text          # redundant explanation removed.
+    assert "máxima diversidad" not in md.text
+
+    # Per-column blocks are wrapped in keep-together Groups: flatten to inspect.
+    flat = _flatten(ch.blocks)
+    kv = next(b for b in flat if isinstance(b, KVTable))
    labels = [r[0] for r in kv.rows]
-    assert "Valores distintos" in labels
-    assert "% distintos" in labels
+    values = " ".join(str(r[1]) for r in kv.rows)
+    # Cardinality metrics: distinct count, %-distinct, unique values and total
+    # rows are present (grouped onto compact rows so the chart fits the page).
+    assert "Distintos · % · únicos" in labels
    assert "Total filas (dataset)" in labels
-    assert "Valores únicos (frecuencia 1)" in labels
    assert any("Entropía" in lbl for lbl in labels)
+    assert "únicos" in values and "%" in values
+    assert "bits" in values and "norm" in values   # entropy + max + normalized.
    # Top-k table + pie figure.
-    dt = next(b for b in ch.blocks if isinstance(b, DataTable))
+    dt = next(b for b in flat if isinstance(b, DataTable))
    assert dt.header == ["Valor", "Conteo", "%"]
    assert any("neumaticos" in str(cell) for row in dt.rows for cell in row)
-    assert any(isinstance(b, Figure) for b in ch.blocks)
-    # id-like column flagged with a Note.
-    assert any(isinstance(b, Note) and "identificador" in b.text
-               for b in ch.blocks)
+    assert any(isinstance(b, Figure) for b in flat)
+    # id-like column flagged with a Note that also explains the top-k is dropped.
+    idnote = next((b for b in flat
+                   if isinstance(b, Note) and "identificador" in b.text), None)
+    assert idnote is not None
+    assert "No se lista el top" in idnote.text


-def test_golden_render_pdf_muestra_categoricas():
+def test_golden_idlike_omite_topk_y_conserva_donut():
+    # The id-like column (uuid, 100% distinct) must NOT carry a top-k DataTable
+    # (it would be a list of unique values), but must still keep its donut Figure
+    # and its cardinality table so it stays a full per-column page.
+    ch = build_cat_distr(_profile(), {})
+    groups = _column_groups(ch)
+    uuid_group = next(g for g in groups
+                      if any(getattr(b, "text", "") == "uuid" for b in g.blocks))
+    kinds = [b.kind for b in uuid_group.blocks]
+    assert "data_table" not in kinds      # top-k of unique values dropped.
+    assert "kv_table" in kinds            # cardinality kept.
+    assert "figure" in kinds              # donut kept (chart per column).
+    # A non-id-like column keeps its top-k table.
+    cat_group = next(g for g in groups
+                     if any(getattr(b, "text", "") == "categoria"
+                            for b in g.blocks))
+    assert "data_table" in [b.kind for b in cat_group.blocks]
+
+
+def test_golden_una_pagina_por_columna_groups():
+    ch = build_cat_distr(_profile(), {})
+    groups = _column_groups(ch)
+    # Two categorical columns -> two column Groups (numeric column excluded).
+    assert len(groups) == 2
+    # Each Group carries one column: a heading + its cardinality table + figure.
+    for g in groups:
+        kinds = [b.kind for b in g.blocks]
+        assert kinds[0] == "heading"
+        assert "kv_table" in kinds
+        assert "figure" in kinds
+    # The first column may share the intro page (no forced break); every later
+    # column starts on a fresh page/slide so each column gets its own page.
+    assert groups[0].page_break_before is False
+    assert all(g.page_break_before is True for g in groups[1:])
+
+
+def test_golden_entropia_clicable_y_definicion_en_glosario():
+    # With a glossary collector the intro marks the clickable term and the FULL
+    # definition (the long explanation removed from the intro) lands in the
+    # glossary, not inline — no data lost, just relocated.
+    gc = GlossaryCollector()
+    ch = build_cat_distr(_profile(), {"glossary": gc})
+    md = next(b for b in ch.blocks if isinstance(b, Markdown))
+    assert "[[term:entropia]]entropía[[/term]]" in md.text
+    assert gc.has("entropia")
+    entry = gc.get("entropia")
+    assert entry is not None
+    # The definition kept in the glossary still carries the detail removed inline.
+    assert "log2" in entry["definition"]
+    assert "normalizada" in entry["definition"].lower()
+
+
+def test_golden_render_pdf_una_pagina_por_columna():
    with tempfile.TemporaryDirectory() as d:
        out = os.path.join(d, "eda.pdf")
        res = render_automatic_eda_pdf(_profile(), out, {"title": "EDA"})
        assert res["path"] == out and os.path.exists(out)
-        assert CHAPTER_ID in [c["id"] for c in res["chapters"]]
+        cat_meta = next(c for c in res["chapters"] if c["id"] == CHAPTER_ID)
+        # Two categorical columns, each on its own page -> >= 2 pages for the
+        # chapter (intro shares the first column's page).
+        assert cat_meta["n_pages"] >= 2
        txt = _pdf_text(out)
        assert "Entrop" in txt
        assert "distintos" in txt
@@ -133,13 +214,91 @@ def test_golden_render_pptx_muestra_categoricas():
        out = os.path.join(d, "eda.pptx")
        res = render_automatic_eda_pptx(_profile(), out, {"title": "EDA"})
        assert res["path"] == out and os.path.exists(out)
-        assert CHAPTER_ID in [c["id"] for c in res["chapters"]]
+        cat_meta = next(c for c in res["chapters"] if c["id"] == CHAPTER_ID)
+        assert cat_meta["n_slides"] >= 2  # one slide per categorical column.
        txt = _pptx_text(out)
        assert "Entrop" in txt
        assert "categoria" in txt and "neumaticos" in txt
        assert "distintos" in txt


+def _profile_high_card() -> dict:
+    """Profile with a high-cardinality NON-id-like categorical column whose top-k
+    of long values would split from its donut on a short 16:9 slide unless the
+    renderer trims the table — the exact case the adversarial check flagged
+    (Ticket / Cabin)."""
+    long_vals = [f"Valor largo de categoria numero {i:02d} con texto extra"
+                 for i in range(40)]
+    top = [{"value": v, "count": 60 - i, "pct": (60 - i) / 5000.0}
+           for i, v in enumerate(long_vals)]
+    return {
+        "table": "t", "source": "t.csv", "n_rows": 5000, "n_cols": 3,
+        "quality_score": 80.0,
+        "columns": [
+            {"name": "precio", "inferred_type": "numeric", "null_pct": 0.0,
+             "numeric": {"mean": 1.0, "median": 1.0, "min": 0.0, "max": 2.0,
+                         "std": 0.5}},
+            # 40 distinct over 5000 rows = 0.8% distinct -> NOT id-like, keeps
+            # its (long) top-k table; the tall table must not push the donut off.
+            {"name": "alta_card_col", "inferred_type": "categorical",
+             "null_pct": 0.0, "distinct_count": 40,
+             "categorical": {"top": top, "mode": long_vals[0], "n_distinct": 40,
+                             "entropy": 5.2, "imbalance": 1.2, "len_min": 40,
+                             "len_mean": 45, "len_max": 50}},
+            {"name": "baja_card_col", "inferred_type": "categorical",
+             "null_pct": 0.0, "distinct_count": 4,
+             "categorical": {
+                 "top": [{"value": "norte", "count": 2000, "pct": 0.4},
+                         {"value": "sur", "count": 1500, "pct": 0.3},
+                         {"value": "este", "count": 1000, "pct": 0.2},
+                         {"value": "oeste", "count": 500, "pct": 0.1}],
+                 "mode": "norte", "n_distinct": 4, "entropy": 1.8}},
+        ],
+    }
+
+
+def test_golden_pptx_una_slide_por_columna_con_su_grafico():
+    """Each categorical column occupies EXACTLY ONE cat_distr slide that carries
+    BOTH its cardinality table and its donut figure (picture) — i.e. the chart is
+    never separated from its table, even for a high-cardinality column."""
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+
+    prof = _profile_high_card()
+    cat_names = ["alta_card_col", "baja_card_col"]
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "eda.pptx")
+        res = render_automatic_eda_pptx(prof, out, {"title": "EDA"})
+        assert res["path"] == out and os.path.exists(out)
+        prs = Presentation(out)
+
+        # Per column: the cat_distr slides whose text mentions it, and whether the
+        # owning slide also has the donut caption + an actual picture shape.
+        slides_with_col = {n: [] for n in cat_names}
+        owner_has_chart = {n: False for n in cat_names}
+        for i, sl in enumerate(prs.slides):
+            texts, has_pic = [], False
+            for sh in sl.shapes:
+                if sh.has_text_frame:
+                    texts.append(sh.text_frame.text)
+                if sh.shape_type == MSO_SHAPE_TYPE.PICTURE:
+                    has_pic = True
+            txt = re.sub(r"\s+", " ", " ".join(texts))
+            if "Distribuciones categ" not in txt:   # footer stamp of the chapter.
+                continue
+            for n in cat_names:
+                if n in txt:
+                    slides_with_col[n].append(i)
+                    has_table = "Cardinalidad" in txt or "distintos" in txt
+                    if has_pic and "donut" in txt and has_table:
+                        owner_has_chart[n] = True
+
+        for n in cat_names:
+            # Exactly one slide carries the column (not split across slides).
+            assert len(slides_with_col[n]) == 1, (n, slides_with_col[n])
+            # That single slide also holds its table AND its donut picture.
+            assert owner_has_chart[n], (n, "tabla y donut no están en el mismo slide")
+
+
 def test_edge_sin_categoricas_devuelve_none():
    only_numeric = {
        "n_rows": 10, "columns": [
@@ -170,11 +329,15 @@ def test_anti_corte_label_largo_y_muchas_columnas():

    ch = build_cat_distr(profile, {})
    assert ch is not None
+    # One Group per column, each forcing its own page (except the first).
+    groups = _column_groups(ch)
+    assert len(groups) == 30
+    assert sum(1 for g in groups if g.page_break_before) == 29
    with tempfile.TemporaryDirectory() as d:
        pdf = os.path.join(d, "anti.pdf")
        res = render_automatic_eda_pdf(profile, pdf, {"write_manifest": False})
        assert res["path"] == pdf
-        assert res["n_pages"] > 1       # many columns spilled across pages, OK.
+        assert res["n_pages"] > 1       # one page per column, OK.
        txt = _pdf_text(pdf)
        # Long label wrapped (not truncated): every word survives.
        for word in ("Lorem", "incididunt", "reprehenderit", "voluptate"):
@@ -139,10 +139,17 @@ class Group:
    it starts on a fresh page and flows (honest degradation, never cut). Use it to
    bind ``Heading`` + ``Markdown`` + ``Figure`` of one idea together (see the
    DISTR NUM / AGREGACION chapters).
+
+    When ``page_break_before`` is True the renderer additionally forces the group
+    to *start* on a fresh page/slide (unless the current one is already empty), so
+    a chapter can give each unit its own page — e.g. one categorical column per
+    page (see CAT DISTR). It is purely additive: the default False keeps the plain
+    keep-together behaviour for every existing chapter.
    """

    blocks: list = field(default_factory=list)
    title: Optional[str] = None
+    page_break_before: bool = False
    kind: str = field(default="group", init=False)


@@ -228,7 +235,9 @@ def as_block(obj: Any):
                return Note(text=_safe_str(obj.get("text")))
            if cls is Group:
                return Group(blocks=as_blocks(obj.get("blocks")),
-                             title=obj.get("title"))
+                             title=obj.get("title"),
+                             page_break_before=bool(
+                                 obj.get("page_break_before", False)))
            if cls is GlossaryEntry:
                return GlossaryEntry(key=_safe_str(obj.get("key")),
                                     label=_safe_str(obj.get("label")),
@@ -0,0 +1,458 @@
+"""AutomaticEDA Markdown serializer — one self-contained file to paste to an LLM.
+
+Same document model as the PDF/PPTX renderers (an ordered list of
+:class:`Chapter`, each a list of format-independent blocks) but emitted as plain
+**Markdown** instead of a binary. The goal is different from the other two
+renderers: a Markdown EDA is meant to be *pasted into an LLM*, so it prioritises
+TEXT and DATA over visuals. Tables become Markdown tables (every row dumped, no
+pagination — nothing is cut because there are no pages); a ``Figure`` becomes its
+caption plus, when possible, the underlying bar/histogram data as a Markdown
+table (an LLM cannot see the image); glossary term markers are stripped while
+``**bold**`` is kept (it is valid Markdown).
+
+dict-no-throw (the ``eda`` group style): :func:`render_md` never raises. On a
+fatal error it returns ``{path: None, ...}`` with a ``note`` explaining why; a
+malformed block degrades to a readable note rather than crashing the document.
+"""
+
+from __future__ import annotations
+
+import os
+import re
+
+from . import model
+
+# Glossary span markers (kept text, dropped markers). We intentionally do NOT use
+# ``text_layout.strip_inline_md`` for Markdown blocks because that also removes
+# ``**bold**`` — valid Markdown we want to preserve when pasting to an LLM.
+_TERM_OPEN_RE = re.compile(r"\[\[term:[A-Za-z0-9_]+\]\]")
+_MAX_BAR_ROWS = 100
+
+
+# --------------------------------------------------------------------------- #
+# Small helpers.
+# --------------------------------------------------------------------------- #
+def _clean_terms(s) -> str:
+    """Drop glossary term markers, keeping the visible text (and any **bold**)."""
+    s = model._safe_str(s)
+    s = _TERM_OPEN_RE.sub("", s)
+    return s.replace("[[/term]]", "")
+
+
+def _cell(v) -> str:
+    """Render a value as a safe Markdown table cell.
+
+    Escapes pipes (``|`` -> ``\\|``) so they do not break the column layout and
+    folds newlines to ``<br>`` so a multi-line value stays inside one cell. None
+    becomes an empty string.
+    """
+    s = model._safe_str(v)
+    s = s.replace("|", "\\|")
+    s = s.replace("\r\n", "\n").replace("\r", "\n").replace("\n", "<br>")
+    return s
+
+
+def _slug(text: str) -> str:
+    """GitHub-style heading anchor: lowercase, spaces->'-', drop other symbols."""
+    s = model._safe_str(text).strip().lower()
+    out = []
+    for ch in s:
+        if ch.isalnum():
+            out.append(ch)
+        elif ch in " -":
+            out.append("-")
+        # any other symbol is dropped.
+    slug = "".join(out)
+    while "--" in slug:
+        slug = slug.replace("--", "-")
+    return slug.strip("-")
+
+
+def _fmt_num(v) -> str:
+    """Compact number for the figure data tables (ints as ints, else 4 sig figs)."""
+    try:
+        f = float(v)
+    except Exception:  # noqa: BLE001
+        return model._safe_str(v)
+    if f != f:  # NaN
+        return "NaN"
+    if f == int(f) and abs(f) < 1e15:
+        return str(int(f))
+    return f"{f:.4g}"
+
+
+def _fmt_int(v) -> str:
+    try:
+        return str(int(v))
+    except Exception:  # noqa: BLE001
+        return model._safe_str(v)
+
+
+def _now_iso() -> str:
+    from datetime import datetime, timezone
+    return datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
+
+
+# --------------------------------------------------------------------------- #
+# Document header (title + metadata blockquote + numbered index).
+# --------------------------------------------------------------------------- #
+def _meta_block(meta: dict) -> list:
+    """Build the metadata lines for the header blockquote (omitting absentees)."""
+    ctx = meta.get("ctx") if isinstance(meta.get("ctx"), dict) else {}
+    lines: list = []
+
+    def add(label, value) -> None:
+        if value is None:
+            return
+        s = model._safe_str(value).strip()
+        if s and s.lower() != "none":
+            lines.append(f"**{label}:** {s}")
+
+    add("Dataset", ctx.get("dataset_name") or meta.get("dataset_name"))
+    add("Fuente", ctx.get("source_origin") or meta.get("source_origin"))
+    add("Almacenamiento", ctx.get("storage") or meta.get("storage"))
+    n_rows = ctx.get("n_rows", meta.get("n_rows"))
+    n_cols = ctx.get("n_cols", meta.get("n_cols"))
+    if n_rows is not None and n_cols is not None:
+        lines.append(
+            f"**Dimensiones:** {_fmt_int(n_rows)} filas × {_fmt_int(n_cols)} columnas")
+    add("Generado", meta.get("generated_at") or _now_iso())
+    lines.append(f"**Motor:** {model.ENGINE_NAME} v{model.ENGINE_VERSION}")
+    return lines
+
+
+# --------------------------------------------------------------------------- #
+# Per-block serializers. Each returns a Markdown string (no surrounding blanks;
+# the caller separates blocks with a blank line).
+# --------------------------------------------------------------------------- #
+def _md_heading(block) -> str:
+    level = int(getattr(block, "level", 1) or 1)
+    hashes = "#" * min(level + 2, 6)  # level1 -> ###; '#'/'##' reserved for doc/chapter.
+    text = _clean_terms(getattr(block, "text", "")).strip()
+    return f"{hashes} {text}"
+
+
+def _md_markdown(block) -> str:
+    # Keep the text verbatim, dropping only glossary markers (keep **bold**).
+    return _clean_terms(getattr(block, "text", "")).rstrip("\n")
+
+
+def _md_kv_table(block) -> str:
+    lines: list = []
+    title = getattr(block, "title", None)
+    if title:
+        lines.append(f"**{_clean_terms(title).strip()}**")
+        lines.append("")
+    lines.append("| Campo | Valor |")
+    lines.append("| --- | --- |")
+    for row in (getattr(block, "rows", []) or []):
+        try:
+            label, value = row[0], row[1]
+        except Exception:  # noqa: BLE001
+            label, value = row, ""
+        lines.append(f"| {_cell(label)} | {_cell(value)} |")
+    return "\n".join(lines)
+
+
+def _md_data_table(block) -> str:
+    lines: list = []
+    title = getattr(block, "title", None)
+    if title:
+        lines.append(f"**{_clean_terms(title).strip()}**")
+        lines.append("")
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    if not header:
+        ncol = max((len(r) for r in rows), default=1)
+        header = [f"col{i + 1}" for i in range(ncol)]
+    ncol = len(header)
+    lines.append("| " + " | ".join(_cell(h) for h in header) + " |")
+    lines.append("| " + " | ".join(["---"] * ncol) + " |")
+    for r in rows:  # dump every row — no pagination, nothing cut.
+        cells = [_cell(r[c]) if c < len(r) else "" for c in range(ncol)]
+        lines.append("| " + " | ".join(cells) + " |")
+    note = getattr(block, "note", None)
+    if note:
+        lines.append("")
+        lines.append(f"*{_clean_terms(note).strip()}*")
+    return "\n".join(lines)
+
+
+def _bars_table(bars: list) -> str:
+    """Render extracted bar/histogram data as a Markdown table (Desde/Hasta/Frec)."""
+    lines = ["| Desde | Hasta | Frecuencia |", "| --- | --- | --- |"]
+    shown = bars[:_MAX_BAR_ROWS]
+    for x0, x1, h in shown:
+        lines.append(f"| {_fmt_num(x0)} | {_fmt_num(x1)} | {_fmt_num(h)} |")
+    out = "\n".join(lines)
+    extra = len(bars) - len(shown)
+    if extra > 0:
+        out += f"\n\n*… ({extra} filas más)*"
+    return out
+
+
+def _extract_bars(fig) -> list:
+    """Collect (x_from, x_to, height) of the rectangular bars of a matplotlib fig.
+
+    Histogram / bar-chart bars are ``matplotlib.patches.Rectangle`` with positive
+    width and height; spines, legends and zero-area artists are skipped. Never
+    raises — returns ``[]`` on any problem.
+    """
+    bars: list = []
+    try:
+        for ax in fig.get_axes():
+            # Collect this axes' positive-area rectangles, then keep only the ones
+            # that look like actual histogram/bar bins. Reference shapes that
+            # matplotlib also stores in ``ax.patches`` — most notably the ``±1σ``
+            # band drawn by ``axvspan`` (a single rectangle far wider than a bin)
+            # and a lone Tukey boxplot box — would otherwise show up as fake
+            # "bins". A histogram axes has several near-equal-width bars, so we
+            # drop any rectangle whose width is more than twice the median width
+            # of that axes' rectangles (the σ-band spans many bins; uniform bins
+            # all sit at the median width and stay).
+            ax_bars: list = []
+            for patch in list(getattr(ax, "patches", []) or []):
+                try:
+                    w = patch.get_width()
+                    h = patch.get_height()
+                    x = patch.get_x()
+                except Exception:  # noqa: BLE001 — not a Rectangle-like patch.
+                    continue
+                if w and w > 0 and h and h > 0:
+                    ax_bars.append((x, x + w, h))
+            if len(ax_bars) >= 3:
+                widths = sorted(b[1] - b[0] for b in ax_bars)
+                median_w = widths[len(widths) // 2]
+                if median_w > 0:
+                    ax_bars = [b for b in ax_bars
+                               if (b[1] - b[0]) <= 2.0 * median_w]
+            bars.extend(ax_bars)
+    except Exception:  # noqa: BLE001
+        return []
+    return bars
+
+
+def _md_figure(block, meta: dict, out_path: str, counter: list) -> str:
+    """Serialize a Figure prioritising TEXT + DATA (an LLM cannot see the image).
+
+    Emits the caption, then — if the matplotlib figure has bars — a Markdown table
+    of the underlying (Desde, Hasta, Frecuencia) values. Optionally (when
+    ``meta['embed_figures']`` is True) also exports a PNG beside the .md and adds
+    an image link; off by default so the Markdown stays self-contained.
+    """
+    caption = model._safe_str(getattr(block, "caption", "")).strip()
+    parts = [f"*Figura: {caption}*" if caption else "*Figura*"]
+    fig = None
+    try:
+        import matplotlib
+        matplotlib.use("Agg")  # defensive: headless rasterization backend.
+        fig = getattr(block, "fig", None)
+        make = getattr(block, "make", None)
+        if fig is None and callable(make):
+            fig = make()
+        if fig is not None:
+            bars = _extract_bars(fig)
+            if bars:
+                parts.append(_bars_table(bars))
+            if meta.get("embed_figures"):
+                png = _embed_png(fig, out_path, counter)
+                if png:
+                    parts.append(f"![{caption}]({png})")
+    except Exception:  # noqa: BLE001 — a bad figure degrades to just its caption.
+        pass
+    finally:
+        if fig is not None:
+            try:
+                import matplotlib.pyplot as plt
+                plt.close(fig)
+            except Exception:  # noqa: BLE001
+                pass
+    return "\n\n".join(parts)
+
+
+def _embed_png(fig, out_path: str, counter: list) -> str:
+    """Export the figure to ``<basename>_figN.png`` beside the .md; return its name."""
+    try:
+        counter[0] += 1
+        base = os.path.splitext(os.path.basename(out_path))[0] or "figura"
+        name = f"{base}_fig{counter[0]}.png"
+        path = os.path.join(os.path.dirname(os.path.abspath(out_path)), name)
+        fig.savefig(path, format="png", dpi=120, bbox_inches="tight")
+        return name
+    except Exception:  # noqa: BLE001
+        return ""
+
+
+def _md_image(block) -> str:
+    path = model._safe_str(getattr(block, "path", ""))
+    caption = model._safe_str(getattr(block, "caption", "")).strip()
+    out = f"![{caption}]({path})"
+    if caption:
+        out += f"\n\n*{caption}*"
+    return out
+
+
+def _md_caption(block) -> str:
+    return f"*{_clean_terms(getattr(block, 'text', '')).strip()}*"
+
+
+def _md_note(block) -> str:
+    text = _clean_terms(getattr(block, "text", "")).strip()
+    lines = text.split("\n")
+    return "\n".join((f"> {ln}" if ln.strip() else ">") for ln in lines)
+
+
+def _md_group(block, meta: dict, out_path: str, counter: list) -> str:
+    parts: list = []
+    title = getattr(block, "title", None)
+    if title:
+        parts.append(f"### {_clean_terms(title).strip()}")
+    for b in (getattr(block, "blocks", []) or []):
+        try:
+            seg = _serialize_block(b, meta, out_path, counter)
+        except Exception:  # noqa: BLE001
+            seg = ""
+        if seg:
+            parts.append(seg)
+    return "\n\n".join(parts)
+
+
+def _md_glossary_entry(block) -> str:
+    label = (model._safe_str(getattr(block, "label", "")).strip()
+             or model._safe_str(getattr(block, "key", "")).strip())
+    definition = _clean_terms(getattr(block, "definition", "")).strip()
+    out = f"### {label}"
+    if definition:
+        out += f"\n\n{definition}"
+    return out
+
+
+def _serialize_block(block, meta: dict, out_path: str, counter: list) -> str:
+    """Dispatch a single block to its Markdown serializer. Unknown -> note."""
+    kind = getattr(block, "kind", "")
+    if kind == "heading":
+        return _md_heading(block)
+    if kind == "markdown":
+        return _md_markdown(block)
+    if kind == "kv_table":
+        return _md_kv_table(block)
+    if kind == "data_table":
+        return _md_data_table(block)
+    if kind == "figure":
+        return _md_figure(block, meta, out_path, counter)
+    if kind == "image":
+        return _md_image(block)
+    if kind == "caption":
+        return _md_caption(block)
+    if kind == "note":
+        return _md_note(block)
+    if kind == "group":
+        return _md_group(block, meta, out_path, counter)
+    if kind == "glossary_entry":
+        return _md_glossary_entry(block)
+    # Unknown content -> readable note (mirrors the model's defensive coercion).
+    return _md_note(model.Note(text=model._safe_str(block)))
+
+
+# --------------------------------------------------------------------------- #
+# Entry point.
+# --------------------------------------------------------------------------- #
+def render_md(chapters: list, out_path: str, meta: dict = None) -> dict:
+    """Serialize a list of Chapters into a single self-contained Markdown file.
+
+    The output leads with ``# <title>``, a metadata blockquote and a numbered
+    ``## Índice`` linking each chapter, then one ``## N. <title>`` section per
+    chapter with its blocks. Tables become Markdown tables (every row dumped),
+    figures become caption + underlying data table, glossary markers are stripped
+    while ``**bold**`` is kept. Designed to be pasted into an LLM.
+
+    Args:
+        chapters: a list of ``Chapter`` (dataclasses or dicts); normalized
+            defensively with ``model.as_chapters``.
+        out_path: filesystem path for the ``.md`` (parent dirs are created).
+        meta: optional dict. Recognised keys: ``title``, ``ctx`` (dict with
+            ``dataset_name``/``source_origin``/``storage``/``n_rows``/``n_cols``),
+            ``generated_at``, ``embed_figures`` (export PNGs beside the .md,
+            default False).
+
+    Returns:
+        dict (never raises): ``{path: str|None, n_chars: int,
+        chapters: list[{id, version}], note: str}``. On a fatal error ``path`` is
+        None and ``note`` explains why.
+    """
+    meta = meta or {}
+    chapters = model.as_chapters(chapters)
+    title = model._safe_str(meta.get("title")) or model.ENGINE_NAME
+
+    # Edge: nothing to render -> a minimal but valid Markdown document.
+    if not chapters:
+        content = (f"# {title}\n\n"
+                   "*(documento vacío — sin capítulos aplicables)*\n")
+        return _write(out_path, content, [], "documento vacío")
+
+    counter = [0]  # document-wide figure counter for unique PNG names.
+    notes: list = []
+    segments: list = [f"# {title}"]
+
+    meta_lines = _meta_block(meta)
+    if meta_lines:
+        segments.append("\n".join(f"> {ln}" for ln in meta_lines))
+
+    # Numbered index. The anchor matches the chapter heading emitted below
+    # (``## N. <title>``) in GitHub slug style.
+    chap_heads = []
+    idx_lines = ["## Índice"]
+    for i, ch in enumerate(chapters, 1):
+        head_text = f"{i}. {model._safe_str(ch.title)}"
+        anchor = _slug(head_text)
+        chap_heads.append((head_text, anchor))
+        idx_lines.append(f"{i}. [{model._safe_str(ch.title)}](#{anchor})")
+    segments.append("\n".join(idx_lines))
+
+    chapters_meta = []
+    for i, ch in enumerate(chapters, 1):
+        segments.append("---")
+        head_text, _anchor = chap_heads[i - 1]
+        segments.append(f"## {head_text}")
+
+        blocks = list(ch.blocks or [])
+        # Omit a leading level-1 Heading that just repeats the chapter title.
+        if blocks:
+            b0 = blocks[0]
+            if (getattr(b0, "kind", "") == "heading"
+                    and int(getattr(b0, "level", 1) or 1) == 1
+                    and _clean_terms(getattr(b0, "text", "")).strip()
+                    == model._safe_str(ch.title).strip()):
+                blocks = blocks[1:]
+
+        for block in blocks:
+            try:
+                seg = _serialize_block(block, meta, out_path, counter)
+            except Exception as e:  # noqa: BLE001
+                seg = _md_note(model.Note(text=model._safe_str(block)))
+                notes.append(
+                    f"bloque '{getattr(block, 'kind', '?')}' del capítulo "
+                    f"'{ch.id}' degradado: {e}")
+            if seg:
+                segments.append(seg)
+        chapters_meta.append({"id": ch.id, "version": ch.version})
+
+    content = "\n\n".join(segments) + "\n"
+    note = f"{len(content)} caracteres"
+    if notes:
+        note += " · " + "; ".join(notes)
+    return _write(out_path, content, chapters_meta, note)
+
+
+def _write(out_path: str, content: str, chapters_meta: list, note: str) -> dict:
+    """Write the Markdown to disk (creating parents). dict-no-throw."""
+    try:
+        parent = os.path.dirname(os.path.abspath(out_path))
+        os.makedirs(parent, exist_ok=True)
+        with open(out_path, "w", encoding="utf-8") as fh:
+            fh.write(content)
+    except Exception as e:  # noqa: BLE001 — never raise from the writer.
+        return {"path": None, "n_chars": 0, "chapters": [],
+                "note": f"no se pudo escribir el Markdown: {e}"}
+    return {"path": out_path, "n_chars": len(content),
+            "chapters": chapters_meta, "note": note}
@@ -675,6 +675,61 @@ def _measure_figure_like(block) -> float:
    return target_h + 0.04 + cap_h + _GAP


+def _measure_kv_table(block) -> float:
+    """Faithful height of a KVTable — matches ``_place_kv_table``.
+
+    Counts the optional title heading and, per row, the wrapped VALUE column
+    (the label column never wraps in the placer). The previous estimate assumed
+    one line per row and ignored the title, so a column's keep-together Group
+    under-budgeted the figure and the chart spilled to the next page. Keep this in
+    sync with ``_place_kv_table``."""
+    h = 0.0
+    title = getattr(block, "title", None)
+    if title:
+        h += _measure_heading_text(title, 2)
+    rows = getattr(block, "rows", []) or []
+    key_w = 1.9
+    val_chars = tl.chars_per_line(_USABLE_W - key_w - 0.1, _FS_BODY)
+    lh = tl.line_height_in(_FS_BODY)
+    for row in rows:
+        try:
+            value = row[1]
+        except Exception:  # noqa: BLE001
+            value = ""
+        v_lines = tl.wrap(model._safe_str(value), val_chars)
+        h += lh * len(v_lines) + _ROW_VPAD
+    return h + _GAP
+
+
+def _measure_data_table(block) -> float:
+    """Faithful height of a DataTable — matches ``_place_data_table``.
+
+    Counts the optional title heading, the wrapped header row, every wrapped data
+    row (per-column wrap via the same ``_col_widths``/``_wrap_row`` the placer
+    uses) and the optional note. Keep this in sync with ``_place_data_table``."""
+    h = 0.0
+    title = getattr(block, "title", None)
+    if title:
+        h += _measure_heading_text(title, 2)
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    fs = _FS_CELL
+    widths = _col_widths(header, rows, fs)
+    lh = tl.line_height_in(fs)
+    if header:
+        header_lines = _wrap_row(header, widths, fs)
+        h += lh * max((len(c) for c in header_lines), default=1) + _ROW_VPAD * 2
+    for r in rows:
+        cells_lines = _wrap_row(r, widths, fs)
+        h += lh * max((len(c) for c in cells_lines), default=1) + _ROW_VPAD * 2
+    note = getattr(block, "note", None)
+    if note:
+        nlines = tl.wrap(model._safe_str(note),
+                         tl.chars_per_line(_USABLE_W, _FS_NOTE))
+        h += tl.line_height_in(_FS_NOTE) * len(nlines)
+    return h + _GAP
+
+
 def _measure_block(st: _PdfState, block) -> float:
    kind = getattr(block, "kind", "")
    try:
@@ -690,13 +745,9 @@ def _measure_block(st: _PdfState, block) -> float:
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + _GAP
        if kind == "kv_table":
-            rows = getattr(block, "rows", []) or []
-            return (tl.line_height_in(_FS_BODY) + _ROW_VPAD) * (len(rows) + 1) \
-                + _GAP
+            return _measure_kv_table(block)
        if kind == "data_table":
-            rows = getattr(block, "rows", []) or []
-            return (tl.line_height_in(_FS_CELL) + _ROW_VPAD * 2) \
-                * (len(rows) + 1) + _GAP
+            return _measure_data_table(block)
        if kind == "group":
            return sum(_measure_block(st, b)
                       for b in (getattr(block, "blocks", []) or []))
@@ -735,6 +786,10 @@ def _place_group(st: _PdfState, block) -> None:
    blocks = getattr(block, "blocks", []) or []
    if not blocks:
        return
+    # Opt-in page break: start this group on a fresh page unless the current one
+    # is still empty (so a chapter can give each unit its own page).
+    if getattr(block, "page_break_before", False) and st.y > _CONTENT_TOP + 1e-6:
+        _new_page(st)
    avail_full = _CONTENT_BOTTOM - _CONTENT_TOP
    _shrink_group_figures(st, blocks, avail_full)
    total = sum(_measure_block(st, b) for b in blocks)
@@ -625,6 +625,55 @@ def _measure_figure_like(block) -> float:
    return target_h + 0.05 + cap_h + _GAP


+def _measure_kv_table(block) -> float:
+    """Faithful KVTable height — matches ``_place_kv_table`` (rendered as a
+    Campo/Valor data table with wrapped cells). The previous estimate assumed one
+    line per row and ignored the title, so a keep-together Group under-budgeted
+    the figure and the chart spilled to the next slide. Keep in sync."""
+    h = 0.0
+    title = getattr(block, "title", None)
+    if title:
+        h += _measure_heading_text(title, 2)
+    rows = getattr(block, "rows", []) or []
+    data_rows = []
+    for row in rows:
+        try:
+            label, value = row[0], row[1]
+        except Exception:  # noqa: BLE001
+            label, value = str(row), ""
+        data_rows.append([model._safe_str(label), model._safe_str(value)])
+    header = ["Campo", "Valor"]
+    widths = _col_widths(header, data_rows)
+    fs = _FS_CELL
+    h += _row_height_in(header, widths, fs)
+    for r in data_rows:
+        h += _row_height_in(r, widths, fs)
+    return h + _GAP
+
+
+def _measure_data_table(block) -> float:
+    """Faithful DataTable height — matches ``_place_data_table`` (title heading +
+    wrapped header + every wrapped row + optional note). Keep in sync."""
+    h = 0.0
+    title = getattr(block, "title", None)
+    if title:
+        h += _measure_heading_text(title, 2)
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    fs = _FS_CELL
+    widths = _col_widths(header, rows)
+    if header:
+        h += _row_height_in(header, widths, fs)
+    for r in rows:
+        h += _row_height_in(r, widths, fs)
+    note = getattr(block, "note", None)
+    if note:
+        nlines = tl.wrap(model._safe_str(note),
+                         tl.chars_per_line(_USABLE_W, _FS_NOTE))
+        h += tl.line_height_in(_FS_NOTE) * len(nlines) + 0.05
+    return h + _GAP
+
+
 def _measure_block(st: _PptxState, block) -> float:
    kind = getattr(block, "kind", "")
    try:
@@ -639,9 +688,10 @@ def _measure_block(st: _PptxState, block) -> float:
            lines = tl.wrap(getattr(block, "text", ""),
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + 0.05 + _GAP
-        if kind in ("kv_table", "data_table"):
-            rows = getattr(block, "rows", []) or []
-            return (tl.line_height_in(_FS_CELL) + 0.10) * (len(rows) + 1) + _GAP
+        if kind == "kv_table":
+            return _measure_kv_table(block)
+        if kind == "data_table":
+            return _measure_data_table(block)
        if kind == "group":
            return sum(_measure_block(st, b)
                       for b in (getattr(block, "blocks", []) or []))
@@ -664,10 +714,14 @@ def _shrink_group_figures(st: _PptxState, blocks: list, avail_full: float) -> No
                   if getattr(b, "kind", "") not in ("figure", "image"))
    fig_overhead = tl.line_height_in(_FS_NOTE) + 0.05 + 0.05 + _GAP
    budget = avail_full - nonfig_h - 0.10 * len(fig_blocks)
-    if budget <= 1.0:
+    # Low thresholds: a 16:9 slide is short, so a content-heavy column (cardinality
+    # table + top-k + chart) only fits if the chart is allowed to shrink small.
+    # Prefer a small-but-present chart on the SAME slide over splitting the column
+    # across slides (matches the PDF renderer's keep-together philosophy).
+    if budget <= 0.6:
        return  # not enough room to keep together; let it flow (degrade).
    per = budget / len(fig_blocks) - fig_overhead
-    if per <= 0.8:
+    if per <= 0.35:
        return
    for fb in fig_blocks:
        cur = getattr(fb, "height_in", None)
@@ -675,12 +729,90 @@ def _shrink_group_figures(st: _PptxState, blocks: list, avail_full: float) -> No
                        if isinstance(cur, (int, float)) and cur > 0 else per)


+# Minimum height (inches) reserved for a figure inside a keep-together group on
+# the short 16:9 slide. When a high-cardinality column's table(s) would otherwise
+# leave no room, the data table is trimmed (with an honest note) so the chart
+# stays on the SAME slide next to its table instead of spilling to the next one.
+_GROUP_MIN_FIG_H = 1.3
+
+
+def _trim_data_table_to_budget(block, budget: float):
+    """Return a copy of a DataTable whose rows fit within ``budget`` inches.
+
+    Keeps the title, header, as many leading rows as fit (at least one) and an
+    honest note reporting how many of the original rows are shown. NEVER mutates
+    the original block — the same Chapter blocks are rendered by the PDF renderer,
+    which keeps the full table (an A5 page fits it)."""
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    title = getattr(block, "title", None)
+    fs = _FS_CELL
+    widths = _col_widths(header, rows)
+    fixed = 0.0
+    if title:
+        fixed += _measure_heading_text(title, 2)
+    if header:
+        fixed += _row_height_in(header, widths, fs)
+    note_h = tl.line_height_in(_FS_NOTE) + 0.05
+    avail_rows = budget - fixed - note_h - _GAP
+    kept = []
+    used = 0.0
+    for r in rows:
+        rh = _row_height_in(r, widths, fs)
+        if used + rh > avail_rows and kept:
+            break
+        kept.append(r)
+        used += rh
+    if len(kept) >= len(rows):
+        return block  # already fits; keep the original (with its own note).
+    note = (f"top {len(kept)} de {len(rows)} categorías mostradas "
+            "(recortado para caber en el slide; el PDF muestra más)")
+    return model.DataTable(header=header, rows=kept, title=title, note=note)
+
+
+def _fit_group_blocks(st: _PptxState, blocks: list, avail_full: float) -> list:
+    """Return a slide-fitting copy of a keep-together group's blocks.
+
+    On the short 16:9 slide a high-cardinality column's top-k table plus its
+    chart can overflow. Reserve ``_GROUP_MIN_FIG_H`` for the (later shrunk) figure
+    and trim the data table(s) to what is left, so every column keeps its chart
+    next to its table on ONE slide. No-op when the group has no figure+table pair
+    (e.g. id-like columns already drop the top-k upstream, or it already fits)."""
+    has_fig = any(getattr(b, "kind", "") in ("figure", "image") for b in blocks)
+    tbls = [b for b in blocks if getattr(b, "kind", "") == "data_table"]
+    if not (has_fig and tbls):
+        return blocks
+    fixed_h = sum(_measure_block(st, b) for b in blocks
+                  if getattr(b, "kind", "") not in ("figure", "image",
+                                                    "data_table"))
+    tables_h = sum(_measure_block(st, b) for b in tbls)
+    budget_tables = avail_full - fixed_h - _GROUP_MIN_FIG_H
+    if tables_h <= budget_tables:
+        return blocks  # already fits next to a min-height figure; leave intact.
+    out = []
+    for b in blocks:
+        if getattr(b, "kind", "") != "data_table":
+            out.append(b)
+            continue
+        trimmed = _trim_data_table_to_budget(b, max(budget_tables, 0.8))
+        out.append(trimmed)
+        budget_tables -= _measure_data_table(trimmed)
+    return out
+
+
 def _place_group(st: _PptxState, block) -> None:
    """Render a keep-together Group: move it whole to the next slide if needed."""
    blocks = getattr(block, "blocks", []) or []
    if not blocks:
        return
+    # Opt-in slide break: start this group on a fresh slide unless the current one
+    # is still empty (so a chapter can give each unit its own slide).
+    if getattr(block, "page_break_before", False) and st.y > _CONTENT_TOP + 1e-6:
+        _new_slide(st, cont=True)
    avail_full = _CONTENT_BOTTOM - _CONTENT_TOP
+    # Trim oversized tables first (keeps the chart on the same slide), then shrink
+    # the figure to share the remaining room.
+    blocks = _fit_group_blocks(st, blocks, avail_full)
    _shrink_group_figures(st, blocks, avail_full)
    total = sum(_measure_block(st, b) for b in blocks)
    if total <= avail_full:
@@ -0,0 +1,103 @@
+---
+id: draw_join_graph_figure_py_datascience
+name: draw_join_graph_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def draw_join_graph_figure(join_graph: dict, title: str = None) -> \"matplotlib.figure.Figure\""
+description: "Rasteriza el join graph de una base (relaciones FK inter-tabla, salida de build_join_graph) a un matplotlib.figure.Figure: nodos circulares con el nombre de cada tabla (hubs en color de acento cálido, el resto neutro) y aristas dirigidas etiquetadas from_col→to_col (más la cardinalidad si viene). Es la contrapartida dibujada del string Mermaid para que el capítulo de relaciones del informe AutomaticEDA muestre un diagrama real. Layout networkx spring_layout determinista (seed=42), backend Agg sin abrir ventanas; defensivo: nunca lanza y nunca hace I/O."
+tags: [eda, plot, relations, graph, matplotlib, figure, networkx, datascience, impure]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib, networkx]
+example: |
+  from draw_join_graph_figure import draw_join_graph_figure
+  join_graph = {
+      "nodes": [
+          {"table": "customers", "out_degree": 0, "in_degree": 1, "role": "dimension"},
+          {"table": "orders", "out_degree": 1, "in_degree": 0, "role": "fact"},
+      ],
+      "edges": [
+          {"from_table": "orders", "from_col": "customer_id",
+           "to_table": "customers", "to_col": "id", "cardinality": "N:1"},
+      ],
+      "hubs": ["orders"],
+  }
+  fig = draw_join_graph_figure(join_graph, title="Relaciones FK")
+  fig.savefig("/tmp/join_graph.png")
+tested: true
+tests:
+  - "test_returns_figure_with_axis"
+  - "test_savefig_produces_nonempty_png"
+  - "test_empty_dict_does_not_raise_and_savefig_png"
+  - "test_none_does_not_raise_and_savefig_png"
+test_file_path: "python/functions/datascience/draw_join_graph_figure_test.py"
+file_path: "python/functions/datascience/draw_join_graph_figure.py"
+params:
+  - name: join_graph
+    desc: "Dict producido por build_join_graph. Claves: `nodes` (list[dict] con table, out_degree, in_degree, role), `edges` (list[dict] con from_table, from_col, to_table, to_col y opcional cardinality/inclusion) y `hubs` (list[str] de tablas hub a destacar en color cálido). Claves ausentes, items no-dict, None o {} se toleran (devuelve Figure con texto, sin lanzar). Los nombres de nodo se derivan también de las aristas, así que un grafo con edges pero sin nodes explícitos igual se dibuja."
+  - name: title
+    desc: "Título dibujado sobre el diagrama. Si se omite (None) se usa \"Join graph\". Default None."
+output: "Un matplotlib.figure.Figure (figsize 7x5) con un único Axes que contiene el diagrama node-link dirigido: tablas como nodos circulares etiquetados (hubs en acento cálido #DD8452, resto en azul neutro #4C72B0) y FKs como flechas dirigidas con etiqueta from_col→to_col (+ cardinalidad). Si join_graph no tiene nodos ni aristas (o es None/{}), devuelve igualmente una Figure con el texto centrado \"Sin relaciones FK detectadas.\"; ante cualquier fallo interno devuelve una Figure con un mensaje genérico (nunca lanza). El caller rasteriza/cierra la figura; la función no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+from draw_join_graph_figure import draw_join_graph_figure
+
+# `join_graph` es la salida de build_join_graph (nodes + edges + hubs).
+join_graph = {
+    "nodes": [
+        {"table": "customers", "out_degree": 0, "in_degree": 1, "role": "dimension"},
+        {"table": "orders", "out_degree": 2, "in_degree": 0, "role": "fact"},
+        {"table": "products", "out_degree": 0, "in_degree": 1, "role": "dimension"},
+    ],
+    "edges": [
+        {"from_table": "orders", "from_col": "customer_id",
+         "to_table": "customers", "to_col": "id", "cardinality": "N:1"},
+        {"from_table": "orders", "from_col": "product_id",
+         "to_table": "products", "to_col": "id", "cardinality": "N:1"},
+    ],
+    "hubs": ["orders"],  # `orders` se pinta en color de acento (tabla de hechos)
+}
+
+fig = draw_join_graph_figure(join_graph, title="Relaciones FK")
+
+# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
+fig.savefig("/tmp/join_graph.png")
+```
+
+## Cuando usarla
+
+Úsala en el capítulo de relaciones de un informe AutomaticEDA cuando quieras un
+diagrama **dibujado** del esquema relacional, no solo el bloque Mermaid pegable.
+Pásale directamente la salida de `build_join_graph` (`nodes` + `edges` + `hubs`)
+y obtienes una `matplotlib.figure.Figure` lista para que el renderer perezoso la
+rasterice. Es la pareja visual del string Mermaid: Mermaid sirve para pegar en
+Markdown/docs que lo soporten; esta función produce la imagen real (PNG/PDF) que
+va embebida en informes que no renderizan Mermaid.
+
+## Gotchas
+
+- **Impura por matplotlib.** Fija el backend `Agg` al importar — no abre
+  ventanas ni depende de un display. Segura de llamar en lotes desde el
+  renderer.
+- **Layout determinista (`seed=42`).** Usa `nx.spring_layout(G, seed=42)`, así
+  que la misma entrada produce el mismo diagrama (test reproducible). Para
+  grafos de 0/1 nodos usa una posición fija centrada en vez del spring layout.
+- **No hace I/O.** No llama `plt.show()` ni guarda a disco — solo devuelve la
+  `Figure`. Quien la consume la rasteriza y la libera (`plt.close(fig)`) para no
+  acumular memoria en informes con muchas tablas.
+- **Devuelve una Figure, NO un dict.** A diferencia de `build_join_graph` (que
+  devuelve el dict del grafo), esta función devuelve el objeto de figura ya
+  dibujado.
+- **Defensiva, nunca lanza.** `None`, `{}`, claves ausentes o items malformados
+  se manejan sin error: en el peor caso devuelve una `Figure` con
+  "Sin relaciones FK detectadas." (vacío) o un mensaje genérico (fallo interno).
+  No la envuelvas en try/except por miedo a un raise — no lo hay.
@@ -0,0 +1,214 @@
+"""Impure EDA helper: rasterize a join graph to a matplotlib Figure (`eda` group).
+
+Takes the join graph produced by ``build_join_graph`` (inter-table FK relations)
+and draws it as a directed node-link diagram on a ready-to-rasterize
+``matplotlib.figure.Figure``. Hub tables (the ones with the highest out-degree,
+candidate fact tables of a star schema) are highlighted in a warm accent colour;
+the rest use a neutral colour. Directed edges carry a ``from_col→to_col`` label
+(plus the cardinality when present).
+
+This is the *drawn* counterpart of the Mermaid string that ``build_join_graph``
+also emits: the relations chapter of an AutomaticEDA report can show a real
+picture instead of only the pasteable Mermaid block.
+
+Impure because it touches matplotlib's rendering machinery. It pins the headless
+Agg backend and a deterministic ``spring_layout`` seed so the output is
+reproducible. It never raises: on any internal failure (or empty input) it
+returns a ``Figure`` carrying a centered message, so the lazy render of the
+document is never broken.
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+import networkx as nx  # noqa: E402
+
+# Warm accent reserved for hub tables (candidate fact tables / star-schema cores).
+_HUB_COLOR = "#DD8452"
+# Neutral blue for every other table.
+_NODE_COLOR = "#4C72B0"
+# Muted gray for the empty/error message text.
+_MUTED_TEXT = "#5f6b7a"
+# Edge colour and label colour.
+_EDGE_COLOR = "#7a7a7a"
+_EDGE_LABEL_COLOR = "#34495e"
+# Constant node size; shared with the edge drawing so arrowheads stop at the
+# node boundary instead of being hidden under the marker.
+_NODE_SIZE = 2200
+
+
+def _text_figure(message: str) -> "matplotlib.figure.Figure":
+    """Return a blank Figure carrying a single centered message.
+
+    Used both for the "no relations" case and as the never-raise fallback.
+    """
+    fig, ax = plt.subplots(figsize=(7, 5))
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=12,
+        color=_MUTED_TEXT,
+        transform=ax.transAxes,
+    )
+    fig.tight_layout()
+    return fig
+
+
+def _edge_label(edge: dict) -> str:
+    """Build the ``from_col→to_col`` label of an edge, appending cardinality."""
+    fc = edge.get("from_col")
+    tc = edge.get("to_col")
+    if fc is not None and tc is not None:
+        label = f"{fc}→{tc}"
+    elif fc is not None:
+        label = str(fc)
+    elif tc is not None:
+        label = str(tc)
+    else:
+        label = ""
+    card = edge.get("cardinality")
+    if card:
+        label = f"{label} ({card})" if label else str(card)
+    return label
+
+
+def draw_join_graph_figure(join_graph: dict, title: str = None):
+    """Rasterize a join graph to a matplotlib Figure.
+
+    Builds a ``networkx.DiGraph`` from the graph's nodes and edges, lays it out
+    with a deterministic ``spring_layout`` (``seed=42``) and draws it on a
+    ``matplotlib.figure.Figure``: tables as labelled circular nodes (hubs in a
+    warm accent, the rest neutral) and FK relations as directed arrows labelled
+    ``from_col→to_col`` (plus cardinality when available).
+
+    The function never raises. On empty/``None`` input it returns a Figure with
+    a centered "Sin relaciones FK detectadas." message; on any internal failure
+    it returns a Figure with a generic centered message. It never shows the
+    figure nor writes it to disk — the document renderer rasterizes it.
+
+    Args:
+        join_graph: Dict produced by ``build_join_graph`` with keys ``nodes``
+            (list of ``{table, out_degree, in_degree, role}``), ``edges`` (list
+            of ``{from_table, from_col, to_table, to_col, cardinality?,
+            inclusion?}``) and ``hubs`` (list of hub table names to highlight).
+            Missing keys, non-dict items, ``None`` or ``{}`` are all tolerated.
+        title: Optional title drawn above the diagram. When omitted, the title
+            defaults to "Join graph".
+
+    Returns:
+        A ``matplotlib.figure.Figure`` (figsize 7x5) with a single Axes holding
+        the node-link diagram. The caller rasterizes/closes it.
+    """
+    try:
+        jg = join_graph if isinstance(join_graph, dict) else {}
+        nodes = jg.get("nodes") or []
+        edges = jg.get("edges") or []
+        hubs = {h for h in (jg.get("hubs") or []) if h is not None}
+
+        # Collect node names from the declared nodes and, defensively, from the
+        # edges (so a graph with edges but no explicit nodes still draws).
+        node_names: list = []
+        seen: set = set()
+
+        def _register(name) -> None:
+            if name is not None and name not in seen:
+                seen.add(name)
+                node_names.append(name)
+
+        for n in nodes:
+            if isinstance(n, dict):
+                _register(n.get("table"))
+        for e in edges:
+            if isinstance(e, dict):
+                _register(e.get("from_table"))
+                _register(e.get("to_table"))
+
+        if not node_names:
+            return _text_figure("Sin relaciones FK detectadas.")
+
+        graph = nx.DiGraph()
+        for name in node_names:
+            graph.add_node(name)
+
+        edge_labels: dict = {}
+        for e in edges:
+            if not isinstance(e, dict):
+                continue
+            ft = e.get("from_table")
+            tt = e.get("to_table")
+            if ft is None or tt is None:
+                continue
+            graph.add_edge(ft, tt)
+            edge_labels[(ft, tt)] = _edge_label(e)
+
+        fig, ax = plt.subplots(figsize=(7, 5))
+
+        # Deterministic layout. Fixed positions for trivial graphs so a single
+        # node sits centered instead of at an arbitrary spring-layout point.
+        if graph.number_of_nodes() <= 1:
+            pos = {name: (0.5, 0.5) for name in graph.nodes()}
+        else:
+            pos = nx.spring_layout(graph, seed=42)
+
+        node_colors = [
+            _HUB_COLOR if name in hubs else _NODE_COLOR for name in graph.nodes()
+        ]
+        nx.draw_networkx_nodes(
+            graph,
+            pos,
+            ax=ax,
+            node_color=node_colors,
+            node_size=_NODE_SIZE,
+            node_shape="o",
+            edgecolors="white",
+            linewidths=1.5,
+        )
+        nx.draw_networkx_labels(
+            graph,
+            pos,
+            ax=ax,
+            font_size=9,
+            font_color="white",
+            font_weight="bold",
+        )
+        nx.draw_networkx_edges(
+            graph,
+            pos,
+            ax=ax,
+            arrows=True,
+            arrowstyle="-|>",
+            arrowsize=18,
+            edge_color=_EDGE_COLOR,
+            width=1.4,
+            connectionstyle="arc3,rad=0.06",
+            node_size=_NODE_SIZE,
+        )
+        if any(lbl for lbl in edge_labels.values()):
+            nx.draw_networkx_edge_labels(
+                graph,
+                pos,
+                edge_labels=edge_labels,
+                ax=ax,
+                font_size=7,
+                font_color=_EDGE_LABEL_COLOR,
+                bbox={
+                    "boxstyle": "round,pad=0.2",
+                    "fc": "white",
+                    "ec": "none",
+                    "alpha": 0.7,
+                },
+            )
+
+        ax.set_title(title if title else "Join graph", fontsize=13)
+        ax.axis("off")
+        fig.tight_layout()
+        return fig
+    except Exception:
+        # Never raise — the document render is lazy and must not be broken.
+        return _text_figure("No se pudo dibujar el join graph.")
@@ -0,0 +1,84 @@
+"""Tests para draw_join_graph_figure (rasteriza el join graph, grupo eda).
+
+Usa el backend Agg sin abrir ventanas; cada test cierra la Figure construida
+(matplotlib.pyplot.close) para no acumular estado entre tests. Las aserciones de
+guardado escriben a tmp_path (fixture de pytest) y comprueban que el PNG no está
+vacío.
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from draw_join_graph_figure import draw_join_graph_figure
+
+
+def _make_join_graph():
+    """Join graph mínimo: 3 nodos (customers/orders/products) y 2 aristas.
+
+    orders -> customers y orders -> products. `orders` es el hub (out_degree 2).
+    """
+    return {
+        "nodes": [
+            {"table": "customers", "out_degree": 0, "in_degree": 1, "role": "dimension"},
+            {"table": "orders", "out_degree": 2, "in_degree": 0, "role": "fact"},
+            {"table": "products", "out_degree": 0, "in_degree": 1, "role": "dimension"},
+        ],
+        "edges": [
+            {
+                "from_table": "orders",
+                "from_col": "customer_id",
+                "to_table": "customers",
+                "to_col": "id",
+                "cardinality": "N:1",
+                "inclusion": 1.0,
+            },
+            {
+                "from_table": "orders",
+                "from_col": "product_id",
+                "to_table": "products",
+                "to_col": "id",
+                "cardinality": "N:1",
+                "inclusion": 0.98,
+            },
+        ],
+        "hubs": ["orders"],
+    }
+
+
+def test_returns_figure_with_axis():
+    fig = draw_join_graph_figure(_make_join_graph(), title="Relaciones FK")
+    assert isinstance(fig, Figure)
+    # Al menos un eje con el diagrama.
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_savefig_produces_nonempty_png(tmp_path):
+    fig = draw_join_graph_figure(_make_join_graph())
+    out = tmp_path / "g.png"
+    fig.savefig(out)
+    assert out.exists()
+    assert out.stat().st_size > 0
+    plt.close(fig)
+
+
+def test_empty_dict_does_not_raise_and_savefig_png(tmp_path):
+    fig = draw_join_graph_figure({})
+    assert isinstance(fig, Figure)
+    out = tmp_path / "empty.png"
+    fig.savefig(out)
+    assert out.stat().st_size > 0
+    plt.close(fig)
+
+
+def test_none_does_not_raise_and_savefig_png(tmp_path):
+    fig = draw_join_graph_figure(None)
+    assert isinstance(fig, Figure)
+    out = tmp_path / "none.png"
+    fig.savefig(out)
+    assert out.stat().st_size > 0
+    plt.close(fig)
@@ -0,0 +1,89 @@
+---
+name: render_automatic_eda_markdown
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def render_automatic_eda_markdown(chapters_or_profile, out_path: str, meta: dict = None) -> dict"
+description: "Renderiza un documento AutomaticEDA por CAPÍTULOS (modelo de bloques independiente del formato) en un único MARKDOWN autocontenido pensado para PEGAR A UN LLM. Acepta una lista de capítulos del modelo o directamente un TableProfile del grupo eda (construye los capítulos canónicos con build_document). Prioriza TEXTO + DATOS sobre lo visual: las tablas se vuelcan como tablas markdown con TODAS las filas (sin paginar — no hay páginas que cortar), una figura matplotlib se reduce a su caption más la tabla de datos subyacente (Desde/Hasta/Frecuencia de las barras del histograma) porque un LLM no ve la imagen, y los marcadores de glosario se eliminan conservando el **negrita**. Lleva cabecera (# título), bloque de metadatos en blockquote e índice numerado con anclas GitHub. Espejo de render_automatic_eda_pdf/render_automatic_eda_pptx pero SIN manifest (KISS, el markdown es un único artefacto de texto). dict-no-throw: nunca lanza, devuelve {path, n_chars, chapters, note}; en error fatal path es None y note explica la causa. Flag opcional meta['embed_figures'] exporta PNGs junto al .md (off por defecto)."
+tags: [eda, markdown, render, report, llm, automatic-eda, chapters, versioned, no-cut, text, datascience, python]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [os, re, matplotlib, "datascience.automatic_eda"]
+params:
+  - name: chapters_or_profile
+    desc: "una lista de capítulos del modelo AutomaticEDA (dataclasses Chapter o dicts {id,title,version,blocks}) O un TableProfile dict del grupo eda. Si es un TableProfile, los capítulos canónicos se construyen con build_document(profile, meta['ctx']). Bloques soportados: heading, markdown, kv_table, data_table, figure, image, caption, note, group, glossary_entry. Lectura defensiva: lo no reconocido se degrada a Note, nunca lanza."
+  - name: out_path
+    desc: "ruta del archivo .md de salida. Los directorios padre se crean si faltan. Directorio no escribible → {path:None, note:<causa>} sin lanzar."
+  - name: meta
+    desc: "dict opcional. Claves: title (título del documento), ctx (dict con dataset_name→Dataset, source_origin→Fuente, storage→Almacenamiento, n_rows/n_cols→Dimensiones; también lo consumen los builders de capítulo cuando se da un profile), generated_at (timestamp; si falta se genera ISO UTC), embed_figures (True para exportar PNGs <basename>_figN.png junto al .md; por defecto False y el markdown queda autocontenido)."
+output: "dict (nunca lanza): {path: str|None, n_chars: int, chapters: list[{id,version}], note: str}. En error fatal (p.ej. directorio no escribible) path es None y note explica la causa. Un documento sin capítulos aplicables produce un markdown mínimo válido con 'documento vacío' y chapters=[]."
+tested: true
+tests: ["test_golden_bloques_sinteticos_serializa_todo_a_markdown", "test_edge_documento_vacio_no_revienta", "test_profile_path_construye_capitulos_y_escribe"]
+test_file_path: "python/functions/datascience/render_automatic_eda_markdown_test.py"
+file_path: "python/functions/datascience/render_automatic_eda_markdown.py"
+---
+
+## Ejemplo
+
+```python
+from datascience import render_automatic_eda_markdown
+
+# Desde un TableProfile del grupo eda (mismo modelo que los renderers PDF/PPTX).
+profile = {
+    "table": "ventas", "source": "/data/ventas.csv",
+    "n_rows": 1000, "n_cols": 2, "quality_score": 92.5,
+    "columns": [
+        {"name": "precio", "inferred_type": "numeric", "null_pct": 0.01,
+         "numeric": {"mean": 42.5, "median": 40.0, "min": 1.0, "max": 100.0,
+                     "std": 12.3}},
+        {"name": "categoria", "inferred_type": "categorical", "null_pct": 0.0,
+         "categorical": {"top": [{"value": "neumaticos", "count": 500}]}},
+    ],
+}
+res = render_automatic_eda_markdown(
+    profile, "reports/ventas_aeda.md",
+    {"title": "EDA — ventas",
+     "ctx": {"dataset_name": "Ventas", "source_origin": "ERP export",
+             "n_rows": 1000, "n_cols": 2}})
+print(res["path"], res["n_chars"], res["chapters"])
+# -> reports/ventas_aeda.md 4123 [{'id':'portada','version':'1.0.0'}, ...]
+```
+
+## Cuando usarla
+
+Cuando quieras **pegar el EDA a un LLM** (ChatGPT, Claude, ...) o tenerlo en texto
+plano versionable: mismo documento por capítulos que el PDF/PPTX, pero serializado a
+Markdown sin binarios. Úsala como tercera salida junto a `render_automatic_eda_pdf`
+(móvil) y `render_automatic_eda_pptx` (compartir) desde el MISMO modelo de capítulos.
+A diferencia de esas dos, no hay páginas ni slides: todas las filas de cada tabla se
+vuelcan (nada se corta) y cada figura se reduce a su caption + la tabla de datos
+subyacente, que es lo que un LLM puede leer. Para añadir capítulos al documento, ver
+`docs/capabilities/automatic_eda.md`.
+
+## Gotchas
+
+- **Impura**: escribe el `.md` en `out_path` (crea los directorios padre). Con
+  `meta['embed_figures']=True` además exporta un PNG `<basename>_figN.png` por figura
+  junto al `.md`; por defecto NO exporta nada y el markdown queda autocontenido.
+- **Nunca lanza** (dict-no-throw): un bloque que falle se degrada a una nota y se anota
+  en `note`; el documento se escribe igual. Un profile/lista vacíos producen un markdown
+  mínimo válido con `*(documento vacío …)*` y `chapters=[]`.
+- **Figuras = datos, no imagen**: un bloque `figure` se serializa como `*Figura: caption*`
+  más, si la figura matplotlib trae barras (histograma / barras), una tabla
+  `| Desde | Hasta | Frecuencia |` extraída de los `Rectangle` patches (máx 100 filas;
+  el resto se trunca con `*… (N filas más)*`). Si no hay barras o algo falla, solo sale
+  el caption. La figura se cierra (`plt.close`) tras leerla.
+- **Glosario vs negrita**: se eliminan SOLO los marcadores de glosario
+  `[[term:key]]visible[[/term]]` (queda `visible`); el `**negrita**` markdown SE
+  CONSERVA (es válido). No se usa `strip_inline_md` aquí porque ese también quita el bold.
+- **Anclas del índice**: el `## Índice` enlaza cada capítulo con un ancla estilo GitHub
+  del encabezado `## N. Título` (minúsculas, espacios→`-`, sin signos). Si dos capítulos
+  comparten título exacto sus anclas colisionan (caso raro; los capítulos canónicos tienen
+  títulos únicos).
+- **Tablas**: las celdas escapan `|` (→ `\|`) y pliegan saltos de línea a `<br>` para no
+  romper la columna. No hay reparto por ancho — un LLM no lo necesita.
@@ -0,0 +1,55 @@
+"""render_automatic_eda_markdown — chapter-based EDA report as one Markdown file.
+
+Public ``eda``-group entry point that serializes an AutomaticEDA document (a list
+of chapters, or an ``eda`` TableProfile from which the canonical chapters are
+built) into a single self-contained Markdown file optimised to be **pasted into
+an LLM**: plain text, Markdown tables (every row dumped — there are no pages to
+cut), figures reduced to caption + underlying data, no binaries. It mirrors
+``render_automatic_eda_pdf`` / ``render_automatic_eda_pptx`` but for text output;
+unlike those it writes no manifest (KISS — Markdown is a single text artefact).
+
+dict-no-throw: never raises. Returns ``{path, n_chars, chapters, note}``; on a
+fatal error ``path`` is None and ``note`` explains why.
+"""
+
+from __future__ import annotations
+
+from datascience.automatic_eda import build_document, render_md
+from datascience.automatic_eda.model import as_chapter, as_chapters
+
+
+def _coerce_chapters(chapters_or_profile, meta: dict) -> list:
+    """Accept chapters OR an eda profile and return a list of Chapter."""
+    arg = chapters_or_profile
+    if isinstance(arg, (list, tuple)):
+        return as_chapters(list(arg))
+    if isinstance(arg, dict):
+        if "blocks" in arg and "columns" not in arg:
+            ch = as_chapter(arg)
+            return [ch] if ch is not None else []
+        return build_document(arg, (meta or {}).get("ctx"))
+    return []
+
+
+def render_automatic_eda_markdown(chapters_or_profile, out_path: str,
+                                  meta: dict = None) -> dict:
+    """Render an AutomaticEDA document into a single self-contained Markdown file.
+
+    Args:
+        chapters_or_profile: a list of chapters (``Chapter`` dataclasses or
+            dicts) or an ``eda`` TableProfile dict (chapters built via
+            ``build_document(profile, meta['ctx'])``).
+        out_path: filesystem path for the ``.md`` (parent dirs are created).
+        meta: optional dict. Recognised keys: ``title``, ``ctx`` (dict with
+            ``dataset_name``/``source_origin``/``storage``/``n_rows``/``n_cols``),
+            ``generated_at``, ``embed_figures`` (export PNGs beside the .md,
+            default False — off keeps the Markdown self-contained).
+
+    Returns:
+        dict (never raises): ``{path: str|None, n_chars: int,
+        chapters: list[{id, version}], note: str}``. On a fatal error ``path`` is
+        None and ``note`` explains the cause.
+    """
+    meta = dict(meta or {})
+    chapters = _coerce_chapters(chapters_or_profile, meta)
+    return render_md(chapters, out_path, meta)
@@ -0,0 +1,168 @@
+"""Tests for render_automatic_eda_markdown — DoD: golden + edge + profile path.
+
+Self-contained synthetic blocks (no DuckDB). Verifies every block kind serializes
+to Markdown (heading, markdown with glossary+bold, kv/data tables, a figure whose
+histogram bars become a data table, caption, note, group, glossary entry), that a
+leading level-1 heading equal to the chapter title is omitted, that an empty
+document degrades to a valid minimal Markdown without raising, and that passing a
+minimal TableProfile builds chapters and writes the file.
+"""
+
+import os
+import tempfile
+
+from datascience.render_automatic_eda_markdown import render_automatic_eda_markdown
+from datascience.automatic_eda.model import (
+    Caption, Chapter, DataTable, Figure, GlossaryEntry, Group, Heading, KVTable,
+    Markdown, Note,
+)
+
+
+def _hist_fig():
+    import matplotlib
+    matplotlib.use("Agg")
+    import matplotlib.pyplot as plt
+    fig, ax = plt.subplots()
+    ax.hist([1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5, 5], bins=5)
+    return fig
+
+
+def _chapters() -> list:
+    blocks = [
+        Heading("Demo", 1),                       # == chapter title -> omitted.
+        Heading("Seccion dos", 2),                # -> ####
+        Markdown("Texto con [[term:ent]]entropia[[/term]] y **bold** aqui."),
+        KVTable(rows=[("Filas", 1000), ("Columnas", 5)], title="Resumen"),
+        DataTable(header=["col", "valor"],
+                  rows=[["alpha", "111"], ["beta", "222"], ["gamma", "333"]],
+                  title="Datos", note="nota inferior"),
+        Figure(make=_hist_fig, caption="Histograma demo"),
+        Caption("pie de figura"),
+        Note("una nota aparte"),
+        Group(title="Grupo X", blocks=[Markdown("dentro del grupo")]),
+        GlossaryEntry(key="ent", label="Entropia",
+                      definition="Medida de incertidumbre."),
+    ]
+    return [Chapter(id="demo", title="Demo", version="1.0.0", blocks=blocks)]
+
+
+def _read(path: str) -> str:
+    with open(path, "r", encoding="utf-8") as fh:
+        return fh.read()
+
+
+def test_golden_bloques_sinteticos_serializa_todo_a_markdown():
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "demo.md")
+        res = render_automatic_eda_markdown(
+            _chapters(), out,
+            {"title": "EDA Demo",
+             "ctx": {"dataset_name": "Demo", "n_rows": 12, "n_cols": 2}})
+        assert res["path"] == out
+        assert os.path.exists(out)
+        assert res["n_chars"] > 0
+        assert res["chapters"] == [{"id": "demo", "version": "1.0.0"}]
+
+        content = _read(out)
+        # Document structure.
+        assert content.startswith("# ")
+        assert "## Índice" in content
+        # A Markdown table is present (header + separator row).
+        assert "| " in content and "| --- " in content
+        # DataTable values are all dumped.
+        for v in ("alpha", "111", "beta", "222", "gamma", "333"):
+            assert v in content
+        # Glossary markers stripped, bold kept.
+        assert "[[term" not in content
+        assert "[[/term]]" not in content
+        assert "**bold**" in content
+        assert "entropia" in content  # visible glossary text preserved.
+        # Figure histogram bars became a data table.
+        assert "| Desde | Hasta | Frecuencia |" in content
+        # Glossary entry rendered as a level-3 heading.
+        assert "### Entropia" in content
+        # Level-2 heading -> ####.
+        assert "#### Seccion dos" in content
+        # Leading level-1 heading equal to the title was omitted.
+        assert "### Demo" not in content
+        # Group title rendered.
+        assert "### Grupo X" in content
+
+
+def _hist_fig_with_span():
+    """Histogram with a wide ``axvspan`` (±1σ band) over it.
+
+    Reproduces the num_distr figure shape: matplotlib keeps the span as a lone
+    Rectangle in ``ax.patches`` alongside the bin bars; it must NOT leak into the
+    extracted bins table as a fake bin (it is ~5x wider than a bin)."""
+    import matplotlib
+    matplotlib.use("Agg")
+    import matplotlib.pyplot as plt
+    fig, ax = plt.subplots()
+    data = [1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5, 5]
+    ax.hist(data, bins=5)
+    ax.axvspan(2.0, 4.0, alpha=0.2)   # mean±σ band — a wide stray rectangle.
+    return fig
+
+
+def test_figura_descarta_axvspan_de_la_tabla_de_bins():
+    """The ±1σ band rectangle must not appear as a row in the bins table."""
+    blocks = [Figure(make=_hist_fig_with_span, caption="Hist con banda")]
+    chapters = [Chapter(id="f", title="Fig", version="1.0.0", blocks=blocks)]
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "fig.md")
+        render_automatic_eda_markdown(chapters, out, {"title": "T"})
+        content = _read(out)
+        assert "| Desde | Hasta | Frecuencia |" in content
+        # Extract the rows of the bins table: lines between the header/separator
+        # and the next blank line.
+        lines = content.splitlines()
+        hi = next(i for i, ln in enumerate(lines)
+                  if ln.startswith("| Desde | Hasta | Frecuencia |"))
+        rows = []
+        for ln in lines[hi + 2:]:           # skip header + separator
+            if not ln.startswith("|"):
+                break
+            rows.append(ln)
+        # 5 histogram bins, no extra wide span row.
+        assert len(rows) == 5, rows
+        # No row spans a width of ~2.0 (the axvspan from x=2 to x=4).
+        for ln in rows:
+            cells = [c.strip() for c in ln.strip("|").split("|")]
+            lo, hi_v = float(cells[0]), float(cells[1])
+            assert (hi_v - lo) < 1.5, f"wide span leaked: {ln}"
+
+
+def test_edge_documento_vacio_no_revienta():
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "empty.md")
+        res = render_automatic_eda_markdown([], out, {})
+        assert res["path"] == out
+        assert os.path.exists(out)
+        assert res["chapters"] == []
+        content = _read(out)
+        assert "documento vacío" in content
+        assert content.startswith("# ")
+
+
+def test_profile_path_construye_capitulos_y_escribe():
+    profile = {
+        "table": "mini",
+        "source": "/data/mini.csv",
+        "n_rows": 10,
+        "n_cols": 1,
+        "quality_score": 88.0,
+        "columns": [
+            {"name": "x", "inferred_type": "numeric", "null_pct": 0.0,
+             "null_count": 0,
+             "numeric": {"mean": 1.0, "median": 1.0, "min": 0.0, "max": 2.0,
+                         "std": 0.5}},
+        ],
+    }
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "mini.md")
+        res = render_automatic_eda_markdown(
+            profile, out, {"title": "Mini", "ctx": {"dataset_name": "Mini"}})
+        assert res["path"] == out  # not None — no exception, file written.
+        assert os.path.exists(out)
+        assert res["n_chars"] > 0
@@ -34,6 +34,7 @@ from .upsert_xlsx_sheet import upsert_xlsx_sheet
 from .duckdb_query_readonly import duckdb_query_readonly
 from .duckdb_execute import duckdb_execute
 from .duckdb_upsert import duckdb_upsert
+from .load_folder_to_duckdb import load_folder_to_duckdb
 from .imap_connect import imap_connect
 from .imap_list_mailboxes import imap_list_mailboxes
 from .imap_search import imap_search
@@ -50,6 +51,7 @@ __all__ = [
    "upsert_xlsx_sheet",
    "duckdb_query_readonly",
    "duckdb_execute",
+    "load_folder_to_duckdb",
    "duckdb_upsert",
    "pg_insert_rows",
    "pg_apply_sql",
@@ -0,0 +1,100 @@
+---
+name: load_folder_to_duckdb
+kind: function
+lang: py
+domain: infra
+version: "1.0.0"
+purity: impure
+signature: "def load_folder_to_duckdb(folder: str, db_path: str = None, pattern: str = '*.csv,*.parquet,*.json') -> dict"
+description: "Escanea el primer nivel de una CARPETA buscando archivos tabulares (CSV/TSV/TXT, Parquet, JSON/NDJSON) y los carga como tablas en una base DuckDB usando los lectores nativos read_csv_auto/read_parquet/read_json_auto. Es la pieza de entrada del EDA a nivel de carpeta (grupo eda). Por cada archivo crea una tabla cuyo nombre se deriva del basename saneado a [0-9a-zA-Z_] en minusculas (prefijo t_ si empieza por digito, sufijos _2/_3 ante colisiones, tabla_<i> si queda vacio). El path se escapa (comilla simple '->'') antes de interpolarlo porque los lectores DuckDB no aceptan el path como parametro posicional. Glob NO recursivo: un glob.glob(os.path.join(folder, g)) por cada patron del CSV, dedup y ordenado. db_path=None genera una DuckDB temporal (mkstemp, se borra el placeholder vacio porque DuckDB rechaza un archivo de 0 bytes) y devuelve su ruta. Un fallo al cargar un archivo concreto no aborta el resto: se registra en errors y se continua. Devuelve siempre un dict sin lanzar (estilo del grupo duckdb): {status:'ok', db_path, tables, errors} en exito (carpeta sin archivos tabulares incluida, tables=[]) y {status:'error', error} cuando la carpeta no existe o falla algo global. Depende del paquete duckdb (1.5.2)."
+tags: [eda, duckdb, ingest, etl, folder]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_py_core"
+imports: [glob, os, re, tempfile, duckdb]
+params:
+  - name: folder
+    desc: "ruta a un directorio. Se escanea solo su primer nivel (NO recursivo). Si no existe o no es un directorio devuelve {status:'error'} sin lanzar."
+  - name: db_path
+    desc: "ruta del archivo DuckDB destino, abierto en modo read-write (lo crea si no existe). None (default) genera una DuckDB temporal unica con tempfile.mkstemp y devuelve su ruta en el campo db_path del retorno. DuckDB es single-writer: si otro proceso lo tiene abierto en escritura, connect falla con error de lock devuelto en el dict."
+  - name: pattern
+    desc: "CSV de globs separados por coma (default '*.csv,*.parquet,*.json'). Cada glob se aplica con glob.glob(os.path.join(folder, g)) sobre el primer nivel de folder; los resultados de todos los globs se deduplican y ordenan. Los globs con ** NO descienden recursivamente (glob.glob sin recursive=True)."
+output: "dict. En exito: {status:'ok', db_path:str (ruta DuckDB usada), tables:[{name:str, source_file:str, n_rows:int}], errors:[{name?:str, source_file:str, error:str}]}. La carpeta sin archivos tabulares es un exito con tables=[] y errors=[]. En error (sin lanzar): {status:'error', error:str}."
+tested: true
+tests:
+  - "test_carga_dos_csv_como_tablas"
+  - "test_db_path_none_crea_temporal"
+  - "test_carpeta_vacia_es_ok_sin_tablas"
+  - "test_carpeta_inexistente_devuelve_status_error"
+test_file_path: "python/functions/infra/load_folder_to_duckdb_test.py"
+file_path: "python/functions/infra/load_folder_to_duckdb.py"
+---
+
+## Ejemplo
+
+```python
+import sys
+sys.path.insert(0, "python/functions")
+from infra.load_folder_to_duckdb import load_folder_to_duckdb
+
+# Preparar una carpeta de demo con dos CSV.
+import os
+os.makedirs("/tmp/eda_folder_demo", exist_ok=True)
+with open("/tmp/eda_folder_demo/ventas.csv", "w") as f:
+    f.write("id,total\n1,10.5\n2,20.0\n3,5.25\n")
+with open("/tmp/eda_folder_demo/clientes.csv", "w") as f:
+    f.write("id,nombre\n1,ana\n2,luis\n")
+
+# Cargar todos los tabulares de la carpeta a una DuckDB temporal.
+res = load_folder_to_duckdb("/tmp/eda_folder_demo")
+print(res["status"])    # ok
+print(res["db_path"])   # /tmp/tmpXXXXXXXX.duckdb (temporal)
+for t in res["tables"]:
+    print(t["name"], t["n_rows"])   # ventas 3  /  clientes 2
+
+# Persistir en una DuckDB concreta y limitar a CSV.
+res2 = load_folder_to_duckdb(
+    "/tmp/eda_folder_demo",
+    db_path="/tmp/eda_folder_demo/folder.duckdb",
+    pattern="*.csv",
+)
+print(res2["tables"])   # [{'name': 'clientes', ...}, {'name': 'ventas', ...}]
+```
+
+## Cuando usarla
+
+Cuando tienes una carpeta de datos sueltos (un dump, un export, varios CSV/Parquet
+descargados) y quieres analizarlos juntos con SQL sin montar la ingesta a mano,
+archivo por archivo. Es el primer eslabon del EDA a nivel de carpeta (grupo `eda`):
+deja una DuckDB con una tabla por archivo, lista para perfilar con
+`duckdb_table_schema_py_infra`, consultar con `duckdb_query_readonly_py_infra`, o
+correlacionar aguas abajo. Usala antes de cualquier paso de perfilado cuando la
+unidad de trabajo es "todos los archivos de este directorio".
+
+## Gotchas
+
+- **Glob NO recursivo**: solo se escanea el primer nivel de `folder`. Archivos en
+  subdirectorios se ignoran (ni siquiera con `**` en el patron, porque
+  `glob.glob` se llama sin `recursive=True`). Si necesitas recursion, aplana la
+  carpeta antes o amplia la funcion.
+- **Saneo de nombres de tabla**: el basename se reduce a `[0-9a-zA-Z_]` en
+  minusculas. `Ventas 2024.csv` -> tabla `ventas_2024`. Dos archivos distintos
+  pueden sanear al mismo nombre (`a-b.csv` y `a_b.csv`); el segundo se desambigua
+  con sufijo `_2`, `_3`, ... El mapeo real archivo->tabla esta en `tables[].name`
+  / `tables[].source_file`, no lo asumas.
+- **`read_json_auto` requiere JSON tabular** (array de objetos u objetos NDJSON
+  homogeneos). Un JSON anidado o irregular puede fallar la carga de ESA tabla; el
+  error se registra en `errors` y el resto de archivos siguen cargandose.
+- **Extension desconocida = se salta**, no falla: queda anotada en `errors` con
+  `unsupported extension`. Mapeo de lectores: `.csv/.tsv/.txt`->`read_csv_auto`,
+  `.parquet/.pq`->`read_parquet`, `.json/.ndjson`->`read_json_auto`.
+- **Escritura real en disco (impura)**. DuckDB es single-writer: si otro proceso
+  tiene `db_path` abierto en escritura, `connect` falla con error de lock devuelto
+  en el dict. Un `db_path` con un directorio padre inexistente tambien falla.
+- **`db_path=None` crea un archivo temporal que NO se borra solo**: la ruta se
+  devuelve en `db_path` para que el llamador la consuma y la limpie cuando termine.
+- **Tipos inferidos por los lectores `_auto`**: los tipos de columna los infiere
+  DuckDB. Revisa el schema con `duckdb_table_schema_py_infra` si el tipado importa
+  aguas abajo.
@@ -0,0 +1,175 @@
+"""Carga una carpeta de archivos tabulares (CSV/Parquet/JSON) como tablas DuckDB.
+
+Funcion impura: escanea el primer nivel de un directorio buscando archivos que
+casen con uno o varios globs, y por cada archivo crea una tabla en una base
+DuckDB usando los lectores nativos (`read_csv_auto`, `read_parquet`,
+`read_json_auto`). Es la pieza de entrada del EDA a nivel de carpeta (grupo
+`eda`): deja una DuckDB con una tabla por archivo, lista para perfilar y
+correlacionar aguas abajo.
+
+Devuelve siempre un dict sin lanzar excepciones, siguiendo el estilo del grupo
+duckdb del registry: {status:'ok', db_path, tables, errors} en exito (incluida
+la carpeta sin archivos tabulares, que es un exito con tables=[]) y
+{status:'error', error:str} cuando la carpeta no existe o falla algo global.
+
+El nombre de cada tabla se deriva del basename del archivo, saneado a
+`[0-9a-zA-Z_]` en minusculas, prefijado con `t_` si empieza por digito, y
+desambiguado con sufijos `_2`, `_3`, ... ante colisiones. El path del archivo se
+escapa (comilla simple, `'`->`''`) antes de interpolarlo en el SQL del lector,
+ya que los lectores DuckDB no admiten el path como parametro posicional. Un fallo
+al cargar un archivo concreto NO aborta el resto: se registra en `errors` y se
+continua con los siguientes.
+"""
+
+import glob
+import os
+import re
+import tempfile
+
+
+def _sanitize_table_name(basename_no_ext: str, index: int) -> str:
+    """Deriva un identificador de tabla valido desde el basename de un archivo.
+
+    Reemplaza todo lo que no sea ``[0-9a-zA-Z_]`` por ``_`` y baja a minusculas.
+    Si tras el saneo queda vacio, usa ``tabla_<index>``. Si empieza por digito,
+    prefija ``t_`` para que sea un identificador SQL valido.
+    """
+    name = re.sub(r"[^0-9a-zA-Z_]", "_", basename_no_ext).lower()
+    if not name:
+        name = f"tabla_{index}"
+    if name[0].isdigit():
+        name = "t_" + name
+    return name
+
+
+def _reader_for_extension(ext: str, quoted_path: str):
+    """Devuelve la expresion de lector DuckDB para una extension, o None.
+
+    El ``quoted_path`` ya viene escapado y entre comillas simples. Extensiones
+    desconocidas devuelven None para que el llamador salte el archivo.
+    """
+    ext = ext.lower()
+    if ext in (".csv", ".tsv", ".txt"):
+        return f"read_csv_auto('{quoted_path}')"
+    if ext in (".parquet", ".pq"):
+        return f"read_parquet('{quoted_path}')"
+    if ext in (".json", ".ndjson"):
+        return f"read_json_auto('{quoted_path}')"
+    return None
+
+
+def load_folder_to_duckdb(
+    folder: str,
+    db_path: str = None,
+    pattern: str = "*.csv,*.parquet,*.json",
+) -> dict:
+    """Carga los archivos tabulares de una carpeta como tablas en una DuckDB.
+
+    Args:
+        folder: ruta a un directorio. Si no existe o no es un directorio,
+            devuelve {status:'error', ...} sin lanzar.
+        db_path: ruta de la DuckDB destino (read-write, se crea si no existe). Si
+            es None, se genera una base temporal con NamedTemporaryFile y su ruta
+            se devuelve en el retorno (`db_path`).
+        pattern: CSV de globs separados por coma (default
+            "*.csv,*.parquet,*.json"). Cada glob se aplica con
+            glob.glob(os.path.join(folder, g)) en el primer nivel (NO recursivo);
+            los resultados se deduplican y ordenan.
+
+    Returns:
+        dict. En exito: {status:'ok', db_path:str, tables:[{name, source_file,
+        n_rows}], errors:[{name?, source_file, error}]}. La carpeta sin archivos
+        tabulares es un exito con tables=[] y errors=[]. En error (sin lanzar):
+        {status:'error', error:str}.
+    """
+    if not isinstance(folder, str) or not os.path.isdir(folder):
+        return {
+            "status": "error",
+            "error": f"folder does not exist or is not a directory: {folder!r}",
+        }
+
+    conn = None
+    try:
+        # Resolver la ruta de la DuckDB destino. Si no se da, reservar un nombre
+        # temporal unico y borrar el archivo vacio que crea mkstemp: DuckDB 1.5.2
+        # rechaza abrir un archivo de 0 bytes ("not a valid DuckDB database
+        # file"), por lo que debe crear el archivo el mismo desde cero.
+        if db_path is None:
+            fd, tmp_name = tempfile.mkstemp(suffix=".duckdb")
+            os.close(fd)
+            os.remove(tmp_name)
+            db_path = tmp_name
+
+        # Resolver los archivos: un glob por cada patron, dedup + orden estable.
+        globs = [g.strip() for g in pattern.split(",") if g.strip()]
+        found = set()
+        for g in globs:
+            for path in glob.glob(os.path.join(folder, g)):
+                if os.path.isfile(path):
+                    found.add(path)
+        files = sorted(found)
+
+        conn = __import__("duckdb").connect(db_path)
+
+        tables = []
+        errors = []
+        used_names = set()
+
+        for i, path in enumerate(files):
+            base = os.path.basename(path)
+            stem, ext = os.path.splitext(base)
+            quoted_path = path.replace("'", "''")
+            reader = _reader_for_extension(ext, quoted_path)
+            if reader is None:
+                errors.append(
+                    {
+                        "source_file": path,
+                        "error": f"unsupported extension: {ext!r}",
+                    }
+                )
+                continue
+
+            name = _sanitize_table_name(stem, i)
+            # Desambiguar colisiones con sufijos _2, _3, ...
+            if name in used_names:
+                suffix = 2
+                while f"{name}_{suffix}" in used_names:
+                    suffix += 1
+                name = f"{name}_{suffix}"
+
+            quoted_ident = '"' + name.replace('"', '""') + '"'
+            try:
+                conn.execute(
+                    f"CREATE TABLE {quoted_ident} AS SELECT * FROM {reader}"
+                )
+                n_rows = conn.execute(
+                    f"SELECT count(*) FROM {quoted_ident}"
+                ).fetchone()[0]
+                used_names.add(name)
+                tables.append(
+                    {
+                        "name": name,
+                        "source_file": path,
+                        "n_rows": int(n_rows),
+                    }
+                )
+            except Exception as e:  # noqa: BLE001
+                errors.append(
+                    {
+                        "name": name,
+                        "source_file": path,
+                        "error": str(e),
+                    }
+                )
+
+        return {
+            "status": "ok",
+            "db_path": db_path,
+            "tables": tables,
+            "errors": errors,
+        }
+    except Exception as e:  # noqa: BLE001
+        return {"status": "error", "error": str(e)}
+    finally:
+        if conn is not None:
+            conn.close()
@@ -0,0 +1,73 @@
+"""Tests para load_folder_to_duckdb."""
+
+import os
+import sys
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+import duckdb  # noqa: E402
+
+from load_folder_to_duckdb import load_folder_to_duckdb  # noqa: E402
+
+
+def _write_csv(path: str, header: str, rows: list[str]) -> None:
+    with open(path, "w", encoding="utf-8") as f:
+        f.write(header + "\n")
+        for r in rows:
+            f.write(r + "\n")
+
+
+def test_carga_dos_csv_como_tablas(tmp_path):
+    _write_csv(
+        str(tmp_path / "ventas.csv"),
+        "id,total",
+        ["1,10.5", "2,20.0", "3,5.25"],
+    )
+    _write_csv(
+        str(tmp_path / "clientes.csv"),
+        "id,nombre",
+        ["1,ana", "2,luis"],
+    )
+    db = tmp_path / "out.duckdb"
+    res = load_folder_to_duckdb(str(tmp_path), str(db))
+
+    assert res["status"] == "ok", res
+    assert res["errors"] == []
+    assert len(res["tables"]) == 2
+    assert res["db_path"] == str(db)
+    assert os.path.exists(str(db))
+
+    by_name = {t["name"]: t for t in res["tables"]}
+    assert by_name["ventas"]["n_rows"] == 3
+    assert by_name["clientes"]["n_rows"] == 2
+
+    # Verificar que las tablas existen realmente en la base.
+    con = duckdb.connect(str(db), read_only=True)
+    assert con.execute("SELECT count(*) FROM ventas").fetchone()[0] == 3
+    assert con.execute("SELECT count(*) FROM clientes").fetchone()[0] == 2
+    con.close()
+
+
+def test_db_path_none_crea_temporal(tmp_path):
+    _write_csv(str(tmp_path / "datos.csv"), "x", ["1", "2"])
+    res = load_folder_to_duckdb(str(tmp_path))
+    assert res["status"] == "ok", res
+    assert res["db_path"]
+    assert os.path.exists(res["db_path"])
+    assert len(res["tables"]) == 1
+    assert res["tables"][0]["n_rows"] == 2
+    os.remove(res["db_path"])
+
+
+def test_carpeta_vacia_es_ok_sin_tablas(tmp_path):
+    db = tmp_path / "out.duckdb"
+    res = load_folder_to_duckdb(str(tmp_path), str(db))
+    assert res["status"] == "ok", res
+    assert res["tables"] == []
+    assert res["errors"] == []
+
+
+def test_carpeta_inexistente_devuelve_status_error(tmp_path):
+    res = load_folder_to_duckdb(str(tmp_path / "no_existe"))
+    assert res["status"] == "error"
+    assert "folder" in res["error"]
@@ -1,9 +1,10 @@
-"""render_automatic_eda — EDA completo one-shot: perfil → ctx → PDF + PPTX.
+"""render_automatic_eda — EDA completo one-shot: perfil → ctx → PDF + PPTX + MD.

 Pipeline impuro del grupo de capacidad `eda`. Dada UNA tabla DuckDB (o
-PostgreSQL), produce el informe AutomaticEDA COMPLETO en sus dos formatos a la
-vez (PDF móvil A5 + PPTX 16:9) con los 11 capítulos POBLADOS, en una sola
-llamada. Compone, sin reimplementar su lógica, cuatro funciones del registry:
+PostgreSQL), produce el informe AutomaticEDA COMPLETO en sus tres formatos a la
+vez (PDF móvil A5 + PPTX 16:9 + Markdown autocontenido para pegar a un LLM) con
+los capítulos POBLADOS, en una sola llamada. Compone, sin reimplementar su
+lógica, varias funciones del registry:

  - profile_table          : perfila la tabla end-to-end (TableProfile agregado),
                             opcionalmente con modelos baratos y análisis de serie.
@@ -12,8 +13,11 @@ llamada. Compone, sin reimplementar su lógica, cuatro funciones del registry:
                             modelos/geo, timeseries_raw para series, geo_points
                             para el mapa, db_path/table para la agregación
                             push-down). Sin él, esos capítulos degradan.
-  - render_automatic_eda_pdf  : renderiza el documento por capítulos a PDF.
-  - render_automatic_eda_pptx : renderiza el mismo documento a PPTX.
+  - render_automatic_eda_pdf      : renderiza el documento por capítulos a PDF.
+  - render_automatic_eda_pptx     : renderiza el mismo documento a PPTX.
+  - render_automatic_eda_markdown : serializa el mismo documento a Markdown
+                                    autocontenido (texto + tablas markdown, sin
+                                    binarios) para incorporar a un LLM.

 El TableProfile agregado basta para portada/overview/distribuciones/calidad/
 correlación, pero los capítulos `modelos`, `timeseries`, `geospatial` y
@@ -32,6 +36,7 @@ from datetime import datetime, timezone

 from datascience import (
    build_eda_render_ctx,
+    render_automatic_eda_markdown,
    render_automatic_eda_pdf,
    render_automatic_eda_pptx,
    run_eda_models,
@@ -93,6 +98,7 @@ def render_automatic_eda(
    out_dir: str = "reports",
    basename: str = None,
    ctx_extra: dict = None,
+    emit_md: bool = True,
 ) -> dict:
    """Perfila una tabla y emite el informe AutomaticEDA completo (PDF + PPTX).

@@ -140,13 +146,19 @@ def render_automatic_eda(
        ctx_extra: dict opcional con claves de presentación/contexto extra que se
            mezclan en el ctx (p.ej. dataset_name, description, source_origin).
            No pisan las claves de datos calculadas por build_eda_render_ctx.
+        emit_md: además del PDF y el PPTX, emite un Markdown autocontenido del
+            MISMO documento por capítulos (texto plano + tablas markdown, sin
+            binarios), pensado para pegar a un LLM. Default True. La ruta sale en
+            la clave de retorno ``aeda_md_path``. No altera las demás salidas.

    Returns:
        dict (nunca lanza). En éxito::

            {"status": "ok", "pdf_path": str, "pptx_path": str,
-             "manifest_path": str|None, "n_pages": int, "n_slides": int,
-             "pdf_note": str, "pptx_note": str, "profile": <TableProfile>}
+             "aeda_md_path": str|None, "manifest_path": str|None,
+             "n_pages": int, "n_slides": int, "md_chars": int|None,
+             "pdf_note": str, "pptx_note": str, "md_note": str|None,
+             "profile": <TableProfile>}

        En error: {"status": "error", "error": str}.
    """
@@ -243,15 +255,26 @@ def render_automatic_eda(
        rpdf = render_automatic_eda_pdf(prof, pdf_path, meta) or {}
        rpptx = render_automatic_eda_pptx(prof, pptx_path, meta) or {}

+        # Salida Markdown autocontenida (mismo documento por capítulos) para
+        # pegar a un LLM. Aditiva: no afecta a PDF/PPTX/manifest. dict-no-throw.
+        rmd = {}
+        md_path = None
+        if emit_md:
+            md_path = os.path.join(out_dir, base + ".md")
+            rmd = render_automatic_eda_markdown(prof, md_path, meta) or {}
+
        return {
            "status": "ok",
            "pdf_path": rpdf.get("path"),
            "pptx_path": rpptx.get("path"),
+            "aeda_md_path": rmd.get("path"),
            "manifest_path": rpdf.get("manifest_path"),
            "n_pages": rpdf.get("n_pages"),
            "n_slides": rpptx.get("n_slides"),
+            "md_chars": rmd.get("n_chars"),
            "pdf_note": rpdf.get("note"),
            "pptx_note": rpptx.get("note"),
+            "md_note": rmd.get("note"),
            "profile": prof,
        }
    except Exception as e:  # noqa: BLE001 — dict-no-throw: degradar, nunca lanzar.
@@ -0,0 +1,115 @@
+---
+name: render_automatic_eda_folder
+kind: pipeline
+lang: py
+domain: pipelines
+purity: impure
+version: "1.0.0"
+signature: "def render_automatic_eda_folder(path: str, out_dir: str = \"reports\", basename: str = None, profile_level: str = \"standard\", emit_pdf: bool = True, emit_pptx: bool = True, emit_md: bool = True, per_table_eda: bool = False, min_inclusion: float = 0.9, ctx_extra: dict = None) -> dict"
+description: "Informe AutomaticEDA a nivel de BASE one-shot de una CARPETA de archivos tabulares (CSV/Parquet/JSON) o de una DuckDB existente. Carga la carpeta a una DuckDB temporal con load_folder_to_duckdb (o usa la DuckDB dada directa), perfila TODA la base con profile_database (resumen de cada tabla + FK candidatas por containment + join graph con diagrama Mermaid), ENSAMBLA un documento-base por capitulos (portada-base con nombre/n tablas/totales/fecha/fuente, resumen de tablas con una fila por tabla, y relaciones inter-tabla con la tabla de FK candidatas + una Figure matplotlib REAL del join graph dibujada con draw_join_graph_figure mas el texto Mermaid) y lo renderiza con el motor AutomaticEDA a PDF (A5 movil), PPTX (16:9) y Markdown autocontenido a la vez. Con per_table_eda=True anexa los capitulos de mini-EDA de cada tabla (build_document por tabla). Es el hermano a nivel de base de render_automatic_eda (que perfila UNA tabla): aqui el informe es de la base y de sus relaciones. Devuelve las rutas de PDF/PPTX/MD, el manifiesto y el DatabaseProfile."
+tags: [eda, duckdb, database, profiling, relations, pipeline, dataops, report, pdf, pptx, launcher]
+uses_functions:
+  - load_folder_to_duckdb_py_infra
+  - profile_database_py_pipelines
+  - render_automatic_eda_pdf_py_datascience
+  - render_automatic_eda_pptx_py_datascience
+  - render_automatic_eda_markdown_py_datascience
+  - draw_join_graph_figure_py_datascience
+uses_types: []
+returns: []
+returns_optional: false
+error_type: error_go_core
+imports: []
+tested: true
+tests:
+  - "golden: carpeta con 3 CSV relacionados (customers/orders/products) emite PDF+PPTX+MD del documento-base con 3 tablas y la FK orders.customer_id->customers.id"
+  - "edge: carpeta vacia -> status ok con documento minimo, sin lanzar"
+  - "edge: 1 sola tabla -> funciona sin relaciones (capitulo relaciones dice 'sin FK')"
+test_file_path: "python/functions/pipelines/render_automatic_eda_folder_test.py"
+file_path: "python/functions/pipelines/render_automatic_eda_folder.py"
+params:
+  - name: path
+    desc: "DIRECTORIO con archivos tabulares (CSV/Parquet/JSON) que se cargan a una DuckDB temporal, o una DuckDB ya existente (.duckdb/.ddb/.db) que se perfila directa."
+  - name: out_dir
+    desc: "Directorio de salida de los informes (se crea si no existe). Default 'reports'."
+  - name: basename
+    desc: "Nombre base de los archivos sin extension. Default 'aeda_base_<nombre>_<timestamp>'."
+  - name: profile_level
+    desc: "Preset de coste del perfil por tabla ('lite'/'standard'/'full'); ajusta el sample que profile_database pasa a cada tabla (lite=2000, standard/full=5000)."
+  - name: emit_pdf
+    desc: "Emite el PDF A5 movil del documento-base. Default True."
+  - name: emit_pptx
+    desc: "Emite el PPTX 16:9 del documento-base. Default True."
+  - name: emit_md
+    desc: "Emite el Markdown autocontenido del documento-base. Default True."
+  - name: per_table_eda
+    desc: "Si True, anexa al documento-base los capitulos de mini-EDA de cada tabla (Heading 'Tabla: <n>' + build_document por tabla). Default False (solo documento-base: portada + resumen + relaciones)."
+  - name: min_inclusion
+    desc: "Umbral de inclusion (0-1) para emitir una FK candidata (se pasa a profile_database). Default 0.9."
+  - name: ctx_extra
+    desc: "Dict opcional de claves de presentacion (p.ej. dataset_name, description) que se mezclan en el contexto de la portada-base."
+output: "Dict dict-no-throw. En exito: {status:'ok', pdf_path, pptx_path, md_path, manifest_path, n_tables, n_pages, n_slides, md_chars, db_path, db_profile}. En error: {status:'error', error:str}."
+---
+
+# render_automatic_eda_folder
+
+EDA de una **carpeta / base multi-tabla** → informe AutomaticEDA por capítulos
+en PDF (móvil A5) + PPTX (16:9) + Markdown, en una sola llamada. Es el hermano a
+nivel de **base** de `render_automatic_eda` (que perfila una sola tabla): aquí el
+documento resume **todas** las tablas y, sobre todo, sus **relaciones**
+inter-tabla (FK candidatas por containment + join graph con diagrama Mermaid).
+
+Compone, sin reimplementar su lógica: `load_folder_to_duckdb` (carga la carpeta),
+`profile_database` (perfila la base + infiere FK + join graph) y los tres
+renderers del motor AutomaticEDA (`render_automatic_eda_pdf`/`_pptx`/`_markdown`),
+que aceptan directamente la lista de capítulos del documento-base que este
+pipeline ensambla. El pipeline de tabla única (`render_automatic_eda`) queda
+intacto: esto es aditivo.
+
+## Ejemplo
+
+```bash
+# Carpeta con varios CSV/Parquet/JSON relacionados:
+./fn run render_automatic_eda_folder /tmp/eda_folder_demo
+
+# Una DuckDB ya existente (rama directa):
+./fn run render_automatic_eda_folder temp/bigdata/taxi.duckdb
+```
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from pipelines.render_automatic_eda_folder import render_automatic_eda_folder
+
+r = render_automatic_eda_folder("/tmp/eda_folder_demo", out_dir="reports")
+# r["status"] == "ok"; r["pdf_path"], r["pptx_path"], r["md_path"]
+# r["n_tables"] == 3; r["db_profile"]["fk_candidates"] incluye
+#   orders.customer_id -> customers.id
+```
+
+## Cuando usarla
+
+Cuando quieras un EDA de una **base entera** (una carpeta de exports o una
+DuckDB con varias tablas), no de una sola tabla: para ver de un vistazo qué
+tablas hay, su tamaño y calidad, y cómo se relacionan (FK candidatas + diagrama),
+en el mismo formato rico por capítulos (PDF móvil + PPTX + MD) que el EDA de
+tabla. Usa `per_table_eda=True` cuando además quieras el mini-EDA de cada tabla
+anexado.
+
+## Gotchas
+
+- Impuro: lee archivos del disco y escribe PDF/PPTX/MD en `out_dir`. En la rama
+  "carpeta" crea una **DuckDB temporal** (su ruta sale en `db_path`); no se borra
+  automáticamente (queda para reinspección).
+- `path` se interpreta así: directorio → se carga la carpeta; archivo con
+  extensión `.duckdb`/`.ddb`/`.db` → se usa directo; cualquier otro archivo o un
+  path inexistente → `{status:'error'}` (no lanza).
+- El escaneo de la carpeta es **no recursivo** (solo el primer nivel) y por
+  defecto cubre `*.csv,*.parquet,*.json` (ver `load_folder_to_duckdb`).
+- El join graph se rasteriza a una **Figure matplotlib real** (vía
+  `draw_join_graph_figure`) que aparece dibujada en PDF/PPTX (nodos = tablas,
+  flechas = FK). Además, el **texto Mermaid** del grafo se incluye como bloque de
+  código (en el Markdown queda como diagrama renderizable y es útil para pegar a
+  un LLM).
+- Carpeta vacía o con 1 sola tabla: funciona igual; el capítulo de relaciones
+  dice "sin FK". dict-no-throw en todos los caminos.
@@ -0,0 +1,366 @@
+"""render_automatic_eda_folder — EDA de una CARPETA / base multi-tabla one-shot.
+
+Pipeline impuro del grupo de capacidad `eda`, a nivel de BASE. Dada una CARPETA
+de archivos tabulares (CSV/Parquet/JSON) o una DuckDB ya existente, produce el
+informe AutomaticEDA de la BASE en sus tres formatos a la vez (PDF móvil A5 +
+PPTX 16:9 + Markdown autocontenido), con los capítulos POBLADOS, en una sola
+llamada. Es el hermano a nivel de base de ``render_automatic_eda`` (que perfila
+UNA tabla): aquí el documento por capítulos resume TODAS las tablas y, sobre
+todo, sus RELACIONES inter-tabla (FK candidatas + join graph).
+
+Compone funciones del registry SIN reimplementar su lógica:
+
+  - load_folder_to_duckdb : carga una carpeta de archivos a una DuckDB temporal
+                            (rama "carpeta"). En la rama "ya es duckdb" se omite.
+  - profile_database      : perfila TODA la base (resumen de cada tabla,
+                            TableProfiles completos, FK candidatas por
+                            containment y join graph con diagrama Mermaid).
+  - render_automatic_eda_pdf      : renderiza el documento-base por capítulos a PDF.
+  - render_automatic_eda_pptx     : renderiza el mismo documento-base a PPTX.
+  - render_automatic_eda_markdown : serializa el mismo documento-base a Markdown
+                                    autocontenido (texto + tablas markdown).
+  - build_document        : (solo con per_table_eda=True) ensambla los capítulos
+                            canónicos de CADA tabla para anexarlos al documento.
+
+La capa propia de este pipeline es ENSAMBLAR EL DOCUMENTO-BASE de capítulos a
+partir del ``DatabaseProfile`` que devuelve ``profile_database`` y cablear los
+tres renderers del motor AutomaticEDA. El documento-base mínimo tiene tres
+capítulos: portada-base (nombre/nº tablas/totales/fecha/fuente), resumen de
+tablas (una fila por tabla) y relaciones inter-tabla (FK candidatas + diagrama
+Mermaid). Con ``per_table_eda=True`` anexa, por cada tabla, sus capítulos de
+mini-EDA.
+
+Estilo dict-no-throw del grupo `eda`: nunca lanza; captura cualquier error y
+degrada a ``{"status": "error", "error": str}``.
+"""
+
+import os
+from datetime import datetime, timezone
+
+from datascience import (
+    draw_join_graph_figure,
+    render_automatic_eda_markdown,
+    render_automatic_eda_pdf,
+    render_automatic_eda_pptx,
+)
+from datascience.automatic_eda import build_document
+from infra import load_folder_to_duckdb
+from pipelines.profile_database import profile_database
+
+# Mapa profile_level -> tamaño de muestra por columna del perfil de cada tabla.
+# A nivel de base el coste lo domina el nº de tablas; el preset solo ajusta el
+# sample que profile_database pasa a profile_table.
+_SAMPLE_BY_LEVEL = {"lite": 2000, "standard": 5000, "full": 5000}
+
+# Extensiones que se consideran "una DuckDB ya hecha" en la rama directa.
+_DUCKDB_EXTS = (".duckdb", ".ddb", ".db")
+
+
+def _fmt_num(v) -> str:
+    """Formatea un entero con separador de millar; '—' si no es número."""
+    if isinstance(v, bool) or not isinstance(v, (int, float)):
+        return "—"
+    try:
+        return f"{int(v):,}".replace(",", ".")
+    except Exception:  # noqa: BLE001
+        return str(v)
+
+
+def _portada_chapter(db_profile: dict, source_path: str, db_path: str,
+                     meta_ctx: dict) -> dict:
+    """Capítulo de portada a nivel de base (NO reusa chapters/portada.py, que es
+    de tabla única): nombre de la base, nº de tablas, totales y procedencia."""
+    tables = db_profile.get("tables", []) or []
+    total_rows = sum(
+        (t.get("n_rows") or 0) for t in tables if isinstance(t.get("n_rows"), (int, float))
+    )
+    total_cols = sum(
+        (t.get("n_cols") or 0) for t in tables if isinstance(t.get("n_cols"), (int, float))
+    )
+    base_name = (meta_ctx or {}).get("dataset_name") or os.path.basename(
+        os.path.normpath(source_path)
+    ) or source_path
+
+    rows = [
+        ("Base", base_name),
+        ("Tablas", _fmt_num(db_profile.get("n_tables"))),
+        ("Filas totales", _fmt_num(total_rows)),
+        ("Columnas totales", _fmt_num(total_cols)),
+        ("Relaciones FK", _fmt_num(len(db_profile.get("fk_candidates", []) or []))),
+        ("Fuente", source_path),
+        ("DuckDB", db_path),
+        ("Generado", datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")),
+    ]
+    blocks = [
+        {"kind": "heading", "text": f"EDA de la base — {base_name}", "level": 1},
+        {"kind": "kv_table", "rows": rows, "title": "Resumen de la base"},
+    ]
+    errs = db_profile.get("errors", []) or []
+    if errs:
+        blocks.append({
+            "kind": "note",
+            "text": f"{len(errs)} aviso(s) durante el perfilado (ver detalle).",
+        })
+    return {"id": "portada_base", "title": "Portada", "version": "1.0.0",
+            "blocks": blocks}
+
+
+def _resumen_chapter(db_profile: dict) -> dict:
+    """Capítulo con una fila por tabla: filas, columnas, calidad, key_candidates."""
+    header = ["Tabla", "Filas", "Columnas", "Calidad", "key_candidates"]
+    rows = []
+    for t in db_profile.get("tables", []) or []:
+        keys = ", ".join(t.get("key_candidates") or []) or "—"
+        rows.append([
+            t.get("table"),
+            _fmt_num(t.get("n_rows")),
+            _fmt_num(t.get("n_cols")),
+            t.get("quality_score"),
+            keys,
+        ])
+    if rows:
+        blocks = [{
+            "kind": "data_table", "header": header, "rows": rows,
+            "title": "Tablas de la base",
+            "note": "Una fila por tabla. Calidad = score agregado del TableProfile.",
+        }]
+    else:
+        blocks = [{"kind": "note",
+                   "text": "La base no contiene tablas perfilables."}]
+    return {"id": "resumen_tablas", "title": "Resumen de tablas",
+            "version": "1.0.0", "blocks": blocks}
+
+
+def _relaciones_chapter(db_profile: dict) -> dict:
+    """Capítulo de relaciones inter-tabla: tabla de FK candidatas + diagrama
+    Mermaid del join graph (vuelca el Mermaid como bloque de código)."""
+    fks = db_profile.get("fk_candidates", []) or []
+    blocks = [{
+        "kind": "heading", "text": "Relaciones inter-tabla", "level": 2,
+    }]
+    if fks:
+        header = ["From", "To", "Inclusión", "Cardinalidad"]
+        rows = []
+        for fk in fks:
+            frm = f"{fk.get('from_table')}.{fk.get('from_col')}"
+            to = f"{fk.get('to_table')}.{fk.get('to_col')}"
+            inc = fk.get("inclusion")
+            inc_s = f"{inc:.3f}" if isinstance(inc, (int, float)) else str(inc)
+            rows.append([frm, to, inc_s, fk.get("cardinality")])
+        blocks.append({
+            "kind": "data_table", "header": header, "rows": rows,
+            "title": "FK candidatas (por containment de valores)",
+            "note": "Inclusión = fracción de valores de From contenidos en To.",
+        })
+    else:
+        blocks.append({
+            "kind": "note",
+            "text": "Sin relaciones FK candidatas detectadas entre las tablas.",
+        })
+
+    join_graph = db_profile.get("join_graph") or {}
+    has_edges = bool(join_graph.get("edges"))
+    if has_edges:
+        blocks.append({"kind": "heading", "text": "Diagrama (join graph)",
+                       "level": 3})
+        # Figure matplotlib REAL del grafo de relaciones (nodos = tablas,
+        # aristas = FK). Lazy via `make`: el renderer la construye solo al
+        # paginar, y se rasteriza en PDF/PPTX. draw_join_graph_figure nunca
+        # lanza (devuelve una Figure de error si algo falla).
+        blocks.append({
+            "kind": "figure",
+            "make": (lambda jg=join_graph: draw_join_graph_figure(
+                jg, title="Join graph (relaciones inter-tabla)")),
+            "caption": "Grafo de relaciones: nodos = tablas, flechas = FK "
+                       "candidatas (etiqueta from_col→to_col).",
+            "height_in": 4.5,
+        })
+        # Además, el Mermaid en texto: en el Markdown queda como diagrama
+        # renderizable y es útil para pegar a un LLM.
+        mermaid = (join_graph.get("mermaid", "") or "").strip()
+        if mermaid:
+            blocks.append({"kind": "markdown",
+                           "text": "```mermaid\n" + mermaid + "\n```"})
+    return {"id": "relaciones", "title": "Relaciones inter-tabla",
+            "version": "1.0.0", "blocks": blocks}
+
+
+def _build_db_document(db_profile: dict, source_path: str, db_path: str,
+                       meta_ctx: dict, per_table_eda: bool) -> list:
+    """Ensambla el documento-base por capítulos a partir del DatabaseProfile.
+
+    Mínimo: portada-base + resumen de tablas + relaciones. Con per_table_eda
+    True anexa, por cada tabla, un capítulo separador + los capítulos canónicos
+    de su mini-EDA (reusando build_document sobre cada TableProfile)."""
+    chapters = [
+        _portada_chapter(db_profile, source_path, db_path, meta_ctx),
+        _resumen_chapter(db_profile),
+        _relaciones_chapter(db_profile),
+    ]
+    if per_table_eda:
+        for prof in db_profile.get("table_profiles", []) or []:
+            tname = prof.get("table") or "tabla"
+            chapters.append({
+                "id": f"tabla_{tname}", "title": f"Tabla: {tname}",
+                "version": "1.0.0",
+                "blocks": [{"kind": "heading", "text": f"Tabla: {tname}",
+                            "level": 1}],
+            })
+            try:
+                # build_document devuelve los capítulos canónicos de la tabla.
+                # ctx None -> los capítulos que necesitan datos crudos degradan,
+                # pero salen completos los de portada/overview/distrib/calidad.
+                chapters.extend(build_document(prof, None) or [])
+            except Exception:  # noqa: BLE001 — una tabla mala no rompe el doc.
+                chapters.append({
+                    "id": f"tabla_{tname}_err", "title": f"Tabla: {tname}",
+                    "version": "1.0.0",
+                    "blocks": [{"kind": "note",
+                                "text": "No se pudo ensamblar el mini-EDA de "
+                                        "esta tabla."}],
+                })
+    return chapters
+
+
+def _resolve_db_path(path: str) -> dict:
+    """Resuelve el DuckDB a perfilar desde ``path``.
+
+    - Directorio  -> carga la carpeta con load_folder_to_duckdb (DuckDB temp).
+    - Archivo .duckdb/.ddb/.db -> se usa directo (rama "ya es duckdb").
+    - Otro archivo / inexistente -> error.
+
+    Devuelve {status, db_path, loaded, n_tables, load_errors}.
+    """
+    if os.path.isdir(path):
+        lr = load_folder_to_duckdb(path)
+        if lr.get("status") != "ok":
+            return {"status": "error",
+                    "error": f"load_folder_to_duckdb falló: {lr.get('error')}"}
+        return {
+            "status": "ok",
+            "db_path": lr.get("db_path"),
+            "loaded": True,
+            "n_tables": len(lr.get("tables", []) or []),
+            "load_errors": lr.get("errors", []) or [],
+        }
+    if os.path.isfile(path):
+        if path.lower().endswith(_DUCKDB_EXTS):
+            return {"status": "ok", "db_path": path, "loaded": False,
+                    "n_tables": None, "load_errors": []}
+        return {"status": "error",
+                "error": f"'{path}' no es un directorio ni una DuckDB "
+                         f"(extensiones {_DUCKDB_EXTS})."}
+    return {"status": "error", "error": f"path no existe: {path}"}
+
+
+def render_automatic_eda_folder(
+    path: str,
+    out_dir: str = "reports",
+    basename: str = None,
+    profile_level: str = "standard",
+    emit_pdf: bool = True,
+    emit_pptx: bool = True,
+    emit_md: bool = True,
+    per_table_eda: bool = False,
+    min_inclusion: float = 0.9,
+    ctx_extra: dict = None,
+) -> dict:
+    """Perfila una CARPETA (o una DuckDB) y emite el informe AutomaticEDA de la base.
+
+    Args:
+        path: o bien un DIRECTORIO con archivos tabulares (CSV/Parquet/JSON) que
+            se cargan a una DuckDB temporal, o bien una DuckDB ya existente
+            (``.duckdb``/``.ddb``/``.db``) que se perfila directa.
+        out_dir: directorio de salida (se crea si no existe). Default "reports".
+        basename: nombre base de los archivos sin extensión. Default
+            "aeda_base_<nombre>_<timestamp>".
+        profile_level: preset de coste del perfil por tabla ("lite"/"standard"/
+            "full"); ajusta el ``sample`` que profile_database pasa a cada tabla.
+        emit_pdf / emit_pptx / emit_md: qué formatos emitir. Default los tres.
+        per_table_eda: si True, anexa al documento-base los capítulos de mini-EDA
+            de cada tabla (un Heading "Tabla: <n>" + build_document por tabla).
+            Default False (solo el documento-base: portada + resumen + relaciones).
+        min_inclusion: umbral de inclusión para emitir una FK candidata (0-1).
+        ctx_extra: dict opcional de claves de presentación (p.ej. dataset_name,
+            description) que se mezclan en el contexto de la portada.
+
+    Returns:
+        dict (nunca lanza). En éxito::
+
+            {"status": "ok", "pdf_path": str|None, "pptx_path": str|None,
+             "md_path": str|None, "manifest_path": str|None,
+             "n_tables": int, "n_pages": int|None, "n_slides": int|None,
+             "md_chars": int|None, "db_path": str, "db_profile": <DatabaseProfile>}
+
+        En error: {"status": "error", "error": str}.
+    """
+    try:
+        # 1) Resolver la DuckDB a perfilar (cargar carpeta o usar la dada).
+        rdb = _resolve_db_path(path)
+        if rdb.get("status") != "ok":
+            return {"status": "error", "error": rdb.get("error")}
+        db_path = rdb.get("db_path")
+
+        # 2) Perfilar la base entera (resumen + FK + join graph). Sin report
+        # propio (write_report/emit_pdf False): este pipeline emite el suyo.
+        sample = _SAMPLE_BY_LEVEL.get(profile_level, 5000)
+        pres = profile_database(
+            db_path, sample=sample, write_report=False,
+            min_inclusion=min_inclusion, emit_pdf=False,
+        )
+        if pres.get("status") != "ok":
+            return {"status": "error",
+                    "error": f"profile_database falló: {pres.get('error')}"}
+        db_profile = pres.get("db_profile") or {}
+
+        # 3) Ensamblar el documento-base por capítulos.
+        meta_ctx = dict(ctx_extra or {})
+        chapters = _build_db_document(
+            db_profile, path, db_path, meta_ctx, per_table_eda
+        )
+
+        # 4) Render a los tres formatos desde el MISMO documento por capítulos.
+        os.makedirs(out_dir, exist_ok=True)
+        ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
+        nm = (meta_ctx.get("dataset_name")
+              or os.path.basename(os.path.normpath(path)) or "base")
+        nm = "".join(c if c.isalnum() else "_" for c in str(nm)).strip("_") or "base"
+        base = basename or f"aeda_base_{nm}_{ts}"
+        title = f"EDA base — {meta_ctx.get('dataset_name') or nm}"
+        meta = {"title": title}
+
+        pdf_path = pptx_path = md_path = manifest_path = None
+        n_pages = n_slides = md_chars = None
+
+        if emit_pdf:
+            target = os.path.join(out_dir, base + ".pdf")
+            rpdf = render_automatic_eda_pdf(chapters, target, meta) or {}
+            pdf_path = rpdf.get("path")
+            n_pages = rpdf.get("n_pages")
+            manifest_path = rpdf.get("manifest_path")
+        if emit_pptx:
+            target = os.path.join(out_dir, base + ".pptx")
+            rpptx = render_automatic_eda_pptx(chapters, target, meta) or {}
+            pptx_path = rpptx.get("path")
+            n_slides = rpptx.get("n_slides")
+        if emit_md:
+            target = os.path.join(out_dir, base + ".md")
+            rmd = render_automatic_eda_markdown(chapters, target, meta) or {}
+            md_path = rmd.get("path")
+            md_chars = rmd.get("n_chars")
+
+        return {
+            "status": "ok",
+            "pdf_path": pdf_path,
+            "pptx_path": pptx_path,
+            "md_path": md_path,
+            "manifest_path": manifest_path,
+            "n_tables": db_profile.get("n_tables"),
+            "n_pages": n_pages,
+            "n_slides": n_slides,
+            "md_chars": md_chars,
+            "db_path": db_path,
+            "db_profile": db_profile,
+        }
+    except Exception as e:  # noqa: BLE001 — dict-no-throw: degradar, nunca lanzar.
+        return {"status": "error", "error": str(e)}
@@ -0,0 +1,188 @@
+"""Tests para render_automatic_eda_folder — EDA de una carpeta / base multi-tabla.
+
+Golden: una carpeta con 3 CSV relacionados (customers/orders/products) produce el
+documento-base en PDF + PPTX + MD, con las 3 tablas en el resumen y la FK
+orders.customer_id -> customers.id en el capítulo de relaciones. Edges: carpeta
+vacía (documento mínimo, sin lanzar), 1 sola tabla (sin relaciones) y la rama
+"ya es una DuckDB" sobre un archivo .duckdb existente.
+"""
+
+import os
+import sys
+
+import duckdb
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
+
+from pipelines.render_automatic_eda_folder import (
+    _relaciones_chapter,
+    render_automatic_eda_folder,
+)
+
+
+def _write_demo_folder(folder: str) -> None:
+    """3 CSV relacionados: orders.customer_id -> customers.id (FK detectable)."""
+    with open(os.path.join(folder, "customers.csv"), "w", encoding="utf-8") as fh:
+        fh.write("id,name,city\n")
+        fh.write("1,Alice,Madrid\n2,Bob,Barcelona\n3,Carol,Valencia\n"
+                 "4,Dave,Sevilla\n5,Eve,Madrid\n")
+    with open(os.path.join(folder, "orders.csv"), "w", encoding="utf-8") as fh:
+        fh.write("order_id,customer_id,product_id,total\n")
+        fh.write("100,1,10,49.90\n101,1,11,12.50\n102,2,10,49.90\n"
+                 "103,3,12,8.00\n104,3,11,12.50\n105,5,10,49.90\n"
+                 "106,2,12,8.00\n")
+    with open(os.path.join(folder, "products.csv"), "w", encoding="utf-8") as fh:
+        fh.write("product_id,product_name,price\n")
+        fh.write("10,Widget,49.90\n11,Gadget,12.50\n12,Gizmo,8.00\n")
+
+
+def _has_fk(db_profile: dict, from_t: str, from_c: str, to_t: str) -> bool:
+    for fk in db_profile.get("fk_candidates", []) or []:
+        if (fk.get("from_table") == from_t and fk.get("from_col") == from_c
+                and fk.get("to_table") == to_t):
+            return True
+    return False
+
+
+def test_golden_folder_three_csv(tmp_path):
+    """Carpeta con 3 CSV relacionados -> PDF+PPTX+MD, 3 tablas, FK detectada."""
+    folder = tmp_path / "demo"
+    folder.mkdir()
+    _write_demo_folder(str(folder))
+    out = tmp_path / "out"
+
+    r = render_automatic_eda_folder(str(folder), out_dir=str(out))
+
+    assert r["status"] == "ok", r
+    assert r["n_tables"] == 3
+    # Los tres formatos se emitieron y existen en disco.
+    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
+    assert r["pptx_path"] and os.path.exists(r["pptx_path"])
+    assert r["md_path"] and os.path.exists(r["md_path"])
+    assert (r["n_pages"] or 0) >= 1
+    assert (r["n_slides"] or 0) >= 1
+    # La FK orders.customer_id -> customers.id se detecta por containment.
+    assert _has_fk(r["db_profile"], "orders", "customer_id", "customers"), \
+        r["db_profile"].get("fk_candidates")
+    # El Markdown menciona las 3 tablas y la relación.
+    md = open(r["md_path"], encoding="utf-8").read()
+    for t in ("customers", "orders", "products"):
+        assert t in md
+    assert "customer_id" in md
+
+
+def test_edge_empty_folder(tmp_path):
+    """Carpeta vacía -> status ok con documento mínimo, sin lanzar."""
+    folder = tmp_path / "empty"
+    folder.mkdir()
+    out = tmp_path / "out"
+
+    r = render_automatic_eda_folder(str(folder), out_dir=str(out))
+
+    assert r["status"] == "ok", r
+    assert r["n_tables"] == 0
+    # Aun sin tablas, emite el documento-base mínimo (portada + resumen vacío +
+    # relaciones "sin FK").
+    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
+    assert r["md_path"] and os.path.exists(r["md_path"])
+
+
+def test_edge_single_table_no_relations(tmp_path):
+    """Carpeta con 1 sola tabla -> funciona sin relaciones (capítulo 'sin FK')."""
+    folder = tmp_path / "single"
+    folder.mkdir()
+    with open(folder / "lonely.csv", "w", encoding="utf-8") as fh:
+        fh.write("a,b\n1,x\n2,y\n3,z\n")
+    out = tmp_path / "out"
+
+    r = render_automatic_eda_folder(str(folder), out_dir=str(out))
+
+    assert r["status"] == "ok", r
+    assert r["n_tables"] == 1
+    assert not (r["db_profile"].get("fk_candidates") or [])
+    md = open(r["md_path"], encoding="utf-8").read()
+    assert "Sin relaciones FK" in md or "sin FK" in md.lower()
+
+
+def test_accepts_existing_duckdb(tmp_path):
+    """Rama 'ya es una DuckDB': un archivo .duckdb existente se perfila directo."""
+    db = tmp_path / "base.duckdb"
+    conn = duckdb.connect(str(db))
+    try:
+        conn.execute("CREATE TABLE customers (id INTEGER, name VARCHAR)")
+        conn.execute("INSERT INTO customers VALUES (1,'Ana'),(2,'Luis'),(3,'Eva')")
+        conn.execute("CREATE TABLE orders (oid INTEGER, customer_id INTEGER)")
+        conn.execute("INSERT INTO orders VALUES (10,1),(11,2),(12,1),(13,3)")
+    finally:
+        conn.close()
+    out = tmp_path / "out"
+
+    r = render_automatic_eda_folder(str(db), out_dir=str(out))
+
+    assert r["status"] == "ok", r
+    assert r["n_tables"] == 2
+    assert r["db_path"] == str(db)
+    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
+
+
+def test_emit_flags_select_formats(tmp_path):
+    """emit_pdf/pptx/md controlan qué formatos se emiten."""
+    folder = tmp_path / "demo"
+    folder.mkdir()
+    _write_demo_folder(str(folder))
+    out = tmp_path / "out"
+
+    r = render_automatic_eda_folder(
+        str(folder), out_dir=str(out),
+        emit_pdf=True, emit_pptx=False, emit_md=False,
+    )
+    assert r["status"] == "ok", r
+    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
+    assert r["pptx_path"] is None
+    assert r["md_path"] is None
+
+
+def test_path_does_not_exist(tmp_path):
+    """Path inexistente -> status error, sin lanzar."""
+    r = render_automatic_eda_folder(str(tmp_path / "nope"))
+    assert r["status"] == "error"
+    assert "no existe" in r["error"].lower()
+
+
+def test_relaciones_chapter_has_real_figure_when_edges():
+    """Con edges, el capítulo de relaciones incluye un bloque Figure matplotlib
+    REAL (no solo el texto Mermaid): su make() devuelve una Figure."""
+    db_profile = {
+        "join_graph": {
+            "nodes": [
+                {"table": "orders", "out_degree": 1, "in_degree": 0, "role": "fact"},
+                {"table": "customers", "out_degree": 0, "in_degree": 1, "role": "dim"},
+            ],
+            "edges": [{"from_table": "orders", "from_col": "customer_id",
+                       "to_table": "customers", "to_col": "id",
+                       "cardinality": "N:1"}],
+            "mermaid": "graph LR orders --> customers",
+            "hubs": ["orders"],
+        },
+        "fk_candidates": [{"from_table": "orders", "from_col": "customer_id",
+                           "to_table": "customers", "to_col": "id",
+                           "inclusion": 1.0, "cardinality": "N:1"}],
+    }
+    ch = _relaciones_chapter(db_profile)
+    figs = [b for b in ch["blocks"] if b.get("kind") == "figure"]
+    assert len(figs) == 1, ch["blocks"]
+    # El make() perezoso produce una matplotlib Figure real.
+    import matplotlib
+    matplotlib.use("Agg")
+    fig = figs[0]["make"]()
+    from matplotlib.figure import Figure
+    assert isinstance(fig, Figure)
+    assert fig.get_axes(), "la Figure del join graph debe tener al menos un eje"
+
+
+def test_relaciones_chapter_no_figure_when_no_edges():
+    """Sin edges, no se añade bloque Figure (capítulo dice 'sin FK')."""
+    db_profile = {"join_graph": {"nodes": [], "edges": [], "mermaid": "",
+                                 "hubs": []}, "fk_candidates": []}
+    ch = _relaciones_chapter(db_profile)
+    assert not [b for b in ch["blocks"] if b.get("kind") == "figure"]
Author	SHA1	Message	Date
egutierrez	9886e2905d	feat(eda): rasterizar join graph a Figure matplotlib real en el capitulo de relaciones draw_join_graph_figure (datascience, grupo eda): dibuja el join graph de la base como una matplotlib Figure real (networkx spring_layout seed=42, nodos = tablas, hubs destacados, flechas dirigidas con etiqueta from_col->to_col + cardinalidad). Nunca lanza: devuelve una Figure de error si algo falla; entrada vacia -> Figure 'Sin relaciones FK detectadas'. render_automatic_eda_folder ahora inserta esa Figure (bloque Figure lazy via make) en el capitulo de relaciones cuando hay edges, ademas del texto Mermaid (util para el MD/LLM). Antes solo se volcaba el texto del grafo; ahora el PDF/PPTX muestran el diagrama dibujado. Tests nuevos: la Figure real se construye con edges y se omite sin edges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 20:57:52 +02:00
egutierrez	6a1520f458	feat(eda): EDA de carpeta/base multi-tabla -> AutomaticEDA por capitulos (PDF+PPTX+MD) Pipeline render_automatic_eda_folder: apunta el AutomaticEDA a una CARPETA de archivos tabulares (CSV/Parquet/JSON) o a una DuckDB existente y emite el informe de la BASE por capitulos en PDF (A5 movil) + PPTX (16:9) + Markdown. Documento-base con portada-base, resumen de todas las tablas y relaciones inter-tabla (FK candidatas por containment + diagrama Mermaid del join graph). Flag per_table_eda anexa el mini-EDA de cada tabla. Aditivo: render_automatic_eda (tabla unica) intacto. Funcion nueva load_folder_to_duckdb (infra, grupo eda+duckdb): carga una carpeta a una DuckDB (temp si no se da path), CREATE TABLE por archivo con read_csv_auto/ read_parquet/read_json_auto. dict-no-throw. Compone profile_database + los 3 renderers del motor AutomaticEDA + build_document (per-tabla), sin reimplementar su logica. Tests: golden 3 CSV relacionados (FK orders.customer_id->customers.id detectada) + edges (carpeta vacia, 1 tabla, DuckDB existente, path inexistente). fn index sin error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 20:34:10 +02:00
egutierrez	a1e2e3567c	merge: 4c cat_distr una hoja por columna (PDF+PPTX 1:1) + sin descripcion entropia redundante + page_break motor (verificado met)	2026-06-30 19:53:57 +02:00
egutierrez	833597c831	fix(eda): cat_distr PPTX — columnas de alta cardinalidad caben en UN slide con su gráfico La verificación adversarial detectó que, en PPTX (slide 16:9, corto), las columnas categóricas de ALTA cardinalidad NO id-like (Ticket, Cabin) ocupaban 3 slides cada una con el donut SEPARADO de su tabla: el top-k de 8 filas largas no cabía junto al donut y el keep-together partía la columna. (El PDF, en A5, ya estaba 1:1 correcto.) Arreglo SOLO en render_pptx_impl.py: - `_fit_group_blocks` (nuevo): para un Group con figura + DataTable que no cabe en el slide, reserva un alto mínimo para el donut (`_GROUP_MIN_FIG_H`) y recorta las filas de la DataTable a lo que queda, de modo que el gráfico se queda en el MISMO slide, junto a su tabla. No-op cuando ya cabe o no hay par figura+tabla (p.ej. columnas id-like, que ya omiten la top-k). - `_trim_data_table_to_budget` (nuevo): devuelve una COPIA de la DataTable con las filas que caben (al menos una) + nota honesta "top N de M categorías mostradas (recortado para caber en el slide; el PDF muestra más)". NUNCA muta el bloque original, que es compartido con el renderer PDF (el PDF sigue mostrando la tabla completa en A5). - `_place_group`: aplica `_fit_group_blocks` antes de `_shrink_group_figures`. Refuerzo de cat_distr_test.py: - `test_golden_pptx_una_slide_por_columna_con_su_grafico`: perfil con una columna categórica de alta cardinalidad no-id-like (40 valores largos sobre 5000 filas, 0.8% distinto) que reproduce el caso Ticket/Cabin. Asierta que CADA columna categórica aparece en EXACTAMENTE UN slide del capítulo y que ese mismo slide lleva su tabla (Cardinalidad/distintos) Y su donut (caption + shape Picture) — el gráfico nunca se separa de su tabla. Sustituye al laxo `n_slides >= 2`. Verificado con titanic_train.csv (render_automatic_eda run_models=True): 5 columnas categóricas (Name, Sex, Ticket, Cabin, Embarked); PDF 6 páginas y PPTX 6 slides del capítulo (intro + 1 por columna), cada columna con su donut junto a su tabla en una sola página/slide. Ticket y Cabin pasaron de 3 slides a 1. Suite verde (122 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 19:45:09 +02:00
egutierrez	7158be8142	feat(eda): cat_distr una hoja por columna (gráfico incluido) + sin descripción redundante con glosario Cada columna categórica del capítulo CAT DISTR ocupa ahora su propia página (PDF) / slide (PPTX) con su gráfico junto a su tabla, y se elimina la explicación larga de la entropía que duplicaba el capítulo GLOSARIO. Cambios: - model.Group: nuevo campo aditivo `page_break_before` (default False). Cuando es True el renderer fuerza al grupo a empezar en página/slide nueva (salvo que la actual esté vacía). Comportamiento de todos los capítulos existentes intacto. Soportado también en el normalizador dict-defensivo `as_block`. - render_pdf_impl / render_pptx_impl `_place_group`: respetan `page_break_before`. - render_pdf_impl / render_pptx_impl `_measure_block`: medición fiel de KVTable y DataTable (replica `_place_*`: título-heading, wrap del valor/celdas por columna, nota). La estimación previa asumía una línea por fila e ignoraba el título, así que el keep-together infra-presupuestaba la figura y el gráfico se desbordaba a la página siguiente. Helpers `_measure_kv_table`/`_measure_data_table`. - render_pptx_impl `_shrink_group_figures`: umbrales más bajos (budget>0.6, per>0.35) para que en el slide corto 16:9 la figura se encoja y conviva con la tabla en lugar de partir la columna (misma filosofía keep-together del PDF). - cat_distr.py: - build envuelve cada columna en un `Group(page_break_before=idx>0)`: una columna por página/slide, con su tabla de cardinalidad, su top-k y su donut juntos. La primera comparte página con la intro para no dejar una casi vacía. - intro recortada: se elimina el párrafo que explicaba qué es la entropía (vive en el capítulo GLOSARIO, donde el término `[[term:entropia]]` enlaza); se conserva el término clicable y el total de filas de referencia. - `_cardinality_block`: métricas relacionadas agrupadas por fila (distintos·%· únicos; entropía bits·máx·norm; desbalance·longitud) sin perder ningún dato, para que tabla + gráfico quepan en el slide 16:9. - columnas id-like (≈100% distintas): se omite la top-k (sería una lista de valores únicos; la nota lo explica) y el donut ocupa ese hueco. - CHAPTER_VERSION 1.1.0 -> 1.2.0. Verificado con titanic (render_automatic_eda run_models=True): PDF 5 páginas y PPTX 5 slides del capítulo (intro + 1 por columna: Name, Sex, Ticket, Embarked), cada columna con su gráfico junto a su tabla, sin cortes. Suite verde (121 passed): pytest automatic_eda/ + render_automatic_eda_test.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 19:26:33 +02:00
egutierrez	9be84a48ea	merge: 4c quitar definiciones redundantes con glosario en calidad/correlacion/modelos/agregacion/relaciones (links intactos, verificado met)	2026-06-30 19:24:22 +02:00
egutierrez	4099d88eaf	merge: 4b salida markdown del AutomaticEDA (render_md, render_automatic_eda emite aeda_md_path, verificado met)	2026-06-30 18:59:33 +02:00
egutierrez	48de3ce3da	feat(eda): salida Markdown del AutomaticEDA para pegar a un LLM Añade un tercer formato de salida al AutomaticEDA, junto al PDF y el PPTX: un Markdown autocontenido del MISMO documento por capítulos (chapters_registry.build_document), optimizado para incorporar a un LLM (texto plano + tablas markdown reales, sin binarios incrustados). - render_md_impl.render_md(chapters, out_path, meta): serializa los bloques del modelo (Heading/Markdown/KVTable/DataTable/Figure/Image/Caption/Note/ Group/GlossaryEntry) a Markdown. Cabecera con metadatos + índice navegable con anclas GitHub; tablas volcadas enteras (el MD no pagina); marcadores de glosario eliminados conservando la negrita; glosario al final. - Figuras: un LLM no ve la imagen, así que se prioriza texto + datos. Se emite el caption y, cuando la figura tiene barras (histograma), se extrae la tabla de bins (Desde/Hasta/Frecuencia) de los artistas matplotlib. La banda ±1σ (axvspan) se descarta por ancho para que no aparezca como un falso bin. PNG opcional vía meta['embed_figures'] (off por defecto → sin binarios). - render_automatic_eda_markdown: función pública del registry (tag eda), espejo de render_automatic_eda_pdf/pptx, acepta lista de capítulos o un TableProfile (build_document). dict-no-throw. - render_automatic_eda (pipeline): emite también el .md (emit_md=True por defecto, clave de retorno aeda_md_path). Cambio aditivo: PDF/PPTX/manifest siguen saliendo igual. Tests: golden de todos los kinds + regresión del filtro de la banda ±1σ + edge documento vacío + profile path. Suite del paquete y del pipeline verde (122 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 18:52:08 +02:00