feat(eda): salida Markdown del AutomaticEDA para pegar a un LLM

Añade un tercer formato de salida al AutomaticEDA, junto al PDF y el PPTX: un Markdown autocontenido del MISMO documento por capítulos (chapters_registry.build_document), optimizado para incorporar a un LLM (texto plano + tablas markdown reales, sin binarios incrustados). - render_md_impl.render_md(chapters, out_path, meta): serializa los bloques del modelo (Heading/Markdown/KVTable/DataTable/Figure/Image/Caption/Note/ Group/GlossaryEntry) a Markdown. Cabecera con metadatos + índice navegable con anclas GitHub; tablas volcadas enteras (el MD no pagina); marcadores de glosario eliminados conservando la negrita; glosario al final. - Figuras: un LLM no ve la imagen, así que se prioriza texto + datos. Se emite el caption y, cuando la figura tiene barras (histograma), se extrae la tabla de bins (Desde/Hasta/Frecuencia) de los artistas matplotlib. La banda ±1σ (axvspan) se descarta por ancho para que no aparezca como un falso bin. PNG opcional vía meta['embed_figures'] (off por defecto → sin binarios). - render_automatic_eda_markdown: función pública del registry (tag eda), espejo de render_automatic_eda_pdf/pptx, acepta lista de capítulos o un TableProfile (build_document). dict-no-throw. - render_automatic_eda (pipeline): emite también el .md (emit_md=True por defecto, clave de retorno aeda_md_path). Cambio aditivo: PDF/PPTX/manifest siguen saliendo igual. Tests: golden de todos los kinds + regresión del filtro de la banda ±1σ + edge documento vacío + profile path. Suite del paquete y del pipeline verde (122 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 18:52:08 +02:00
12 changed files with 900 additions and 494 deletions
@@ -64,6 +64,7 @@ from .exploratory_caveats import exploratory_caveats
 from .render_eda_pdf import render_eda_pdf, render_eda_pdf_relational
 from .render_automatic_eda_pdf import render_automatic_eda_pdf
 from .render_automatic_eda_pptx import render_automatic_eda_pptx
+from .render_automatic_eda_markdown import render_automatic_eda_markdown
 from .detect_time_column import detect_time_column
 from .extract_timeseries_raw import extract_timeseries_raw
 from .build_eda_render_ctx import build_eda_render_ctx
@@ -82,6 +83,7 @@ __all__ = [
    "resample_timeseries",
    "render_automatic_eda_pdf",
    "render_automatic_eda_pptx",
+    "render_automatic_eda_markdown",
    "decode_qr_image",
    "adf_kpss_stationarity",
    "acf_pacf",
@@ -36,6 +36,7 @@ from .model import (  # noqa: F401
 from .chapters_registry import CHAPTER_ORDER, build_chapter, build_document  # noqa: F401
 from .render_pdf_impl import render_pdf  # noqa: F401
 from .render_pptx_impl import render_pptx  # noqa: F401
+from .render_md_impl import render_md  # noqa: F401

 __all__ = [
    "ENGINE_NAME",
@@ -60,4 +61,5 @@ __all__ = [
    "build_document",
    "render_pdf",
    "render_pptx",
+    "render_md",
 ]
@@ -1,25 +1,19 @@
 """Categorical distributions chapter (CAT DISTR).

-Third reference chapter for AutomaticEDA. Each categorical column gets **its own
-page (PDF) / slide (PPTX)**: every column is wrapped in a keep-together
-``model.Group`` with ``page_break_before=True`` (except the first, which may share
-the intro's page), so its chart sits next to its tables and no column is split.
+Third reference chapter for AutomaticEDA. For every categorical column it shows,
+fulfilling the user's request:

-A short intro names the clickable **[[term:entropia]]entropía[[/term]]** term —
-the full definition lives in the GLOSARIO chapter, so it is NOT repeated inline
-here (one click jumps to the glossary entry). The intro also carries the dataset
-row total used as a comparison baseline.
-
-Per column the Group contains, in order:
-
-1. A cardinality key/value table: distinct values, ``% distinct`` (distinct /
-   total rows), total dataset rows, singleton values (frequency 1), entropy with
-   its theoretical maximum and the normalized ratio, mode, imbalance and
-   string-length stats.
-2. A short note flagging problematic cardinality (id-like ≈100% distinct, or a
+1. A short opening explanation of **Shannon entropy** (what it measures, its 0
+   and log2(k) bounds, the normalized 0–1 version) and the dataset row total used
+   as a comparison baseline.
+2. Per column, a cardinality key/value table: distinct values, ``% distinct``
+   (distinct / total rows), total dataset rows, singleton values (frequency 1),
+   entropy with its theoretical maximum and the normalized ratio, mode, imbalance
+   and string-length stats.
+3. A short note flagging problematic cardinality (id-like ≈100% distinct, or a
   single dominating category).
-3. A ``top-k`` table (value / count / %).
-4. A **donut pie chart** of the most common categories (top-k + an "Otros"
+4. A ``top-k`` table (value / count / %).
+5. A **donut pie chart** of the most common categories (top-k + an "Otros"
   bucket), drawn lazily so the renderers scale it to fit entirely.

 Data comes from the ``eda`` group: each ``columns[i]['categorical']`` is the
@@ -39,7 +33,7 @@ import math

 from .. import model

-CHAPTER_VERSION = "1.2.0"
+CHAPTER_VERSION = "1.1.0"
 CHAPTER_ID = "cat_distr"
 CHAPTER_TITLE = "Distribuciones categóricas"

@@ -59,17 +53,11 @@ _TERM_ENTROPIA_DEF = (
 # Cap the number of categorical columns rendered to keep the document bounded;
 # the rest are summarized in a closing note (no silent truncation).
 MAX_COLS = 40
-# Rows shown in each top-k table and explicit slices in the pie. Kept moderate so
-# the whole column — cardinality table + top-k table + donut — fits on ONE
-# page/slide with the chart next to its tables; the table note still reports
-# "top N of M" so nothing is silently hidden. For id-like columns (≈100%
-# distinct) the top-k table is dropped entirely (it would be a list of unique
-# values — pure noise), which also frees the room the donut needs (see build).
-TOP_TABLE_ROWS = 8
+# Rows shown in each top-k table and explicit slices in the pie.
+TOP_TABLE_ROWS = 15
 PIE_TOP_K = 6
-# Truncate very long category labels in tables (the renderer also wraps). Kept
-# tight so a column with long id-like values (names, tickets) still fits its page.
-LABEL_MAX = 28
+# Truncate very long category labels in tables (the renderer also wraps).
+LABEL_MAX = 48


 def _fmt_int(value) -> str:
@@ -279,55 +267,45 @@ def _normalize_card(card: dict) -> dict:


 def _cardinality_block(card: dict):
-    """KVTable with the cardinality / entropy metrics for one column.
-
-    Related metrics are grouped onto a single row each (distinct/%/unique;
-    entropy bits/max/normalized; length min/mean/max) so the whole column —
-    table + chart — fits one page/slide without dropping any datum; the short
-    16:9 PPTX slide does not fit one metric per row plus a chart otherwise."""
+    """KVTable with the cardinality / entropy metrics for one column."""
    n_singletons = card.get("n_singletons")
    if n_singletons is not None and card.get("n_singletons_partial"):
-        singletons = f"≥{_fmt_int(n_singletons)}"
+        singletons = f"≥{_fmt_int(n_singletons)} (en top mostrado)"
    elif n_singletons is not None:
        singletons = _fmt_int(n_singletons)
    else:
        singletons = "—"

-    # Distinct count · % distinct · unique (frequency 1) on one row.
-    distinct_combo = (f"{_fmt_int(card.get('n_distinct'))} · "
-                      f"{_fmt_pct_value(card.get('pct_distinct'))} · "
-                      f"{singletons} únicos")
-
-    # Entropy bits · theoretical max · normalized 0–1 on one row.
-    entropy_combo = (f"{_fmt_num(card.get('entropy'))} bits · "
-                     f"máx {_fmt_num(card.get('entropy_max'))} · "
-                     f"norm {_fmt_num(card.get('entropy_norm'))}")
+    entropy_ref = _fmt_num(card.get("entropy"))
+    emax = card.get("entropy_max")
+    if emax is not None:
+        entropy_ref = f"{entropy_ref} (máx {_fmt_num(emax)})"

    mode = card.get("mode")
    mode_pct = card.get("mode_pct")
-    mode_str = "—" if mode is None else _truncate(mode, 32)
+    mode_str = "—" if mode is None else model._safe_str(mode)
    if mode is not None and mode_pct is not None:
        mode_str = f"{mode_str} ({_fmt_pct_value(mode_pct)})"

    rows = [
-        ("Distintos · % · únicos", distinct_combo),
+        ("Valores distintos", _fmt_int(card.get("n_distinct"))),
+        ("% distintos", _fmt_pct_value(card.get("pct_distinct"))),
        ("Total filas (dataset)", _fmt_int(card.get("n_rows"))),
-        ("Entropía (bits · máx · norm)", entropy_combo),
+        ("Valores únicos (frecuencia 1)", singletons),
+        ("Entropía (bits)", entropy_ref),
+        ("Entropía normalizada (0–1)", _fmt_num(card.get("entropy_norm"))),
        ("Moda", mode_str),
    ]
    imbalance = card.get("imbalance")
+    if imbalance is not None:
+        rows.append(("Desbalance", _fmt_num(imbalance)))
    lm = card.get("len_min")
    lmean = card.get("len_mean")
    lmax = card.get("len_max")
-    # Imbalance and string length (both secondary) share one closing row.
-    extras = []
-    if imbalance is not None:
-        extras.append(f"desbalance {_fmt_num(imbalance)}")
    if any(v is not None for v in (lm, lmean, lmax)):
-        extras.append(
-            f"long. {_fmt_num(lm)}/{_fmt_num(lmean)}/{_fmt_num(lmax)}")
-    if extras:
-        rows.append(("Desbalance · longitud", " · ".join(extras)))
+        rows.append((
+            "Longitud (mín/media/máx)",
+            f"{_fmt_num(lm)} / {_fmt_num(lmean)} / {_fmt_num(lmax)}"))
    return model.KVTable(rows=rows, title="Cardinalidad")


@@ -337,8 +315,7 @@ def _flag_note(card: dict):
        return model.Note(
            "Casi todos los valores son distintos (≈100% distintos): la columna "
            "se comporta como un identificador y aporta poco para agrupar o "
-            "comparar categorías. No se lista el top de categorías (serían "
-            "valores casi todos únicos).")
+            "comparar categorías.")
    if card.get("dominated"):
        mp = card.get("mode_pct")
        mp_str = _fmt_pct_value(mp) if mp is not None else "muy alta"
@@ -358,7 +335,7 @@ def _topk_table(cat: dict):
        if not isinstance(t, dict):
            continue
        rows.append([
-            _truncate(t.get("value")),
+            model._safe_str(t.get("value")),
            _fmt_int(t.get("count")),
            _pct_from_maybe_fraction(t.get("pct")),
        ])
@@ -376,16 +353,20 @@ def _topk_table(cat: dict):
 def _intro_blocks(n_rows, mark_term: bool = False):
    total = _fmt_int(n_rows)
    # Mark the first appearance of the term as a clickable glossary jump when the
-    # term was registered (mark_term). The full definition of entropy lives in the
-    # GLOSARIO chapter, so the intro only names the clickable term here instead of
-    # repeating the long explanation (avoids the redundancy with the glossary).
-    entropia = ("[[term:entropia]]entropía[[/term]]" if mark_term
-                else "entropía")
+    # term was registered (mark_term). The visible text is identical either way.
+    entropia = ("[[term:entropia]]**entropía de Shannon**[[/term]]" if mark_term
+                else "**entropía de Shannon**")
    text = (
-        f"Cada columna categórica ocupa su propia página: sus métricas de "
-        f"cardinalidad —incluida la {entropia}—, una nota que señala cardinalidad "
-        "problemática, la tabla de las categorías más frecuentes y un gráfico de "
-        "tarta (donut) de las más comunes, todo junto."
+        f"La {entropia} mide cómo de repartidos están los valores de "
+        "una columna categórica, en bits. Vale 0 cuando una sola categoría "
+        "concentra todas las filas (máxima previsibilidad) y alcanza su máximo, "
+        "log2(k) para k categorías distintas, cuando todas aparecen por igual "
+        "(máxima diversidad). La **entropía normalizada** (entropía dividida por "
+        "su máximo) la lleva al rango 0–1 para comparar columnas con distinto "
+        "número de categorías. Para cada columna se muestran los valores "
+        "distintos, el porcentaje que representan sobre el total de filas, los "
+        "valores únicos (que aparecen una sola vez), la tabla de las categorías "
+        "más frecuentes y un gráfico de tarta (donut) de las más comunes."
    )
    if n_rows is not None:
        text += f" El dataset tiene {total} filas en total como referencia."
@@ -417,37 +398,24 @@ def build_cat_distr(profile: dict, ctx: dict):
    blocks = list(_intro_blocks(n_rows, mark_term=mark_term))

    rendered = cat_cols[:MAX_COLS]
-    for idx, col in enumerate(rendered):
+    for col in rendered:
        name = col.get("name") or "(columna)"
        cat = col.get("categorical") or {}
        card = _normalize_card(_cardinality(cat, n_rows))

-        # One Group per categorical column: heading + cardinality table + flag
-        # note + top-k table + donut figure are kept together and the renderer
-        # starts each on a fresh page/slide (page_break_before) so every column
-        # gets its own page with its chart next to its tables. The first column
-        # may share the intro's page (no forced break) to avoid a near-empty page.
-        col_blocks = [
-            model.Heading(text=str(name), level=2),
-            _cardinality_block(card),
-        ]
+        blocks.append(model.Heading(text=str(name), level=2))
+        blocks.append(_cardinality_block(card))
        note = _flag_note(card)
        if note is not None:
-            col_blocks.append(note)
-        # For id-like columns (≈100% distinct) the top-k is a list of unique
-        # values — pure noise; skip it (the flag note already explains why) and
-        # let the donut take that room so the whole column fits one page/slide.
-        if not card.get("id_like"):
-            topk = _topk_table(cat)
-            if topk is not None:
-                col_blocks.append(topk)
-        col_blocks.append(model.Figure(
+            blocks.append(note)
+        topk = _topk_table(cat)
+        if topk is not None:
+            blocks.append(topk)
+        blocks.append(model.Figure(
            make=_pie_make(cat.get("top") or [], card.get("n_distinct"),
                           str(name), n_rows),
            caption=(f"Categorías más comunes de «{_truncate(name, 32)}» "
                     "(donut: top-k + «Otros»)")))
-        blocks.append(model.Group(blocks=col_blocks,
-                                  page_break_before=(idx > 0)))

    if len(cat_cols) > len(rendered):
        omitted = len(cat_cols) - len(rendered)
@@ -2,14 +2,11 @@

 Self-contained: builds synthetic TableProfiles (no DuckDB) so the suite is fast
 and deterministic. Verifies that ``build_cat_distr`` emits the blocks the user
-asked for (distinct/total/%-distinct/unique metrics, top-k table and a donut
-figure), that EACH categorical column is wrapped in its own keep-together
-``Group`` that starts on a fresh page/slide (one column per page, chart next to
-its tables), that the long entropy explanation is NOT repeated inline (it lives
-in the glossary — only the clickable term is kept), that the chapter renders
-inside the full document to both PDF and PPTX showing that content, that a
-profile with no categorical columns yields ``None`` without raising, and that
-long labels / many columns are never cut in either output.
+asked for (entropy intro, distinct/total/%-distinct/unique metrics, top-k table
+and a donut figure), that the chapter renders inside the full document to both
+PDF and PPTX showing that content, that a profile with no categorical columns
+yields ``None`` without raising, and that long labels / many columns are never
+cut in either output.
 """

 import os
@@ -20,8 +17,7 @@ from pypdf import PdfReader
 from pptx import Presentation

 from datascience.automatic_eda.model import (
-    DataTable, Figure, GlossaryCollector, Group, Heading, KVTable, Markdown,
-    Note,
+    DataTable, Figure, Heading, KVTable, Note,
 )
 from datascience.automatic_eda.chapters.cat_distr import (
    CHAPTER_ID, CHAPTER_VERSION, build_cat_distr,
@@ -85,20 +81,8 @@ def _pptx_text(path: str) -> str:
    return re.sub(r"\s+", " ", " ".join(parts))


-def _flatten(blocks):
-    """Expand keep-together Groups so the per-column heading/table/figure are
-    inspectable as a flat block list (the chapter wraps each column in a Group)."""
-    out = []
-    for b in blocks:
-        if getattr(b, "kind", "") == "group":
-            out.extend(_flatten(getattr(b, "blocks", []) or []))
-        else:
-            out.append(b)
-    return out
-
-
-def _column_groups(chapter):
-    return [b for b in chapter.blocks if isinstance(b, Group)]
+def _kinds(chapter):
+    return [b.kind for b in chapter.blocks]


 def test_golden_build_cat_distr_emite_bloques_pedidos():
@@ -106,101 +90,36 @@ def test_golden_build_cat_distr_emite_bloques_pedidos():
    assert ch is not None
    assert ch.id == CHAPTER_ID
    assert ch.version == CHAPTER_VERSION
-
-    # Entropy intro present, but the long explanation is gone (it lives in the
-    # glossary now): only the term is named, no log2/normalizada walkthrough.
+    kinds = _kinds(ch)
+    # Entropy intro present.
    headings = [b.text for b in ch.blocks if isinstance(b, Heading)]
    assert any("Entrop" in h for h in headings)
-    md = next(b for b in ch.blocks if isinstance(b, Markdown))
-    assert "entropía" in md.text.lower()
-    assert "log2" not in md.text          # redundant explanation removed.
-    assert "máxima diversidad" not in md.text
-
-    # Per-column blocks are wrapped in keep-together Groups: flatten to inspect.
-    flat = _flatten(ch.blocks)
-    kv = next(b for b in flat if isinstance(b, KVTable))
+    md = next(b for b in ch.blocks if b.kind == "markdown")
+    assert "entropía" in md.text.lower() and "log2" in md.text
+    # Cardinality metrics: distinct, total rows, %-distinct, unique values.
+    kv = next(b for b in ch.blocks if isinstance(b, KVTable))
    labels = [r[0] for r in kv.rows]
-    values = " ".join(str(r[1]) for r in kv.rows)
-    # Cardinality metrics: distinct count, %-distinct, unique values and total
-    # rows are present (grouped onto compact rows so the chart fits the page).
-    assert "Distintos · % · únicos" in labels
+    assert "Valores distintos" in labels
+    assert "% distintos" in labels
    assert "Total filas (dataset)" in labels
+    assert "Valores únicos (frecuencia 1)" in labels
    assert any("Entropía" in lbl for lbl in labels)
-    assert "únicos" in values and "%" in values
-    assert "bits" in values and "norm" in values   # entropy + max + normalized.
    # Top-k table + pie figure.
-    dt = next(b for b in flat if isinstance(b, DataTable))
+    dt = next(b for b in ch.blocks if isinstance(b, DataTable))
    assert dt.header == ["Valor", "Conteo", "%"]
    assert any("neumaticos" in str(cell) for row in dt.rows for cell in row)
-    assert any(isinstance(b, Figure) for b in flat)
-    # id-like column flagged with a Note that also explains the top-k is dropped.
-    idnote = next((b for b in flat
-                   if isinstance(b, Note) and "identificador" in b.text), None)
-    assert idnote is not None
-    assert "No se lista el top" in idnote.text
+    assert any(isinstance(b, Figure) for b in ch.blocks)
+    # id-like column flagged with a Note.
+    assert any(isinstance(b, Note) and "identificador" in b.text
+               for b in ch.blocks)


-def test_golden_idlike_omite_topk_y_conserva_donut():
-    # The id-like column (uuid, 100% distinct) must NOT carry a top-k DataTable
-    # (it would be a list of unique values), but must still keep its donut Figure
-    # and its cardinality table so it stays a full per-column page.
-    ch = build_cat_distr(_profile(), {})
-    groups = _column_groups(ch)
-    uuid_group = next(g for g in groups
-                      if any(getattr(b, "text", "") == "uuid" for b in g.blocks))
-    kinds = [b.kind for b in uuid_group.blocks]
-    assert "data_table" not in kinds      # top-k of unique values dropped.
-    assert "kv_table" in kinds            # cardinality kept.
-    assert "figure" in kinds              # donut kept (chart per column).
-    # A non-id-like column keeps its top-k table.
-    cat_group = next(g for g in groups
-                     if any(getattr(b, "text", "") == "categoria"
-                            for b in g.blocks))
-    assert "data_table" in [b.kind for b in cat_group.blocks]
-
-
-def test_golden_una_pagina_por_columna_groups():
-    ch = build_cat_distr(_profile(), {})
-    groups = _column_groups(ch)
-    # Two categorical columns -> two column Groups (numeric column excluded).
-    assert len(groups) == 2
-    # Each Group carries one column: a heading + its cardinality table + figure.
-    for g in groups:
-        kinds = [b.kind for b in g.blocks]
-        assert kinds[0] == "heading"
-        assert "kv_table" in kinds
-        assert "figure" in kinds
-    # The first column may share the intro page (no forced break); every later
-    # column starts on a fresh page/slide so each column gets its own page.
-    assert groups[0].page_break_before is False
-    assert all(g.page_break_before is True for g in groups[1:])
-
-
-def test_golden_entropia_clicable_y_definicion_en_glosario():
-    # With a glossary collector the intro marks the clickable term and the FULL
-    # definition (the long explanation removed from the intro) lands in the
-    # glossary, not inline — no data lost, just relocated.
-    gc = GlossaryCollector()
-    ch = build_cat_distr(_profile(), {"glossary": gc})
-    md = next(b for b in ch.blocks if isinstance(b, Markdown))
-    assert "[[term:entropia]]entropía[[/term]]" in md.text
-    assert gc.has("entropia")
-    entry = gc.get("entropia")
-    assert entry is not None
-    # The definition kept in the glossary still carries the detail removed inline.
-    assert "log2" in entry["definition"]
-    assert "normalizada" in entry["definition"].lower()
-
-
-def test_golden_render_pdf_una_pagina_por_columna():
+def test_golden_render_pdf_muestra_categoricas():
    with tempfile.TemporaryDirectory() as d:
        out = os.path.join(d, "eda.pdf")
        res = render_automatic_eda_pdf(_profile(), out, {"title": "EDA"})
        assert res["path"] == out and os.path.exists(out)
-        cat_meta = next(c for c in res["chapters"] if c["id"] == CHAPTER_ID)
-        # Two categorical columns, each on its own page -> >= 2 pages for the
-        # chapter (intro shares the first column's page).
-        assert cat_meta["n_pages"] >= 2
+        assert CHAPTER_ID in [c["id"] for c in res["chapters"]]
        txt = _pdf_text(out)
        assert "Entrop" in txt
        assert "distintos" in txt
@@ -214,91 +133,13 @@ def test_golden_render_pptx_muestra_categoricas():
        out = os.path.join(d, "eda.pptx")
        res = render_automatic_eda_pptx(_profile(), out, {"title": "EDA"})
        assert res["path"] == out and os.path.exists(out)
-        cat_meta = next(c for c in res["chapters"] if c["id"] == CHAPTER_ID)
-        assert cat_meta["n_slides"] >= 2  # one slide per categorical column.
+        assert CHAPTER_ID in [c["id"] for c in res["chapters"]]
        txt = _pptx_text(out)
        assert "Entrop" in txt
        assert "categoria" in txt and "neumaticos" in txt
        assert "distintos" in txt


-def _profile_high_card() -> dict:
-    """Profile with a high-cardinality NON-id-like categorical column whose top-k
-    of long values would split from its donut on a short 16:9 slide unless the
-    renderer trims the table — the exact case the adversarial check flagged
-    (Ticket / Cabin)."""
-    long_vals = [f"Valor largo de categoria numero {i:02d} con texto extra"
-                 for i in range(40)]
-    top = [{"value": v, "count": 60 - i, "pct": (60 - i) / 5000.0}
-           for i, v in enumerate(long_vals)]
-    return {
-        "table": "t", "source": "t.csv", "n_rows": 5000, "n_cols": 3,
-        "quality_score": 80.0,
-        "columns": [
-            {"name": "precio", "inferred_type": "numeric", "null_pct": 0.0,
-             "numeric": {"mean": 1.0, "median": 1.0, "min": 0.0, "max": 2.0,
-                         "std": 0.5}},
-            # 40 distinct over 5000 rows = 0.8% distinct -> NOT id-like, keeps
-            # its (long) top-k table; the tall table must not push the donut off.
-            {"name": "alta_card_col", "inferred_type": "categorical",
-             "null_pct": 0.0, "distinct_count": 40,
-             "categorical": {"top": top, "mode": long_vals[0], "n_distinct": 40,
-                             "entropy": 5.2, "imbalance": 1.2, "len_min": 40,
-                             "len_mean": 45, "len_max": 50}},
-            {"name": "baja_card_col", "inferred_type": "categorical",
-             "null_pct": 0.0, "distinct_count": 4,
-             "categorical": {
-                 "top": [{"value": "norte", "count": 2000, "pct": 0.4},
-                         {"value": "sur", "count": 1500, "pct": 0.3},
-                         {"value": "este", "count": 1000, "pct": 0.2},
-                         {"value": "oeste", "count": 500, "pct": 0.1}],
-                 "mode": "norte", "n_distinct": 4, "entropy": 1.8}},
-        ],
-    }
-
-
-def test_golden_pptx_una_slide_por_columna_con_su_grafico():
-    """Each categorical column occupies EXACTLY ONE cat_distr slide that carries
-    BOTH its cardinality table and its donut figure (picture) — i.e. the chart is
-    never separated from its table, even for a high-cardinality column."""
-    from pptx.enum.shapes import MSO_SHAPE_TYPE
-
-    prof = _profile_high_card()
-    cat_names = ["alta_card_col", "baja_card_col"]
-    with tempfile.TemporaryDirectory() as d:
-        out = os.path.join(d, "eda.pptx")
-        res = render_automatic_eda_pptx(prof, out, {"title": "EDA"})
-        assert res["path"] == out and os.path.exists(out)
-        prs = Presentation(out)
-
-        # Per column: the cat_distr slides whose text mentions it, and whether the
-        # owning slide also has the donut caption + an actual picture shape.
-        slides_with_col = {n: [] for n in cat_names}
-        owner_has_chart = {n: False for n in cat_names}
-        for i, sl in enumerate(prs.slides):
-            texts, has_pic = [], False
-            for sh in sl.shapes:
-                if sh.has_text_frame:
-                    texts.append(sh.text_frame.text)
-                if sh.shape_type == MSO_SHAPE_TYPE.PICTURE:
-                    has_pic = True
-            txt = re.sub(r"\s+", " ", " ".join(texts))
-            if "Distribuciones categ" not in txt:   # footer stamp of the chapter.
-                continue
-            for n in cat_names:
-                if n in txt:
-                    slides_with_col[n].append(i)
-                    has_table = "Cardinalidad" in txt or "distintos" in txt
-                    if has_pic and "donut" in txt and has_table:
-                        owner_has_chart[n] = True
-
-        for n in cat_names:
-            # Exactly one slide carries the column (not split across slides).
-            assert len(slides_with_col[n]) == 1, (n, slides_with_col[n])
-            # That single slide also holds its table AND its donut picture.
-            assert owner_has_chart[n], (n, "tabla y donut no están en el mismo slide")
-
-
 def test_edge_sin_categoricas_devuelve_none():
    only_numeric = {
        "n_rows": 10, "columns": [
@@ -329,15 +170,11 @@ def test_anti_corte_label_largo_y_muchas_columnas():

    ch = build_cat_distr(profile, {})
    assert ch is not None
-    # One Group per column, each forcing its own page (except the first).
-    groups = _column_groups(ch)
-    assert len(groups) == 30
-    assert sum(1 for g in groups if g.page_break_before) == 29
    with tempfile.TemporaryDirectory() as d:
        pdf = os.path.join(d, "anti.pdf")
        res = render_automatic_eda_pdf(profile, pdf, {"write_manifest": False})
        assert res["path"] == pdf
-        assert res["n_pages"] > 1       # one page per column, OK.
+        assert res["n_pages"] > 1       # many columns spilled across pages, OK.
        txt = _pdf_text(pdf)
        # Long label wrapped (not truncated): every word survives.
        for word in ("Lorem", "incididunt", "reprehenderit", "voluptate"):
@@ -139,17 +139,10 @@ class Group:
    it starts on a fresh page and flows (honest degradation, never cut). Use it to
    bind ``Heading`` + ``Markdown`` + ``Figure`` of one idea together (see the
    DISTR NUM / AGREGACION chapters).
-
-    When ``page_break_before`` is True the renderer additionally forces the group
-    to *start* on a fresh page/slide (unless the current one is already empty), so
-    a chapter can give each unit its own page — e.g. one categorical column per
-    page (see CAT DISTR). It is purely additive: the default False keeps the plain
-    keep-together behaviour for every existing chapter.
    """

    blocks: list = field(default_factory=list)
    title: Optional[str] = None
-    page_break_before: bool = False
    kind: str = field(default="group", init=False)


@@ -235,9 +228,7 @@ def as_block(obj: Any):
                return Note(text=_safe_str(obj.get("text")))
            if cls is Group:
                return Group(blocks=as_blocks(obj.get("blocks")),
-                             title=obj.get("title"),
-                             page_break_before=bool(
-                                 obj.get("page_break_before", False)))
+                             title=obj.get("title"))
            if cls is GlossaryEntry:
                return GlossaryEntry(key=_safe_str(obj.get("key")),
                                     label=_safe_str(obj.get("label")),
@@ -0,0 +1,458 @@
+"""AutomaticEDA Markdown serializer — one self-contained file to paste to an LLM.
+
+Same document model as the PDF/PPTX renderers (an ordered list of
+:class:`Chapter`, each a list of format-independent blocks) but emitted as plain
+**Markdown** instead of a binary. The goal is different from the other two
+renderers: a Markdown EDA is meant to be *pasted into an LLM*, so it prioritises
+TEXT and DATA over visuals. Tables become Markdown tables (every row dumped, no
+pagination — nothing is cut because there are no pages); a ``Figure`` becomes its
+caption plus, when possible, the underlying bar/histogram data as a Markdown
+table (an LLM cannot see the image); glossary term markers are stripped while
+``**bold**`` is kept (it is valid Markdown).
+
+dict-no-throw (the ``eda`` group style): :func:`render_md` never raises. On a
+fatal error it returns ``{path: None, ...}`` with a ``note`` explaining why; a
+malformed block degrades to a readable note rather than crashing the document.
+"""
+
+from __future__ import annotations
+
+import os
+import re
+
+from . import model
+
+# Glossary span markers (kept text, dropped markers). We intentionally do NOT use
+# ``text_layout.strip_inline_md`` for Markdown blocks because that also removes
+# ``**bold**`` — valid Markdown we want to preserve when pasting to an LLM.
+_TERM_OPEN_RE = re.compile(r"\[\[term:[A-Za-z0-9_]+\]\]")
+_MAX_BAR_ROWS = 100
+
+
+# --------------------------------------------------------------------------- #
+# Small helpers.
+# --------------------------------------------------------------------------- #
+def _clean_terms(s) -> str:
+    """Drop glossary term markers, keeping the visible text (and any **bold**)."""
+    s = model._safe_str(s)
+    s = _TERM_OPEN_RE.sub("", s)
+    return s.replace("[[/term]]", "")
+
+
+def _cell(v) -> str:
+    """Render a value as a safe Markdown table cell.
+
+    Escapes pipes (``|`` -> ``\\|``) so they do not break the column layout and
+    folds newlines to ``<br>`` so a multi-line value stays inside one cell. None
+    becomes an empty string.
+    """
+    s = model._safe_str(v)
+    s = s.replace("|", "\\|")
+    s = s.replace("\r\n", "\n").replace("\r", "\n").replace("\n", "<br>")
+    return s
+
+
+def _slug(text: str) -> str:
+    """GitHub-style heading anchor: lowercase, spaces->'-', drop other symbols."""
+    s = model._safe_str(text).strip().lower()
+    out = []
+    for ch in s:
+        if ch.isalnum():
+            out.append(ch)
+        elif ch in " -":
+            out.append("-")
+        # any other symbol is dropped.
+    slug = "".join(out)
+    while "--" in slug:
+        slug = slug.replace("--", "-")
+    return slug.strip("-")
+
+
+def _fmt_num(v) -> str:
+    """Compact number for the figure data tables (ints as ints, else 4 sig figs)."""
+    try:
+        f = float(v)
+    except Exception:  # noqa: BLE001
+        return model._safe_str(v)
+    if f != f:  # NaN
+        return "NaN"
+    if f == int(f) and abs(f) < 1e15:
+        return str(int(f))
+    return f"{f:.4g}"
+
+
+def _fmt_int(v) -> str:
+    try:
+        return str(int(v))
+    except Exception:  # noqa: BLE001
+        return model._safe_str(v)
+
+
+def _now_iso() -> str:
+    from datetime import datetime, timezone
+    return datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
+
+
+# --------------------------------------------------------------------------- #
+# Document header (title + metadata blockquote + numbered index).
+# --------------------------------------------------------------------------- #
+def _meta_block(meta: dict) -> list:
+    """Build the metadata lines for the header blockquote (omitting absentees)."""
+    ctx = meta.get("ctx") if isinstance(meta.get("ctx"), dict) else {}
+    lines: list = []
+
+    def add(label, value) -> None:
+        if value is None:
+            return
+        s = model._safe_str(value).strip()
+        if s and s.lower() != "none":
+            lines.append(f"**{label}:** {s}")
+
+    add("Dataset", ctx.get("dataset_name") or meta.get("dataset_name"))
+    add("Fuente", ctx.get("source_origin") or meta.get("source_origin"))
+    add("Almacenamiento", ctx.get("storage") or meta.get("storage"))
+    n_rows = ctx.get("n_rows", meta.get("n_rows"))
+    n_cols = ctx.get("n_cols", meta.get("n_cols"))
+    if n_rows is not None and n_cols is not None:
+        lines.append(
+            f"**Dimensiones:** {_fmt_int(n_rows)} filas × {_fmt_int(n_cols)} columnas")
+    add("Generado", meta.get("generated_at") or _now_iso())
+    lines.append(f"**Motor:** {model.ENGINE_NAME} v{model.ENGINE_VERSION}")
+    return lines
+
+
+# --------------------------------------------------------------------------- #
+# Per-block serializers. Each returns a Markdown string (no surrounding blanks;
+# the caller separates blocks with a blank line).
+# --------------------------------------------------------------------------- #
+def _md_heading(block) -> str:
+    level = int(getattr(block, "level", 1) or 1)
+    hashes = "#" * min(level + 2, 6)  # level1 -> ###; '#'/'##' reserved for doc/chapter.
+    text = _clean_terms(getattr(block, "text", "")).strip()
+    return f"{hashes} {text}"
+
+
+def _md_markdown(block) -> str:
+    # Keep the text verbatim, dropping only glossary markers (keep **bold**).
+    return _clean_terms(getattr(block, "text", "")).rstrip("\n")
+
+
+def _md_kv_table(block) -> str:
+    lines: list = []
+    title = getattr(block, "title", None)
+    if title:
+        lines.append(f"**{_clean_terms(title).strip()}**")
+        lines.append("")
+    lines.append("| Campo | Valor |")
+    lines.append("| --- | --- |")
+    for row in (getattr(block, "rows", []) or []):
+        try:
+            label, value = row[0], row[1]
+        except Exception:  # noqa: BLE001
+            label, value = row, ""
+        lines.append(f"| {_cell(label)} | {_cell(value)} |")
+    return "\n".join(lines)
+
+
+def _md_data_table(block) -> str:
+    lines: list = []
+    title = getattr(block, "title", None)
+    if title:
+        lines.append(f"**{_clean_terms(title).strip()}**")
+        lines.append("")
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    if not header:
+        ncol = max((len(r) for r in rows), default=1)
+        header = [f"col{i + 1}" for i in range(ncol)]
+    ncol = len(header)
+    lines.append("| " + " | ".join(_cell(h) for h in header) + " |")
+    lines.append("| " + " | ".join(["---"] * ncol) + " |")
+    for r in rows:  # dump every row — no pagination, nothing cut.
+        cells = [_cell(r[c]) if c < len(r) else "" for c in range(ncol)]
+        lines.append("| " + " | ".join(cells) + " |")
+    note = getattr(block, "note", None)
+    if note:
+        lines.append("")
+        lines.append(f"*{_clean_terms(note).strip()}*")
+    return "\n".join(lines)
+
+
+def _bars_table(bars: list) -> str:
+    """Render extracted bar/histogram data as a Markdown table (Desde/Hasta/Frec)."""
+    lines = ["| Desde | Hasta | Frecuencia |", "| --- | --- | --- |"]
+    shown = bars[:_MAX_BAR_ROWS]
+    for x0, x1, h in shown:
+        lines.append(f"| {_fmt_num(x0)} | {_fmt_num(x1)} | {_fmt_num(h)} |")
+    out = "\n".join(lines)
+    extra = len(bars) - len(shown)
+    if extra > 0:
+        out += f"\n\n*… ({extra} filas más)*"
+    return out
+
+
+def _extract_bars(fig) -> list:
+    """Collect (x_from, x_to, height) of the rectangular bars of a matplotlib fig.
+
+    Histogram / bar-chart bars are ``matplotlib.patches.Rectangle`` with positive
+    width and height; spines, legends and zero-area artists are skipped. Never
+    raises — returns ``[]`` on any problem.
+    """
+    bars: list = []
+    try:
+        for ax in fig.get_axes():
+            # Collect this axes' positive-area rectangles, then keep only the ones
+            # that look like actual histogram/bar bins. Reference shapes that
+            # matplotlib also stores in ``ax.patches`` — most notably the ``±1σ``
+            # band drawn by ``axvspan`` (a single rectangle far wider than a bin)
+            # and a lone Tukey boxplot box — would otherwise show up as fake
+            # "bins". A histogram axes has several near-equal-width bars, so we
+            # drop any rectangle whose width is more than twice the median width
+            # of that axes' rectangles (the σ-band spans many bins; uniform bins
+            # all sit at the median width and stay).
+            ax_bars: list = []
+            for patch in list(getattr(ax, "patches", []) or []):
+                try:
+                    w = patch.get_width()
+                    h = patch.get_height()
+                    x = patch.get_x()
+                except Exception:  # noqa: BLE001 — not a Rectangle-like patch.
+                    continue
+                if w and w > 0 and h and h > 0:
+                    ax_bars.append((x, x + w, h))
+            if len(ax_bars) >= 3:
+                widths = sorted(b[1] - b[0] for b in ax_bars)
+                median_w = widths[len(widths) // 2]
+                if median_w > 0:
+                    ax_bars = [b for b in ax_bars
+                               if (b[1] - b[0]) <= 2.0 * median_w]
+            bars.extend(ax_bars)
+    except Exception:  # noqa: BLE001
+        return []
+    return bars
+
+
+def _md_figure(block, meta: dict, out_path: str, counter: list) -> str:
+    """Serialize a Figure prioritising TEXT + DATA (an LLM cannot see the image).
+
+    Emits the caption, then — if the matplotlib figure has bars — a Markdown table
+    of the underlying (Desde, Hasta, Frecuencia) values. Optionally (when
+    ``meta['embed_figures']`` is True) also exports a PNG beside the .md and adds
+    an image link; off by default so the Markdown stays self-contained.
+    """
+    caption = model._safe_str(getattr(block, "caption", "")).strip()
+    parts = [f"*Figura: {caption}*" if caption else "*Figura*"]
+    fig = None
+    try:
+        import matplotlib
+        matplotlib.use("Agg")  # defensive: headless rasterization backend.
+        fig = getattr(block, "fig", None)
+        make = getattr(block, "make", None)
+        if fig is None and callable(make):
+            fig = make()
+        if fig is not None:
+            bars = _extract_bars(fig)
+            if bars:
+                parts.append(_bars_table(bars))
+            if meta.get("embed_figures"):
+                png = _embed_png(fig, out_path, counter)
+                if png:
+                    parts.append(f"![{caption}]({png})")
+    except Exception:  # noqa: BLE001 — a bad figure degrades to just its caption.
+        pass
+    finally:
+        if fig is not None:
+            try:
+                import matplotlib.pyplot as plt
+                plt.close(fig)
+            except Exception:  # noqa: BLE001
+                pass
+    return "\n\n".join(parts)
+
+
+def _embed_png(fig, out_path: str, counter: list) -> str:
+    """Export the figure to ``<basename>_figN.png`` beside the .md; return its name."""
+    try:
+        counter[0] += 1
+        base = os.path.splitext(os.path.basename(out_path))[0] or "figura"
+        name = f"{base}_fig{counter[0]}.png"
+        path = os.path.join(os.path.dirname(os.path.abspath(out_path)), name)
+        fig.savefig(path, format="png", dpi=120, bbox_inches="tight")
+        return name
+    except Exception:  # noqa: BLE001
+        return ""
+
+
+def _md_image(block) -> str:
+    path = model._safe_str(getattr(block, "path", ""))
+    caption = model._safe_str(getattr(block, "caption", "")).strip()
+    out = f"![{caption}]({path})"
+    if caption:
+        out += f"\n\n*{caption}*"
+    return out
+
+
+def _md_caption(block) -> str:
+    return f"*{_clean_terms(getattr(block, 'text', '')).strip()}*"
+
+
+def _md_note(block) -> str:
+    text = _clean_terms(getattr(block, "text", "")).strip()
+    lines = text.split("\n")
+    return "\n".join((f"> {ln}" if ln.strip() else ">") for ln in lines)
+
+
+def _md_group(block, meta: dict, out_path: str, counter: list) -> str:
+    parts: list = []
+    title = getattr(block, "title", None)
+    if title:
+        parts.append(f"### {_clean_terms(title).strip()}")
+    for b in (getattr(block, "blocks", []) or []):
+        try:
+            seg = _serialize_block(b, meta, out_path, counter)
+        except Exception:  # noqa: BLE001
+            seg = ""
+        if seg:
+            parts.append(seg)
+    return "\n\n".join(parts)
+
+
+def _md_glossary_entry(block) -> str:
+    label = (model._safe_str(getattr(block, "label", "")).strip()
+             or model._safe_str(getattr(block, "key", "")).strip())
+    definition = _clean_terms(getattr(block, "definition", "")).strip()
+    out = f"### {label}"
+    if definition:
+        out += f"\n\n{definition}"
+    return out
+
+
+def _serialize_block(block, meta: dict, out_path: str, counter: list) -> str:
+    """Dispatch a single block to its Markdown serializer. Unknown -> note."""
+    kind = getattr(block, "kind", "")
+    if kind == "heading":
+        return _md_heading(block)
+    if kind == "markdown":
+        return _md_markdown(block)
+    if kind == "kv_table":
+        return _md_kv_table(block)
+    if kind == "data_table":
+        return _md_data_table(block)
+    if kind == "figure":
+        return _md_figure(block, meta, out_path, counter)
+    if kind == "image":
+        return _md_image(block)
+    if kind == "caption":
+        return _md_caption(block)
+    if kind == "note":
+        return _md_note(block)
+    if kind == "group":
+        return _md_group(block, meta, out_path, counter)
+    if kind == "glossary_entry":
+        return _md_glossary_entry(block)
+    # Unknown content -> readable note (mirrors the model's defensive coercion).
+    return _md_note(model.Note(text=model._safe_str(block)))
+
+
+# --------------------------------------------------------------------------- #
+# Entry point.
+# --------------------------------------------------------------------------- #
+def render_md(chapters: list, out_path: str, meta: dict = None) -> dict:
+    """Serialize a list of Chapters into a single self-contained Markdown file.
+
+    The output leads with ``# <title>``, a metadata blockquote and a numbered
+    ``## Índice`` linking each chapter, then one ``## N. <title>`` section per
+    chapter with its blocks. Tables become Markdown tables (every row dumped),
+    figures become caption + underlying data table, glossary markers are stripped
+    while ``**bold**`` is kept. Designed to be pasted into an LLM.
+
+    Args:
+        chapters: a list of ``Chapter`` (dataclasses or dicts); normalized
+            defensively with ``model.as_chapters``.
+        out_path: filesystem path for the ``.md`` (parent dirs are created).
+        meta: optional dict. Recognised keys: ``title``, ``ctx`` (dict with
+            ``dataset_name``/``source_origin``/``storage``/``n_rows``/``n_cols``),
+            ``generated_at``, ``embed_figures`` (export PNGs beside the .md,
+            default False).
+
+    Returns:
+        dict (never raises): ``{path: str|None, n_chars: int,
+        chapters: list[{id, version}], note: str}``. On a fatal error ``path`` is
+        None and ``note`` explains why.
+    """
+    meta = meta or {}
+    chapters = model.as_chapters(chapters)
+    title = model._safe_str(meta.get("title")) or model.ENGINE_NAME
+
+    # Edge: nothing to render -> a minimal but valid Markdown document.
+    if not chapters:
+        content = (f"# {title}\n\n"
+                   "*(documento vacío — sin capítulos aplicables)*\n")
+        return _write(out_path, content, [], "documento vacío")
+
+    counter = [0]  # document-wide figure counter for unique PNG names.
+    notes: list = []
+    segments: list = [f"# {title}"]
+
+    meta_lines = _meta_block(meta)
+    if meta_lines:
+        segments.append("\n".join(f"> {ln}" for ln in meta_lines))
+
+    # Numbered index. The anchor matches the chapter heading emitted below
+    # (``## N. <title>``) in GitHub slug style.
+    chap_heads = []
+    idx_lines = ["## Índice"]
+    for i, ch in enumerate(chapters, 1):
+        head_text = f"{i}. {model._safe_str(ch.title)}"
+        anchor = _slug(head_text)
+        chap_heads.append((head_text, anchor))
+        idx_lines.append(f"{i}. [{model._safe_str(ch.title)}](#{anchor})")
+    segments.append("\n".join(idx_lines))
+
+    chapters_meta = []
+    for i, ch in enumerate(chapters, 1):
+        segments.append("---")
+        head_text, _anchor = chap_heads[i - 1]
+        segments.append(f"## {head_text}")
+
+        blocks = list(ch.blocks or [])
+        # Omit a leading level-1 Heading that just repeats the chapter title.
+        if blocks:
+            b0 = blocks[0]
+            if (getattr(b0, "kind", "") == "heading"
+                    and int(getattr(b0, "level", 1) or 1) == 1
+                    and _clean_terms(getattr(b0, "text", "")).strip()
+                    == model._safe_str(ch.title).strip()):
+                blocks = blocks[1:]
+
+        for block in blocks:
+            try:
+                seg = _serialize_block(block, meta, out_path, counter)
+            except Exception as e:  # noqa: BLE001
+                seg = _md_note(model.Note(text=model._safe_str(block)))
+                notes.append(
+                    f"bloque '{getattr(block, 'kind', '?')}' del capítulo "
+                    f"'{ch.id}' degradado: {e}")
+            if seg:
+                segments.append(seg)
+        chapters_meta.append({"id": ch.id, "version": ch.version})
+
+    content = "\n\n".join(segments) + "\n"
+    note = f"{len(content)} caracteres"
+    if notes:
+        note += " · " + "; ".join(notes)
+    return _write(out_path, content, chapters_meta, note)
+
+
+def _write(out_path: str, content: str, chapters_meta: list, note: str) -> dict:
+    """Write the Markdown to disk (creating parents). dict-no-throw."""
+    try:
+        parent = os.path.dirname(os.path.abspath(out_path))
+        os.makedirs(parent, exist_ok=True)
+        with open(out_path, "w", encoding="utf-8") as fh:
+            fh.write(content)
+    except Exception as e:  # noqa: BLE001 — never raise from the writer.
+        return {"path": None, "n_chars": 0, "chapters": [],
+                "note": f"no se pudo escribir el Markdown: {e}"}
+    return {"path": out_path, "n_chars": len(content),
+            "chapters": chapters_meta, "note": note}
@@ -675,61 +675,6 @@ def _measure_figure_like(block) -> float:
    return target_h + 0.04 + cap_h + _GAP


-def _measure_kv_table(block) -> float:
-    """Faithful height of a KVTable — matches ``_place_kv_table``.
-
-    Counts the optional title heading and, per row, the wrapped VALUE column
-    (the label column never wraps in the placer). The previous estimate assumed
-    one line per row and ignored the title, so a column's keep-together Group
-    under-budgeted the figure and the chart spilled to the next page. Keep this in
-    sync with ``_place_kv_table``."""
-    h = 0.0
-    title = getattr(block, "title", None)
-    if title:
-        h += _measure_heading_text(title, 2)
-    rows = getattr(block, "rows", []) or []
-    key_w = 1.9
-    val_chars = tl.chars_per_line(_USABLE_W - key_w - 0.1, _FS_BODY)
-    lh = tl.line_height_in(_FS_BODY)
-    for row in rows:
-        try:
-            value = row[1]
-        except Exception:  # noqa: BLE001
-            value = ""
-        v_lines = tl.wrap(model._safe_str(value), val_chars)
-        h += lh * len(v_lines) + _ROW_VPAD
-    return h + _GAP
-
-
-def _measure_data_table(block) -> float:
-    """Faithful height of a DataTable — matches ``_place_data_table``.
-
-    Counts the optional title heading, the wrapped header row, every wrapped data
-    row (per-column wrap via the same ``_col_widths``/``_wrap_row`` the placer
-    uses) and the optional note. Keep this in sync with ``_place_data_table``."""
-    h = 0.0
-    title = getattr(block, "title", None)
-    if title:
-        h += _measure_heading_text(title, 2)
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
-    fs = _FS_CELL
-    widths = _col_widths(header, rows, fs)
-    lh = tl.line_height_in(fs)
-    if header:
-        header_lines = _wrap_row(header, widths, fs)
-        h += lh * max((len(c) for c in header_lines), default=1) + _ROW_VPAD * 2
-    for r in rows:
-        cells_lines = _wrap_row(r, widths, fs)
-        h += lh * max((len(c) for c in cells_lines), default=1) + _ROW_VPAD * 2
-    note = getattr(block, "note", None)
-    if note:
-        nlines = tl.wrap(model._safe_str(note),
-                         tl.chars_per_line(_USABLE_W, _FS_NOTE))
-        h += tl.line_height_in(_FS_NOTE) * len(nlines)
-    return h + _GAP
-
-
 def _measure_block(st: _PdfState, block) -> float:
    kind = getattr(block, "kind", "")
    try:
@@ -745,9 +690,13 @@ def _measure_block(st: _PdfState, block) -> float:
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + _GAP
        if kind == "kv_table":
-            return _measure_kv_table(block)
+            rows = getattr(block, "rows", []) or []
+            return (tl.line_height_in(_FS_BODY) + _ROW_VPAD) * (len(rows) + 1) \
+                + _GAP
        if kind == "data_table":
-            return _measure_data_table(block)
+            rows = getattr(block, "rows", []) or []
+            return (tl.line_height_in(_FS_CELL) + _ROW_VPAD * 2) \
+                * (len(rows) + 1) + _GAP
        if kind == "group":
            return sum(_measure_block(st, b)
                       for b in (getattr(block, "blocks", []) or []))
@@ -786,10 +735,6 @@ def _place_group(st: _PdfState, block) -> None:
    blocks = getattr(block, "blocks", []) or []
    if not blocks:
        return
-    # Opt-in page break: start this group on a fresh page unless the current one
-    # is still empty (so a chapter can give each unit its own page).
-    if getattr(block, "page_break_before", False) and st.y > _CONTENT_TOP + 1e-6:
-        _new_page(st)
    avail_full = _CONTENT_BOTTOM - _CONTENT_TOP
    _shrink_group_figures(st, blocks, avail_full)
    total = sum(_measure_block(st, b) for b in blocks)
@@ -625,55 +625,6 @@ def _measure_figure_like(block) -> float:
    return target_h + 0.05 + cap_h + _GAP


-def _measure_kv_table(block) -> float:
-    """Faithful KVTable height — matches ``_place_kv_table`` (rendered as a
-    Campo/Valor data table with wrapped cells). The previous estimate assumed one
-    line per row and ignored the title, so a keep-together Group under-budgeted
-    the figure and the chart spilled to the next slide. Keep in sync."""
-    h = 0.0
-    title = getattr(block, "title", None)
-    if title:
-        h += _measure_heading_text(title, 2)
-    rows = getattr(block, "rows", []) or []
-    data_rows = []
-    for row in rows:
-        try:
-            label, value = row[0], row[1]
-        except Exception:  # noqa: BLE001
-            label, value = str(row), ""
-        data_rows.append([model._safe_str(label), model._safe_str(value)])
-    header = ["Campo", "Valor"]
-    widths = _col_widths(header, data_rows)
-    fs = _FS_CELL
-    h += _row_height_in(header, widths, fs)
-    for r in data_rows:
-        h += _row_height_in(r, widths, fs)
-    return h + _GAP
-
-
-def _measure_data_table(block) -> float:
-    """Faithful DataTable height — matches ``_place_data_table`` (title heading +
-    wrapped header + every wrapped row + optional note). Keep in sync."""
-    h = 0.0
-    title = getattr(block, "title", None)
-    if title:
-        h += _measure_heading_text(title, 2)
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
-    fs = _FS_CELL
-    widths = _col_widths(header, rows)
-    if header:
-        h += _row_height_in(header, widths, fs)
-    for r in rows:
-        h += _row_height_in(r, widths, fs)
-    note = getattr(block, "note", None)
-    if note:
-        nlines = tl.wrap(model._safe_str(note),
-                         tl.chars_per_line(_USABLE_W, _FS_NOTE))
-        h += tl.line_height_in(_FS_NOTE) * len(nlines) + 0.05
-    return h + _GAP
-
-
 def _measure_block(st: _PptxState, block) -> float:
    kind = getattr(block, "kind", "")
    try:
@@ -688,10 +639,9 @@ def _measure_block(st: _PptxState, block) -> float:
            lines = tl.wrap(getattr(block, "text", ""),
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + 0.05 + _GAP
-        if kind == "kv_table":
-            return _measure_kv_table(block)
-        if kind == "data_table":
-            return _measure_data_table(block)
+        if kind in ("kv_table", "data_table"):
+            rows = getattr(block, "rows", []) or []
+            return (tl.line_height_in(_FS_CELL) + 0.10) * (len(rows) + 1) + _GAP
        if kind == "group":
            return sum(_measure_block(st, b)
                       for b in (getattr(block, "blocks", []) or []))
@@ -714,14 +664,10 @@ def _shrink_group_figures(st: _PptxState, blocks: list, avail_full: float) -> No
                   if getattr(b, "kind", "") not in ("figure", "image"))
    fig_overhead = tl.line_height_in(_FS_NOTE) + 0.05 + 0.05 + _GAP
    budget = avail_full - nonfig_h - 0.10 * len(fig_blocks)
-    # Low thresholds: a 16:9 slide is short, so a content-heavy column (cardinality
-    # table + top-k + chart) only fits if the chart is allowed to shrink small.
-    # Prefer a small-but-present chart on the SAME slide over splitting the column
-    # across slides (matches the PDF renderer's keep-together philosophy).
-    if budget <= 0.6:
+    if budget <= 1.0:
        return  # not enough room to keep together; let it flow (degrade).
    per = budget / len(fig_blocks) - fig_overhead
-    if per <= 0.35:
+    if per <= 0.8:
        return
    for fb in fig_blocks:
        cur = getattr(fb, "height_in", None)
@@ -729,90 +675,12 @@ def _shrink_group_figures(st: _PptxState, blocks: list, avail_full: float) -> No
                        if isinstance(cur, (int, float)) and cur > 0 else per)


-# Minimum height (inches) reserved for a figure inside a keep-together group on
-# the short 16:9 slide. When a high-cardinality column's table(s) would otherwise
-# leave no room, the data table is trimmed (with an honest note) so the chart
-# stays on the SAME slide next to its table instead of spilling to the next one.
-_GROUP_MIN_FIG_H = 1.3
-
-
-def _trim_data_table_to_budget(block, budget: float):
-    """Return a copy of a DataTable whose rows fit within ``budget`` inches.
-
-    Keeps the title, header, as many leading rows as fit (at least one) and an
-    honest note reporting how many of the original rows are shown. NEVER mutates
-    the original block — the same Chapter blocks are rendered by the PDF renderer,
-    which keeps the full table (an A5 page fits it)."""
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
-    title = getattr(block, "title", None)
-    fs = _FS_CELL
-    widths = _col_widths(header, rows)
-    fixed = 0.0
-    if title:
-        fixed += _measure_heading_text(title, 2)
-    if header:
-        fixed += _row_height_in(header, widths, fs)
-    note_h = tl.line_height_in(_FS_NOTE) + 0.05
-    avail_rows = budget - fixed - note_h - _GAP
-    kept = []
-    used = 0.0
-    for r in rows:
-        rh = _row_height_in(r, widths, fs)
-        if used + rh > avail_rows and kept:
-            break
-        kept.append(r)
-        used += rh
-    if len(kept) >= len(rows):
-        return block  # already fits; keep the original (with its own note).
-    note = (f"top {len(kept)} de {len(rows)} categorías mostradas "
-            "(recortado para caber en el slide; el PDF muestra más)")
-    return model.DataTable(header=header, rows=kept, title=title, note=note)
-
-
-def _fit_group_blocks(st: _PptxState, blocks: list, avail_full: float) -> list:
-    """Return a slide-fitting copy of a keep-together group's blocks.
-
-    On the short 16:9 slide a high-cardinality column's top-k table plus its
-    chart can overflow. Reserve ``_GROUP_MIN_FIG_H`` for the (later shrunk) figure
-    and trim the data table(s) to what is left, so every column keeps its chart
-    next to its table on ONE slide. No-op when the group has no figure+table pair
-    (e.g. id-like columns already drop the top-k upstream, or it already fits)."""
-    has_fig = any(getattr(b, "kind", "") in ("figure", "image") for b in blocks)
-    tbls = [b for b in blocks if getattr(b, "kind", "") == "data_table"]
-    if not (has_fig and tbls):
-        return blocks
-    fixed_h = sum(_measure_block(st, b) for b in blocks
-                  if getattr(b, "kind", "") not in ("figure", "image",
-                                                    "data_table"))
-    tables_h = sum(_measure_block(st, b) for b in tbls)
-    budget_tables = avail_full - fixed_h - _GROUP_MIN_FIG_H
-    if tables_h <= budget_tables:
-        return blocks  # already fits next to a min-height figure; leave intact.
-    out = []
-    for b in blocks:
-        if getattr(b, "kind", "") != "data_table":
-            out.append(b)
-            continue
-        trimmed = _trim_data_table_to_budget(b, max(budget_tables, 0.8))
-        out.append(trimmed)
-        budget_tables -= _measure_data_table(trimmed)
-    return out
-
-
 def _place_group(st: _PptxState, block) -> None:
    """Render a keep-together Group: move it whole to the next slide if needed."""
    blocks = getattr(block, "blocks", []) or []
    if not blocks:
        return
-    # Opt-in slide break: start this group on a fresh slide unless the current one
-    # is still empty (so a chapter can give each unit its own slide).
-    if getattr(block, "page_break_before", False) and st.y > _CONTENT_TOP + 1e-6:
-        _new_slide(st, cont=True)
    avail_full = _CONTENT_BOTTOM - _CONTENT_TOP
-    # Trim oversized tables first (keeps the chart on the same slide), then shrink
-    # the figure to share the remaining room.
-    blocks = _fit_group_blocks(st, blocks, avail_full)
    _shrink_group_figures(st, blocks, avail_full)
    total = sum(_measure_block(st, b) for b in blocks)
    if total <= avail_full:
@@ -0,0 +1,89 @@
+---
+name: render_automatic_eda_markdown
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def render_automatic_eda_markdown(chapters_or_profile, out_path: str, meta: dict = None) -> dict"
+description: "Renderiza un documento AutomaticEDA por CAPÍTULOS (modelo de bloques independiente del formato) en un único MARKDOWN autocontenido pensado para PEGAR A UN LLM. Acepta una lista de capítulos del modelo o directamente un TableProfile del grupo eda (construye los capítulos canónicos con build_document). Prioriza TEXTO + DATOS sobre lo visual: las tablas se vuelcan como tablas markdown con TODAS las filas (sin paginar — no hay páginas que cortar), una figura matplotlib se reduce a su caption más la tabla de datos subyacente (Desde/Hasta/Frecuencia de las barras del histograma) porque un LLM no ve la imagen, y los marcadores de glosario se eliminan conservando el **negrita**. Lleva cabecera (# título), bloque de metadatos en blockquote e índice numerado con anclas GitHub. Espejo de render_automatic_eda_pdf/render_automatic_eda_pptx pero SIN manifest (KISS, el markdown es un único artefacto de texto). dict-no-throw: nunca lanza, devuelve {path, n_chars, chapters, note}; en error fatal path es None y note explica la causa. Flag opcional meta['embed_figures'] exporta PNGs junto al .md (off por defecto)."
+tags: [eda, markdown, render, report, llm, automatic-eda, chapters, versioned, no-cut, text, datascience, python]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [os, re, matplotlib, "datascience.automatic_eda"]
+params:
+  - name: chapters_or_profile
+    desc: "una lista de capítulos del modelo AutomaticEDA (dataclasses Chapter o dicts {id,title,version,blocks}) O un TableProfile dict del grupo eda. Si es un TableProfile, los capítulos canónicos se construyen con build_document(profile, meta['ctx']). Bloques soportados: heading, markdown, kv_table, data_table, figure, image, caption, note, group, glossary_entry. Lectura defensiva: lo no reconocido se degrada a Note, nunca lanza."
+  - name: out_path
+    desc: "ruta del archivo .md de salida. Los directorios padre se crean si faltan. Directorio no escribible → {path:None, note:<causa>} sin lanzar."
+  - name: meta
+    desc: "dict opcional. Claves: title (título del documento), ctx (dict con dataset_name→Dataset, source_origin→Fuente, storage→Almacenamiento, n_rows/n_cols→Dimensiones; también lo consumen los builders de capítulo cuando se da un profile), generated_at (timestamp; si falta se genera ISO UTC), embed_figures (True para exportar PNGs <basename>_figN.png junto al .md; por defecto False y el markdown queda autocontenido)."
+output: "dict (nunca lanza): {path: str|None, n_chars: int, chapters: list[{id,version}], note: str}. En error fatal (p.ej. directorio no escribible) path es None y note explica la causa. Un documento sin capítulos aplicables produce un markdown mínimo válido con 'documento vacío' y chapters=[]."
+tested: true
+tests: ["test_golden_bloques_sinteticos_serializa_todo_a_markdown", "test_edge_documento_vacio_no_revienta", "test_profile_path_construye_capitulos_y_escribe"]
+test_file_path: "python/functions/datascience/render_automatic_eda_markdown_test.py"
+file_path: "python/functions/datascience/render_automatic_eda_markdown.py"
+---
+
+## Ejemplo
+
+```python
+from datascience import render_automatic_eda_markdown
+
+# Desde un TableProfile del grupo eda (mismo modelo que los renderers PDF/PPTX).
+profile = {
+    "table": "ventas", "source": "/data/ventas.csv",
+    "n_rows": 1000, "n_cols": 2, "quality_score": 92.5,
+    "columns": [
+        {"name": "precio", "inferred_type": "numeric", "null_pct": 0.01,
+         "numeric": {"mean": 42.5, "median": 40.0, "min": 1.0, "max": 100.0,
+                     "std": 12.3}},
+        {"name": "categoria", "inferred_type": "categorical", "null_pct": 0.0,
+         "categorical": {"top": [{"value": "neumaticos", "count": 500}]}},
+    ],
+}
+res = render_automatic_eda_markdown(
+    profile, "reports/ventas_aeda.md",
+    {"title": "EDA — ventas",
+     "ctx": {"dataset_name": "Ventas", "source_origin": "ERP export",
+             "n_rows": 1000, "n_cols": 2}})
+print(res["path"], res["n_chars"], res["chapters"])
+# -> reports/ventas_aeda.md 4123 [{'id':'portada','version':'1.0.0'}, ...]
+```
+
+## Cuando usarla
+
+Cuando quieras **pegar el EDA a un LLM** (ChatGPT, Claude, ...) o tenerlo en texto
+plano versionable: mismo documento por capítulos que el PDF/PPTX, pero serializado a
+Markdown sin binarios. Úsala como tercera salida junto a `render_automatic_eda_pdf`
+(móvil) y `render_automatic_eda_pptx` (compartir) desde el MISMO modelo de capítulos.
+A diferencia de esas dos, no hay páginas ni slides: todas las filas de cada tabla se
+vuelcan (nada se corta) y cada figura se reduce a su caption + la tabla de datos
+subyacente, que es lo que un LLM puede leer. Para añadir capítulos al documento, ver
+`docs/capabilities/automatic_eda.md`.
+
+## Gotchas
+
+- **Impura**: escribe el `.md` en `out_path` (crea los directorios padre). Con
+  `meta['embed_figures']=True` además exporta un PNG `<basename>_figN.png` por figura
+  junto al `.md`; por defecto NO exporta nada y el markdown queda autocontenido.
+- **Nunca lanza** (dict-no-throw): un bloque que falle se degrada a una nota y se anota
+  en `note`; el documento se escribe igual. Un profile/lista vacíos producen un markdown
+  mínimo válido con `*(documento vacío …)*` y `chapters=[]`.
+- **Figuras = datos, no imagen**: un bloque `figure` se serializa como `*Figura: caption*`
+  más, si la figura matplotlib trae barras (histograma / barras), una tabla
+  `| Desde | Hasta | Frecuencia |` extraída de los `Rectangle` patches (máx 100 filas;
+  el resto se trunca con `*… (N filas más)*`). Si no hay barras o algo falla, solo sale
+  el caption. La figura se cierra (`plt.close`) tras leerla.
+- **Glosario vs negrita**: se eliminan SOLO los marcadores de glosario
+  `[[term:key]]visible[[/term]]` (queda `visible`); el `**negrita**` markdown SE
+  CONSERVA (es válido). No se usa `strip_inline_md` aquí porque ese también quita el bold.
+- **Anclas del índice**: el `## Índice` enlaza cada capítulo con un ancla estilo GitHub
+  del encabezado `## N. Título` (minúsculas, espacios→`-`, sin signos). Si dos capítulos
+  comparten título exacto sus anclas colisionan (caso raro; los capítulos canónicos tienen
+  títulos únicos).
+- **Tablas**: las celdas escapan `|` (→ `\|`) y pliegan saltos de línea a `<br>` para no
+  romper la columna. No hay reparto por ancho — un LLM no lo necesita.
@@ -0,0 +1,55 @@
+"""render_automatic_eda_markdown — chapter-based EDA report as one Markdown file.
+
+Public ``eda``-group entry point that serializes an AutomaticEDA document (a list
+of chapters, or an ``eda`` TableProfile from which the canonical chapters are
+built) into a single self-contained Markdown file optimised to be **pasted into
+an LLM**: plain text, Markdown tables (every row dumped — there are no pages to
+cut), figures reduced to caption + underlying data, no binaries. It mirrors
+``render_automatic_eda_pdf`` / ``render_automatic_eda_pptx`` but for text output;
+unlike those it writes no manifest (KISS — Markdown is a single text artefact).
+
+dict-no-throw: never raises. Returns ``{path, n_chars, chapters, note}``; on a
+fatal error ``path`` is None and ``note`` explains why.
+"""
+
+from __future__ import annotations
+
+from datascience.automatic_eda import build_document, render_md
+from datascience.automatic_eda.model import as_chapter, as_chapters
+
+
+def _coerce_chapters(chapters_or_profile, meta: dict) -> list:
+    """Accept chapters OR an eda profile and return a list of Chapter."""
+    arg = chapters_or_profile
+    if isinstance(arg, (list, tuple)):
+        return as_chapters(list(arg))
+    if isinstance(arg, dict):
+        if "blocks" in arg and "columns" not in arg:
+            ch = as_chapter(arg)
+            return [ch] if ch is not None else []
+        return build_document(arg, (meta or {}).get("ctx"))
+    return []
+
+
+def render_automatic_eda_markdown(chapters_or_profile, out_path: str,
+                                  meta: dict = None) -> dict:
+    """Render an AutomaticEDA document into a single self-contained Markdown file.
+
+    Args:
+        chapters_or_profile: a list of chapters (``Chapter`` dataclasses or
+            dicts) or an ``eda`` TableProfile dict (chapters built via
+            ``build_document(profile, meta['ctx'])``).
+        out_path: filesystem path for the ``.md`` (parent dirs are created).
+        meta: optional dict. Recognised keys: ``title``, ``ctx`` (dict with
+            ``dataset_name``/``source_origin``/``storage``/``n_rows``/``n_cols``),
+            ``generated_at``, ``embed_figures`` (export PNGs beside the .md,
+            default False — off keeps the Markdown self-contained).
+
+    Returns:
+        dict (never raises): ``{path: str|None, n_chars: int,
+        chapters: list[{id, version}], note: str}``. On a fatal error ``path`` is
+        None and ``note`` explains the cause.
+    """
+    meta = dict(meta or {})
+    chapters = _coerce_chapters(chapters_or_profile, meta)
+    return render_md(chapters, out_path, meta)
@@ -0,0 +1,168 @@
+"""Tests for render_automatic_eda_markdown — DoD: golden + edge + profile path.
+
+Self-contained synthetic blocks (no DuckDB). Verifies every block kind serializes
+to Markdown (heading, markdown with glossary+bold, kv/data tables, a figure whose
+histogram bars become a data table, caption, note, group, glossary entry), that a
+leading level-1 heading equal to the chapter title is omitted, that an empty
+document degrades to a valid minimal Markdown without raising, and that passing a
+minimal TableProfile builds chapters and writes the file.
+"""
+
+import os
+import tempfile
+
+from datascience.render_automatic_eda_markdown import render_automatic_eda_markdown
+from datascience.automatic_eda.model import (
+    Caption, Chapter, DataTable, Figure, GlossaryEntry, Group, Heading, KVTable,
+    Markdown, Note,
+)
+
+
+def _hist_fig():
+    import matplotlib
+    matplotlib.use("Agg")
+    import matplotlib.pyplot as plt
+    fig, ax = plt.subplots()
+    ax.hist([1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5, 5], bins=5)
+    return fig
+
+
+def _chapters() -> list:
+    blocks = [
+        Heading("Demo", 1),                       # == chapter title -> omitted.
+        Heading("Seccion dos", 2),                # -> ####
+        Markdown("Texto con [[term:ent]]entropia[[/term]] y **bold** aqui."),
+        KVTable(rows=[("Filas", 1000), ("Columnas", 5)], title="Resumen"),
+        DataTable(header=["col", "valor"],
+                  rows=[["alpha", "111"], ["beta", "222"], ["gamma", "333"]],
+                  title="Datos", note="nota inferior"),
+        Figure(make=_hist_fig, caption="Histograma demo"),
+        Caption("pie de figura"),
+        Note("una nota aparte"),
+        Group(title="Grupo X", blocks=[Markdown("dentro del grupo")]),
+        GlossaryEntry(key="ent", label="Entropia",
+                      definition="Medida de incertidumbre."),
+    ]
+    return [Chapter(id="demo", title="Demo", version="1.0.0", blocks=blocks)]
+
+
+def _read(path: str) -> str:
+    with open(path, "r", encoding="utf-8") as fh:
+        return fh.read()
+
+
+def test_golden_bloques_sinteticos_serializa_todo_a_markdown():
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "demo.md")
+        res = render_automatic_eda_markdown(
+            _chapters(), out,
+            {"title": "EDA Demo",
+             "ctx": {"dataset_name": "Demo", "n_rows": 12, "n_cols": 2}})
+        assert res["path"] == out
+        assert os.path.exists(out)
+        assert res["n_chars"] > 0
+        assert res["chapters"] == [{"id": "demo", "version": "1.0.0"}]
+
+        content = _read(out)
+        # Document structure.
+        assert content.startswith("# ")
+        assert "## Índice" in content
+        # A Markdown table is present (header + separator row).
+        assert "| " in content and "| --- " in content
+        # DataTable values are all dumped.
+        for v in ("alpha", "111", "beta", "222", "gamma", "333"):
+            assert v in content
+        # Glossary markers stripped, bold kept.
+        assert "[[term" not in content
+        assert "[[/term]]" not in content
+        assert "**bold**" in content
+        assert "entropia" in content  # visible glossary text preserved.
+        # Figure histogram bars became a data table.
+        assert "| Desde | Hasta | Frecuencia |" in content
+        # Glossary entry rendered as a level-3 heading.
+        assert "### Entropia" in content
+        # Level-2 heading -> ####.
+        assert "#### Seccion dos" in content
+        # Leading level-1 heading equal to the title was omitted.
+        assert "### Demo" not in content
+        # Group title rendered.
+        assert "### Grupo X" in content
+
+
+def _hist_fig_with_span():
+    """Histogram with a wide ``axvspan`` (±1σ band) over it.
+
+    Reproduces the num_distr figure shape: matplotlib keeps the span as a lone
+    Rectangle in ``ax.patches`` alongside the bin bars; it must NOT leak into the
+    extracted bins table as a fake bin (it is ~5x wider than a bin)."""
+    import matplotlib
+    matplotlib.use("Agg")
+    import matplotlib.pyplot as plt
+    fig, ax = plt.subplots()
+    data = [1, 1, 2, 2, 2, 3, 4, 4, 5, 5, 5, 5]
+    ax.hist(data, bins=5)
+    ax.axvspan(2.0, 4.0, alpha=0.2)   # mean±σ band — a wide stray rectangle.
+    return fig
+
+
+def test_figura_descarta_axvspan_de_la_tabla_de_bins():
+    """The ±1σ band rectangle must not appear as a row in the bins table."""
+    blocks = [Figure(make=_hist_fig_with_span, caption="Hist con banda")]
+    chapters = [Chapter(id="f", title="Fig", version="1.0.0", blocks=blocks)]
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "fig.md")
+        render_automatic_eda_markdown(chapters, out, {"title": "T"})
+        content = _read(out)
+        assert "| Desde | Hasta | Frecuencia |" in content
+        # Extract the rows of the bins table: lines between the header/separator
+        # and the next blank line.
+        lines = content.splitlines()
+        hi = next(i for i, ln in enumerate(lines)
+                  if ln.startswith("| Desde | Hasta | Frecuencia |"))
+        rows = []
+        for ln in lines[hi + 2:]:           # skip header + separator
+            if not ln.startswith("|"):
+                break
+            rows.append(ln)
+        # 5 histogram bins, no extra wide span row.
+        assert len(rows) == 5, rows
+        # No row spans a width of ~2.0 (the axvspan from x=2 to x=4).
+        for ln in rows:
+            cells = [c.strip() for c in ln.strip("|").split("|")]
+            lo, hi_v = float(cells[0]), float(cells[1])
+            assert (hi_v - lo) < 1.5, f"wide span leaked: {ln}"
+
+
+def test_edge_documento_vacio_no_revienta():
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "empty.md")
+        res = render_automatic_eda_markdown([], out, {})
+        assert res["path"] == out
+        assert os.path.exists(out)
+        assert res["chapters"] == []
+        content = _read(out)
+        assert "documento vacío" in content
+        assert content.startswith("# ")
+
+
+def test_profile_path_construye_capitulos_y_escribe():
+    profile = {
+        "table": "mini",
+        "source": "/data/mini.csv",
+        "n_rows": 10,
+        "n_cols": 1,
+        "quality_score": 88.0,
+        "columns": [
+            {"name": "x", "inferred_type": "numeric", "null_pct": 0.0,
+             "null_count": 0,
+             "numeric": {"mean": 1.0, "median": 1.0, "min": 0.0, "max": 2.0,
+                         "std": 0.5}},
+        ],
+    }
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "mini.md")
+        res = render_automatic_eda_markdown(
+            profile, out, {"title": "Mini", "ctx": {"dataset_name": "Mini"}})
+        assert res["path"] == out  # not None — no exception, file written.
+        assert os.path.exists(out)
+        assert res["n_chars"] > 0
@@ -1,9 +1,10 @@
-"""render_automatic_eda — EDA completo one-shot: perfil → ctx → PDF + PPTX.
+"""render_automatic_eda — EDA completo one-shot: perfil → ctx → PDF + PPTX + MD.

 Pipeline impuro del grupo de capacidad `eda`. Dada UNA tabla DuckDB (o
-PostgreSQL), produce el informe AutomaticEDA COMPLETO en sus dos formatos a la
-vez (PDF móvil A5 + PPTX 16:9) con los 11 capítulos POBLADOS, en una sola
-llamada. Compone, sin reimplementar su lógica, cuatro funciones del registry:
+PostgreSQL), produce el informe AutomaticEDA COMPLETO en sus tres formatos a la
+vez (PDF móvil A5 + PPTX 16:9 + Markdown autocontenido para pegar a un LLM) con
+los capítulos POBLADOS, en una sola llamada. Compone, sin reimplementar su
+lógica, varias funciones del registry:

  - profile_table          : perfila la tabla end-to-end (TableProfile agregado),
                             opcionalmente con modelos baratos y análisis de serie.
@@ -12,8 +13,11 @@ llamada. Compone, sin reimplementar su lógica, cuatro funciones del registry:
                             modelos/geo, timeseries_raw para series, geo_points
                             para el mapa, db_path/table para la agregación
                             push-down). Sin él, esos capítulos degradan.
-  - render_automatic_eda_pdf  : renderiza el documento por capítulos a PDF.
-  - render_automatic_eda_pptx : renderiza el mismo documento a PPTX.
+  - render_automatic_eda_pdf      : renderiza el documento por capítulos a PDF.
+  - render_automatic_eda_pptx     : renderiza el mismo documento a PPTX.
+  - render_automatic_eda_markdown : serializa el mismo documento a Markdown
+                                    autocontenido (texto + tablas markdown, sin
+                                    binarios) para incorporar a un LLM.

 El TableProfile agregado basta para portada/overview/distribuciones/calidad/
 correlación, pero los capítulos `modelos`, `timeseries`, `geospatial` y
@@ -32,6 +36,7 @@ from datetime import datetime, timezone

 from datascience import (
    build_eda_render_ctx,
+    render_automatic_eda_markdown,
    render_automatic_eda_pdf,
    render_automatic_eda_pptx,
    run_eda_models,
@@ -93,6 +98,7 @@ def render_automatic_eda(
    out_dir: str = "reports",
    basename: str = None,
    ctx_extra: dict = None,
+    emit_md: bool = True,
 ) -> dict:
    """Perfila una tabla y emite el informe AutomaticEDA completo (PDF + PPTX).

@@ -140,13 +146,19 @@ def render_automatic_eda(
        ctx_extra: dict opcional con claves de presentación/contexto extra que se
            mezclan en el ctx (p.ej. dataset_name, description, source_origin).
            No pisan las claves de datos calculadas por build_eda_render_ctx.
+        emit_md: además del PDF y el PPTX, emite un Markdown autocontenido del
+            MISMO documento por capítulos (texto plano + tablas markdown, sin
+            binarios), pensado para pegar a un LLM. Default True. La ruta sale en
+            la clave de retorno ``aeda_md_path``. No altera las demás salidas.

    Returns:
        dict (nunca lanza). En éxito::

            {"status": "ok", "pdf_path": str, "pptx_path": str,
-             "manifest_path": str|None, "n_pages": int, "n_slides": int,
-             "pdf_note": str, "pptx_note": str, "profile": <TableProfile>}
+             "aeda_md_path": str|None, "manifest_path": str|None,
+             "n_pages": int, "n_slides": int, "md_chars": int|None,
+             "pdf_note": str, "pptx_note": str, "md_note": str|None,
+             "profile": <TableProfile>}

        En error: {"status": "error", "error": str}.
    """
@@ -243,15 +255,26 @@ def render_automatic_eda(
        rpdf = render_automatic_eda_pdf(prof, pdf_path, meta) or {}
        rpptx = render_automatic_eda_pptx(prof, pptx_path, meta) or {}

+        # Salida Markdown autocontenida (mismo documento por capítulos) para
+        # pegar a un LLM. Aditiva: no afecta a PDF/PPTX/manifest. dict-no-throw.
+        rmd = {}
+        md_path = None
+        if emit_md:
+            md_path = os.path.join(out_dir, base + ".md")
+            rmd = render_automatic_eda_markdown(prof, md_path, meta) or {}
+
        return {
            "status": "ok",
            "pdf_path": rpdf.get("path"),
            "pptx_path": rpptx.get("path"),
+            "aeda_md_path": rmd.get("path"),
            "manifest_path": rpdf.get("manifest_path"),
            "n_pages": rpdf.get("n_pages"),
            "n_slides": rpptx.get("n_slides"),
+            "md_chars": rmd.get("n_chars"),
            "pdf_note": rpdf.get("note"),
            "pptx_note": rpptx.get("note"),
+            "md_note": rmd.get("note"),
            "profile": prof,
        }
    except Exception as e:  # noqa: BLE001 — dict-no-throw: degradar, nunca lanzar.