chore: auto-commit (1 archivos)

- logs/ardour_mcp_server.log Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
merge(eda): cap4/cap5 distribuciones — parrafos al glosario, desc LLM+unidad por columna, donut->barras, PPT side_by_side
2026-07-01 02:16:25 +02:00 · 2026-07-01 02:11:53 +02:00 · 2026-07-01 02:10:39 +02:00 · 2026-07-01 02:01:07 +02:00 · 2026-07-01 01:36:10 +02:00 · 2026-07-01 01:34:21 +02:00
20 changed files with 2767 additions and 148 deletions
@@ -41,12 +41,13 @@ reconocido se degrada a `Note`, nunca lanza).
 | `Heading(text, level=1)` | título de sección, `level` 1 (grande) … 3 (chico) | una o varias líneas en negrita; nivel 1 lleva subrayado de acento |
 | `Markdown(text)` | texto markdown ligero | ver subset abajo; **nunca corta a media línea** |
 | `KVTable(rows, title=None)` | `rows = [(clave, valor), ...]` | tabla de 2 columnas etiqueta/valor; el valor se envuelve |
-| `DataTable(header, rows, title=None, note=None)` | `header=[...]`, `rows=[[...],...]` | tabla con cabecera; **se parte por filas repitiendo cabecera**; las celdas largas se envuelven dentro de su columna |
+| `DataTable(header, rows, title=None, note=None)` | `header=[...]`, `rows=[[...],...]` | tabla con cabecera; **si cabe** como texto se parte por filas repitiendo cabecera; **si NO cabe** (demasiadas columnas) se rasteriza entera como imagen de alta resolución para hacer zoom. Ver §11.4 |
 | `Figure(fig=None, make=None, caption=None, height_in=None)` | una `matplotlib.figure.Figure` ya construida (`fig`) o un callable `make()->Figure` (perezoso) | se rasteriza y escala para caber entera (nunca recortada) |
 | `Image(path, caption=None, height_in=None)` | ruta a PNG/JPG | se escala para caber entera |
 | `Caption(text)` / `Note(text)` | texto auxiliar pequeño | pie/nota en gris; `Note` es además el fallback de lo desconocido |
-| `Group(blocks, title=None)` | unidad **keep-together**: sus bloques se mantienen juntos | el renderer mide el grupo entero y lo mueve completo a la página/slide siguiente si no cabe; encoge la figura para dejar sitio al título+texto. Ver §11 |
+| `Group(blocks, title=None, page_break_before=False, layout="stack")` | unidad **keep-together**: sus bloques se mantienen juntos | el renderer mide el grupo entero y lo mueve completo a la página/slide siguiente si no cabe; encoge la figura para dejar sitio al título+texto. `layout="side_by_side"` coloca tabla+figura en dos columnas (solo PPTX). Ver §11 y §11.4 |
 | `GlossaryEntry(key, label, definition)` | una entrada del glosario (destino clicable) | la genera el capítulo `glosario`; registra su posición como destino de los términos marcados. Ver §11 |
+| `TocEntry(label, target_id)` | una entrada de **índice clicable** en la portada | la genera el capítulo `portada`; el renderer la cablea como salto al inicio del capítulo cuyo `id` o `title` coincide con `target_id`. Ver §11.4 |

 `Figure`/`Image` aceptan `height_in` (hint): el renderer **clampa** la figura a esa altura máxima (lo usa `Group` para encoger la figura). Toda figura escala dejando sitio a su caption en la misma página/slide; en PPTX el caption es **siempre** visible (si no se da `caption`, cae al último heading o a "Figura").

@@ -397,6 +398,65 @@ cabecera con su fondo propio. Es automático en PDF y PPTX; el patrón se mantie
 cuando una tabla larga se parte y repite cabecera (el índice de fila es lógico, no por
 página). No hay nada que hacer en los capítulos.

+### 11.4 Calidad de render global: DPI alto, tabla ancha → imagen, figura al lado, índice clicable
+
+Cuatro capacidades transversales del motor, **todas automáticas salvo `layout`** (que un
+capítulo activa explícitamente). Aplican a PDF y PPTX salvo donde se indique.
+
+**(a) DPI alto (automático).** Toda figura/imagen embebida se rasteriza a **220 dpi**
+(constante `_RASTER_DPI` en ambos renderers; en PDF se aplica también al `savefig` de la
+página, porque matplotlib re-rasteriza cada `imshow` al escribir la página). Objetivo:
+ampliar en el móvil y leer detalle (ejes, celdas) sin pixelar. El texto sigue siendo
+vectorial y seleccionable. No hay nada que hacer en los capítulos.
+
+**(b) Tabla ancha → imagen de alta resolución (automático).** Cuando un `DataTable` tiene
+**demasiadas columnas para ser legible como texto** en el ancho útil (criterio
+`_table_fits_as_text`: ancho mínimo legible por columna × nº de columnas > ancho útil; en
+la práctica salta sobre tablas tipo `df.head` con muchas columnas), en vez de comprimir las
+columnas hasta hacerlas ilegibles, la tabla se dibuja **entera como una imagen de alta
+resolución** (función `render_table_as_figure_py_datascience`: cabecera sombreada + zebra)
+escalada para caber completa, de modo que el lector hace **zoom** y la lee sin perder datos.
+Si la tabla **sí cabe**, se mantiene como texto seleccionable (PDF) / tabla nativa (PPTX).
+Las `KVTable` (2 columnas) caben siempre y se quedan como texto. No hay nada que hacer en
+los capítulos.
+
+**(c) Figura al lado de la tabla — `Group(layout="side_by_side")`.** Hint de layout que un
+capítulo activa para que su **tabla quede a la izquierda y su figura a la derecha** en la
+misma diapositiva, en lugar de apiladas:
+
+```python
+model.Group(
+    layout="side_by_side",
+    blocks=[
+        model.Heading(text=str(name), level=2),       # va a ancho completo arriba
+        model.DataTable(header=..., rows=...),         # columna IZQUIERDA (~55%)
+        model.Figure(make=_grafico_perezoso(...)),     # columna DERECHA (~45%)
+        model.Markdown(text="explicación…"),           # va a ancho completo abajo
+    ])
+```
+
+Contrato exacto del campo:
+
+| Campo | Valor | Efecto |
+|---|---|---|
+| `layout` | `"stack"` (por defecto) | comportamiento histórico: apilado vertical (keep-together). |
+| `layout` | `"side_by_side"` | **PPTX**: la tabla (rasterizada a imagen) ocupa la columna izquierda (~55% del ancho útil) y la figura la derecha (~45%); cualquier otro bloque (heading, markdown) va a ancho completo arriba/abajo. Si no hay un par tabla+figura, o no caben lado a lado en una slide, **cae automáticamente a apilado**. **PDF**: se trata **igual que `stack`** (el ancho A5 móvil no admite dos columnas legibles). Valores desconocidos degradan a `"stack"`. |
+
+Es **retrocompatible**: un `Group` sin `layout` (o `layout="stack"`) se comporta exactamente
+como antes. El capítulo `cat_distr` es el consumidor previsto (gráfico a la derecha de la
+tabla de categorías en PPT); este motor solo provee el soporte.
+
+**(d) Índice clicable en la portada — `TocEntry`.** La portada emite un `Heading("Índice")`
+seguido de un `TocEntry(label, target_id)` por capítulo. El renderer registra la
+página/slide de inicio de **cada** capítulo (indexado por `id` **y** por `title`) y cablea
+cada `TocEntry` como un salto real a ese inicio: en **PDF** vía
+`add_pdf_internal_links_py_datascience` (link GOTO de PyMuPDF), en **PPTX** vía
+`pptx_link_run_to_slide_py_datascience` (salto a slide nativo). Como la portada solo conoce
+los **títulos** de los capítulos, el `target_id` se hace coincidir contra el `title` (o el
+`id`) de destino. Si un destino no resuelve, la entrada se muestra igualmente como texto
+(en color de enlace), nunca se corta. Es el mismo mecanismo que los términos clicables del
+glosario (§11.1), reutilizado en sentido portada → capítulo.
+
 ---

 ## 10. Integración futura con `profile_table` (siguiente fase)
@@ -29,6 +29,7 @@ from .model import (  # noqa: F401
    KVTable,
    Markdown,
    Note,
+    TocEntry,
    as_blocks,
    as_chapters,
    merge_manifest,
@@ -52,6 +53,7 @@ __all__ = [
    "Group",
    "GlossaryEntry",
    "GlossaryCollector",
+    "TocEntry",
    "Chapter",
    "as_blocks",
    "as_chapters",
@@ -5,28 +5,32 @@ page (PDF) / slide (PPTX)**: every column is wrapped in a keep-together
 ``model.Group`` with ``page_break_before=True`` (except the first, which may share
 the intro's page), so its chart sits next to its tables and no column is split.

-A short intro names the clickable **[[term:entropia]]entropía[[/term]]** term —
-the full definition lives in the GLOSARIO chapter, so it is NOT repeated inline
-here (one click jumps to the glossary entry). The intro also carries the dataset
-row total used as a comparison baseline.
+Per column the Group is laid out ``side_by_side`` (PPTX: cardinality table LEFT,
+chart RIGHT; PDF: stacked) and contains, in order:

-Per column the Group contains, in order:
-
-1. A cardinality key/value table: distinct values, ``% distinct`` (distinct /
+1. The column name plus, when the LLM layer ran, its business **description** and
+   **unit** (read from ``profile['llm']['dictionary']``, matched by column name).
+2. A cardinality key/value table: distinct values, ``% distinct`` (distinct /
   total rows), total dataset rows, singleton values (frequency 1), entropy with
   its theoretical maximum and the normalized ratio, mode, imbalance and
   string-length stats.
-2. A short note flagging problematic cardinality (id-like ≈100% distinct, or a
+3. A short note flagging problematic cardinality (id-like ≈100% distinct, or a
   single dominating category).
-3. A ``top-k`` table (value / count / %).
-4. A **donut pie chart** of the most common categories (top-k + an "Otros"
+4. A ``top-k`` table (value / count / %).
+5. A **horizontal bar chart** of the most common categories (top-k + an "Otros"
   bucket), drawn lazily so the renderers scale it to fit entirely.

+A short intro names the clickable **[[term:entropia]]entropía[[/term]]** and
+**[[term:pagina_categorica]]page-layout[[/term]]** terms — their full
+definitions live in the GLOSARIO chapter, so they are NOT repeated inline here
+(one click jumps to the glossary entry). The intro also carries the dataset row
+total used as a comparison baseline.
+
 Data comes from the ``eda`` group: each ``columns[i]['categorical']`` is the
 output of ``summarize_categorical`` (``top[{value,count,pct}]``, ``mode``,
 ``n_distinct``, ``entropy``, ``imbalance``, ``len_min/mean/max``). The derived
-cardinality metrics and the pie figure are delegated to two registry functions
-(``categorical_cardinality_block`` and ``categorical_top_pie_figure``); both are
+cardinality metrics and the bar figure are delegated to two registry functions
+(``categorical_cardinality_block`` and ``categorical_top_bar_figure``); both are
 imported lazily and degrade to a minimal inline fallback so this chapter never
 raises even if they are unavailable.

@@ -39,10 +43,21 @@ import math

 from .. import model

-CHAPTER_VERSION = "1.2.0"
+CHAPTER_VERSION = "1.3.0"
 CHAPTER_ID = "cat_distr"
 CHAPTER_TITLE = "Distribuciones categóricas"

+# Key under which eda_llm_insights stores its interpretive block in the profile.
+LLM_KEY = "llm"
+
+# Second glossary term this chapter names: "how each categorical page is laid
+# out". The long paragraph that used to describe it inline in the intro now lives
+# in the GLOSARIO chapter (canonical definition in ``glosario._BASELINE_TERMS``);
+# the intro only names the clickable term, relocating the explanation, not losing
+# it. The chapter only needs to register key+label here.
+_TERM_PAGINA_KEY = "pagina_categorica"
+_TERM_PAGINA_LABEL = "Cómo se organiza cada página categórica"
+
 # Glossary term this chapter explains. Registered in the shared collector and
 # marked clickable on its first appearance (end-to-end glossary example —
 # mejora 6). Other chapters hook their own terms the same way (see the contract).
@@ -59,14 +74,14 @@ _TERM_ENTROPIA_DEF = (
 # Cap the number of categorical columns rendered to keep the document bounded;
 # the rest are summarized in a closing note (no silent truncation).
 MAX_COLS = 40
-# Rows shown in each top-k table and explicit slices in the pie. Kept moderate so
-# the whole column — cardinality table + top-k table + donut — fits on ONE
+# Rows shown in each top-k table and explicit bars in the chart. Kept moderate so
+# the whole column — cardinality table + top-k table + bar chart — fits on ONE
 # page/slide with the chart next to its tables; the table note still reports
 # "top N of M" so nothing is silently hidden. For id-like columns (≈100%
 # distinct) the top-k table is dropped entirely (it would be a list of unique
-# values — pure noise), which also frees the room the donut needs (see build).
+# values — pure noise), which also frees the room the chart needs (see build).
 TOP_TABLE_ROWS = 8
-PIE_TOP_K = 6
+CHART_TOP_K = 6
 # Truncate very long category labels in tables (the renderer also wraps). Kept
 # tight so a column with long id-like values (names, tickets) still fits its page.
 LABEL_MAX = 28
@@ -208,26 +223,74 @@ def _fallback_cardinality(cat: dict, n_rows) -> dict:
    }


-def _pie_make(top, n_distinct, title, n_rows):
-    """Return a zero-arg callable that builds the donut figure lazily."""
+def _llm_index(profile: dict, ctx: dict) -> dict:
+    """Map column name -> its LLM dictionary entry (description/unit/...).
+
+    Reads the ``llm.dictionary`` list that ``eda_llm_insights`` stored in the
+    profile (``profile['llm']``; falls back to ``ctx['llm']``). Returns an empty
+    dict when ``run_llm`` did not run, so the caller degrades cleanly. Fully
+    defensive: never raises on malformed input.
+    """
+    llm = profile.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        llm = ctx.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        return {}
+    entries = llm.get("dictionary")
+    if not isinstance(entries, (list, tuple)):
+        return {}
+    index: dict = {}
+    for e in entries:
+        if not isinstance(e, dict):
+            continue
+        col = e.get("column")
+        if col is None:
+            continue
+        index[model._safe_str(col)] = e
+    return index
+
+
+def _llm_desc_unit_block(name: str, llm_index: dict):
+    """Markdown block with the LLM business description + unit of a column, or
+    None when no LLM entry matches the column (clean fallback without LLM)."""
+    entry = llm_index.get(model._safe_str(name))
+    if not isinstance(entry, dict):
+        return None
+    raw_desc = entry.get("description") or entry.get("business_meaning")
+    desc = " ".join(model._safe_str(raw_desc).split()) if raw_desc else ""
+    raw_unit = entry.get("unit")
+    unit = " ".join(model._safe_str(raw_unit).split()) if raw_unit else ""
+    parts = []
+    if desc:
+        parts.append(f"**Descripción:** {desc}")
+    if unit:
+        parts.append(f"**Unidad:** {unit}")
+    if not parts:
+        return None
+    return model.Markdown(text=" · ".join(parts))
+
+
+def _bar_make(top, n_distinct, title, n_rows):
+    """Return a zero-arg callable that builds the bar figure lazily."""

    def make():
        try:
-            from datascience.categorical_top_pie_figure import (
-                categorical_top_pie_figure,
+            from datascience.categorical_top_bar_figure import (
+                categorical_top_bar_figure,
            )

-            return categorical_top_pie_figure(
+            return categorical_top_bar_figure(
                top=top, n_distinct=n_distinct or 0, title=title,
-                top_k=PIE_TOP_K, n_rows=n_rows)
+                top_k=CHART_TOP_K, n_rows=n_rows)
        except Exception:  # noqa: BLE001 — minimal local fallback figure.
-            return _fallback_pie(top, title)
+            return _fallback_bar(top, title)

    return make


-def _fallback_pie(top, title):
-    """Minimal donut figure used only if the registry function is unavailable."""
+def _fallback_bar(top, title):
+    """Minimal horizontal-bar figure used only if the registry function is
+    unavailable. Largest category on top, the rest folded into "Otros"."""
    import matplotlib

    matplotlib.use("Agg")
@@ -238,8 +301,8 @@ def _fallback_pie(top, title):
    items = [t for t in (top or [])
             if isinstance(t, dict) and isinstance(t.get("count"), (int, float))]
    items = sorted(items, key=lambda t: t.get("count") or 0, reverse=True)
-    head = items[:PIE_TOP_K]
-    rest = items[PIE_TOP_K:]
+    head = items[:CHART_TOP_K]
+    rest = items[CHART_TOP_K:]
    labels = [_truncate(t.get("value"), 20) for t in head]
    sizes = [float(t.get("count") or 0) for t in head]
    if rest:
@@ -249,10 +312,13 @@ def _fallback_pie(top, title):
        ax.text(0.5, 0.5, "sin datos categóricos", ha="center", va="center")
        ax.axis("off")
        return fig
-    ax.pie(sizes, labels=None, wedgeprops={"width": 0.42},
-           autopct=lambda p: f"{p:.0f}%" if p >= 4 else "")
-    ax.legend(labels, loc="center left", bbox_to_anchor=(1.0, 0.5),
-              fontsize=7, frameon=False)
+    # barh draws bottom-up, so reverse to put the largest category on top.
+    y_pos = range(len(labels))
+    ax.barh(list(y_pos), list(reversed(sizes)), color="#4C72B0",
+            edgecolor="white")
+    ax.set_yticks(list(y_pos))
+    ax.set_yticklabels(list(reversed(labels)), fontsize=7)
+    ax.set_xlabel("conteo", fontsize=8)
    ax.set_title(_truncate(title, 40))
    fig.tight_layout()
    return fig
@@ -373,22 +439,17 @@ def _topk_table(cat: dict):
                           note=note)


-def _intro_blocks(n_rows, mark_term: bool = False):
-    total = _fmt_int(n_rows)
-    # Mark the first appearance of the term as a clickable glossary jump when the
-    # term was registered (mark_term). The full definition of entropy lives in the
-    # GLOSARIO chapter, so the intro only names the clickable term here instead of
-    # repeating the long explanation (avoids the redundancy with the glossary).
+def _intro_blocks(mark_term: bool = False):
+    # The full explanation of entropy AND of how each categorical page is laid out
+    # lives in the GLOSARIO chapter; the chapter body keeps only the minimal
+    # clickable terms — no descriptive prose — to avoid duplicating the glossary.
+    # The dataset row total is not repeated here: each column's cardinality table
+    # already carries "Total filas (dataset)".
    entropia = ("[[term:entropia]]entropía[[/term]]" if mark_term
                else "entropía")
-    text = (
-        f"Cada columna categórica ocupa su propia página: sus métricas de "
-        f"cardinalidad —incluida la {entropia}—, una nota que señala cardinalidad "
-        "problemática, la tabla de las categorías más frecuentes y un gráfico de "
-        "tarta (donut) de las más comunes, todo junto."
-    )
-    if n_rows is not None:
-        text += f" El dataset tiene {total} filas en total como referencia."
+    pagina = ("[[term:pagina_categorica]]cómo se organiza cada página[[/term]]"
+              if mark_term else "cómo se organiza cada página")
+    text = f"Términos: {entropia} · {pagina}."
    return [
        model.Heading(text="Entropía y cardinalidad", level=2),
        model.Markdown(text=text),
@@ -406,15 +467,22 @@ def build_cat_distr(profile: dict, ctx: dict):
        return None

    n_rows = profile.get("n_rows")
-    # Register "entropía" in the shared glossary collector (if present) and mark
-    # its first appearance clickable. End-to-end glossary example (mejora 6).
+    # Register "entropía" and the "how each categorical page is laid out" term in
+    # the shared glossary collector (if present) and mark their first appearance
+    # clickable. End-to-end glossary example (mejora 6).
    glossary = ctx.get("glossary")
    mark_term = False
    if isinstance(glossary, model.GlossaryCollector):
        glossary.add(_TERM_ENTROPIA_KEY, _TERM_ENTROPIA_LABEL,
                     _TERM_ENTROPIA_DEF)
+        glossary.add(_TERM_PAGINA_KEY, _TERM_PAGINA_LABEL)
        mark_term = True
-    blocks = list(_intro_blocks(n_rows, mark_term=mark_term))
+    blocks = list(_intro_blocks(mark_term=mark_term))
+
+    # Business description + unit per column come from the LLM dictionary
+    # (profile['llm']['dictionary'], matched by column name); absent without
+    # run_llm, in which case the per-column description block is simply omitted.
+    llm_index = _llm_index(profile, ctx)

    rendered = cat_cols[:MAX_COLS]
    for idx, col in enumerate(rendered):
@@ -422,31 +490,36 @@ def build_cat_distr(profile: dict, ctx: dict):
        cat = col.get("categorical") or {}
        card = _normalize_card(_cardinality(cat, n_rows))

-        # One Group per categorical column: heading + cardinality table + flag
-        # note + top-k table + donut figure are kept together and the renderer
-        # starts each on a fresh page/slide (page_break_before) so every column
-        # gets its own page with its chart next to its tables. The first column
-        # may share the intro's page (no forced break) to avoid a near-empty page.
-        col_blocks = [
-            model.Heading(text=str(name), level=2),
-            _cardinality_block(card),
-        ]
+        # One Group per categorical column: heading + (optional) LLM description +
+        # cardinality table + flag note + top-k table + bar figure are kept
+        # together and the renderer starts each on a fresh page/slide
+        # (page_break_before) so every column gets its own page with its chart next
+        # to its tables. The first column may share the intro's page (no forced
+        # break) to avoid a near-empty page.
+        col_blocks = [model.Heading(text=str(name), level=2)]
+        desc_block = _llm_desc_unit_block(name, llm_index)
+        if desc_block is not None:
+            col_blocks.append(desc_block)
+        col_blocks.append(_cardinality_block(card))
        note = _flag_note(card)
        if note is not None:
            col_blocks.append(note)
        # For id-like columns (≈100% distinct) the top-k is a list of unique
        # values — pure noise; skip it (the flag note already explains why) and
-        # let the donut take that room so the whole column fits one page/slide.
+        # let the bar chart take that room so the whole column fits one page/slide.
        if not card.get("id_like"):
            topk = _topk_table(cat)
            if topk is not None:
                col_blocks.append(topk)
        col_blocks.append(model.Figure(
-            make=_pie_make(cat.get("top") or [], card.get("n_distinct"),
+            make=_bar_make(cat.get("top") or [], card.get("n_distinct"),
                           str(name), n_rows),
            caption=(f"Categorías más comunes de «{_truncate(name, 32)}» "
-                     "(donut: top-k + «Otros»)")))
-        blocks.append(model.Group(blocks=col_blocks,
+                     "(barras: top-k + «Otros»)")))
+        # layout="side_by_side": in PPTX the cardinality table goes to the LEFT and
+        # the bar chart to the RIGHT of the same slide; the PDF renderer stacks it
+        # (the A5 mobile page is too narrow for two readable columns).
+        blocks.append(model.Group(blocks=col_blocks, layout="side_by_side",
                                  page_break_before=(idx > 0)))

    if len(cat_cols) > len(rendered):
@@ -2,12 +2,14 @@

 Self-contained: builds synthetic TableProfiles (no DuckDB) so the suite is fast
 and deterministic. Verifies that ``build_cat_distr`` emits the blocks the user
-asked for (distinct/total/%-distinct/unique metrics, top-k table and a donut
+asked for (distinct/total/%-distinct/unique metrics, top-k table and a bar
 figure), that EACH categorical column is wrapped in its own keep-together
-``Group`` that starts on a fresh page/slide (one column per page, chart next to
-its tables), that the long entropy explanation is NOT repeated inline (it lives
-in the glossary — only the clickable term is kept), that the chapter renders
-inside the full document to both PDF and PPTX showing that content, that a
+``Group`` laid out ``side_by_side`` (PPTX: table left / bars right) that starts on
+a fresh page/slide (one column per page, chart next to its tables), that the LLM
+business description + unit are shown per column when the profile carries an LLM
+block, that the long entropy / page-layout explanations are NOT repeated inline
+(they live in the glossary — only the clickable terms are kept), that the chapter
+renders inside the full document to both PDF and PPTX showing that content, that a
 profile with no categorical columns yields ``None`` without raising, and that
 long labels / many columns are never cut in either output.
 """
@@ -116,6 +118,10 @@ def test_golden_build_cat_distr_emite_bloques_pedidos():
    assert "log2" not in md.text          # redundant explanation removed.
    assert "máxima diversidad" not in md.text

+    # The donut/pie is gone: the intro no longer mentions tarta/donut (the chart
+    # is now a bar chart; the long page-layout explanation moved to the glossary).
+    assert "donut" not in md.text and "tarta" not in md.text
+
    # Per-column blocks are wrapped in keep-together Groups: flatten to inspect.
    flat = _flatten(ch.blocks)
    kv = next(b for b in flat if isinstance(b, KVTable))
@@ -128,11 +134,13 @@ def test_golden_build_cat_distr_emite_bloques_pedidos():
    assert any("Entropía" in lbl for lbl in labels)
    assert "únicos" in values and "%" in values
    assert "bits" in values and "norm" in values   # entropy + max + normalized.
-    # Top-k table + pie figure.
+    # Top-k table + bar figure.
    dt = next(b for b in flat if isinstance(b, DataTable))
    assert dt.header == ["Valor", "Conteo", "%"]
    assert any("neumaticos" in str(cell) for row in dt.rows for cell in row)
    assert any(isinstance(b, Figure) for b in flat)
+    # Each per-column Group is laid out side_by_side (table left / bars right).
+    assert all(g.layout == "side_by_side" for g in _column_groups(ch))
    # id-like column flagged with a Note that also explains the top-k is dropped.
    idnote = next((b for b in flat
                   if isinstance(b, Note) and "identificador" in b.text), None)
@@ -140,9 +148,9 @@ def test_golden_build_cat_distr_emite_bloques_pedidos():
    assert "No se lista el top" in idnote.text


-def test_golden_idlike_omite_topk_y_conserva_donut():
+def test_golden_idlike_omite_topk_y_conserva_grafico():
    # The id-like column (uuid, 100% distinct) must NOT carry a top-k DataTable
-    # (it would be a list of unique values), but must still keep its donut Figure
+    # (it would be a list of unique values), but must still keep its bar Figure
    # and its cardinality table so it stays a full per-column page.
    ch = build_cat_distr(_profile(), {})
    groups = _column_groups(ch)
@@ -151,7 +159,7 @@ def test_golden_idlike_omite_topk_y_conserva_donut():
    kinds = [b.kind for b in uuid_group.blocks]
    assert "data_table" not in kinds      # top-k of unique values dropped.
    assert "kv_table" in kinds            # cardinality kept.
-    assert "figure" in kinds              # donut kept (chart per column).
+    assert "figure" in kinds              # bar chart kept (chart per column).
    # A non-id-like column keeps its top-k table.
    cat_group = next(g for g in groups
                     if any(getattr(b, "text", "") == "categoria"
@@ -205,7 +213,7 @@ def test_golden_render_pdf_una_pagina_por_columna():
        assert "Entrop" in txt
        assert "distintos" in txt
        assert "categoria" in txt and "neumaticos" in txt
-        assert "donut" in txt           # figure caption rendered as text.
+        assert "barras" in txt          # bar-chart caption rendered as text (PDF).
        assert "identificador" in txt   # id-like note rendered.


@@ -258,9 +266,11 @@ def _profile_high_card() -> dict:


 def test_golden_pptx_una_slide_por_columna_con_su_grafico():
-    """Each categorical column occupies EXACTLY ONE cat_distr slide that carries
-    BOTH its cardinality table and its donut figure (picture) — i.e. the chart is
-    never separated from its table, even for a high-cardinality column."""
+    """Cada columna categórica ocupa EXACTAMENTE UN slide cat_distr que lleva su
+    gráfico (picture) en la misma slide — el chart nunca se separa de su columna,
+    ni siquiera para una columna de alta cardinalidad. Con layout side_by_side la
+    tabla se rasteriza a imagen, así que la comprobación se hace por presencia de
+    picture (no por el texto de la tabla)."""
    from pptx.enum.shapes import MSO_SHAPE_TYPE

    prof = _profile_high_card()
@@ -272,7 +282,7 @@ def test_golden_pptx_una_slide_por_columna_con_su_grafico():
        prs = Presentation(out)

        # Per column: the cat_distr slides whose text mentions it, and whether the
-        # owning slide also has the donut caption + an actual picture shape.
+        # owning slide also carries an actual picture shape (its chart).
        slides_with_col = {n: [] for n in cat_names}
        owner_has_chart = {n: False for n in cat_names}
        for i, sl in enumerate(prs.slides):
@@ -288,15 +298,106 @@ def test_golden_pptx_una_slide_por_columna_con_su_grafico():
            for n in cat_names:
                if n in txt:
                    slides_with_col[n].append(i)
-                    has_table = "Cardinalidad" in txt or "distintos" in txt
-                    if has_pic and "donut" in txt and has_table:
+                    if has_pic:
                        owner_has_chart[n] = True

        for n in cat_names:
            # Exactly one slide carries the column (not split across slides).
            assert len(slides_with_col[n]) == 1, (n, slides_with_col[n])
-            # That single slide also holds its table AND its donut picture.
-            assert owner_has_chart[n], (n, "tabla y donut no están en el mismo slide")
+            # That single slide also holds its chart picture.
+            assert owner_has_chart[n], (n, "el gráfico no está en el slide de la columna")
+
+
+def test_golden_pptx_columna_side_by_side_tabla_izq_barra_der():
+    """Con layout side_by_side, una columna categórica coloca su tabla de
+    cardinalidad (imagen) en la mitad izquierda y su gráfico de barras (imagen) en
+    la mitad derecha de la MISMA slide. Verifica que al menos una columna queda en
+    dos columnas (tabla-izq / barras-der), evidencia del side_by_side en PPTX."""
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    from pptx.util import Inches
+
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "eda.pptx")
+        render_automatic_eda_pptx(_profile(), out, {"title": "EDA"})
+        prs = Presentation(out)
+        centre = int(Inches(13.333 / 2.0))   # half of the 16:9 slide width.
+        two_col_slides = 0
+        for sl in prs.slides:
+            texts, lefts = [], []
+            for sh in sl.shapes:
+                if sh.has_text_frame:
+                    texts.append(sh.text_frame.text)
+                if (sh.shape_type == MSO_SHAPE_TYPE.PICTURE
+                        and sh.left is not None):
+                    lefts.append(sh.left)
+            txt = re.sub(r"\s+", " ", " ".join(texts))
+            if "Distribuciones categ" not in txt:
+                continue
+            # One picture starts in the left half, another in the right half.
+            if len(lefts) >= 2 and min(lefts) < centre and max(lefts) > centre:
+                two_col_slides += 1
+        assert two_col_slides >= 1, (
+            "ninguna columna quedó con tabla-izq / barras-der (side_by_side)")
+
+
+def _profile_with_llm() -> dict:
+    """The base profile plus an ``llm`` block (as eda_llm_insights would store it
+    with run_llm=True): a data dictionary with description/unit per column."""
+    prof = _profile()
+    prof["llm"] = {
+        "dictionary": [
+            {"column": "categoria",
+             "description": "Familia de producto del recambio",
+             "business_meaning": "Agrupa el catálogo por tipo de pieza",
+             "unit": "categoría"},
+            {"column": "uuid",
+             "description": "Identificador único de registro",
+             "unit": ""},
+        ],
+    }
+    return prof
+
+
+def test_llm_descripcion_y_unidad_por_columna():
+    # With an LLM dictionary, each categorical column whose name matches shows its
+    # business description and unit in a per-column markdown block.
+    ch = build_cat_distr(_profile_with_llm(), {})
+    groups = _column_groups(ch)
+    cat_group = next(g for g in groups
+                     if any(getattr(b, "text", "") == "categoria"
+                            for b in g.blocks))
+    md = " ".join(b.text for b in cat_group.blocks
+                  if getattr(b, "kind", "") == "markdown")
+    assert "Descripción" in md and "Familia de producto" in md
+    assert "Unidad" in md and "categoría" in md
+
+
+def test_edge_sin_llm_no_anade_descripcion():
+    # Without an LLM block the per-column description markdown is simply omitted;
+    # the column still renders its cardinality table and bar figure.
+    ch = build_cat_distr(_profile(), {})
+    for g in _column_groups(ch):
+        mds = [b.text for b in g.blocks if getattr(b, "kind", "") == "markdown"]
+        assert not any("Descripción" in t for t in mds)
+
+
+def test_pagina_categorica_clicable_y_definicion_en_glosario():
+    # The "how each categorical page is laid out" term is registered + marked
+    # clickable in the intro, and its full definition lands in the glossary
+    # chapter (canonical baseline catalog), not inline.
+    from datascience.automatic_eda.chapters.glosario import build_glosario
+
+    gc = GlossaryCollector()
+    ch = build_cat_distr(_profile(), {"glossary": gc})
+    md = next(b for b in ch.blocks if isinstance(b, Markdown))
+    assert "[[term:pagina_categorica]]" in md.text
+    assert gc.has("pagina_categorica")
+    glos = build_glosario(_profile(), {"glossary": gc})
+    entry = next(b for b in glos.blocks
+                 if getattr(b, "kind", "") == "glossary_entry"
+                 and b.key == "pagina_categorica")
+    assert "barras" in entry.definition
+    assert "identificador" in entry.definition


 def test_edge_sin_categoricas_devuelve_none():
@@ -17,10 +17,63 @@ from __future__ import annotations

 from .. import model

-CHAPTER_VERSION = "1.0.0"
+CHAPTER_VERSION = "1.1.0"
 CHAPTER_ID = "glosario"
 CHAPTER_TITLE = "Glosario"

+# Canonical definitions for cross-cutting terms — the "how to read it" entries
+# that do not belong to a single chapter. A chapter only needs to *register* the
+# term (``ctx['glossary'].add(key, label)``) and mark its in-text appearance with
+# ``[[term:key]]…[[/term]]``; this chapter supplies the full definition here when
+# the collector carries the term without one. Keeping the prose in a single place
+# avoids repeating a long paragraph inline in every chapter that names the term
+# (the explanation moved out of the NUM DISTR and CAT DISTR intros lives here).
+_BASELINE_TERMS = {
+    "histograma_boxplot": {
+        "label": "Cómo leer el histograma y el boxplot",
+        "definition": (
+            "Para cada columna numérica se muestra su histograma con tres líneas "
+            "de referencia: la media (línea roja discontinua), la mediana (línea "
+            "verde continua) y la banda ±1σ (zona sombreada que cubre una "
+            "desviación estándar a cada lado de la media). Debajo, alineado al "
+            "mismo eje horizontal, un boxplot de Tukey: la caja abarca del primer "
+            "al tercer cuartil (P25–P75), la línea interior es la mediana y los "
+            "bigotes llegan hasta 1,5·IQR; los puntos rojos señalan que hay "
+            "valores más allá de las vallas (posibles atípicos). Comparar la media "
+            "con la mediana revela la asimetría: si la media supera a la mediana la "
+            "cola larga cae hacia los valores altos (asimetría a la derecha), y al "
+            "revés hacia los bajos."),
+    },
+    "pagina_categorica": {
+        "label": "Cómo se organiza cada página categórica",
+        "definition": (
+            "Cada columna categórica ocupa su propia página: muestra sus métricas "
+            "de cardinalidad —incluida la entropía—, una nota que señala "
+            "cardinalidad problemática (columnas que se comportan como "
+            "identificador, con casi todos los valores distintos, o dominadas por "
+            "una sola categoría), la tabla de las categorías más frecuentes (top-k, "
+            "con su conteo y porcentaje) y un gráfico de barras de las categorías "
+            "más comunes (top-k más una barra «Otros» que agrupa la cola). El total "
+            "de filas del dataset se usa como referencia para interpretar los "
+            "conteos."),
+    },
+}
+
+
+def _resolve_term(term: dict) -> tuple:
+    """Return (label, definition) for a collected term, completing a missing
+    definition (and, if absent, the label) from the canonical baseline catalog."""
+    key = model._safe_str(term.get("key"))
+    label = model._safe_str(term.get("label"))
+    definition = model._safe_str(term.get("definition"))
+    base = _BASELINE_TERMS.get(key)
+    if base:
+        if not definition.strip():
+            definition = model._safe_str(base.get("definition"))
+        if not label.strip() or label == key:
+            label = model._safe_str(base.get("label")) or label
+    return label, definition
+

 def build_glosario(profile: dict, ctx: dict):
    """Build the glossary Chapter from the shared collector, or None if empty."""
@@ -36,12 +89,14 @@ def build_glosario(profile: dict, ctx: dict):
            "Cada término va resaltado en el texto y, al pulsarlo, salta a su "
            "definición en esta sección.")),
    ]
-    # One clickable destination per term, alphabetically by visible label.
+    # One clickable destination per term, alphabetically by visible label. A term
+    # registered without a definition is completed from the canonical baseline.
    for term in glossary.terms(by="label"):
+        label, definition = _resolve_term(term)
        blocks.append(model.GlossaryEntry(
            key=model._safe_str(term.get("key")),
-            label=model._safe_str(term.get("label")),
-            definition=model._safe_str(term.get("definition"))))
+            label=label,
+            definition=definition))

    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
                         version=CHAPTER_VERSION, blocks=blocks)
@@ -35,10 +35,21 @@ try:
 except Exception:  # noqa: BLE001 — keep the chapter importable no matter what.
    build_boxplot_stats = None  # type: ignore[assignment]

-CHAPTER_VERSION = "1.2.0"
+CHAPTER_VERSION = "1.3.0"
 CHAPTER_ID = "num_distr"
 CHAPTER_TITLE = "Distribuciones numéricas"

+# Glossary term this chapter explains. The long "how to read the histogram and
+# the boxplot" paragraph used to live inline in the intro; it now lives in the
+# GLOSARIO chapter (canonical definition in ``glosario._BASELINE_TERMS``) and the
+# intro only names the clickable term — one click jumps to the full explanation,
+# so the information is relocated, not lost (mejora glosario).
+_TERM_HISTOBOX_KEY = "histograma_boxplot"
+_TERM_HISTOBOX_LABEL = "Cómo leer el histograma y el boxplot"
+
+# Key under which eda_llm_insights stores its interpretive block in the profile.
+LLM_KEY = "llm"
+
 # Plain-Spanish gloss for every label ``detect_distribution_type`` can emit, so a
 # non-expert reader understands the shape and the suggested next step (MUST-4.3).
 _DIST_GLOSS = {
@@ -99,6 +110,53 @@ def _numeric_columns(profile: dict) -> list:
    return out


+def _llm_index(profile: dict, ctx: dict) -> dict:
+    """Map column name -> its LLM dictionary entry (description/unit/...).
+
+    Reads the ``llm.dictionary`` list that ``eda_llm_insights`` stored in the
+    profile (``profile['llm']``; falls back to ``ctx['llm']``). Returns an empty
+    dict when ``run_llm`` did not run, so the caller degrades cleanly. Fully
+    defensive: never raises on malformed input.
+    """
+    llm = profile.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        llm = ctx.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        return {}
+    entries = llm.get("dictionary")
+    if not isinstance(entries, (list, tuple)):
+        return {}
+    index: dict = {}
+    for e in entries:
+        if not isinstance(e, dict):
+            continue
+        col = e.get("column")
+        if col is None:
+            continue
+        index[model._safe_str(col)] = e
+    return index
+
+
+def _llm_desc_unit_block(name: str, llm_index: dict):
+    """Markdown block with the LLM business description + unit of a column, or
+    None when no LLM entry matches the column (clean fallback without LLM)."""
+    entry = llm_index.get(model._safe_str(name))
+    if not isinstance(entry, dict):
+        return None
+    raw_desc = entry.get("description") or entry.get("business_meaning")
+    desc = " ".join(model._safe_str(raw_desc).split()) if raw_desc else ""
+    raw_unit = entry.get("unit")
+    unit = " ".join(model._safe_str(raw_unit).split()) if raw_unit else ""
+    parts = []
+    if desc:
+        parts.append(f"**Descripción:** {desc}")
+    if unit:
+        parts.append(f"**Unidad:** {unit}")
+    if not parts:
+        return None
+    return model.Markdown(text=" · ".join(parts))
+
+
 def _make_hist_box(name: str, numeric: dict, box: dict):
    """Build the histogram (with mean/median/±σ lines) + boxplot figure.

@@ -271,15 +329,26 @@ def build_num_distr(profile: dict, ctx: dict):
    if not numerics:
        return None  # chapter does not apply to a dataset with no numerics.

+    # Register the "how to read the histogram and boxplot" term in the shared
+    # glossary collector (if present) and mark its first appearance clickable. The
+    # full explanation (colour code, 1,5·IQR rule, asymmetry reading) lives in the
+    # GLOSARIO chapter instead of inline here: the intro only names the term.
+    glossary = ctx.get("glossary")
+    mark_term = False
+    if isinstance(glossary, model.GlossaryCollector):
+        glossary.add(_TERM_HISTOBOX_KEY, _TERM_HISTOBOX_LABEL)
+        mark_term = True
+    como_leer = ("[[term:histograma_boxplot]]cómo leer estos gráficos[[/term]]"
+                 if mark_term else "cómo leer estos gráficos")
    intro = (
-        "Para cada columna numérica se muestra su **histograma** con tres líneas "
-        "de referencia: la **media** (línea roja discontinua), la **mediana** "
-        "(línea verde continua) y la banda **±1σ** (zona sombreada). Debajo, "
-        "alineado al mismo eje, un **boxplot de Tukey**: la caja abarca del "
-        "primer al tercer cuartil (P25–P75), la línea interior es la mediana y "
-        "los bigotes llegan hasta 1,5·IQR; los puntos rojos señalan que hay "
-        "valores más allá de las vallas. Comparar media y mediana revela la "
-        "asimetría de la distribución.")
+        "Cada columna numérica muestra su **histograma** (con la **media**, la "
+        "**mediana** y la banda **±1σ**) y, debajo y al mismo eje, su **boxplot "
+        f"de Tukey** — {como_leer}.")
+
+    # Business description + unit per column come from the LLM dictionary
+    # (profile['llm']['dictionary'], matched by column name); absent without
+    # run_llm, in which case the per-column description block is simply omitted.
+    llm_index = _llm_index(profile, ctx)

    blocks = [
        model.Heading(text=CHAPTER_TITLE, level=1),
@@ -293,17 +362,20 @@ def build_num_distr(profile: dict, ctx: dict):
                box = build_boxplot_stats(numeric) or {}
            except Exception:  # noqa: BLE001 — degrade, never raise.
                box = {}
-        # Keep the column heading, its figure and its stats note together on the
-        # same page/slide (mejora 3 — keep-together): the renderers measure the
-        # whole Group and move it whole when it would not fit.
-        blocks.append(model.Group(blocks=[
-            model.Heading(text=str(name), level=2),
-            model.Figure(
-                make=_figure_maker(name, numeric, box),
-                caption=f"Distribución de «{name}» — histograma "
-                        f"(media/mediana/±σ) y boxplot."),
-            model.Markdown(text=_stats_note(name, numeric, box)),
-        ]))
+        # Keep the column heading, its (optional) LLM description, its figure and
+        # its stats note together on the same page/slide (mejora 3 —
+        # keep-together): the renderers measure the whole Group and move it whole
+        # when it would not fit.
+        col_blocks = [model.Heading(text=str(name), level=2)]
+        desc_block = _llm_desc_unit_block(name, llm_index)
+        if desc_block is not None:
+            col_blocks.append(desc_block)
+        col_blocks.append(model.Figure(
+            make=_figure_maker(name, numeric, box),
+            caption=f"Distribución de «{name}» — histograma "
+                    f"(media/mediana/±σ) y boxplot."))
+        col_blocks.append(model.Markdown(text=_stats_note(name, numeric, box)))
+        blocks.append(model.Group(blocks=col_blocks))

    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
                         version=CHAPTER_VERSION, blocks=blocks)
@@ -101,7 +101,7 @@ def test_golden_chapter_estructura_y_bloques():


 def test_golden_media_mediana_sigma_y_boxplot_presentes():
-    # The intro documents the three reference lines and the Tukey boxplot; the
+    # The short intro names the three reference lines and the Tukey boxplot; the
    # per-column note carries the actual mean/median/σ numbers and the shape.
    ch = build_num_distr(_profile(n_numeric=1, extra_categorical=False), {})
    md_texts = " ".join(b.text for b in _flatten(ch.blocks)
@@ -110,10 +110,58 @@ def test_golden_media_mediana_sigma_y_boxplot_presentes():
    assert "±1σ" in md_texts or "σ" in md_texts
    assert "boxplot" in md_texts.lower()
    assert "Tukey" in md_texts
+    # The long "how to read it" explanation moved to the glossary: the colour-code
+    # / 1,5·IQR walkthrough is no longer inline in the chapter body.
+    assert "1,5·IQR" not in md_texts
+    assert "línea roja" not in md_texts
    # distribution_type gloss surfaced for the column (right-skewed preset).
    assert _DIST_GLOSS["right-skewed"].split(";")[0][:20] in md_texts


+def test_glosario_histograma_boxplot_clicable_y_definicion():
+    # With a glossary collector the intro marks the clickable term and the FULL
+    # explanation (the long paragraph removed from the body) lands in the glossary.
+    from datascience.automatic_eda.chapters.glosario import build_glosario
+
+    gc = model.GlossaryCollector()
+    prof = _profile(n_numeric=1, extra_categorical=False)
+    ch = build_num_distr(prof, {"glossary": gc})
+    intro = next(b for b in ch.blocks if b.kind == "markdown")
+    assert "[[term:histograma_boxplot]]" in intro.text
+    assert gc.has("histograma_boxplot")
+    glos = build_glosario(prof, {"glossary": gc})
+    entry = next(b for b in glos.blocks
+                 if getattr(b, "kind", "") == "glossary_entry"
+                 and b.key == "histograma_boxplot")
+    assert "boxplot" in entry.definition.lower()
+    assert "1,5·IQR" in entry.definition
+
+
+def test_llm_descripcion_y_unidad_por_columna():
+    # With an LLM dictionary, each numeric column whose name matches shows its
+    # business description and unit in a per-column markdown block.
+    prof = _profile(n_numeric=2)
+    prof["llm"] = {"dictionary": [
+        {"column": "precio", "description": "Precio de venta del producto",
+         "unit": "EUR"},
+        {"column": "alcohol", "business_meaning": "Grado alcohólico",
+         "unit": "% vol"},
+    ]}
+    ch = build_num_distr(prof, {})
+    md_all = " ".join(b.text for b in _flatten(ch.blocks)
+                      if b.kind == "markdown")
+    assert "Precio de venta" in md_all and "EUR" in md_all
+    assert "Grado alcohólico" in md_all and "% vol" in md_all
+
+
+def test_edge_sin_llm_no_anade_descripcion():
+    # Without an LLM block the per-column description markdown is simply omitted.
+    ch = build_num_distr(_profile(n_numeric=2), {})
+    md_all = " ".join(b.text for b in _flatten(ch.blocks)
+                      if b.kind == "markdown")
+    assert "Descripción" not in md_all
+
+
 def test_boxplot_stats_se_consumen_del_registry():
    # The chapter must feed build_boxplot_stats (group eda) and the resulting
    # box must carry the Tukey fences for the figure.
@@ -26,7 +26,7 @@ from datetime import datetime, timezone

 from .. import model

-CHAPTER_VERSION = "1.3.0"
+CHAPTER_VERSION = "1.4.0"
 CHAPTER_ID = "portada"
 CHAPTER_TITLE = "Portada"

@@ -120,9 +120,17 @@ def _summary_blocks(summary) -> list:
        # Values pinned to the right margin (numbers flush right, label left).
        blocks.append(model.KVTable(rows=rows, value_align="right"))
    if titles:
-        bullets = "\n".join(f"- {model._safe_str(t)}" for t in titles)
-        blocks.append(model.Markdown(
-            text="Este informe incluye los siguientes capítulos:\n" + bullets))
+        # Clickable index ("Índice"): one TocEntry per chapter title. Each entry
+        # becomes a real jump to that chapter's first page/slide once the document
+        # is laid out (the renderers register every chapter start and wire the
+        # links; ``target_id`` is matched against the chapter title). The cover only
+        # knows chapter titles, so the title doubles as the link target.
+        blocks.append(model.Heading(text="Índice", level=2))
+        for t in titles:
+            label = model._safe_str(t)
+            if not label:
+                continue
+            blocks.append(model.TocEntry(label=label, target_id=label))
    return blocks


@@ -160,11 +160,21 @@ class Group:
    a chapter can give each unit its own page — e.g. one categorical column per
    page (see CAT DISTR). It is purely additive: the default False keeps the plain
    keep-together behaviour for every existing chapter.
+
+    ``layout`` is a hint for how the group's children are arranged:
+    ``"stack"`` (default) keeps the historical top-to-bottom flow; ``"side_by_side"``
+    asks the PPTX renderer to place the group's table to the LEFT and its figure to
+    the RIGHT of the same slide (table ~55% width, figure ~45%), measuring so both
+    fit and falling back to stacking when they do not. The PDF renderer treats
+    ``"side_by_side"`` exactly like ``"stack"`` (the A5 mobile page is too narrow for
+    two readable columns). Unknown values degrade to ``"stack"``. Purely additive:
+    the default keeps every existing chapter unchanged.
    """

    blocks: list = field(default_factory=list)
    title: Optional[str] = None
    page_break_before: bool = False
+    layout: str = "stack"
    kind: str = field(default="group", init=False)


@@ -183,6 +193,22 @@ class GlossaryEntry:
    kind: str = field(default="glossary_entry", init=False)


+@dataclass
+class TocEntry:
+    """One clickable index (table-of-contents) entry shown on the cover.
+
+    Rendered as a single line — the chapter ``label`` in the accent link colour —
+    that, once the document is laid out, becomes a real click jumping to the first
+    page/slide of the target chapter (PDF link annotation via PyMuPDF; PPTX native
+    slide jump). ``target_id`` is matched against each chapter's ``id`` *and* its
+    ``title`` (the cover only knows chapter titles), so either resolves. If the
+    target cannot be resolved the entry still renders as plain text (never cut)."""
+
+    label: str = ""
+    target_id: str = ""
+    kind: str = field(default="toc_entry", init=False)
+
+
@dataclass
 class Chapter:
    """An ordered set of blocks with an id, a title and a generation version."""
@@ -207,13 +233,14 @@ _BLOCK_BY_KIND = {
    "note": Note,
    "group": Group,
    "glossary_entry": GlossaryEntry,
+    "toc_entry": TocEntry,
 }


 def as_block(obj: Any):
    """Coerce a value into a block dataclass. Unknown values become a Note."""
    if isinstance(obj, (Heading, Markdown, KVTable, DataTable, Figure, Image,
-                        Caption, Note, Group, GlossaryEntry)):
+                        Caption, Note, Group, GlossaryEntry, TocEntry)):
        if isinstance(obj, Group):
            obj.blocks = as_blocks(obj.blocks)
        return obj
@@ -259,11 +286,15 @@ def as_block(obj: Any):
                return Group(blocks=as_blocks(obj.get("blocks")),
                             title=obj.get("title"),
                             page_break_before=bool(
-                                 obj.get("page_break_before", False)))
+                                 obj.get("page_break_before", False)),
+                             layout=_safe_str(obj.get("layout")) or "stack")
            if cls is GlossaryEntry:
                return GlossaryEntry(key=_safe_str(obj.get("key")),
                                     label=_safe_str(obj.get("label")),
                                     definition=_safe_str(obj.get("definition")))
+            if cls is TocEntry:
+                return TocEntry(label=_safe_str(obj.get("label")),
+                                target_id=_safe_str(obj.get("target_id")))
        except Exception:  # noqa: BLE001 — never raise on a malformed block.
            return Note(text=_safe_str(obj))
    return Note(text=_safe_str(obj))
@@ -298,11 +298,16 @@ def test_cover_first_glossary_last_with_summary():
    headings = [b.text for b in cover.blocks if b.kind == "heading"]
    assert any("Resumen" in h for h in headings), \
        "la portada no incluye el resumen agregado"
-    # The summary reflects the body chapters (e.g. the numeric/categorical ones).
-    cover_text = " ".join(
-        b.text for b in cover.blocks if getattr(b, "kind", "") == "markdown")
-    assert "Distribuciones" in cover_text, \
-        "el resumen de portada no menciona los capítulos del cuerpo"
+    # The index ("Índice") is now a clickable list of TocEntry blocks (one per
+    # body chapter), not a markdown bullet list. Verify both the heading and that
+    # the entries name the body chapters.
+    assert any("Índice" in h for h in headings), \
+        "la portada no incluye la sección Índice"
+    toc_labels = " ".join(
+        getattr(b, "label", "") for b in cover.blocks
+        if getattr(b, "kind", "") == "toc_entry")
+    assert "Distribuciones" in toc_labels, \
+        "el índice de portada no menciona los capítulos del cuerpo"


 # --------------------------------------------------------------------------- #
@@ -46,11 +46,23 @@ _MUTED = "#8a8a8a"
 _RULE = "#cccccc"
 _HEAD_BG = "#eef3f6"

+# Rasterization DPI for every embedded raster (figure/table image) AND for the
+# page save itself. Raised from the old 150/default-100 to 220 so a reader can
+# pinch-zoom on a phone and still see crisp detail (axis labels, table cells)
+# without pixelation. Text stays vectorial (pdf.fonttype=42) so it remains
+# selectable regardless of DPI — only the embedded images gain resolution. 220 is
+# a deliberate balance: noticeably sharper than 150 while keeping the file size
+# reasonable. ``savefig.dpi`` matters because matplotlib re-rasterizes each
+# ``imshow`` when PdfPages writes the page; without it the final image would land
+# at ~100 dpi no matter how sharp the intermediate PNG was.
+_RASTER_DPI = 220
+
 _RC = {
    "font.size": 10,
    "font.family": "sans-serif",
    "figure.facecolor": "white",
    "savefig.facecolor": "white",
+    "savefig.dpi": _RASTER_DPI,
    "pdf.fonttype": 42,  # embed TrueType — text stays selectable on mobile.
 }

@@ -80,6 +92,10 @@ class _PdfState:
        # points (1/72") with a top-left origin — same convention as PyMuPDF.
        self.term_sources = []       # [{key, page, rect:[x0,y0,x1,y1]}]
        self.term_dests = {}         # key -> {page, point:[x,y]}
+        # Clickable index (cover → chapter). Sources are the cover's TocEntry
+        # rects; chapter_starts maps a chapter id AND its title to its first page.
+        self.toc_sources = []        # [{target_id, page, rect:[x0,y0,x1,y1]}]
+        self.chapter_starts = {}     # id|title -> {page, point:[x,y]}


 # --------------------------------------------------------------------------- #
@@ -385,6 +401,57 @@ def _col_widths(header: list, rows: list, fs: float) -> list:
    return widths


+# Minimal legible characters reserved per column when deciding whether a table
+# can be shown as selectable text. Below this width per column the cells become
+# unreadable, so the table is rasterized to a zoomable high-res image instead.
+_MIN_LEGIBLE_CHARS = 8
+
+
+def _table_fits_as_text(header: list, rows: list) -> bool:
+    """True when the table fits the usable width as readable text.
+
+    A table whose columns cannot each get a minimal legible width within the A5
+    usable width (typically many columns, e.g. a 19-column ``df.head``) is flagged
+    so it is rendered as a single high-resolution image — the reader zooms in on
+    the phone and reads every cell, nothing cut — instead of being squeezed until
+    unreadable. Narrow tables (few columns) keep the selectable-text rendering."""
+    header = header or []
+    rows = rows or []
+    ncol = len(header) if header else (len(rows[0]) if rows else 1)
+    ncol = max(1, ncol)
+    cw = tl.avg_char_width_in(_FS_CELL)
+    min_needed = ncol * (_MIN_LEGIBLE_CHARS * cw + _CELL_PAD * 2)
+    return min_needed <= _USABLE_W
+
+
+def _table_figure_block(block):
+    """Wrap a too-wide table as a lazily-rasterized Figure (cached on the block).
+
+    The table is drawn once via ``render_table_as_figure`` (header shading + zebra)
+    and embedded as one high-res image scaled to fit entirely. The same Figure is
+    reused for measuring and placing so keep-together stays consistent. The table
+    title/note are drawn inside the image (self-describing when zoomed/shared), so
+    the block-level caption is left empty to avoid a duplicate title."""
+    cached = getattr(block, "_aeda_tablefig", None)
+    if cached is not None:
+        return cached
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    title = getattr(block, "title", None)
+    note = getattr(block, "note", None)
+
+    def _make():
+        from datascience.render_table_as_figure import render_table_as_figure
+        return render_table_as_figure(header, rows, title=title, note=note)
+
+    fig = model.Figure(make=_make, caption=None)
+    try:
+        block._aeda_tablefig = fig
+    except Exception:  # noqa: BLE001 — block may reject attributes; degrade.
+        pass
+    return fig
+
+
 def _wrap_row(cells: list, widths: list, fs: float) -> list:
    """Wrap each cell to its column width → list of line-lists per cell."""
    out = []
@@ -424,11 +491,16 @@ def _draw_table_row(st: _PdfState, cells_lines: list, widths: list, fs: float,


 def _place_data_table(st: _PdfState, block) -> None:
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Too many columns to be legible as text → render the whole table as one
+    # high-res image, scaled to fit entirely (the reader zooms to read it).
+    if not _table_fits_as_text(header, rows):
+        _place_figure(st, _table_figure_block(block))
+        return
    title = getattr(block, "title", None)
    if title:
        _place_heading(st, model.Heading(title, level=2))
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows, fs)
    header_lines = _wrap_row(header, widths, fs) if header else None
@@ -486,8 +558,11 @@ def _resolve_figure(block):


 def _png_from_figure(fig) -> bytes:
+    # ``bbox_inches='tight'`` is kept so the real aspect ratio is what we measure
+    # and place. The page save (savefig.dpi in _RC) re-rasterizes this at the same
+    # high DPI, so the embedded image stays crisp for phone zoom.
    buf = io.BytesIO()
-    fig.savefig(buf, format="png", dpi=150, bbox_inches="tight")
+    fig.savefig(buf, format="png", dpi=_RASTER_DPI, bbox_inches="tight")
    buf.seek(0)
    return buf.read()

@@ -729,12 +804,16 @@ def _measure_data_table(block) -> float:
    Counts the optional title heading, the wrapped header row, every wrapped data
    row (per-column wrap via the same ``_col_widths``/``_wrap_row`` the placer
    uses) and the optional note. Keep this in sync with ``_place_data_table``."""
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Mirror the placer: a too-wide table is drawn as a single image, so its
+    # keep-together height is the image's, not the (squeezed) text layout's.
+    if not _table_fits_as_text(header, rows):
+        return _measure_figure_like(_table_figure_block(block))
    h = 0.0
    title = getattr(block, "title", None)
    if title:
        h += _measure_heading_text(title, 2)
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows, fs)
    lh = tl.line_height_in(fs)
@@ -766,6 +845,10 @@ def _measure_block(st: _PdfState, block) -> float:
            lines = tl.wrap(getattr(block, "text", ""),
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + _GAP
+        if kind == "toc_entry":
+            lines = tl.wrap(tl.strip_inline_md(getattr(block, "label", "")),
+                            tl.chars_per_line(_USABLE_W - 0.22, _FS_BODY)) or [""]
+            return tl.line_height_in(_FS_BODY) * len(lines) + _GAP * 0.4
        if kind == "kv_table":
            return _measure_kv_table(block)
        if kind == "data_table":
@@ -850,6 +933,38 @@ def _place_glossary_entry(st: _PdfState, block) -> None:
    st.y += _GAP * 0.5


+def _place_toc_entry(st: _PdfState, block) -> None:
+    """Render one clickable index line and record it as a link source.
+
+    Drawn as a bulleted line in the accent link colour; its rectangle is recorded
+    in ``st.toc_sources`` so the post-processor turns it into a real jump to the
+    target chapter's first page. If the target is never resolved the line still
+    shows as plain (accent) text — never cut, never broken."""
+    label = tl.strip_inline_md(getattr(block, "label", "")) or ""
+    target_id = getattr(block, "target_id", "") or ""
+    fs = _FS_BODY
+    lh = tl.line_height_in(fs)
+    bullet = "•  "
+    indent = 0.22
+    max_chars = tl.chars_per_line(_USABLE_W - indent, fs)
+    lines = tl.wrap(label, max_chars) or [""]
+    for idx, ln in enumerate(lines):
+        _ensure_space(st, lh)
+        x = _ML
+        st.fig.text(_xf(x), _yf(st.y), bullet if idx == 0 else "   ",
+                    fontsize=fs, color=_LINK, ha="left", va="top")
+        x += indent
+        w = _text_width_in(st, ln, fs, False)
+        st.fig.text(_xf(x), _yf(st.y), ln, fontsize=fs, color=_LINK,
+                    ha="left", va="top")
+        if target_id and idx == 0:
+            st.toc_sources.append({
+                "target_id": target_id, "page": st.page - 1,
+                "rect": _pt_rect(_ML, st.y, x + w, st.y + lh)})
+        st.y += lh
+    st.y += _GAP * 0.4
+
+
 _PLACERS = {
    "heading": _place_heading,
    "markdown": _place_markdown,
@@ -861,6 +976,7 @@ _PLACERS = {
    "note": _place_note,
    "group": _place_group,
    "glossary_entry": _place_glossary_entry,
+    "toc_entry": _place_toc_entry,
 }


@@ -892,6 +1008,15 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:
                    st.chapter = ch
                    st.chapter_pages = 0
                    _new_page(st)  # each chapter starts on a fresh page.
+                    # Record this chapter's first page as a link target for the
+                    # cover index (keyed by id AND title, since the cover only
+                    # knows titles). Point is the top of the content area.
+                    _start = {"page": st.page - 1,
+                              "point": [_ML * 72.0, _CONTENT_TOP * 72.0]}
+                    if ch.id:
+                        st.chapter_starts[ch.id] = _start
+                    if getattr(ch, "title", ""):
+                        st.chapter_starts.setdefault(ch.title, _start)
                    for block in ch.blocks:
                        placer = _PLACERS.get(getattr(block, "kind", ""),
                                              _place_note)
@@ -924,7 +1049,7 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:

    note = f"{n_pages} páginas"
    if n_links:
-        note += f" · {n_links} enlaces de glosario"
+        note += f" · {n_links} enlaces internos"
    if notes:
        note += " · " + "; ".join(notes)
    return {"path": out_path, "n_pages": n_pages, "chapters": chapters_meta,
@@ -932,9 +1057,11 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:


 def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
-    """Build {source rect → glossary dest} links and apply them via PyMuPDF.
+    """Apply internal PDF links via PyMuPDF: glossary terms + the cover index.

-    Returns the number of links applied (0 if there is nothing to wire or the
+    Builds two sets of GOTO links — every in-text glossary term → its entry, and
+    every cover ``TocEntry`` → its chapter's first page — and applies them in one
+    pass. Returns the number of links applied (0 if there is nothing to wire or the
    post-processor is unavailable). Never raises."""
    try:
        links = []
@@ -945,6 +1072,14 @@ def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
            links.append({
                "src_page": src["page"], "src_rect": src["rect"],
                "dst_page": dest["page"], "dst_point": dest["point"]})
+        # Cover index → chapter first page (clickable, navigable table of contents).
+        for src in st.toc_sources:
+            dest = st.chapter_starts.get(src.get("target_id"))
+            if not dest:
+                continue
+            links.append({
+                "src_page": src["page"], "src_rect": src["rect"],
+                "dst_page": dest["page"], "dst_point": dest["point"]})
        if not links:
            return 0
        from datascience.add_pdf_internal_links import add_pdf_internal_links
@@ -952,7 +1087,7 @@ def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
        if isinstance(res, dict) and res.get("status") == "ok":
            return int(res.get("n_links") or 0)
        if isinstance(res, dict) and res.get("error"):
-            notes.append(f"glosario sin enlaces: {res.get('error')}")
+            notes.append(f"enlaces internos no aplicados: {res.get('error')}")
    except Exception as e:  # noqa: BLE001 — links are best-effort.
-        notes.append(f"glosario sin enlaces: {e}")
+        notes.append(f"enlaces internos no aplicados: {e}")
    return 0
@@ -51,6 +51,12 @@ _FS_H1, _FS_H2, _FS_H3 = 20, 16, 13
 _FS_BODY, _FS_CELL, _FS_NOTE = 14, 11, 11
 _GAP = 0.12

+# Rasterization DPI for every embedded figure/table image. Raised from 150 to 220
+# so a viewer can zoom into a slide (or a shared picture) and read crisp detail —
+# axis labels, table cells — without pixelation. Kept moderate so the deck size
+# stays reasonable. Same value as the PDF renderer.
+_RASTER_DPI = 220
+

 class _PptxState:
    def __init__(self, prs, title: str):
@@ -65,6 +71,10 @@ class _PptxState:
        # Glossary wiring (mejora 6): runs to link and per-term target slide.
        self.term_runs = []           # [(key, run)]
        self.term_anchor_slide = {}   # key -> Slide (glossary entry)
+        # Clickable index (cover → chapter). toc_runs are the cover's index runs;
+        # chapter_starts maps a chapter id AND its title to its first slide.
+        self.toc_runs = []            # [(target_id, run, src_slide)]
+        self.chapter_starts = {}      # id|title -> Slide (chapter first slide)


 def _rgb(c):
@@ -309,6 +319,58 @@ def _col_widths(header, rows):
    return [_USABLE_W * w / total for w in clamped]


+# Minimal legible characters reserved per column when deciding whether a table
+# can be shown as a native (selectable) PowerPoint table. Below this width per
+# column the cells become unreadable, so the table is rasterized to a zoomable
+# high-res image instead. The 16:9 slide is wide, so more columns fit than on A5.
+_MIN_LEGIBLE_CHARS = 8
+_CELL_PAD = 0.05
+
+
+def _table_fits_as_text(header: list, rows: list) -> bool:
+    """True when the table fits the usable slide width as a readable table.
+
+    A table whose columns cannot each get a minimal legible width within the slide
+    usable width (typically many columns, e.g. a 19-column ``df.head``) is flagged
+    so it is rendered as one high-resolution image — the viewer zooms in and reads
+    every cell — instead of being squeezed unreadable. Narrow tables keep the
+    native selectable table."""
+    header = header or []
+    rows = rows or []
+    ncol = len(header) if header else (len(rows[0]) if rows else 1)
+    ncol = max(1, ncol)
+    cw = tl.avg_char_width_in(_FS_CELL)
+    min_needed = ncol * (_MIN_LEGIBLE_CHARS * cw + _CELL_PAD * 2)
+    return min_needed <= _USABLE_W
+
+
+def _table_figure_block(block):
+    """Wrap a too-wide table as a lazily-rasterized Figure (cached on the block).
+
+    Drawn once via ``render_table_as_figure`` (header shading + zebra) and embedded
+    as one high-res image scaled to fit entirely. The title/note are drawn inside
+    the image (self-describing when zoomed/shared), so no separate caption is
+    emitted. Reused for measuring and placing so keep-together stays consistent."""
+    cached = getattr(block, "_aeda_tablefig", None)
+    if cached is not None:
+        return cached
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    title = getattr(block, "title", None)
+    note = getattr(block, "note", None)
+
+    def _make():
+        from datascience.render_table_as_figure import render_table_as_figure
+        return render_table_as_figure(header, rows, title=title, note=note)
+
+    fig = model.Figure(make=_make, caption=None)
+    try:
+        block._aeda_tablefig = fig
+    except Exception:  # noqa: BLE001 — block may reject attributes; degrade.
+        pass
+    return fig
+
+
 def _row_height_in(cells, widths, fs) -> float:
    lh = tl.line_height_in(fs)
    maxlines = 1
@@ -372,11 +434,27 @@ def _style_cell(cell, fs, color, bold, fill) -> None:

 def _place_data_table(st: _PptxState, block, shaded_header=True,
                      key_value=False) -> None:
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Too many columns to be legible as a native table → render the whole table as
+    # one high-res picture, scaled to fit entirely (the viewer zooms to read it).
+    # KVTables (rendered here as a 2-column Campo/Valor table) are excluded: they
+    # always fit in width and stay as a selectable table.
+    if not key_value and not _table_fits_as_text(header, rows):
+        figblock = _table_figure_block(block)
+        data, _asp = _figure_bytes_cached(figblock)
+        if data is None:
+            _add_text(st, ["(tabla no disponible)"], _FS_NOTE, _MUTED,
+                      italic=True)
+            st.y += _GAP
+            return
+        _place_picture_bytes(st, data, None,
+                             max_h_in=getattr(figblock, "height_in", None),
+                             force_caption=False)
+        return
    title = getattr(block, "title", None)
    if title:
        _place_heading(st, model.Heading(title, level=2))
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows)
    header_h = _row_height_in(header, widths, fs) if header else 0.0
@@ -436,7 +514,7 @@ def _resolve_png(block):
    try:
        import matplotlib.pyplot as plt
        buf = io.BytesIO()
-        f.savefig(buf, format="png", dpi=150, bbox_inches="tight")
+        f.savefig(buf, format="png", dpi=_RASTER_DPI, bbox_inches="tight")
        buf.seek(0)
        return buf.read()
    except Exception:  # noqa: BLE001
@@ -483,12 +561,15 @@ def _figure_bytes_cached(block):


 def _place_picture_bytes(st: _PptxState, data: bytes, caption,
-                         max_h_in=None) -> None:
+                         max_h_in=None, force_caption=True) -> None:
    # Mejora 4 — every figure on a slide carries a visible caption/title. If the
    # block has no caption, fall back to the current section heading, then to a
-    # generic label, so no image is ever shown untitled.
-    caption = (model._safe_str(caption).strip()
-               or model._safe_str(st.last_heading).strip() or "Figura")
+    # generic label, so no image is ever shown untitled. ``force_caption=False``
+    # suppresses that fallback (used for table images, whose title is inside the
+    # picture) so no redundant caption is drawn.
+    caption = model._safe_str(caption).strip()
+    if not caption and force_caption:
+        caption = model._safe_str(st.last_heading).strip() or "Figura"
    w_px, h_px = _img_size_px(data)
    aspect = (h_px / w_px) if w_px else 0.66
    # Reserve the caption's REAL (possibly multi-line) height FIRST, then scale
@@ -496,9 +577,11 @@ def _place_picture_bytes(st: _PptxState, data: bytes, caption,
    # so its caption always fits on the SAME slide and no image is untitled.
    # cap_real = what _add_text consumes; cap_reserve adds the post-image gap and
    # a small cushion so the caption never spills to the next slide.
-    cap_lines = tl.wrap(caption, tl.chars_per_line(_USABLE_W, _FS_NOTE))
-    cap_real = tl.line_height_in(_FS_NOTE) * len(cap_lines) + 0.05
-    cap_reserve = cap_real + 0.05 + 0.10
+    cap_lines = tl.wrap(caption, tl.chars_per_line(_USABLE_W, _FS_NOTE)) \
+        if caption else []
+    cap_real = (tl.line_height_in(_FS_NOTE) * len(cap_lines) + 0.05) \
+        if cap_lines else 0.0
+    cap_reserve = (cap_real + 0.05 + 0.10) if cap_lines else 0.05
    max_h = _CONTENT_BOTTOM - _CONTENT_TOP
    # height_in hint (model.Figure/Image): cap the target height so a figure in a
    # keep-together Group shrinks to leave room for its heading and text.
@@ -517,7 +600,8 @@ def _place_picture_bytes(st: _PptxState, data: bytes, caption,
    st.slide.shapes.add_picture(io.BytesIO(data), Inches(left), Inches(st.y),
                                width=Inches(target_w), height=Inches(target_h))
    st.y += target_h + 0.05
-    _add_text(st, cap_lines, _FS_NOTE, _MUTED, italic=True)
+    if cap_lines:
+        _add_text(st, cap_lines, _FS_NOTE, _MUTED, italic=True)
    st.y += _GAP


@@ -663,12 +747,16 @@ def _measure_kv_table(block) -> float:
 def _measure_data_table(block) -> float:
    """Faithful DataTable height — matches ``_place_data_table`` (title heading +
    wrapped header + every wrapped row + optional note). Keep in sync."""
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Mirror the placer: a too-wide table is drawn as one image, so its
+    # keep-together height is the image's, not the (squeezed) table layout's.
+    if not _table_fits_as_text(header, rows):
+        return _measure_figure_like(_table_figure_block(block))
    h = 0.0
    title = getattr(block, "title", None)
    if title:
        h += _measure_heading_text(title, 2)
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows)
    if header:
@@ -698,6 +786,10 @@ def _measure_block(st: _PptxState, block) -> float:
            lines = tl.wrap(getattr(block, "text", ""),
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + 0.05 + _GAP
+        if kind == "toc_entry":
+            lines = tl.wrap(tl.strip_inline_md(getattr(block, "label", "")),
+                            tl.chars_per_line(_USABLE_W - 0.3, _FS_BODY)) or [""]
+            return tl.line_height_in(_FS_BODY) * len(lines) + 0.05
        if kind == "kv_table":
            return _measure_kv_table(block)
        if kind == "data_table":
@@ -810,6 +902,73 @@ def _fit_group_blocks(st: _PptxState, blocks: list, avail_full: float) -> list:
    return out


+def _fit_img(width_col: float, aspect: float, max_h: float):
+    """Scale an image to ``width_col`` then clamp to ``max_h`` keeping aspect."""
+    w = width_col
+    h = w * aspect
+    if h > max_h:
+        h = max_h
+        w = (h / aspect) if aspect else width_col
+    return w, h
+
+
+def _place_group_side_by_side(st: _PptxState, block, avail_full: float) -> bool:
+    """Place a Group's table (left ~55%) next to its figure (right ~45%).
+
+    Both the table and the figure are rasterized to high-res images and placed in
+    two columns of the SAME slide; any other blocks (e.g. a heading) render full
+    width above the pair, the rest below. Returns True on success; returns False
+    (so the caller falls back to stacking) when the group has no table+figure pair
+    or the pair cannot fit side by side on one slide. Never raises by itself."""
+    blocks = getattr(block, "blocks", []) or []
+    tbl = next((b for b in blocks
+                if getattr(b, "kind", "") in ("data_table", "kv_table")), None)
+    fig = next((b for b in blocks
+                if getattr(b, "kind", "") in ("figure", "image")), None)
+    if tbl is None or fig is None:
+        return False
+    gap_col = 0.3
+    left_w = _USABLE_W * 0.55 - gap_col / 2.0
+    right_w = _USABLE_W * 0.45 - gap_col / 2.0
+    if left_w <= 1.0 or right_w <= 1.0:
+        return False
+    tdata, tasp = _figure_bytes_cached(_table_figure_block(tbl))
+    fdata, fasp = _figure_bytes_cached(fig)
+    if not tdata or not fdata:
+        return False
+    ti, fi = blocks.index(tbl), blocks.index(fig)
+    lo = min(ti, fi)
+    lead = list(blocks[:lo])
+    rest = [b for b in blocks[lo + 1:] if b is not tbl and b is not fig]
+    lead_h = sum(_measure_block(st, b) for b in lead)
+    rest_h = sum(_measure_block(st, b) for b in rest)
+    col_max_h = avail_full - lead_h - rest_h - _GAP * 2
+    if col_max_h < 1.2:
+        return False  # not enough vertical room to put the pair side by side.
+    tw, th = _fit_img(left_w, tasp, col_max_h)
+    fw, fh = _fit_img(right_w, fasp, col_max_h)
+    band = max(th, fh)
+    needed = lead_h + band + rest_h + _GAP * 2
+    if needed > avail_full:
+        return False  # taller than a whole slide even side by side → stack.
+    if needed > _remaining(st):
+        _new_slide(st, cont=True)
+    for b in lead:
+        _PLACERS.get(getattr(b, "kind", ""), _place_note)(st, b)
+    top = st.y
+    f_left = _ML + left_w + gap_col
+    st.slide.shapes.add_picture(
+        io.BytesIO(tdata), Inches(_ML + (left_w - tw) / 2.0),
+        Inches(top + (band - th) / 2.0), width=Inches(tw), height=Inches(th))
+    st.slide.shapes.add_picture(
+        io.BytesIO(fdata), Inches(f_left + (right_w - fw) / 2.0),
+        Inches(top + (band - fh) / 2.0), width=Inches(fw), height=Inches(fh))
+    st.y = top + band + _GAP
+    for b in rest:
+        _PLACERS.get(getattr(b, "kind", ""), _place_note)(st, b)
+    return True
+
+
 def _place_group(st: _PptxState, block) -> None:
    """Render a keep-together Group: move it whole to the next slide if needed."""
    blocks = getattr(block, "blocks", []) or []
@@ -820,6 +979,14 @@ def _place_group(st: _PptxState, block) -> None:
    if getattr(block, "page_break_before", False) and st.y > _CONTENT_TOP + 1e-6:
        _new_slide(st, cont=True)
    avail_full = _CONTENT_BOTTOM - _CONTENT_TOP
+    # layout="side_by_side": try table-left / figure-right on one slide; on any
+    # reason it can't, fall through to the normal stacked keep-together below.
+    if str(getattr(block, "layout", "stack")).lower() == "side_by_side":
+        try:
+            if _place_group_side_by_side(st, block, avail_full):
+                return
+        except Exception:  # noqa: BLE001 — degrade to stacking, never abort.
+            pass
    # Trim oversized tables first (keeps the chart on the same slide), then shrink
    # the figure to share the remaining room.
    blocks = _fit_group_blocks(st, blocks, avail_full)
@@ -853,6 +1020,44 @@ def _place_glossary_entry(st: _PptxState, block) -> None:
    st.y += _GAP


+def _place_toc_entry(st: _PptxState, block) -> None:
+    """Render one clickable index line and record its run as a link source.
+
+    Drawn as a bulleted line in the accent link colour; the run is recorded in
+    ``st.toc_runs`` so it later becomes a native slide-jump to the target chapter's
+    first slide. If the target is never resolved the line still shows as plain
+    (accent) text — never cut."""
+    label = tl.strip_inline_md(getattr(block, "label", "")) or ""
+    target_id = getattr(block, "target_id", "") or ""
+    fs = _FS_BODY
+    lines = tl.wrap(label, tl.chars_per_line(_USABLE_W - 0.3, fs)) or [""]
+    lh = tl.line_height_in(fs)
+    height = lh * len(lines) + 0.05
+    _ensure(st, height)
+    box = st.slide.shapes.add_textbox(
+        Inches(_ML), Inches(st.y), Inches(_USABLE_W), Inches(height))
+    tf = box.text_frame
+    tf.word_wrap = True
+    first = True
+    link_run = None
+    for idx, ln in enumerate(lines):
+        p = tf.paragraphs[0] if first else tf.add_paragraph()
+        first = False
+        r0 = p.add_run()
+        r0.text = "•  " if idx == 0 else "   "
+        r0.font.size = Pt(fs)
+        r0.font.color.rgb = _rgb(_LINK)
+        run = p.add_run()
+        run.text = ln
+        run.font.size = Pt(fs)
+        run.font.color.rgb = _rgb(_LINK)
+        if idx == 0:
+            link_run = run
+    if target_id and link_run is not None:
+        st.toc_runs.append((target_id, link_run, st.slide))
+    st.y += height
+
+
 _PLACERS = {
    "heading": _place_heading,
    "markdown": _place_markdown,
@@ -864,6 +1069,7 @@ _PLACERS = {
    "note": _place_note,
    "group": _place_group,
    "glossary_entry": _place_glossary_entry,
+    "toc_entry": _place_toc_entry,
 }


@@ -899,6 +1105,12 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:
            st.chapter = ch
            st.chapter_slides = 0
            _new_slide(st, cont=False)
+            # Record this chapter's first slide as a link target for the cover
+            # index (keyed by id AND title, since the cover only knows titles).
+            if ch.id:
+                st.chapter_starts[ch.id] = st.slide
+            if getattr(ch, "title", ""):
+                st.chapter_starts.setdefault(ch.title, st.slide)
            for block in ch.blocks:
                placer = _PLACERS.get(getattr(block, "kind", ""), _place_note)
                try:
@@ -926,7 +1138,7 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:

    note = f"{n_slides} slides"
    if n_links:
-        note += f" · {n_links} enlaces de glosario"
+        note += f" · {n_links} enlaces internos"
    if notes:
        note += " · " + "; ".join(notes)
    return {"path": out_path, "n_slides": n_slides, "chapters": chapters_meta,
@@ -934,19 +1146,21 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:


 def _wire_glossary_links(st: _PptxState, notes: list) -> int:
-    """Turn each recorded term run into a native jump to its glossary slide.
+    """Apply native slide-jumps: glossary terms + the cover index.

-    Returns the number of links applied. A term whose only appearance is inside
-    its own glossary entry (source slide == target slide) is skipped. Never
+    Each in-text glossary term run jumps to its glossary entry slide, and each
+    cover ``TocEntry`` run jumps to its chapter's first slide. Returns the total
+    number of links applied. A run whose target is its own slide is skipped. Never
    raises."""
-    if not st.term_runs or not st.term_anchor_slide:
+    if not (st.term_runs and st.term_anchor_slide) and not (
+            st.toc_runs and st.chapter_starts):
        return 0
-    linked = 0
    try:
        from datascience.pptx_link_run_to_slide import pptx_link_run_to_slide
    except Exception as e:  # noqa: BLE001
-        notes.append(f"glosario sin enlaces: {e}")
+        notes.append(f"enlaces internos no aplicados: {e}")
        return 0
+    linked = 0
    for key, run, src_slide in st.term_runs:
        tgt = st.term_anchor_slide.get(key)
        if tgt is None or tgt is src_slide:
@@ -956,4 +1170,14 @@ def _wire_glossary_links(st: _PptxState, notes: list) -> int:
                linked += 1
        except Exception:  # noqa: BLE001 — links are best-effort.
            pass
+    # Cover index → chapter first slide (clickable, navigable table of contents).
+    for target_id, run, src_slide in st.toc_runs:
+        tgt = st.chapter_starts.get(target_id)
+        if tgt is None or tgt is src_slide:
+            continue
+        try:
+            if pptx_link_run_to_slide(run, src_slide, tgt):
+                linked += 1
+        except Exception:  # noqa: BLE001 — links are best-effort.
+            pass
    return linked
@@ -0,0 +1,283 @@
+"""Golden tests for the global render-quality features (issue: eda-render-quality).
+
+Covers, with executable evidence:
+  * High DPI: every embedded figure is rasterized at 220 dpi, so a phone reader
+    can zoom in and still see crisp detail.
+  * Wide table → image: a table too wide to be legible as text (e.g. a 19-column
+    df.head) is rendered as one high-res image that scales to fit entirely, while
+    a narrow table keeps its selectable-text/native-table rendering.
+  * ``Group(layout="side_by_side")``: in PPTX the table and figure are placed in
+    two columns of the same slide; in PDF the same group stacks vertically.
+  * Backward compatibility: a Group without ``layout`` defaults to ``"stack"`` and
+    a fitting table renders exactly as before.
+
+Renderers are invoked for real; PDFs are inspected with PyMuPDF and PPTX decks
+with python-pptx.
+"""
+
+from __future__ import annotations
+
+import os
+import tempfile
+
+import matplotlib
+
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt  # noqa: E402
+
+import pytest  # noqa: E402
+
+from datascience.automatic_eda import model  # noqa: E402
+from datascience.automatic_eda.render_pdf_impl import (  # noqa: E402
+    render_pdf, _RASTER_DPI as _PDF_DPI, _table_fits_as_text as _pdf_fits)
+from datascience.automatic_eda.render_pptx_impl import (  # noqa: E402
+    render_pptx, _RASTER_DPI as _PPTX_DPI, _table_fits_as_text as _pptx_fits)
+
+
+# --------------------------------------------------------------------------- #
+# Helpers.
+# --------------------------------------------------------------------------- #
+def _simple_fig():
+    """A small, real matplotlib figure for the figure blocks."""
+    fig, ax = plt.subplots(figsize=(4, 3))
+    ax.plot([0, 1, 2, 3], [1, 3, 2, 4])
+    ax.set_title("demo")
+    return fig
+
+
+def _wide_table(n_cols=19, n_rows=5):
+    header = [f"columna_{i}" for i in range(n_cols)]
+    rows = [[f"v{r}_{c}" for c in range(n_cols)] for r in range(n_rows)]
+    return model.DataTable(header=header, rows=rows, title="Primeras filas")
+
+
+def _narrow_table():
+    return model.DataTable(header=["a", "b", "c"],
+                           rows=[["1", "2", "3"], ["4", "5", "6"]],
+                           title="Tabla estrecha")
+
+
+def _chapter(blocks, cid="cap", title="Capítulo"):
+    return [model.Chapter(id=cid, title=title, version="1.0.0", blocks=blocks)]
+
+
+# --------------------------------------------------------------------------- #
+# 1) High DPI — the unit constant and a real embedded image.
+# --------------------------------------------------------------------------- #
+def test_raster_dpi_is_high_both_renderers():
+    assert _PDF_DPI >= 200, "el DPI del PDF debe ser alto (>=200)"
+    assert _PPTX_DPI >= 200, "el DPI del PPTX debe ser alto (>=200)"
+
+
+def test_pdf_embedded_figure_is_high_resolution(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "fig.pdf")
+    res = render_pdf(_chapter([model.Figure(make=_simple_fig, caption="demo")]),
+                     out, {"title": "T"})
+    assert res["path"] == out
+    doc = fitz.open(out)
+    try:
+        widths = []
+        for page in doc:
+            for img in page.get_images(full=True):
+                xref = img[0]
+                info = doc.extract_image(xref)
+                widths.append(info.get("width", 0))
+        assert widths, "no se incrustó ninguna imagen en el PDF"
+        # A ~4" figure rasterized at 220 dpi is ~ >850 px wide. At the old 150 dpi
+        # it would be ~600 px. The high-res threshold proves the DPI bump.
+        assert max(widths) >= 800, \
+            f"la figura embebida no es de alta resolución: {max(widths)} px"
+    finally:
+        doc.close()
+
+
+# --------------------------------------------------------------------------- #
+# 2) Wide table → image (PDF and PPTX); narrow table stays text.
+# --------------------------------------------------------------------------- #
+def test_fit_criterion_flags_wide_and_keeps_narrow():
+    wide = _wide_table()
+    narrow = _narrow_table()
+    assert not _pdf_fits(wide.header, wide.rows), \
+        "una tabla de 19 columnas debería NO caber como texto en A5"
+    assert not _pptx_fits(wide.header, wide.rows), \
+        "una tabla de 19 columnas debería NO caber como tabla nativa en 16:9"
+    assert _pdf_fits(narrow.header, narrow.rows), \
+        "una tabla de 3 columnas debería caber como texto en A5"
+    assert _pptx_fits(narrow.header, narrow.rows), \
+        "una tabla de 3 columnas debería caber como tabla nativa en 16:9"
+
+
+def test_wide_table_rendered_as_image_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "wide.pdf")
+    res = render_pdf(_chapter([_wide_table()]), out, {"title": "T"})
+    assert res["path"] == out
+    doc = fitz.open(out)
+    try:
+        n_images = sum(len(page.get_images(full=True)) for page in doc)
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    assert n_images >= 1, "la tabla ancha no se rasterizó como imagen en el PDF"
+    # The cells are now inside the image, not selectable text. A unique cell value
+    # must therefore NOT appear as extractable text (it lives in the picture).
+    assert "v4_18" not in text, \
+        "la tabla ancha sigue como texto seleccionable (no se hizo imagen)"
+
+
+def test_narrow_table_stays_selectable_text_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "narrow.pdf")
+    render_pdf(_chapter([_narrow_table()]), out, {"title": "T"})
+    doc = fitz.open(out)
+    try:
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    # Narrow table is selectable text: its header/cells are extractable.
+    for v in ("a", "b", "c", "1", "6"):
+        assert v in text, f"la celda '{v}' debería ser texto seleccionable"
+
+
+def test_wide_table_rendered_as_picture_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    out = str(tmp_path / "wide.pptx")
+    res = render_pptx(_chapter([_wide_table()]), out, {"title": "T"})
+    assert res["path"] == out
+    prs = pptx.Presentation(out)
+    pics = sum(1 for s in prs.slides for sh in s.shapes
+               if sh.shape_type == MSO_SHAPE_TYPE.PICTURE)
+    assert pics >= 1, "la tabla ancha no se colocó como imagen en el PPTX"
+
+
+# --------------------------------------------------------------------------- #
+# 3) Group(layout="side_by_side"): two columns in PPTX, stacked in PDF.
+# --------------------------------------------------------------------------- #
+def _side_by_side_group():
+    return model.Group(
+        blocks=[model.Heading(text="Columna X", level=2),
+                _narrow_table(),
+                model.Figure(make=_simple_fig, caption="grafico")],
+        layout="side_by_side")
+
+
+def test_side_by_side_places_two_columns_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    from pptx.util import Inches
+    out = str(tmp_path / "sbs.pptx")
+    render_pptx(_chapter([_side_by_side_group()]), out, {"title": "T"})
+    prs = pptx.Presentation(out)
+    # Find the slide that holds the pair (table image + figure image).
+    centre_emu = int(Inches(13.333 / 2.0))
+    placed = False
+    for s in prs.slides:
+        lefts = [sh.left for sh in s.shapes
+                 if sh.shape_type == MSO_SHAPE_TYPE.PICTURE
+                 and sh.left is not None]
+        if len(lefts) >= 2:
+            # one picture starts in the left half, another in the right half.
+            if min(lefts) < centre_emu and max(lefts) > centre_emu:
+                placed = True
+                break
+    assert placed, \
+        "side_by_side no colocó tabla y figura en dos columnas de la misma slide"
+
+
+def test_side_by_side_stacks_in_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "sbs.pdf")
+    res = render_pdf(_chapter([_side_by_side_group()]), out, {"title": "T"})
+    assert res["path"] == out and res["n_pages"] >= 1
+    doc = fitz.open(out)
+    try:
+        n_images = sum(len(page.get_images(full=True)) for page in doc)
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    # PDF stacks: the narrow table stays selectable text (1 of its cells is
+    # extractable) and the figure is the single embedded image — not a 2-column
+    # pair of pictures like PPTX.
+    assert n_images == 1, "el PDF no debería usar el layout de dos imágenes"
+    assert "Columna X" in text and "1" in text, \
+        "la tabla del grupo debería seguir como texto apilado en el PDF"
+
+
+# --------------------------------------------------------------------------- #
+# 4) Backward compatibility — default layout stacks, fitting table unchanged.
+# --------------------------------------------------------------------------- #
+def test_group_default_layout_is_stack():
+    g = model.Group(blocks=[_narrow_table()])
+    assert g.layout == "stack", "el layout por defecto debe ser 'stack'"
+
+
+# --------------------------------------------------------------------------- #
+# 5) Clickable cover index ("Índice") → chapter first page/slide.
+# --------------------------------------------------------------------------- #
+def _doc_with_index():
+    portada = model.Chapter(id="portada", title="Portada", version="1.0.0",
+                            blocks=[model.Heading(text="Índice", level=2),
+                                    model.TocEntry(label="Distribuciones",
+                                                   target_id="Distribuciones")])
+    cap = model.Chapter(id="num", title="Distribuciones", version="1.0.0",
+                        blocks=[model.Markdown(text="contenido del capítulo")])
+    return [portada, cap]
+
+
+def test_cover_index_is_clickable_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "idx.pdf")
+    res = render_pdf(_doc_with_index(), out, {"title": "T"})
+    assert res["path"] == out
+    doc = fitz.open(out)
+    try:
+        # The cover (page 0) must carry a GOTO link jumping to a later page.
+        goto = [lk for lk in doc[0].get_links()
+                if lk.get("kind") == fitz.LINK_GOTO and lk.get("page", 0) > 0]
+    finally:
+        doc.close()
+    assert goto, "el índice de la portada no produjo enlaces clicables en el PDF"
+
+
+def test_cover_index_shows_heading_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "idxh.pdf")
+    render_pdf(_doc_with_index(), out, {"title": "T"})
+    doc = fitz.open(out)
+    try:
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    assert "Índice" in text, "la portada no muestra el encabezado 'Índice'"
+    assert "Este informe incluye" not in text, \
+        "la portada aún muestra el texto antiguo 'Este informe incluye'"
+
+
+def test_cover_index_is_clickable_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    out = str(tmp_path / "idx.pptx")
+    render_pptx(_doc_with_index(), out, {"title": "T"})
+    prs = pptx.Presentation(out)
+    cover_xml = prs.slides[0]._element.xml
+    assert "hlinksldjump" in cover_xml, \
+        "el índice de la portada no produjo un salto de slide nativo en el PPTX"
+
+
+def test_default_group_renders_like_before_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    out = str(tmp_path / "stack.pptx")
+    grp = model.Group(blocks=[model.Heading(text="Y", level=2),
+                              _narrow_table(),
+                              model.Figure(make=_simple_fig, caption="g")])
+    render_pptx(_chapter([grp]), out, {"title": "T"})
+    prs = pptx.Presentation(out)
+    # Stacked group: the narrow table is a NATIVE table (selectable), and there is
+    # exactly one picture (the figure) — not the two-image side-by-side layout.
+    n_tables = sum(1 for s in prs.slides for sh in s.shapes if sh.has_table)
+    n_pics = sum(1 for s in prs.slides for sh in s.shapes
+                 if sh.shape_type == MSO_SHAPE_TYPE.PICTURE)
+    assert n_tables >= 1, "el grupo apilado debería usar una tabla nativa"
+    assert n_pics == 1, "el grupo apilado no debería duplicar imágenes"
@@ -0,0 +1,111 @@
+---
+id: categorical_top_bar_figure_py_datascience
+name: categorical_top_bar_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def categorical_top_bar_figure(top: list, n_distinct: int = 0, title: str = \"\", top_k: int = 6, n_rows=None) -> \"matplotlib.figure.Figure\""
+description: "Construye una figura matplotlib de barras horizontales de las top_k categorías más frecuentes de una columna categórica, con la mayor arriba y agregando el resto en una barra gris \"Otros (N categorías)\". Contrato de entrada idéntico a categorical_top_pie_figure (swap directo donut↔barras): consume el bloque `top` de summarize_categorical y devuelve un matplotlib.figure.Figure listo para rasterizar por el renderer del informe EDA. Backend Agg sin pyplot global; defensivo total ante top vacío/None, nunca lanza."
+tags: [eda, categorical, bar, barh, matplotlib, figure, visualization, datascience, impure]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib]
+example: |
+  from categorical_top_bar_figure import categorical_top_bar_figure
+  top = [
+      {"value": "rojo", "count": 40, "pct": 0.4},
+      {"value": "azul", "count": 30, "pct": 0.3},
+      {"value": "verde", "count": 20, "pct": 0.2},
+  ]
+  fig = categorical_top_bar_figure(top, n_distinct=12, title="color", top_k=6, n_rows=100)
+tested: true
+tests:
+  - "test_returns_figure"
+  - "test_ten_items_topk_six_yields_seven_bars"
+  - "test_empty_top_does_not_raise_and_returns_figure"
+  - "test_long_value_truncated"
+  - "test_none_value_and_none_count_are_handled"
+  - "test_n_rows_adds_exact_others_bar"
+test_file_path: "python/functions/datascience/categorical_top_bar_figure_test.py"
+file_path: "python/functions/datascience/categorical_top_bar_figure.py"
+params:
+  - name: top
+    desc: "Lista de dicts {value, count, pct} ordenada de mayor a menor por count (salida del bloque `top` de summarize_categorical). Puede venir vacía o con dicts incompletos: items no-dict, sin count, con count None o count <= 0 se descartan. value None se admite (etiqueta vacía)."
+  - name: n_distinct
+    desc: "Nº total de categorías distintas de la columna. Etiqueta la barra agregada como \"Otros (n_distinct - top_k)\" (mínimo 0). Si no supera el nº de barras mostradas, se usa el overflow real de `top` como nº de categorías agregadas. Default 0."
+  - name: title
+    desc: "Título de la figura (nombre de la columna). Se trunca a ~48 chars con elipsis si es muy largo. Default \"\" (sin título)."
+  - name: top_k
+    desc: "Nº máximo de barras explícitas. Default 6. La barra \"Otros\" no cuenta contra este límite. Con top_k <= 0 se muestra al menos la categoría mayor."
+  - name: n_rows
+    desc: "Opcional. Total de filas del dataset. Si se da y la suma de counts mostrados < n_rows, la barra \"Otros\" usa (n_rows - suma_mostrada) como count para que sea exacta respecto al total real. Si se omite, \"Otros\" usa la suma de counts fuera del top_k mostrado (solo cuando top trae más de top_k items). Default None."
+output: "Un matplotlib.figure.Figure (figsize 6.4 x altura escalada con el nº de barras, dpi 150) con un Axes de barras horizontales: la categoría más frecuente arriba, la barra gris \"Otros (N categorías)\" abajo, cada barra anotada con su conteo y porcentaje al final y etiquetas de categoría (yticklabels) truncadas a ~22 chars. Si no hay counts válidos devuelve igualmente una Figure con un texto centrado \"sin datos categóricos\" (nunca lanza); cualquier error inesperado cae a una Figure con el texto del error. El caller rasteriza/cierra la figura; la función no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+from categorical_top_bar_figure import categorical_top_bar_figure
+
+# `top` es la salida del bloque "top" de summarize_categorical (ya ordenado desc).
+top = [
+    {"value": "rojo", "count": 40, "pct": 0.40},
+    {"value": "azul", "count": 30, "pct": 0.30},
+    {"value": "verde", "count": 20, "pct": 0.20},
+    {"value": "amarillo", "count": 5, "pct": 0.05},
+]
+
+fig = categorical_top_bar_figure(
+    top,
+    n_distinct=12,            # 12 categorías distintas en total
+    title="color_producto",
+    top_k=6,                  # hasta 6 barras explícitas
+    n_rows=100,               # "Otros" = 100 - 95 = 5, sobre 8 categorías agregadas
+)
+
+# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
+fig.savefig("/tmp/barras_color.png")
+```
+
+## Cuando usarla
+
+Úsala dentro de un informe EDA cuando quieras comparar **magnitudes** de las
+categorías dominantes de una columna categórica: qué categoría manda y por
+cuánto frente a las siguientes. Pásale directamente el bloque `top` de
+`summarize_categorical` (ya ordenado de mayor a menor) más `n_distinct` para que
+la barra "Otros" indique cuántas categorías quedan agrupadas. Es el clon "de
+barras" del donut `categorical_top_pie_figure` con **contrato de entrada
+idéntico**: puedes intercambiar una por otra sin tocar el caller. Elige barras
+cuando importe comparar tamaños exactos; el donut cuando importe la proporción
+del total.
+
+## Gotchas
+
+- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
+  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
+  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
+  es thread-safe; esta función evita ese riesgo construyendo el `Figure`
+  directamente, así que es segura de llamar en bucle desde el renderer.
+- **El caller cierra la figura.** La función devuelve el `Figure` pero no lo
+  muestra ni lo guarda. Quien la consume debe rasterizarla y luego liberarla
+  (`fig.clf()` / `matplotlib.pyplot.close(fig)` si se usó pyplot en el caller)
+  para no acumular memoria en lotes grandes de columnas.
+- **`barh` dibuja de abajo arriba.** La categoría más frecuente va arriba porque
+  el orden de display se invierte antes de plotear; la barra "Otros" queda
+  siempre al fondo. No reordenes `top` esperando otro layout: la función asume
+  que ya viene ordenado desc por count.
+- **Magnitud exacta de "Otros" solo con `n_rows`.** Sin `n_rows`, la barra
+  "Otros" se calcula con el overflow presente en `top`; si `top` ya viene
+  recortado a `top_k` por el productor, no habrá "Otros" aunque existan más
+  categorías. Pasa `n_rows` (total de filas del dataset) para una barra correcta
+  respecto al total real.
+- **Defensiva, nunca lanza.** `top=[]`, `value=None`, `count=None` o counts no
+  numéricos se manejan sin error: en el peor caso devuelve una `Figure` con
+  "sin datos categóricos", y cualquier excepción inesperada cae a una `Figure`
+  con el texto del error. No envuelvas la llamada en try/except por miedo a un
+  raise — no lo hay.
@@ -0,0 +1,233 @@
+"""Impure EDA helper: horizontal bar figure of the most common categories (`eda` group).
+
+Builds a horizontal bar chart of the ``top_k`` most frequent categories of a
+categorical column, folding everything else into a single gray
+"Otros (N categorías)" bar. The most frequent category sits at the top, each bar
+labelled with its count (and percentage) at the end. Returns a ready-to-rasterize
+``matplotlib.figure.Figure``; it never shows nor saves it.
+
+This is the "magnitude" twin of ``categorical_top_pie_figure``: identical input
+contract (same ``top``/``n_distinct``/``title``/``top_k``/``n_rows`` signature) so
+it can be swapped in directly, but it communicates comparable magnitudes via bars
+instead of proportions via wedges.
+
+Impure because it touches matplotlib's rendering machinery. It uses the headless
+Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
+global state and is safe to call repeatedly from a report renderer.
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+from matplotlib.figure import Figure  # noqa: E402
+
+
+# Gray reserved for the aggregated "Otros" bar.
+_OTHER_COLOR = "#9e9e9e"
+# Muted gray for secondary text (title fallback, no-data message).
+_MUTED_TEXT = "#5f6b7a"
+# Soft red for the error fallback message.
+_ERROR_TEXT = "#b00020"
+# Pleasant, colour-blind-friendly qualitative palette for the explicit bars.
+_PALETTE = [
+    "#4C72B0",
+    "#DD8452",
+    "#55A868",
+    "#C44E52",
+    "#8172B3",
+    "#937860",
+    "#DA8BC3",
+    "#8C8C8C",
+    "#CCB974",
+    "#64B5CD",
+]
+
+
+def _truncate(text, width: int = 22) -> str:
+    """Truncate ``text`` to ``width`` chars, appending an ellipsis if cut."""
+    s = "" if text is None else str(text)
+    if len(s) <= width:
+        return s
+    if width <= 1:
+        return s[:width]
+    return s[: width - 1] + "…"
+
+
+def _message_figure(message: str, color: str = _MUTED_TEXT, title: str = "") -> "Figure":
+    """Return a fallback ``Figure`` carrying a single centered message."""
+    fig = Figure(figsize=(6.4, 4.0), dpi=150)
+    ax = fig.add_subplot(111)
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=12,
+        color=color,
+        wrap=True,
+        transform=ax.transAxes,
+    )
+    if title:
+        ax.set_title(_truncate(title, 48), fontsize=12, loc="center", pad=8)
+    fig.tight_layout()
+    return fig
+
+
+def categorical_top_bar_figure(
+    top: list,
+    n_distinct: int = 0,
+    title: str = "",
+    top_k: int = 6,
+    n_rows=None,
+) -> "matplotlib.figure.Figure":
+    """Build a horizontal bar figure of the most common categories of a column.
+
+    Renders the ``top_k`` most frequent categories as explicit horizontal bars,
+    largest at the top, and aggregates every remaining category into a single
+    gray "Otros (N categorías)" bar at the bottom. Each bar is annotated with its
+    count and percentage of the total at the end of the bar; the category names
+    are truncated Y tick labels.
+
+    The function shares the exact input contract of
+    ``categorical_top_pie_figure`` (the donut twin) so it is a drop-in swap. It is
+    fully defensive: empty input, missing/``None`` values or counts never raise.
+    When there is nothing valid to draw it still returns a ``Figure`` carrying a
+    centered "sin datos categóricos" message, and any unexpected error is caught
+    and turned into a fallback ``Figure`` carrying the error text.
+
+    Args:
+        top: List of ``{value, count, pct}`` dicts, already sorted by ``count``
+            descending (the ``top`` block of ``summarize_categorical``). May be
+            empty or carry incomplete/``None`` entries; non-dict items, items
+            without a positive numeric ``count`` and ``None`` counts are skipped.
+        n_distinct: Total number of distinct categories in the column. Used to
+            label the aggregated bar as "Otros (n_distinct - top_k)" (floored at
+            0). Ignored when it does not exceed the number of shown bars.
+        title: Figure title (the column name). Truncated when too long.
+        top_k: Maximum number of explicit bars. Default 6. The "Otros" bar does
+            not count against this limit.
+        n_rows: Optional total row count of the dataset. When given and the sum of
+            shown counts is below ``n_rows``, the "Otros" bar uses
+            ``n_rows - sum_shown`` as its count so it is exact with respect to the
+            real total. When omitted, "Otros" uses the sum of the counts that fall
+            outside the shown ``top_k`` (only when ``top`` carries more than
+            ``top_k`` items).
+
+    Returns:
+        A ``matplotlib.figure.Figure`` with a single horizontal-bar Axes. The
+        caller is responsible for rasterizing/closing it.
+    """
+    try:
+        safe_title = _truncate(title, 48)
+
+        # --- Defensive parse: keep only well-formed {value, count} with count > 0.
+        cleaned = []
+        if isinstance(top, list):
+            for item in top:
+                if not isinstance(item, dict):
+                    continue
+                count = item.get("count")
+                if count is None:
+                    continue
+                try:
+                    count = float(count)
+                except (TypeError, ValueError):
+                    continue
+                if count <= 0:
+                    continue
+                cleaned.append((item.get("value"), count))
+
+        if not cleaned:
+            return _message_figure("sin datos categóricos", title=title)
+
+        # --- Split into shown bars and the aggregated remainder.
+        shown = cleaned[: max(int(top_k), 0)]
+        if not shown:  # top_k <= 0 — show at least the largest category.
+            shown = cleaned[:1]
+
+        sum_shown = sum(c for _, c in shown)
+        overflow_count = sum(c for _, c in cleaned[len(shown):])
+
+        # How many categories are folded into "Otros".
+        try:
+            nd = int(n_distinct)
+        except (TypeError, ValueError):
+            nd = 0
+        others_categories = max(nd - len(shown), 0)
+        # If n_distinct is unknown/too small, fall back to the overflow we
+        # actually have in `top` beyond the shown bars.
+        overflow_items = len(cleaned) - len(shown)
+        if others_categories == 0 and overflow_items > 0:
+            others_categories = overflow_items
+
+        # Count attributed to the "Otros" bar.
+        others_count = 0.0
+        if n_rows is not None:
+            try:
+                total_rows = float(n_rows)
+            except (TypeError, ValueError):
+                total_rows = None
+            if total_rows is not None and total_rows > sum_shown:
+                others_count = total_rows - sum_shown
+        if others_count <= 0:
+            others_count = overflow_count
+
+        # --- Build the display order (top to bottom): largest .. smallest, Otros.
+        display_labels = [_truncate(v, 22) for v, _ in shown]
+        display_values = [c for _, c in shown]
+        display_colors = [_PALETTE[i % len(_PALETTE)] for i in range(len(shown))]
+
+        has_others = others_count > 0 and others_categories > 0
+        if has_others:
+            display_labels.append(f"Otros ({others_categories} categorías)")
+            display_values.append(others_count)
+            display_colors.append(_OTHER_COLOR)
+
+        total = sum(display_values) or 1.0
+
+        # barh draws bottom-up, so reverse the display order before plotting to
+        # land the largest category on top and "Otros" at the bottom.
+        labels = list(reversed(display_labels))
+        values = list(reversed(display_values))
+        colors = list(reversed(display_colors))
+        y_pos = range(len(values))
+
+        # Height scales with the number of bars so dense reports stay readable.
+        n_bars = len(values)
+        height = max(2.4, min(0.4 * n_bars + 1.2, 14.0))
+        fig = Figure(figsize=(6.4, height), dpi=150)
+        ax = fig.add_subplot(111)
+
+        ax.barh(list(y_pos), values, color=colors, edgecolor="white")
+        ax.set_yticks(list(y_pos))
+        ax.set_yticklabels(labels, fontsize=8)
+        ax.set_xlabel("conteo", fontsize=9)
+
+        max_val = max(values) if values else 1.0
+        ax.set_xlim(0, max_val * 1.18 if max_val > 0 else 1.0)
+
+        # Annotate each bar with its count and percentage at the end of the bar.
+        for y, val in zip(y_pos, values):
+            pct = val / total * 100.0
+            ax.text(
+                val + max_val * 0.012,
+                y,
+                f"{int(round(val))} ({pct:.0f}%)",
+                va="center",
+                ha="left",
+                fontsize=7,
+                color="#202020",
+            )
+
+        if safe_title:
+            ax.set_title(safe_title, fontsize=13, loc="left", pad=10)
+
+        fig.tight_layout()
+        return fig
+    except Exception as exc:  # noqa: BLE001 — never raise from a figure builder.
+        return _message_figure(
+            f"error al dibujar barras: {exc}", color=_ERROR_TEXT
+        )
@@ -0,0 +1,103 @@
+"""Tests para categorical_top_bar_figure (barras de categorías top, grupo eda).
+
+Usa el backend Agg sin pyplot; no muestra ni guarda figuras. Cada test cierra
+explícitamente la Figure construida (matplotlib.pyplot.close) para no acumular
+estado entre tests.
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from categorical_top_bar_figure import categorical_top_bar_figure
+
+
+def _make_top(n):
+    """n items {value, count, pct} ordenados desc por count."""
+    return [
+        {"value": f"cat_{i}", "count": n - i, "pct": (n - i) / sum(range(1, n + 1))}
+        for i in range(n)
+    ]
+
+
+def _bar_count(ax):
+    """Devuelve el nº de barras (longitud del primer BarContainer del Axes)."""
+    if ax.containers:
+        return len(ax.containers[0])
+    return 0
+
+
+def test_returns_figure():
+    fig = categorical_top_bar_figure(_make_top(3), n_distinct=3, title="col")
+    assert isinstance(fig, Figure)
+    plt.close(fig)
+
+
+def test_ten_items_topk_six_yields_seven_bars():
+    top = _make_top(10)
+    fig = categorical_top_bar_figure(top, n_distinct=10, title="muchas", top_k=6)
+    ax = fig.axes[0]
+    # 6 categorías explícitas + 1 barra "Otros".
+    assert _bar_count(ax) == 7
+    plt.close(fig)
+
+
+def test_empty_top_does_not_raise_and_returns_figure():
+    fig = categorical_top_bar_figure([], n_distinct=0, title="vacía")
+    assert isinstance(fig, Figure)
+    # Sin datos: no debe haber barras.
+    assert _bar_count(fig.axes[0]) == 0
+    plt.close(fig)
+
+
+def test_long_value_truncated():
+    long_value = "una_categoria_con_un_nombre_larguisimo_que_excede_el_limite"
+    top = [
+        {"value": long_value, "count": 10, "pct": 0.5},
+        {"value": "corta", "count": 10, "pct": 0.5},
+    ]
+    fig = categorical_top_bar_figure(top, n_distinct=2, title="col", top_k=6)
+    ax = fig.axes[0]
+    tick_texts = [t.get_text() for t in ax.get_yticklabels()]
+    # El valor largo aparece truncado con elipsis y NO en su forma completa.
+    assert any("…" in t for t in tick_texts)
+    assert long_value not in " ".join(tick_texts)
+    plt.close(fig)
+
+
+def test_none_value_and_none_count_are_handled():
+    top = [
+        {"value": None, "count": 5, "pct": 0.5},
+        {"value": "b", "count": None, "pct": 0.0},  # count None -> se descarta
+        {"value": "c", "count": 5, "pct": 0.5},
+    ]
+    fig = categorical_top_bar_figure(top, n_distinct=2, title="con nones", top_k=6)
+    assert isinstance(fig, Figure)
+    # Solo 2 items válidos, sin overflow -> 2 barras, sin "Otros".
+    assert _bar_count(fig.axes[0]) == 2
+    plt.close(fig)
+
+
+def test_n_rows_adds_exact_others_bar():
+    # 3 categorías mostradas suman 30, dataset real 100 -> "Otros" = 70.
+    top = [
+        {"value": "a", "count": 15, "pct": 0.15},
+        {"value": "b", "count": 10, "pct": 0.10},
+        {"value": "c", "count": 5, "pct": 0.05},
+    ]
+    fig = categorical_top_bar_figure(
+        top, n_distinct=20, title="col", top_k=3, n_rows=100
+    )
+    ax = fig.axes[0]
+    # 3 explícitas + Otros.
+    assert _bar_count(ax) == 4
+    tick_texts = [t.get_text() for t in ax.get_yticklabels()]
+    # La barra Otros refleja n_distinct - top_k = 17 categorías.
+    assert any("Otros (17 categorías)" in t for t in tick_texts)
+    # Su anotación lleva el count 70.
+    annotation_texts = [t.get_text() for t in ax.texts]
+    assert any("70" in t for t in annotation_texts)
+    plt.close(fig)
@@ -0,0 +1,121 @@
+---
+id: render_table_as_figure_py_datascience
+name: render_table_as_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def render_table_as_figure(header, rows, title=None, note=None, fontsize=9.0, max_cell_chars=40) -> \"matplotlib.figure.Figure\""
+description: "Dibuja un bloque tabular (cabecera + filas) como una matplotlib.figure.Figure nítida, lista para rasterizar a DPI alto. Pensada para tablas que NO caben como texto en una página/slide del informe EDA: se rasteriza a alta resolución (el caller usa dpi=220, bbox_inches='tight') y el usuario hace zoom en el móvil para leerla entera sin perder datos. Cabecera sombreada (#eef3f6) y en negrita, filas pares (1-based) con zebra suave (#f6f8fa), tinta oscura (#1b1b1b) sobre blanco, rejilla gris muy fina (#cccccc). Trunca cada celda a max_cell_chars con elipsis y str()-ea cada valor (None -> \"\"). figsize proporcional al contenido (ancho por nº y longitud de columnas, alto por nº de filas) para que sea legible con zoom. Backend Agg sin pyplot global. Defensiva: header/rows vacíos o None, filas irregulares o cualquier error interno devuelven una Figure placeholder con texto centrado \"(tabla no disponible)\". NUNCA lanza."
+tags: [eda, table, figure, matplotlib, visualization, rasterize, zoom, render, datascience, impure]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib]
+example: |
+  from datascience.render_table_as_figure import render_table_as_figure
+  header = ["columna", "n_nulos", "%_nulos", "distintos", "tipo", "ejemplo"]
+  rows = [
+      ["ingresos", 12, "1.2%", 980, "float64", "2345.67"],
+      ["edad", 0, "0.0%", 88, "int64", "37"],
+      ["ciudad", 5, "0.5%", 412, "object", "Madrid"],
+  ]
+  fig = render_table_as_figure(header, rows, title="Resumen de columnas",
+                               note="rasteriza a dpi=220 y haz zoom")
+  fig.savefig("/tmp/tabla.png", dpi=220, bbox_inches="tight")
+tested: true
+tests:
+  - "test_returns_figure_with_table"
+  - "test_rows_none_does_not_raise"
+  - "test_header_none_does_not_raise"
+  - "test_empty_lists_return_placeholder_figure"
+  - "test_both_none_return_placeholder_figure"
+  - "test_long_cell_is_truncated"
+  - "test_none_cells_become_empty_strings"
+  - "test_can_rasterize_to_png_high_dpi"
+  - "test_placeholder_can_rasterize"
+  - "test_ragged_rows_are_padded"
+test_file_path: "python/functions/datascience/render_table_as_figure_test.py"
+file_path: "python/functions/datascience/render_table_as_figure.py"
+params:
+  - name: header
+    desc: "Lista de nombres de columna (puede ser [] o None). Cada nombre se str()-ea, se trunca a max_cell_chars y se pinta en la fila cabecera sombreada en negrita. Si está vacío/None no se dibuja fila de cabecera (solo cuerpo)."
+  - name: rows
+    desc: "Lista de filas; cada fila es una lista de celdas con valores cualesquiera (se str()-ean; None -> \"\"). Admite None (se trata como []), filas escalares (se envuelven en una celda) y filas de distinta longitud (la rejilla se rectangulariza al ancho máximo, rellenando con celdas vacías). Saltos de línea/tabs en una celda se colapsan a espacios para que no desborde a otras filas."
+  - name: title
+    desc: "Título opcional dibujado encima de la tabla, en negrita tinta #1b1b1b, alineado a la izquierda. None o \"\" => sin título. Default None."
+  - name: note
+    desc: "Nota opcional al pie de la figura, en gris #8a8a8a e itálica. None o \"\" => sin nota. Default None."
+  - name: fontsize
+    desc: "Tamaño de fuente base (pt) de las celdas del cuerpo. La cabecera usa fontsize+3 y la nota max(7, fontsize-1). Un valor no numérico o <= 0 cae a 9.0. Default 9.0."
+  - name: max_cell_chars
+    desc: "Trunca el texto de cada celda a este nº de chars (con … final cuando se recorta) para que el ancho no explote. Un valor no entero cae a 40; <= 0 deja las celdas vacías. Default 40."
+output: "Un matplotlib.figure.Figure (figsize proporcional al contenido: ancho ≈ 0.9-1.6\" por columna según su texto, total acotado a 3-26\"; alto ≈ 0.32\" por fila + cabecera + espacio para título/nota, acotado) con un Axes sin ejes que contiene un ax.table(...) NO cerrado. Cabecera fondo #eef3f6 texto #1b1b1b bold; filas pares (1-based) zebra #f6f8fa, impares blanco; tinta #1b1b1b; bordes/rejilla #cccccc lw 0.4; texto alineado a la izquierda. Título encima (bold) y nota debajo (gris itálica) si se pasan. Si header/rows son vacíos o None, o ante cualquier error interno, devuelve una Figure placeholder pequeña con el texto centrado \"(tabla no disponible)\". NUNCA lanza. El caller la rasteriza (dpi=220, bbox_inches='tight') y la cierra; la función no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience.render_table_as_figure import render_table_as_figure
+
+# Tabla que no cabe como texto en la slide -> se rasteriza y se lee con zoom.
+header = ["columna", "n_nulos", "%_nulos", "distintos", "tipo", "ejemplo"]
+rows = [
+    ["ingresos", 12, "1.2%", 980, "float64", "2345.67"],
+    ["edad", 0, "0.0%", 88, "int64", "37"],
+    ["ciudad", 5, "0.5%", 412, "object", "Madrid"],
+    ["categoria_producto", 0, "0.0%", 1840, "object",
+     "un_valor_categorico_muy_largo_que_se_trunca"],
+]
+
+fig = render_table_as_figure(
+    header,
+    rows,
+    title="Resumen de columnas",
+    note="rasteriza a dpi=220 y haz zoom en el móvil",
+    fontsize=9.0,
+    max_cell_chars=40,
+)
+
+# El renderer del informe lo rasteriza a alta resolución; aquí lo persistimos.
+fig.savefig("/tmp/tabla.png", dpi=220, bbox_inches="tight")
+```
+
+## Cuando usarla
+
+Úsala en un informe EDA cuando una tabla **no cabe como texto** en una página o
+slide y prefieres una imagen nítida que el lector pueda ampliar en el móvil para
+leerla entera (perfiles de columnas, matrices de conteo, tablas de frecuencias
+con muchas filas o columnas anchas). Pásale la cabecera y las filas tal cual (los
+valores se `str()`-ean por ti) más un `title`/`note` opcionales; el llamante la
+rasteriza a `dpi=220` con `bbox_inches='tight'`. Es la pareja "tabla-como-imagen"
+de los gráficos `build_boxplots_figure` / `categorical_top_pie_figure`: misma
+paleta y mismo contrato (Agg, sin `pyplot`, el caller cierra la figura).
+
+## Gotchas
+
+- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
+  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
+  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
+  es thread-safe; esta función construye el `Figure` directamente, así que es
+  segura de llamar en bucle desde el renderer.
+- **El caller cierra la figura.** Devuelve el `Figure` pero no lo muestra ni lo
+  guarda. Quien la consume debe rasterizarla y luego liberarla
+  (`matplotlib.pyplot.close(fig)`) para no acumular memoria en lotes grandes.
+- **Pensada para rasterizar a DPI alto.** El `figsize` es proporcional al
+  contenido pero la legibilidad real viene del DPI: rasteriza con `dpi=220` y
+  `bbox_inches='tight'`. Una tabla con muchísimas filas crece en alto (capado a
+  ~60") — para miles de filas, parte la tabla o resume antes de pasarla.
+- **Truncación de celda visible.** Cada celda se recorta a `max_cell_chars`
+  (default 40) con `…` final y los saltos de línea/tabs se colapsan a espacios,
+  para que ninguna celda desborde a otras filas. Sube `max_cell_chars` si
+  necesitas ver el valor completo (a costa de ancho).
+- **Defensiva, nunca lanza.** `header`/`rows` vacíos o `None`, filas escalares,
+  filas de distinta longitud o cualquier error interno se manejan sin propagar:
+  en el peor caso devuelve una `Figure` placeholder con "(tabla no disponible)".
+  No envuelvas la llamada en try/except por miedo a un raise — no lo hay.
@@ -0,0 +1,241 @@
+"""Impure EDA helper: a crisp table rendered as a matplotlib Figure (`eda` group).
+
+Draws a tabular block (header + rows) as a sharp ``matplotlib.figure.Figure``
+ready to be rasterized at high DPI, so a table that does NOT fit as text on a
+page/slide can still be read in full by zooming into the rasterized image on a
+phone. The header is shaded and bold, even rows carry a soft zebra stripe, the
+ink is dark on white and the grid is very thin.
+
+Impure because it touches matplotlib's rendering machinery. It uses the headless
+Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
+global state and is safe to call repeatedly from a report renderer. It is fully
+defensive and NEVER raises: empty/invalid input or any internal error returns a
+small placeholder figure carrying a centered "(tabla no disponible)".
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+from matplotlib.figure import Figure  # noqa: E402
+
+# Palette shared with the EDA report renderer so the document stays coherent.
+_HEADER_BG = "#eef3f6"   # header cell background.
+_HEADER_TEXT = "#1b1b1b"  # header cell text (bold).
+_ZEBRA_BG = "#f6f8fa"    # even (1-based) row background stripe.
+_BODY_BG = "#ffffff"     # odd row background.
+_INK = "#1b1b1b"         # body text + title ink.
+_GRID = "#cccccc"        # cell borders / grid (thin).
+_NOTE_TEXT = "#8a8a8a"   # muted gray for the note (italic).
+
+
+def _placeholder_figure(message: str = "(tabla no disponible)") -> "Figure":
+    """Return a small fallback ``Figure`` carrying a single centered message."""
+    fig = Figure(figsize=(6.0, 1.6), dpi=150)
+    ax = fig.add_subplot(111)
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=11,
+        color=_NOTE_TEXT,
+        style="italic",
+        wrap=True,
+        transform=ax.transAxes,
+    )
+    fig.tight_layout()
+    return fig
+
+
+def _cell_text(value, max_cell_chars: int) -> str:
+    """``str()`` a cell value defensively, None -> "", truncate with an ellipsis."""
+    s = "" if value is None else str(value)
+    # Collapse newlines/tabs so a single cell never spills across table rows.
+    s = s.replace("\n", " ").replace("\r", " ").replace("\t", " ")
+    try:
+        limit = int(max_cell_chars)
+    except (TypeError, ValueError):
+        limit = 40
+    if limit <= 0:
+        return ""
+    if len(s) <= limit:
+        return s
+    if limit == 1:
+        return "…"
+    return s[: limit - 1] + "…"
+
+
+def render_table_as_figure(
+    header,
+    rows,
+    title=None,
+    note=None,
+    fontsize=9.0,
+    max_cell_chars=40,
+):
+    """Dibuja una tabla nítida como matplotlib.figure.Figure, lista para rasterizar a DPI alto.
+
+    Pensada para tablas que NO caben como texto en una página/slide: se rasteriza
+    a alta resolución y el usuario hace zoom en el móvil para leerla entera sin
+    perder datos. Cabecera sombreada + negrita, filas pares con zebra suave,
+    tinta oscura sobre blanco, rejilla muy fina.
+
+    Args:
+        header: lista de nombres de columna (puede ser []).
+        rows: lista de filas; cada fila es una lista de celdas (valores cualquiera, se str()-ean).
+        title: título opcional dibujado encima de la tabla (o None).
+        note: nota opcional en gris/itálica bajo la tabla (o None).
+        fontsize: tamaño de fuente base (pt) de las celdas.
+        max_cell_chars: trunca el texto de celda a este nº de chars (con … final) para que no explote el ancho.
+
+    Returns:
+        matplotlib.figure.Figure — NO cerrada (el llamante la rasteriza y la cierra).
+        Nunca lanza: ante cualquier error devuelve una Figure con el texto "(tabla no disponible)".
+    """
+    try:
+        # --- Defensive normalization of header/rows into a rectangular grid.
+        header_list = list(header) if isinstance(header, (list, tuple)) else []
+        raw_rows = list(rows) if isinstance(rows, (list, tuple)) else []
+
+        clean_rows = []
+        for row in raw_rows:
+            if isinstance(row, (list, tuple)):
+                clean_rows.append(list(row))
+            elif row is None:
+                clean_rows.append([])
+            else:
+                # A scalar row becomes a single-cell row instead of being dropped.
+                clean_rows.append([row])
+
+        # Nothing to draw at all -> placeholder.
+        if not header_list and not clean_rows:
+            return _placeholder_figure()
+
+        # Number of columns = widest of header / any row.
+        n_cols = len(header_list)
+        for row in clean_rows:
+            if len(row) > n_cols:
+                n_cols = len(row)
+        if n_cols <= 0:
+            return _placeholder_figure()
+
+        # Base font size, tolerate a bad value.
+        try:
+            base_fs = float(fontsize)
+        except (TypeError, ValueError):
+            base_fs = 9.0
+        if base_fs <= 0:
+            base_fs = 9.0
+
+        # --- Build the truncated, padded text matrix.
+        header_cells = [
+            _cell_text(header_list[c] if c < len(header_list) else "", max_cell_chars)
+            for c in range(n_cols)
+        ]
+        body_cells = []
+        for row in clean_rows:
+            body_cells.append(
+                [
+                    _cell_text(row[c] if c < len(row) else "", max_cell_chars)
+                    for c in range(n_cols)
+                ]
+            )
+
+        has_header = any(t for t in header_cells)
+        n_body = len(body_cells)
+        # Total drawn table rows (header counts as one when present).
+        n_table_rows = n_body + (1 if has_header else 0)
+        if n_table_rows <= 0:
+            return _placeholder_figure()
+
+        # --- figsize proportional to content so it reads under zoom.
+        # Width: per-column width scales with the longest text in that column,
+        # clamped to a sensible per-column range, total capped.
+        per_col_widths = []
+        for c in range(n_cols):
+            col_texts = [header_cells[c]] if has_header else []
+            col_texts += [body_cells[r][c] for r in range(n_body)]
+            longest = max((len(t) for t in col_texts), default=0)
+            # ~0.085" per char at the base font, clamped to [0.9, 1.6] inches.
+            w = 0.9 + 0.085 * max(longest - 6, 0)
+            w = max(0.9, min(1.6, w))
+            per_col_widths.append(w)
+        fig_w = sum(per_col_widths)
+        fig_w = max(3.0, min(26.0, fig_w))
+
+        # Height: ~0.32" per row + room for title / note.
+        fig_h = 0.32 * n_table_rows + 0.30
+        if title is not None and str(title) != "":
+            fig_h += 0.45
+        if note is not None and str(note) != "":
+            fig_h += 0.30
+        fig_h = max(1.0, min(60.0, fig_h))
+
+        fig = Figure(figsize=(fig_w, fig_h), dpi=150)
+        ax = fig.add_subplot(111)
+        ax.axis("off")
+
+        # Reserve vertical bands for the optional title (top) and note (bottom)
+        # so the table itself never overlaps them.
+        title_band = 0.10 if (title is not None and str(title) != "") else 0.0
+        note_band = 0.07 if (note is not None and str(note) != "") else 0.0
+        table_bbox = [0.0, note_band, 1.0, max(0.05, 1.0 - title_band - note_band)]
+
+        cell_text = ([header_cells] if has_header else []) + body_cells
+
+        col_widths = [w / fig_w for w in per_col_widths]
+
+        table = ax.table(
+            cellText=cell_text,
+            colWidths=col_widths,
+            cellLoc="left",
+            loc="center",
+            bbox=table_bbox,
+        )
+        table.auto_set_font_size(False)
+        table.set_fontsize(base_fs)
+
+        # --- Style every cell: zebra body, shaded bold header, thin gray grid.
+        for (r, _c), cell in table.get_celld().items():
+            cell.set_edgecolor(_GRID)
+            cell.set_linewidth(0.4)
+            # Small horizontal padding so text does not touch the border.
+            cell.PAD = 0.04
+            if has_header and r == 0:
+                cell.set_facecolor(_HEADER_BG)
+                cell.set_text_props(color=_HEADER_TEXT, fontweight="bold", ha="left")
+            else:
+                body_index = r - 1 if has_header else r  # 0-based body row.
+                # 1-based even rows get the zebra stripe.
+                is_even = ((body_index + 1) % 2) == 0
+                cell.set_facecolor(_ZEBRA_BG if is_even else _BODY_BG)
+                cell.set_text_props(color=_INK, ha="left")
+
+        if title is not None and str(title) != "":
+            ax.set_title(
+                str(title),
+                fontsize=base_fs + 3.0,
+                fontweight="bold",
+                color=_INK,
+                loc="left",
+                pad=8,
+            )
+
+        if note is not None and str(note) != "":
+            fig.text(
+                0.01,
+                0.01,
+                str(note),
+                ha="left",
+                va="bottom",
+                fontsize=max(7.0, base_fs - 1.0),
+                color=_NOTE_TEXT,
+                style="italic",
+            )
+
+        return fig
+    except Exception:  # noqa: BLE001 — never raise from a figure builder.
+        return _placeholder_figure()
@@ -0,0 +1,119 @@
+"""Tests para render_table_as_figure (tabla nítida como Figure, grupo eda).
+
+Usa el backend Agg sin display; no muestra ni guarda figuras a disco salvo a un
+BytesIO en memoria. Cada test cierra explícitamente la Figure construida
+(matplotlib.pyplot.close) para no acumular estado entre tests.
+"""
+
+from io import BytesIO
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from render_table_as_figure import render_table_as_figure
+
+
+def _grid(n_cols, n_rows):
+    """Cabecera de n_cols columnas + n_rows filas de celdas."""
+    header = [f"col_{c}" for c in range(n_cols)]
+    rows = [[f"r{r}c{c}" for c in range(n_cols)] for r in range(n_rows)]
+    return header, rows
+
+
+def test_returns_figure_with_table():
+    header, rows = _grid(6, 5)
+    fig = render_table_as_figure(header, rows, title="Tabla", note="nota al pie")
+    assert isinstance(fig, Figure)
+    # Hay al menos un Axes y ese Axes contiene una tabla con celdas.
+    assert len(fig.axes) >= 1
+    ax = fig.axes[0]
+    assert len(ax.tables) >= 1
+    # 6 columnas x (1 cabecera + 5 filas) = 36 celdas.
+    assert len(ax.tables[0].get_celld()) == 6 * (5 + 1)
+    plt.close(fig)
+
+
+def test_rows_none_does_not_raise():
+    fig = render_table_as_figure(["a", "b"], None)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_header_none_does_not_raise():
+    fig = render_table_as_figure(None, [["x", "y"], ["z", "w"]])
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_empty_lists_return_placeholder_figure():
+    fig = render_table_as_figure([], [])
+    assert isinstance(fig, Figure)
+    # Placeholder: un Axes con texto, sin tabla.
+    assert len(fig.axes) >= 1
+    assert len(fig.axes[0].tables) == 0
+    plt.close(fig)
+
+
+def test_both_none_return_placeholder_figure():
+    fig = render_table_as_figure(None, None)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes[0].tables) == 0
+    plt.close(fig)
+
+
+def test_long_cell_is_truncated():
+    long_value = "x" * 200
+    header, _ = _grid(2, 0)
+    fig = render_table_as_figure(header, [[long_value, "ok"]], max_cell_chars=20)
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    texts = [c.get_text().get_text() for c in ax.tables[0].get_celld().values()]
+    # La celda larga aparece truncada con elipsis y nunca en su forma completa.
+    assert any(t.endswith("…") and len(t) <= 20 for t in texts)
+    assert long_value not in texts
+    plt.close(fig)
+
+
+def test_none_cells_become_empty_strings():
+    fig = render_table_as_figure(["a", "b"], [[None, "v"], ["w", None]])
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    texts = [c.get_text().get_text() for c in ax.tables[0].get_celld().values()]
+    # Hay celdas vacías (los None) y celdas con valor.
+    assert "" in texts
+    assert "v" in texts
+    plt.close(fig)
+
+
+def test_can_rasterize_to_png_high_dpi():
+    header, rows = _grid(6, 8)
+    fig = render_table_as_figure(header, rows, title="Render", note="zoom me")
+    buf = BytesIO()
+    # No debe lanzar al rasterizar a DPI alto con bbox tight.
+    fig.savefig(buf, format="png", dpi=220, bbox_inches="tight")
+    assert buf.getbuffer().nbytes > 0
+    plt.close(fig)
+
+
+def test_placeholder_can_rasterize():
+    fig = render_table_as_figure([], [])
+    buf = BytesIO()
+    fig.savefig(buf, format="png", dpi=220, bbox_inches="tight")
+    assert buf.getbuffer().nbytes > 0
+    plt.close(fig)
+
+
+def test_ragged_rows_are_padded():
+    # Filas de distinta longitud: la rejilla se rectangulariza al ancho máximo.
+    fig = render_table_as_figure(["a", "b", "c"], [["1"], ["1", "2", "3", "4"]])
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    # 4 columnas (la fila más ancha) x (1 cabecera + 2 filas) = 12 celdas.
+    assert len(ax.tables[0].get_celld()) == 4 * (2 + 1)
+    plt.close(fig)
Author	SHA1	Message	Date
egutierrez	26569c7015	chore: auto-commit (1 archivos) - logs/ardour_mcp_server.log Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-07-01 02:16:25 +02:00
egutierrez	44622339fa	merge(eda): cap4/cap5 distribuciones — parrafos al glosario, desc LLM+unidad por columna, donut->barras, PPT side_by_side	2026-07-01 02:11:53 +02:00
egutierrez	c0d44a6352	fix(eda): cat_distr — intro del cuerpo reducida a términos clicables mínimos Quita la frase descriptiva del cuerpo del capítulo ('Cada columna categórica ocupa su propia página — ...: cardinalidad, top de categorías y gráfico de barras. El dataset tiene N filas...'); ya vivía duplicada en la entrada de glosario 'pagina_categorica'. El intro deja solo los términos clicables mínimos ([[term:entropia]] · [[term:pagina_categorica]]) bajo el heading 'Entropía y cardinalidad'. El total de filas del dataset sigue disponible por columna en la tabla de cardinalidad ('Total filas (dataset)').	2026-07-01 02:10:39 +02:00
egutierrez	cab0fbf0a3	feat(eda): CAP4/CAP5 distribuciones — párrafos al glosario, desc LLM + unidad por columna, donut→barras, PPT figura a la derecha CAP4 num_distr: - Mueve el párrafo introductorio largo del histograma/boxplot al glosario (nuevo término clicable "histograma_boxplot"); el cuerpo del capítulo solo nombra el término con [[term:histograma_boxplot]] y la explicación completa (código de colores, 1,5·IQR, lectura de asimetría) vive en la entrada del glosario. La información se traslada, no se pierde. - Añade por columna numérica la descripción de negocio del LLM y la unidad, leídas de profile['llm']['dictionary'] (empareja por nombre de columna). Sin bloque LLM el bloque de descripción se omite limpiamente. CAP5 cat_distr: - Mueve el párrafo "Cada columna categórica ocupa su propia página..." al glosario (nuevo término clicable "pagina_categorica"); el intro solo nombra los términos entropía y pagina_categorica. - Añade descripción LLM + unidad por columna (misma fuente que CAP4). - Cambia el donut/pie por gráfico de barras horizontales (nueva función del registry categorical_top_bar_figure_py_datascience, contrato de entrada idéntico al donut para swap directo) más su fallback inline de barras. - Marca cada Group de columna con layout="side_by_side": en PPTX la tabla de cardinalidad queda a la izquierda y la barra a la derecha; en PDF se apila (A5 estrecho). No toca los renderers — el soporte de layout ya existía. Glosario: - Catálogo canónico _BASELINE_TERMS con las definiciones de los dos términos nuevos; build_glosario completa la definición de un término registrado sin ella desde el catálogo (los chapters solo registran clave+label). Tests actualizados (donut→barras, side_by_side, LLM desc/unidad, glosario) y nueva función con sus tests. Suite del subsistema + acceptance verde.	2026-07-01 02:01:07 +02:00
egutierrez	7f304adc9c	merge(eda): render quality global — DPI 220, tablas anchas como imagen, layout side_by_side, indice clicable	2026-07-01 01:36:10 +02:00
egutierrez	a74a5a047f	feat(eda): render quality global — DPI 220, tablas anchas como imagen, layout side_by_side, índice clicable Mejoras transversales del motor AutomaticEDA (PDF + PPTX) sobre el modelo de bloques: 1. DPI alto global: toda figura/imagen embebida se rasteriza a 220 dpi (antes 150, y en PDF la página se guardaba a ~100 dpi re-rasterizando los imshow). En PDF se aplica savefig.dpi=220 a la página; el texto sigue vectorial y seleccionable. Permite ampliar en el móvil sin pixelar. Imagen embebida medida: ~1081px (antes ~492px). 2. Tabla ancha → imagen de alta resolución: cuando un DataTable tiene demasiadas columnas para ser legible como texto (criterio _table_fits_as_text), se dibuja entera como una imagen nítida (nueva función render_table_as_figure_py_datascience: cabecera sombreada + zebra) escalada para caber completa, de modo que el lector hace zoom y la lee sin perder datos. Las tablas que sí caben siguen como texto seleccionable / tabla nativa. Aplica en PDF y PPTX. El df.head de 19 columnas del dataset sintético ya no se corta: sale como imagen. 3. Group.layout: nuevo hint retrocompatible (default "stack"). "side_by_side" coloca la tabla a la izquierda (~55%) y la figura a la derecha (~45%) en la misma slide PPTX (cae a apilado si no hay par tabla+figura o no caben); en PDF se trata como "stack" (el ancho A5 móvil no admite dos columnas). Pensado para que el capítulo cat_distr ponga el gráfico al lado de la tabla en PPT. 4. Portada con índice clicable: la lista de capítulos pasa de "Este informe incluye..." (markdown) a un Heading "Índice" + un TocEntry por capítulo. El renderer registra el inicio de cada capítulo y cablea cada entrada como salto real (PDF: link GOTO PyMuPDF; PPTX: salto a slide nativo), reutilizando el mecanismo del glosario clicable. Modelo: Group gana `layout`; nuevo bloque TocEntry; normalizers y __init__ actualizados. Contrato: documentado en docs/automatic_eda_contract.md §11.4 (incluye el contrato exacto del campo layout para el agente de cat_distr). Tests: nuevo render_quality_test.py (13 golden: DPI alto real, tabla ancha→imagen PDF/PPTX, narrow→texto, side_by_side PPTX dos columnas / PDF apilado, índice clicable PDF+PPTX, retrocompatibilidad layout por defecto). render_features_test actualizado al índice nuevo. Suite: 188 passed (módulo) + 38 passed/1 skipped (acceptance + pipeline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 01:34:21 +02:00
egutierrez	44be1d6b58	merge(eda): cap2 overview enriquece diccionario y describe con descripcion+unidad del LLM	2026-07-01 01:14:37 +02:00