feat(eda): render quality global — DPI 220, tablas anchas como imagen, layout side_by_side, índice clicable

Mejoras transversales del motor AutomaticEDA (PDF + PPTX) sobre el modelo de bloques: 1. DPI alto global: toda figura/imagen embebida se rasteriza a 220 dpi (antes 150, y en PDF la página se guardaba a ~100 dpi re-rasterizando los imshow). En PDF se aplica savefig.dpi=220 a la página; el texto sigue vectorial y seleccionable. Permite ampliar en el móvil sin pixelar. Imagen embebida medida: ~1081px (antes ~492px). 2. Tabla ancha → imagen de alta resolución: cuando un DataTable tiene demasiadas columnas para ser legible como texto (criterio _table_fits_as_text), se dibuja entera como una imagen nítida (nueva función render_table_as_figure_py_datascience: cabecera sombreada + zebra) escalada para caber completa, de modo que el lector hace zoom y la lee sin perder datos. Las tablas que sí caben siguen como texto seleccionable / tabla nativa. Aplica en PDF y PPTX. El df.head de 19 columnas del dataset sintético ya no se corta: sale como imagen. 3. Group.layout: nuevo hint retrocompatible (default "stack"). "side_by_side" coloca la tabla a la izquierda (~55%) y la figura a la derecha (~45%) en la misma slide PPTX (cae a apilado si no hay par tabla+figura o no caben); en PDF se trata como "stack" (el ancho A5 móvil no admite dos columnas). Pensado para que el capítulo cat_distr ponga el gráfico al lado de la tabla en PPT. 4. Portada con índice clicable: la lista de capítulos pasa de "Este informe incluye..." (markdown) a un Heading "Índice" + un TocEntry por capítulo. El renderer registra el inicio de cada capítulo y cablea cada entrada como salto real (PDF: link GOTO PyMuPDF; PPTX: salto a slide nativo), reutilizando el mecanismo del glosario clicable. Modelo: Group gana `layout`; nuevo bloque TocEntry; normalizers y __init__ actualizados. Contrato: documentado en docs/automatic_eda_contract.md §11.4 (incluye el contrato exacto del campo layout para el agente de cat_distr). Tests: nuevo render_quality_test.py (13 golden: DPI alto real, tabla ancha→imagen PDF/PPTX, narrow→texto, side_by_side PPTX dos columnas / PDF apilado, índice clicable PDF+PPTX, retrocompatibilidad layout por defecto). render_features_test actualizado al índice nuevo. Suite: 188 passed (módulo) + 38 passed/1 skipped (acceptance + pipeline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
merge(eda): portada v2 (sin Criterios, descripcion LLM, resumen a la derecha) + zebra global PDF + nombre PPTX grande/subrayado
2026-07-01 01:34:21 +02:00 · 2026-06-30 22:53:46 +02:00 · 2026-06-30 22:44:33 +02:00 · 2026-06-30 22:07:21 +02:00 · 2026-06-30 22:07:15 +02:00 · 2026-06-30 21:37:16 +02:00
21 changed files with 2686 additions and 68 deletions
@@ -41,12 +41,13 @@ reconocido se degrada a `Note`, nunca lanza).
 | `Heading(text, level=1)` | título de sección, `level` 1 (grande) … 3 (chico) | una o varias líneas en negrita; nivel 1 lleva subrayado de acento |
 | `Markdown(text)` | texto markdown ligero | ver subset abajo; **nunca corta a media línea** |
 | `KVTable(rows, title=None)` | `rows = [(clave, valor), ...]` | tabla de 2 columnas etiqueta/valor; el valor se envuelve |
-| `DataTable(header, rows, title=None, note=None)` | `header=[...]`, `rows=[[...],...]` | tabla con cabecera; **se parte por filas repitiendo cabecera**; las celdas largas se envuelven dentro de su columna |
+| `DataTable(header, rows, title=None, note=None)` | `header=[...]`, `rows=[[...],...]` | tabla con cabecera; **si cabe** como texto se parte por filas repitiendo cabecera; **si NO cabe** (demasiadas columnas) se rasteriza entera como imagen de alta resolución para hacer zoom. Ver §11.4 |
 | `Figure(fig=None, make=None, caption=None, height_in=None)` | una `matplotlib.figure.Figure` ya construida (`fig`) o un callable `make()->Figure` (perezoso) | se rasteriza y escala para caber entera (nunca recortada) |
 | `Image(path, caption=None, height_in=None)` | ruta a PNG/JPG | se escala para caber entera |
 | `Caption(text)` / `Note(text)` | texto auxiliar pequeño | pie/nota en gris; `Note` es además el fallback de lo desconocido |
-| `Group(blocks, title=None)` | unidad **keep-together**: sus bloques se mantienen juntos | el renderer mide el grupo entero y lo mueve completo a la página/slide siguiente si no cabe; encoge la figura para dejar sitio al título+texto. Ver §11 |
+| `Group(blocks, title=None, page_break_before=False, layout="stack")` | unidad **keep-together**: sus bloques se mantienen juntos | el renderer mide el grupo entero y lo mueve completo a la página/slide siguiente si no cabe; encoge la figura para dejar sitio al título+texto. `layout="side_by_side"` coloca tabla+figura en dos columnas (solo PPTX). Ver §11 y §11.4 |
 | `GlossaryEntry(key, label, definition)` | una entrada del glosario (destino clicable) | la genera el capítulo `glosario`; registra su posición como destino de los términos marcados. Ver §11 |
+| `TocEntry(label, target_id)` | una entrada de **índice clicable** en la portada | la genera el capítulo `portada`; el renderer la cablea como salto al inicio del capítulo cuyo `id` o `title` coincide con `target_id`. Ver §11.4 |

 `Figure`/`Image` aceptan `height_in` (hint): el renderer **clampa** la figura a esa altura máxima (lo usa `Group` para encoger la figura). Toda figura escala dejando sitio a su caption en la misma página/slide; en PPTX el caption es **siempre** visible (si no se da `caption`, cae al último heading o a "Figura").

@@ -397,6 +398,65 @@ cabecera con su fondo propio. Es automático en PDF y PPTX; el patrón se mantie
 cuando una tabla larga se parte y repite cabecera (el índice de fila es lógico, no por
 página). No hay nada que hacer en los capítulos.

+### 11.4 Calidad de render global: DPI alto, tabla ancha → imagen, figura al lado, índice clicable
+
+Cuatro capacidades transversales del motor, **todas automáticas salvo `layout`** (que un
+capítulo activa explícitamente). Aplican a PDF y PPTX salvo donde se indique.
+
+**(a) DPI alto (automático).** Toda figura/imagen embebida se rasteriza a **220 dpi**
+(constante `_RASTER_DPI` en ambos renderers; en PDF se aplica también al `savefig` de la
+página, porque matplotlib re-rasteriza cada `imshow` al escribir la página). Objetivo:
+ampliar en el móvil y leer detalle (ejes, celdas) sin pixelar. El texto sigue siendo
+vectorial y seleccionable. No hay nada que hacer en los capítulos.
+
+**(b) Tabla ancha → imagen de alta resolución (automático).** Cuando un `DataTable` tiene
+**demasiadas columnas para ser legible como texto** en el ancho útil (criterio
+`_table_fits_as_text`: ancho mínimo legible por columna × nº de columnas > ancho útil; en
+la práctica salta sobre tablas tipo `df.head` con muchas columnas), en vez de comprimir las
+columnas hasta hacerlas ilegibles, la tabla se dibuja **entera como una imagen de alta
+resolución** (función `render_table_as_figure_py_datascience`: cabecera sombreada + zebra)
+escalada para caber completa, de modo que el lector hace **zoom** y la lee sin perder datos.
+Si la tabla **sí cabe**, se mantiene como texto seleccionable (PDF) / tabla nativa (PPTX).
+Las `KVTable` (2 columnas) caben siempre y se quedan como texto. No hay nada que hacer en
+los capítulos.
+
+**(c) Figura al lado de la tabla — `Group(layout="side_by_side")`.** Hint de layout que un
+capítulo activa para que su **tabla quede a la izquierda y su figura a la derecha** en la
+misma diapositiva, en lugar de apiladas:
+
+```python
+model.Group(
+    layout="side_by_side",
+    blocks=[
+        model.Heading(text=str(name), level=2),       # va a ancho completo arriba
+        model.DataTable(header=..., rows=...),         # columna IZQUIERDA (~55%)
+        model.Figure(make=_grafico_perezoso(...)),     # columna DERECHA (~45%)
+        model.Markdown(text="explicación…"),           # va a ancho completo abajo
+    ])
+```
+
+Contrato exacto del campo:
+
+| Campo | Valor | Efecto |
+|---|---|---|
+| `layout` | `"stack"` (por defecto) | comportamiento histórico: apilado vertical (keep-together). |
+| `layout` | `"side_by_side"` | **PPTX**: la tabla (rasterizada a imagen) ocupa la columna izquierda (~55% del ancho útil) y la figura la derecha (~45%); cualquier otro bloque (heading, markdown) va a ancho completo arriba/abajo. Si no hay un par tabla+figura, o no caben lado a lado en una slide, **cae automáticamente a apilado**. **PDF**: se trata **igual que `stack`** (el ancho A5 móvil no admite dos columnas legibles). Valores desconocidos degradan a `"stack"`. |
+
+Es **retrocompatible**: un `Group` sin `layout` (o `layout="stack"`) se comporta exactamente
+como antes. El capítulo `cat_distr` es el consumidor previsto (gráfico a la derecha de la
+tabla de categorías en PPT); este motor solo provee el soporte.
+
+**(d) Índice clicable en la portada — `TocEntry`.** La portada emite un `Heading("Índice")`
+seguido de un `TocEntry(label, target_id)` por capítulo. El renderer registra la
+página/slide de inicio de **cada** capítulo (indexado por `id` **y** por `title`) y cablea
+cada `TocEntry` como un salto real a ese inicio: en **PDF** vía
+`add_pdf_internal_links_py_datascience` (link GOTO de PyMuPDF), en **PPTX** vía
+`pptx_link_run_to_slide_py_datascience` (salto a slide nativo). Como la portada solo conoce
+los **títulos** de los capítulos, el `target_id` se hace coincidir contra el `title` (o el
+`id`) de destino. Si un destino no resuelve, la entrada se muestra igualmente como texto
+(en color de enlace), nunca se corta. Es el mismo mecanismo que los términos clicables del
+glosario (§11.1), reutilizado en sentido portada → capítulo.
+
 ---

 ## 10. Integración futura con `profile_table` (siguiente fase)
@@ -77,8 +77,12 @@ from .add_pdf_internal_links import add_pdf_internal_links
 from .suggest_intratable_fk_candidates import suggest_intratable_fk_candidates
 from .render_paper_pdf import render_paper_pdf
 from .draw_join_graph_figure import draw_join_graph_figure
+from .generate_synthetic_eda_table import generate_synthetic_eda_table
+from .generate_synthetic_eda_folder import generate_synthetic_eda_folder

 __all__ = [
+    "generate_synthetic_eda_table",
+    "generate_synthetic_eda_folder",
    "render_paper_pdf",
    "draw_join_graph_figure",
    "suggest_intratable_fk_candidates",
@@ -29,6 +29,7 @@ from .model import (  # noqa: F401
    KVTable,
    Markdown,
    Note,
+    TocEntry,
    as_blocks,
    as_chapters,
    merge_manifest,
@@ -52,6 +53,7 @@ __all__ = [
    "Group",
    "GlossaryEntry",
    "GlossaryCollector",
+    "TocEntry",
    "Chapter",
    "as_blocks",
    "as_chapters",
@@ -26,7 +26,7 @@ from datetime import datetime, timezone

 from .. import model

-CHAPTER_VERSION = "1.2.0"
+CHAPTER_VERSION = "1.4.0"
 CHAPTER_ID = "portada"
 CHAPTER_TITLE = "Portada"

@@ -35,12 +35,9 @@ CHAPTER_TITLE = "Portada"
 # row represents) from it when the LLM layer ran (``run_llm``).
 _LLM_KEY = "llm"

-# Default human description of what the table quality score measures. Chapters
-# can override it via ctx["quality_criteria"].
-_DEFAULT_QUALITY_CRITERIA = (
-    "media de los scores por columna (0–100): completitud (sin nulos/vacíos), "
-    "validez (tipo y rango coherentes) y consistencia (sin duplicados/constantes)."
-)
+# Font size (pt) for the dataset name on the PPTX cover slide — notably larger
+# than the default H1 so the dataset name stands out (shown underlined too).
+_PPTX_TITLE_PT = 44.0


 def _storage_from_source(source: str) -> str:
@@ -120,11 +117,20 @@ def _summary_blocks(summary) -> list:

    blocks = [model.Heading(text="Resumen del análisis", level=2)]
    if rows:
-        blocks.append(model.KVTable(rows=rows))
+        # Values pinned to the right margin (numbers flush right, label left).
+        blocks.append(model.KVTable(rows=rows, value_align="right"))
    if titles:
-        bullets = "\n".join(f"- {model._safe_str(t)}" for t in titles)
-        blocks.append(model.Markdown(
-            text="Este informe incluye los siguientes capítulos:\n" + bullets))
+        # Clickable index ("Índice"): one TocEntry per chapter title. Each entry
+        # becomes a real jump to that chapter's first page/slide once the document
+        # is laid out (the renderers register every chapter start and wire the
+        # links; ``target_id`` is matched against the chapter title). The cover only
+        # knows chapter titles, so the title doubles as the link target.
+        blocks.append(model.Heading(text="Índice", level=2))
+        for t in titles:
+            label = model._safe_str(t)
+            if not label:
+                continue
+            blocks.append(model.TocEntry(label=label, target_id=label))
    return blocks


@@ -213,9 +219,7 @@ def _derive_description(profile: dict, ctx: dict) -> str:
    score = profile.get("quality_score")
    if score is not None:
        parts.append(f"Calidad media estimada: {score}/100.")
-    parts.append(
-        "Resumen derivado del perfil; active la interpretación LLM (`run_llm`) "
-        "para una descripción de negocio más rica.")
+    parts.append("Resumen derivado del perfil.")
    return " ".join(parts)


@@ -259,7 +263,6 @@ def build_portada(profile: dict, ctx: dict):
    shape = f"{_fmt_int(n_rows)} filas × {_fmt_int(n_cols)} columnas"

    score = profile.get("quality_score")
-    quality_criteria = ctx.get("quality_criteria") or _DEFAULT_QUALITY_CRITERIA
    quality_value = "—" if score is None else f"{score} / 100"

    llm = _llm_block(profile, ctx)
@@ -282,8 +285,11 @@ def build_portada(profile: dict, ctx: dict):

    # Title + dataset size shown together and BIG (Heading) at the top, kept on
    # the same page (Group). The size is no longer buried in the metadata table.
+    # The dataset name is shown big and underlined on the PPTX cover slide
+    # (size_pt/underline are honoured by the PPTX renderer; the PDF ignores them).
    cover = [
-        model.Heading(text=str(dataset_name), level=1),
+        model.Heading(text=str(dataset_name), level=1, underline=True,
+                      size_pt=_PPTX_TITLE_PT),
        model.Markdown(text="**Automatic-EDA** · informe exploratorio automático"),
        model.Heading(text=shape, level=2),
    ]
@@ -295,7 +301,6 @@ def build_portada(profile: dict, ctx: dict):
            ("Almacenamiento", storage),
            ("Generado", when),
            ("Calidad", quality_value),
-            ("Criterios de calidad", quality_criteria),
        ]),
        model.Heading(text="Descripción", level=2),
        model.Markdown(text=str(description)),
@@ -38,10 +38,18 @@ ENGINE_NAME = "AutomaticEDA"
 # --------------------------------------------------------------------------- #
@dataclass
 class Heading:
-    """A section heading. ``level`` 1 (largest) .. 3 (smallest)."""
+    """A section heading. ``level`` 1 (largest) .. 3 (smallest).
+
+    ``underline`` and ``size_pt`` are optional emphasis hints honoured by the
+    PPTX renderer (the cover uses them to show the dataset name big and
+    underlined). ``size_pt`` overrides the per-level font size when set; the PDF
+    renderer ignores both so its layout is unchanged.
+    """

    text: str = ""
    level: int = 1
+    underline: bool = False
+    size_pt: Optional[float] = None
    kind: str = field(default="heading", init=False)


@@ -62,10 +70,17 @@ class Markdown:

@dataclass
 class KVTable:
-    """A two-column key/value table. ``rows`` is a list of ``(label, value)``."""
+    """A two-column key/value table. ``rows`` is a list of ``(label, value)``.
+
+    ``value_align`` controls the horizontal alignment of the value column in the
+    PDF renderer: ``"left"`` (default) keeps values next to the label column;
+    ``"right"`` pins them to the right margin (used by the cover's analysis
+    summary so the numbers line up flush right).
+    """

    rows: list = field(default_factory=list)
    title: Optional[str] = None
+    value_align: str = "left"
    kind: str = field(default="kv_table", init=False)


@@ -145,11 +160,21 @@ class Group:
    a chapter can give each unit its own page — e.g. one categorical column per
    page (see CAT DISTR). It is purely additive: the default False keeps the plain
    keep-together behaviour for every existing chapter.
+
+    ``layout`` is a hint for how the group's children are arranged:
+    ``"stack"`` (default) keeps the historical top-to-bottom flow; ``"side_by_side"``
+    asks the PPTX renderer to place the group's table to the LEFT and its figure to
+    the RIGHT of the same slide (table ~55% width, figure ~45%), measuring so both
+    fit and falling back to stacking when they do not. The PDF renderer treats
+    ``"side_by_side"`` exactly like ``"stack"`` (the A5 mobile page is too narrow for
+    two readable columns). Unknown values degrade to ``"stack"``. Purely additive:
+    the default keeps every existing chapter unchanged.
    """

    blocks: list = field(default_factory=list)
    title: Optional[str] = None
    page_break_before: bool = False
+    layout: str = "stack"
    kind: str = field(default="group", init=False)


@@ -168,6 +193,22 @@ class GlossaryEntry:
    kind: str = field(default="glossary_entry", init=False)


+@dataclass
+class TocEntry:
+    """One clickable index (table-of-contents) entry shown on the cover.
+
+    Rendered as a single line — the chapter ``label`` in the accent link colour —
+    that, once the document is laid out, becomes a real click jumping to the first
+    page/slide of the target chapter (PDF link annotation via PyMuPDF; PPTX native
+    slide jump). ``target_id`` is matched against each chapter's ``id`` *and* its
+    ``title`` (the cover only knows chapter titles), so either resolves. If the
+    target cannot be resolved the entry still renders as plain text (never cut)."""
+
+    label: str = ""
+    target_id: str = ""
+    kind: str = field(default="toc_entry", init=False)
+
+
@dataclass
 class Chapter:
    """An ordered set of blocks with an id, a title and a generation version."""
@@ -192,13 +233,14 @@ _BLOCK_BY_KIND = {
    "note": Note,
    "group": Group,
    "glossary_entry": GlossaryEntry,
+    "toc_entry": TocEntry,
 }


 def as_block(obj: Any):
    """Coerce a value into a block dataclass. Unknown values become a Note."""
    if isinstance(obj, (Heading, Markdown, KVTable, DataTable, Figure, Image,
-                        Caption, Note, Group, GlossaryEntry)):
+                        Caption, Note, Group, GlossaryEntry, TocEntry)):
        if isinstance(obj, Group):
            obj.blocks = as_blocks(obj.blocks)
        return obj
@@ -210,13 +252,20 @@ def as_block(obj: Any):
        # Build only with fields the dataclass accepts (ignore extras).
        try:
            if cls is Heading:
+                size_pt = obj.get("size_pt")
                return Heading(text=_safe_str(obj.get("text")),
-                               level=int(obj.get("level", 1) or 1))
+                               level=int(obj.get("level", 1) or 1),
+                               underline=bool(obj.get("underline", False)),
+                               size_pt=(float(size_pt)
+                                        if isinstance(size_pt, (int, float))
+                                        else None))
            if cls is Markdown:
                return Markdown(text=_safe_str(obj.get("text")))
            if cls is KVTable:
                return KVTable(rows=list(obj.get("rows") or []),
-                               title=obj.get("title"))
+                               title=obj.get("title"),
+                               value_align=_safe_str(
+                                   obj.get("value_align")) or "left")
            if cls is DataTable:
                return DataTable(header=list(obj.get("header") or []),
                                 rows=list(obj.get("rows") or []),
@@ -237,11 +286,15 @@ def as_block(obj: Any):
                return Group(blocks=as_blocks(obj.get("blocks")),
                             title=obj.get("title"),
                             page_break_before=bool(
-                                 obj.get("page_break_before", False)))
+                                 obj.get("page_break_before", False)),
+                             layout=_safe_str(obj.get("layout")) or "stack")
            if cls is GlossaryEntry:
                return GlossaryEntry(key=_safe_str(obj.get("key")),
                                     label=_safe_str(obj.get("label")),
                                     definition=_safe_str(obj.get("definition")))
+            if cls is TocEntry:
+                return TocEntry(label=_safe_str(obj.get("label")),
+                                target_id=_safe_str(obj.get("target_id")))
        except Exception:  # noqa: BLE001 — never raise on a malformed block.
            return Note(text=_safe_str(obj))
    return Note(text=_safe_str(obj))
@@ -298,11 +298,16 @@ def test_cover_first_glossary_last_with_summary():
    headings = [b.text for b in cover.blocks if b.kind == "heading"]
    assert any("Resumen" in h for h in headings), \
        "la portada no incluye el resumen agregado"
-    # The summary reflects the body chapters (e.g. the numeric/categorical ones).
-    cover_text = " ".join(
-        b.text for b in cover.blocks if getattr(b, "kind", "") == "markdown")
-    assert "Distribuciones" in cover_text, \
-        "el resumen de portada no menciona los capítulos del cuerpo"
+    # The index ("Índice") is now a clickable list of TocEntry blocks (one per
+    # body chapter), not a markdown bullet list. Verify both the heading and that
+    # the entries name the body chapters.
+    assert any("Índice" in h for h in headings), \
+        "la portada no incluye la sección Índice"
+    toc_labels = " ".join(
+        getattr(b, "label", "") for b in cover.blocks
+        if getattr(b, "kind", "") == "toc_entry")
+    assert "Distribuciones" in toc_labels, \
+        "el índice de portada no menciona los capítulos del cuerpo"


 # --------------------------------------------------------------------------- #
@@ -46,11 +46,23 @@ _MUTED = "#8a8a8a"
 _RULE = "#cccccc"
 _HEAD_BG = "#eef3f6"

+# Rasterization DPI for every embedded raster (figure/table image) AND for the
+# page save itself. Raised from the old 150/default-100 to 220 so a reader can
+# pinch-zoom on a phone and still see crisp detail (axis labels, table cells)
+# without pixelation. Text stays vectorial (pdf.fonttype=42) so it remains
+# selectable regardless of DPI — only the embedded images gain resolution. 220 is
+# a deliberate balance: noticeably sharper than 150 while keeping the file size
+# reasonable. ``savefig.dpi`` matters because matplotlib re-rasterizes each
+# ``imshow`` when PdfPages writes the page; without it the final image would land
+# at ~100 dpi no matter how sharp the intermediate PNG was.
+_RASTER_DPI = 220
+
 _RC = {
    "font.size": 10,
    "font.family": "sans-serif",
    "figure.facecolor": "white",
    "savefig.facecolor": "white",
+    "savefig.dpi": _RASTER_DPI,
    "pdf.fonttype": 42,  # embed TrueType — text stays selectable on mobile.
 }

@@ -80,6 +92,10 @@ class _PdfState:
        # points (1/72") with a top-left origin — same convention as PyMuPDF.
        self.term_sources = []       # [{key, page, rect:[x0,y0,x1,y1]}]
        self.term_dests = {}         # key -> {page, point:[x,y]}
+        # Clickable index (cover → chapter). Sources are the cover's TocEntry
+        # rects; chapter_starts maps a chapter id AND its title to its first page.
+        self.toc_sources = []        # [{target_id, page, rect:[x0,y0,x1,y1]}]
+        self.chapter_starts = {}     # id|title -> {page, point:[x,y]}


 # --------------------------------------------------------------------------- #
@@ -317,10 +333,18 @@ def _place_kv_table(st: _PdfState, block) -> None:
    if title:
        _place_heading(st, model.Heading(title, level=2))
    rows = getattr(block, "rows", []) or []
+    # ``value_align="right"`` pins the value column to the right margin (label
+    # left, number flush right) — used by the cover's analysis summary.
+    right = str(getattr(block, "value_align", "left")).lower() == "right"
    key_w = 1.9  # inches reserved for the label column.
+    # Right-aligned values wrap against the full usable width minus the label
+    # column; left-aligned values wrap against the value column only.
    val_chars = tl.chars_per_line(_USABLE_W - key_w - 0.1, _FS_BODY)
    lh = tl.line_height_in(_FS_BODY)
-    for row in rows:
+    # ``data_idx`` is the 0-based logical row index: even rows (1-based) are
+    # zebra-shaded → 0-based odd indices, matching the data-table convention so
+    # every table in the document carries the same striping.
+    for data_idx, row in enumerate(rows):
        try:
            label, value = row[0], row[1]
        except Exception:  # noqa: BLE001
@@ -329,11 +353,25 @@ def _place_kv_table(st: _PdfState, block) -> None:
        row_h = lh * len(v_lines) + _ROW_VPAD
        _ensure_space(st, row_h)
        y0 = st.y
+        # Faint zebra fill for even rows, drawn first (zorder 0) so striping
+        # never hides the text/value drawn on top.
+        if data_idx % 2 == 1:
+            st.fig.add_artist(Rectangle(
+                (_xf(_ML), _yf(y0 + row_h)), _xf(_ML + _USABLE_W) - _xf(_ML),
+                _yf(y0) - _yf(y0 + row_h), transform=st.fig.transFigure,
+                color=_ZEBRA, lw=0, zorder=0))
        st.fig.text(_xf(_ML), _yf(y0), tl.strip_inline_md(model._safe_str(label)),
-                    fontsize=_FS_BODY, color=_MUTED, ha="left", va="top")
+                    fontsize=_FS_BODY, color=_MUTED, ha="left", va="top",
+                    zorder=2)
        for k, vl in enumerate(v_lines):
-            st.fig.text(_xf(_ML + key_w), _yf(y0 + k * lh), vl,
-                        fontsize=_FS_BODY, color=_INK, ha="left", va="top")
+            if right:
+                st.fig.text(_xf(_ML + _USABLE_W), _yf(y0 + k * lh), vl,
+                            fontsize=_FS_BODY, color=_INK, ha="right",
+                            va="top", zorder=2)
+            else:
+                st.fig.text(_xf(_ML + key_w), _yf(y0 + k * lh), vl,
+                            fontsize=_FS_BODY, color=_INK, ha="left",
+                            va="top", zorder=2)
        st.y = y0 + row_h
    st.y += _GAP

@@ -363,6 +401,57 @@ def _col_widths(header: list, rows: list, fs: float) -> list:
    return widths


+# Minimal legible characters reserved per column when deciding whether a table
+# can be shown as selectable text. Below this width per column the cells become
+# unreadable, so the table is rasterized to a zoomable high-res image instead.
+_MIN_LEGIBLE_CHARS = 8
+
+
+def _table_fits_as_text(header: list, rows: list) -> bool:
+    """True when the table fits the usable width as readable text.
+
+    A table whose columns cannot each get a minimal legible width within the A5
+    usable width (typically many columns, e.g. a 19-column ``df.head``) is flagged
+    so it is rendered as a single high-resolution image — the reader zooms in on
+    the phone and reads every cell, nothing cut — instead of being squeezed until
+    unreadable. Narrow tables (few columns) keep the selectable-text rendering."""
+    header = header or []
+    rows = rows or []
+    ncol = len(header) if header else (len(rows[0]) if rows else 1)
+    ncol = max(1, ncol)
+    cw = tl.avg_char_width_in(_FS_CELL)
+    min_needed = ncol * (_MIN_LEGIBLE_CHARS * cw + _CELL_PAD * 2)
+    return min_needed <= _USABLE_W
+
+
+def _table_figure_block(block):
+    """Wrap a too-wide table as a lazily-rasterized Figure (cached on the block).
+
+    The table is drawn once via ``render_table_as_figure`` (header shading + zebra)
+    and embedded as one high-res image scaled to fit entirely. The same Figure is
+    reused for measuring and placing so keep-together stays consistent. The table
+    title/note are drawn inside the image (self-describing when zoomed/shared), so
+    the block-level caption is left empty to avoid a duplicate title."""
+    cached = getattr(block, "_aeda_tablefig", None)
+    if cached is not None:
+        return cached
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    title = getattr(block, "title", None)
+    note = getattr(block, "note", None)
+
+    def _make():
+        from datascience.render_table_as_figure import render_table_as_figure
+        return render_table_as_figure(header, rows, title=title, note=note)
+
+    fig = model.Figure(make=_make, caption=None)
+    try:
+        block._aeda_tablefig = fig
+    except Exception:  # noqa: BLE001 — block may reject attributes; degrade.
+        pass
+    return fig
+
+
 def _wrap_row(cells: list, widths: list, fs: float) -> list:
    """Wrap each cell to its column width → list of line-lists per cell."""
    out = []
@@ -402,11 +491,16 @@ def _draw_table_row(st: _PdfState, cells_lines: list, widths: list, fs: float,


 def _place_data_table(st: _PdfState, block) -> None:
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Too many columns to be legible as text → render the whole table as one
+    # high-res image, scaled to fit entirely (the reader zooms to read it).
+    if not _table_fits_as_text(header, rows):
+        _place_figure(st, _table_figure_block(block))
+        return
    title = getattr(block, "title", None)
    if title:
        _place_heading(st, model.Heading(title, level=2))
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows, fs)
    header_lines = _wrap_row(header, widths, fs) if header else None
@@ -464,8 +558,11 @@ def _resolve_figure(block):


 def _png_from_figure(fig) -> bytes:
+    # ``bbox_inches='tight'`` is kept so the real aspect ratio is what we measure
+    # and place. The page save (savefig.dpi in _RC) re-rasterizes this at the same
+    # high DPI, so the embedded image stays crisp for phone zoom.
    buf = io.BytesIO()
-    fig.savefig(buf, format="png", dpi=150, bbox_inches="tight")
+    fig.savefig(buf, format="png", dpi=_RASTER_DPI, bbox_inches="tight")
    buf.seek(0)
    return buf.read()

@@ -707,12 +804,16 @@ def _measure_data_table(block) -> float:
    Counts the optional title heading, the wrapped header row, every wrapped data
    row (per-column wrap via the same ``_col_widths``/``_wrap_row`` the placer
    uses) and the optional note. Keep this in sync with ``_place_data_table``."""
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Mirror the placer: a too-wide table is drawn as a single image, so its
+    # keep-together height is the image's, not the (squeezed) text layout's.
+    if not _table_fits_as_text(header, rows):
+        return _measure_figure_like(_table_figure_block(block))
    h = 0.0
    title = getattr(block, "title", None)
    if title:
        h += _measure_heading_text(title, 2)
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows, fs)
    lh = tl.line_height_in(fs)
@@ -744,6 +845,10 @@ def _measure_block(st: _PdfState, block) -> float:
            lines = tl.wrap(getattr(block, "text", ""),
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + _GAP
+        if kind == "toc_entry":
+            lines = tl.wrap(tl.strip_inline_md(getattr(block, "label", "")),
+                            tl.chars_per_line(_USABLE_W - 0.22, _FS_BODY)) or [""]
+            return tl.line_height_in(_FS_BODY) * len(lines) + _GAP * 0.4
        if kind == "kv_table":
            return _measure_kv_table(block)
        if kind == "data_table":
@@ -828,6 +933,38 @@ def _place_glossary_entry(st: _PdfState, block) -> None:
    st.y += _GAP * 0.5


+def _place_toc_entry(st: _PdfState, block) -> None:
+    """Render one clickable index line and record it as a link source.
+
+    Drawn as a bulleted line in the accent link colour; its rectangle is recorded
+    in ``st.toc_sources`` so the post-processor turns it into a real jump to the
+    target chapter's first page. If the target is never resolved the line still
+    shows as plain (accent) text — never cut, never broken."""
+    label = tl.strip_inline_md(getattr(block, "label", "")) or ""
+    target_id = getattr(block, "target_id", "") or ""
+    fs = _FS_BODY
+    lh = tl.line_height_in(fs)
+    bullet = "•  "
+    indent = 0.22
+    max_chars = tl.chars_per_line(_USABLE_W - indent, fs)
+    lines = tl.wrap(label, max_chars) or [""]
+    for idx, ln in enumerate(lines):
+        _ensure_space(st, lh)
+        x = _ML
+        st.fig.text(_xf(x), _yf(st.y), bullet if idx == 0 else "   ",
+                    fontsize=fs, color=_LINK, ha="left", va="top")
+        x += indent
+        w = _text_width_in(st, ln, fs, False)
+        st.fig.text(_xf(x), _yf(st.y), ln, fontsize=fs, color=_LINK,
+                    ha="left", va="top")
+        if target_id and idx == 0:
+            st.toc_sources.append({
+                "target_id": target_id, "page": st.page - 1,
+                "rect": _pt_rect(_ML, st.y, x + w, st.y + lh)})
+        st.y += lh
+    st.y += _GAP * 0.4
+
+
 _PLACERS = {
    "heading": _place_heading,
    "markdown": _place_markdown,
@@ -839,6 +976,7 @@ _PLACERS = {
    "note": _place_note,
    "group": _place_group,
    "glossary_entry": _place_glossary_entry,
+    "toc_entry": _place_toc_entry,
 }


@@ -870,6 +1008,15 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:
                    st.chapter = ch
                    st.chapter_pages = 0
                    _new_page(st)  # each chapter starts on a fresh page.
+                    # Record this chapter's first page as a link target for the
+                    # cover index (keyed by id AND title, since the cover only
+                    # knows titles). Point is the top of the content area.
+                    _start = {"page": st.page - 1,
+                              "point": [_ML * 72.0, _CONTENT_TOP * 72.0]}
+                    if ch.id:
+                        st.chapter_starts[ch.id] = _start
+                    if getattr(ch, "title", ""):
+                        st.chapter_starts.setdefault(ch.title, _start)
                    for block in ch.blocks:
                        placer = _PLACERS.get(getattr(block, "kind", ""),
                                              _place_note)
@@ -902,7 +1049,7 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:

    note = f"{n_pages} páginas"
    if n_links:
-        note += f" · {n_links} enlaces de glosario"
+        note += f" · {n_links} enlaces internos"
    if notes:
        note += " · " + "; ".join(notes)
    return {"path": out_path, "n_pages": n_pages, "chapters": chapters_meta,
@@ -910,9 +1057,11 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:


 def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
-    """Build {source rect → glossary dest} links and apply them via PyMuPDF.
+    """Apply internal PDF links via PyMuPDF: glossary terms + the cover index.

-    Returns the number of links applied (0 if there is nothing to wire or the
+    Builds two sets of GOTO links — every in-text glossary term → its entry, and
+    every cover ``TocEntry`` → its chapter's first page — and applies them in one
+    pass. Returns the number of links applied (0 if there is nothing to wire or the
    post-processor is unavailable). Never raises."""
    try:
        links = []
@@ -923,6 +1072,14 @@ def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
            links.append({
                "src_page": src["page"], "src_rect": src["rect"],
                "dst_page": dest["page"], "dst_point": dest["point"]})
+        # Cover index → chapter first page (clickable, navigable table of contents).
+        for src in st.toc_sources:
+            dest = st.chapter_starts.get(src.get("target_id"))
+            if not dest:
+                continue
+            links.append({
+                "src_page": src["page"], "src_rect": src["rect"],
+                "dst_page": dest["page"], "dst_point": dest["point"]})
        if not links:
            return 0
        from datascience.add_pdf_internal_links import add_pdf_internal_links
@@ -930,7 +1087,7 @@ def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
        if isinstance(res, dict) and res.get("status") == "ok":
            return int(res.get("n_links") or 0)
        if isinstance(res, dict) and res.get("error"):
-            notes.append(f"glosario sin enlaces: {res.get('error')}")
+            notes.append(f"enlaces internos no aplicados: {res.get('error')}")
    except Exception as e:  # noqa: BLE001 — links are best-effort.
-        notes.append(f"glosario sin enlaces: {e}")
+        notes.append(f"enlaces internos no aplicados: {e}")
    return 0
@@ -51,6 +51,12 @@ _FS_H1, _FS_H2, _FS_H3 = 20, 16, 13
 _FS_BODY, _FS_CELL, _FS_NOTE = 14, 11, 11
 _GAP = 0.12

+# Rasterization DPI for every embedded figure/table image. Raised from 150 to 220
+# so a viewer can zoom into a slide (or a shared picture) and read crisp detail —
+# axis labels, table cells — without pixelation. Kept moderate so the deck size
+# stays reasonable. Same value as the PDF renderer.
+_RASTER_DPI = 220
+

 class _PptxState:
    def __init__(self, prs, title: str):
@@ -65,6 +71,10 @@ class _PptxState:
        # Glossary wiring (mejora 6): runs to link and per-term target slide.
        self.term_runs = []           # [(key, run)]
        self.term_anchor_slide = {}   # key -> Slide (glossary entry)
+        # Clickable index (cover → chapter). toc_runs are the cover's index runs;
+        # chapter_starts maps a chapter id AND its title to its first slide.
+        self.toc_runs = []            # [(target_id, run, src_slide)]
+        self.chapter_starts = {}      # id|title -> Slide (chapter first slide)


 def _rgb(c):
@@ -135,7 +145,7 @@ def _ensure(st: _PptxState, height: float) -> None:


 def _add_text(st: _PptxState, lines: list, fs: float, color, bold=False,
-              italic=False, indent=0.0, bullet=False) -> None:
+              italic=False, indent=0.0, bullet=False, underline=False) -> None:
    lh = tl.line_height_in(fs)
    height = lh * len(lines) + 0.05
    _ensure(st, height)
@@ -153,6 +163,7 @@ def _add_text(st: _PptxState, lines: list, fs: float, color, bold=False,
        run.font.size = Pt(fs)
        run.font.bold = bold
        run.font.italic = italic
+        run.font.underline = underline
        run.font.color.rgb = _rgb(color)
    st.y += height

@@ -206,10 +217,16 @@ def _add_rich_text(st: _PptxState, rich_lines: list, fs: float, color,
 def _place_heading(st: _PptxState, block) -> None:
    level = max(1, min(3, int(getattr(block, "level", 1) or 1)))
    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
+    # Optional per-heading emphasis (cover dataset name): a larger font and an
+    # underline. ``size_pt`` overrides the per-level size when set.
+    size_override = getattr(block, "size_pt", None)
+    if isinstance(size_override, (int, float)) and size_override > 0:
+        fs = float(size_override)
+    underline = bool(getattr(block, "underline", False))
    text = tl.strip_inline_md(getattr(block, "text", ""))
    st.last_heading = text or st.last_heading
    lines = tl.wrap(text, tl.chars_per_line(_USABLE_W, fs))
-    _add_text(st, lines, fs, _INK, bold=True)
+    _add_text(st, lines, fs, _INK, bold=True, underline=underline)
    st.y += 0.04


@@ -302,6 +319,58 @@ def _col_widths(header, rows):
    return [_USABLE_W * w / total for w in clamped]


+# Minimal legible characters reserved per column when deciding whether a table
+# can be shown as a native (selectable) PowerPoint table. Below this width per
+# column the cells become unreadable, so the table is rasterized to a zoomable
+# high-res image instead. The 16:9 slide is wide, so more columns fit than on A5.
+_MIN_LEGIBLE_CHARS = 8
+_CELL_PAD = 0.05
+
+
+def _table_fits_as_text(header: list, rows: list) -> bool:
+    """True when the table fits the usable slide width as a readable table.
+
+    A table whose columns cannot each get a minimal legible width within the slide
+    usable width (typically many columns, e.g. a 19-column ``df.head``) is flagged
+    so it is rendered as one high-resolution image — the viewer zooms in and reads
+    every cell — instead of being squeezed unreadable. Narrow tables keep the
+    native selectable table."""
+    header = header or []
+    rows = rows or []
+    ncol = len(header) if header else (len(rows[0]) if rows else 1)
+    ncol = max(1, ncol)
+    cw = tl.avg_char_width_in(_FS_CELL)
+    min_needed = ncol * (_MIN_LEGIBLE_CHARS * cw + _CELL_PAD * 2)
+    return min_needed <= _USABLE_W
+
+
+def _table_figure_block(block):
+    """Wrap a too-wide table as a lazily-rasterized Figure (cached on the block).
+
+    Drawn once via ``render_table_as_figure`` (header shading + zebra) and embedded
+    as one high-res image scaled to fit entirely. The title/note are drawn inside
+    the image (self-describing when zoomed/shared), so no separate caption is
+    emitted. Reused for measuring and placing so keep-together stays consistent."""
+    cached = getattr(block, "_aeda_tablefig", None)
+    if cached is not None:
+        return cached
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    title = getattr(block, "title", None)
+    note = getattr(block, "note", None)
+
+    def _make():
+        from datascience.render_table_as_figure import render_table_as_figure
+        return render_table_as_figure(header, rows, title=title, note=note)
+
+    fig = model.Figure(make=_make, caption=None)
+    try:
+        block._aeda_tablefig = fig
+    except Exception:  # noqa: BLE001 — block may reject attributes; degrade.
+        pass
+    return fig
+
+
 def _row_height_in(cells, widths, fs) -> float:
    lh = tl.line_height_in(fs)
    maxlines = 1
@@ -365,11 +434,27 @@ def _style_cell(cell, fs, color, bold, fill) -> None:

 def _place_data_table(st: _PptxState, block, shaded_header=True,
                      key_value=False) -> None:
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Too many columns to be legible as a native table → render the whole table as
+    # one high-res picture, scaled to fit entirely (the viewer zooms to read it).
+    # KVTables (rendered here as a 2-column Campo/Valor table) are excluded: they
+    # always fit in width and stay as a selectable table.
+    if not key_value and not _table_fits_as_text(header, rows):
+        figblock = _table_figure_block(block)
+        data, _asp = _figure_bytes_cached(figblock)
+        if data is None:
+            _add_text(st, ["(tabla no disponible)"], _FS_NOTE, _MUTED,
+                      italic=True)
+            st.y += _GAP
+            return
+        _place_picture_bytes(st, data, None,
+                             max_h_in=getattr(figblock, "height_in", None),
+                             force_caption=False)
+        return
    title = getattr(block, "title", None)
    if title:
        _place_heading(st, model.Heading(title, level=2))
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows)
    header_h = _row_height_in(header, widths, fs) if header else 0.0
@@ -429,7 +514,7 @@ def _resolve_png(block):
    try:
        import matplotlib.pyplot as plt
        buf = io.BytesIO()
-        f.savefig(buf, format="png", dpi=150, bbox_inches="tight")
+        f.savefig(buf, format="png", dpi=_RASTER_DPI, bbox_inches="tight")
        buf.seek(0)
        return buf.read()
    except Exception:  # noqa: BLE001
@@ -476,12 +561,15 @@ def _figure_bytes_cached(block):


 def _place_picture_bytes(st: _PptxState, data: bytes, caption,
-                         max_h_in=None) -> None:
+                         max_h_in=None, force_caption=True) -> None:
    # Mejora 4 — every figure on a slide carries a visible caption/title. If the
    # block has no caption, fall back to the current section heading, then to a
-    # generic label, so no image is ever shown untitled.
-    caption = (model._safe_str(caption).strip()
-               or model._safe_str(st.last_heading).strip() or "Figura")
+    # generic label, so no image is ever shown untitled. ``force_caption=False``
+    # suppresses that fallback (used for table images, whose title is inside the
+    # picture) so no redundant caption is drawn.
+    caption = model._safe_str(caption).strip()
+    if not caption and force_caption:
+        caption = model._safe_str(st.last_heading).strip() or "Figura"
    w_px, h_px = _img_size_px(data)
    aspect = (h_px / w_px) if w_px else 0.66
    # Reserve the caption's REAL (possibly multi-line) height FIRST, then scale
@@ -489,9 +577,11 @@ def _place_picture_bytes(st: _PptxState, data: bytes, caption,
    # so its caption always fits on the SAME slide and no image is untitled.
    # cap_real = what _add_text consumes; cap_reserve adds the post-image gap and
    # a small cushion so the caption never spills to the next slide.
-    cap_lines = tl.wrap(caption, tl.chars_per_line(_USABLE_W, _FS_NOTE))
-    cap_real = tl.line_height_in(_FS_NOTE) * len(cap_lines) + 0.05
-    cap_reserve = cap_real + 0.05 + 0.10
+    cap_lines = tl.wrap(caption, tl.chars_per_line(_USABLE_W, _FS_NOTE)) \
+        if caption else []
+    cap_real = (tl.line_height_in(_FS_NOTE) * len(cap_lines) + 0.05) \
+        if cap_lines else 0.0
+    cap_reserve = (cap_real + 0.05 + 0.10) if cap_lines else 0.05
    max_h = _CONTENT_BOTTOM - _CONTENT_TOP
    # height_in hint (model.Figure/Image): cap the target height so a figure in a
    # keep-together Group shrinks to leave room for its heading and text.
@@ -510,7 +600,8 @@ def _place_picture_bytes(st: _PptxState, data: bytes, caption,
    st.slide.shapes.add_picture(io.BytesIO(data), Inches(left), Inches(st.y),
                                width=Inches(target_w), height=Inches(target_h))
    st.y += target_h + 0.05
-    _add_text(st, cap_lines, _FS_NOTE, _MUTED, italic=True)
+    if cap_lines:
+        _add_text(st, cap_lines, _FS_NOTE, _MUTED, italic=True)
    st.y += _GAP


@@ -552,9 +643,11 @@ def _place_note(st: _PptxState, block) -> None:
 # WITHOUT drawing it so a Group can move whole to the next slide before drawing.
 # Over-estimating only triggers an earlier slide break, never a content cut.
 # --------------------------------------------------------------------------- #
-def _measure_heading_text(text: str, level: int) -> float:
+def _measure_heading_text(text: str, level: int, size_pt=None) -> float:
    level = max(1, min(3, int(level or 1)))
    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
+    if isinstance(size_pt, (int, float)) and size_pt > 0:
+        fs = float(size_pt)
    lines = tl.wrap(tl.strip_inline_md(text), tl.chars_per_line(_USABLE_W, fs))
    return tl.line_height_in(fs) * len(lines) + 0.05 + 0.04

@@ -654,12 +747,16 @@ def _measure_kv_table(block) -> float:
 def _measure_data_table(block) -> float:
    """Faithful DataTable height — matches ``_place_data_table`` (title heading +
    wrapped header + every wrapped row + optional note). Keep in sync."""
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Mirror the placer: a too-wide table is drawn as one image, so its
+    # keep-together height is the image's, not the (squeezed) table layout's.
+    if not _table_fits_as_text(header, rows):
+        return _measure_figure_like(_table_figure_block(block))
    h = 0.0
    title = getattr(block, "title", None)
    if title:
        h += _measure_heading_text(title, 2)
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows)
    if header:
@@ -679,7 +776,8 @@ def _measure_block(st: _PptxState, block) -> float:
    try:
        if kind == "heading":
            return _measure_heading_text(getattr(block, "text", ""),
-                                         getattr(block, "level", 1))
+                                         getattr(block, "level", 1),
+                                         size_pt=getattr(block, "size_pt", None))
        if kind == "markdown":
            return _measure_markdown(block)
        if kind in ("figure", "image"):
@@ -688,6 +786,10 @@ def _measure_block(st: _PptxState, block) -> float:
            lines = tl.wrap(getattr(block, "text", ""),
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + 0.05 + _GAP
+        if kind == "toc_entry":
+            lines = tl.wrap(tl.strip_inline_md(getattr(block, "label", "")),
+                            tl.chars_per_line(_USABLE_W - 0.3, _FS_BODY)) or [""]
+            return tl.line_height_in(_FS_BODY) * len(lines) + 0.05
        if kind == "kv_table":
            return _measure_kv_table(block)
        if kind == "data_table":
@@ -800,6 +902,73 @@ def _fit_group_blocks(st: _PptxState, blocks: list, avail_full: float) -> list:
    return out


+def _fit_img(width_col: float, aspect: float, max_h: float):
+    """Scale an image to ``width_col`` then clamp to ``max_h`` keeping aspect."""
+    w = width_col
+    h = w * aspect
+    if h > max_h:
+        h = max_h
+        w = (h / aspect) if aspect else width_col
+    return w, h
+
+
+def _place_group_side_by_side(st: _PptxState, block, avail_full: float) -> bool:
+    """Place a Group's table (left ~55%) next to its figure (right ~45%).
+
+    Both the table and the figure are rasterized to high-res images and placed in
+    two columns of the SAME slide; any other blocks (e.g. a heading) render full
+    width above the pair, the rest below. Returns True on success; returns False
+    (so the caller falls back to stacking) when the group has no table+figure pair
+    or the pair cannot fit side by side on one slide. Never raises by itself."""
+    blocks = getattr(block, "blocks", []) or []
+    tbl = next((b for b in blocks
+                if getattr(b, "kind", "") in ("data_table", "kv_table")), None)
+    fig = next((b for b in blocks
+                if getattr(b, "kind", "") in ("figure", "image")), None)
+    if tbl is None or fig is None:
+        return False
+    gap_col = 0.3
+    left_w = _USABLE_W * 0.55 - gap_col / 2.0
+    right_w = _USABLE_W * 0.45 - gap_col / 2.0
+    if left_w <= 1.0 or right_w <= 1.0:
+        return False
+    tdata, tasp = _figure_bytes_cached(_table_figure_block(tbl))
+    fdata, fasp = _figure_bytes_cached(fig)
+    if not tdata or not fdata:
+        return False
+    ti, fi = blocks.index(tbl), blocks.index(fig)
+    lo = min(ti, fi)
+    lead = list(blocks[:lo])
+    rest = [b for b in blocks[lo + 1:] if b is not tbl and b is not fig]
+    lead_h = sum(_measure_block(st, b) for b in lead)
+    rest_h = sum(_measure_block(st, b) for b in rest)
+    col_max_h = avail_full - lead_h - rest_h - _GAP * 2
+    if col_max_h < 1.2:
+        return False  # not enough vertical room to put the pair side by side.
+    tw, th = _fit_img(left_w, tasp, col_max_h)
+    fw, fh = _fit_img(right_w, fasp, col_max_h)
+    band = max(th, fh)
+    needed = lead_h + band + rest_h + _GAP * 2
+    if needed > avail_full:
+        return False  # taller than a whole slide even side by side → stack.
+    if needed > _remaining(st):
+        _new_slide(st, cont=True)
+    for b in lead:
+        _PLACERS.get(getattr(b, "kind", ""), _place_note)(st, b)
+    top = st.y
+    f_left = _ML + left_w + gap_col
+    st.slide.shapes.add_picture(
+        io.BytesIO(tdata), Inches(_ML + (left_w - tw) / 2.0),
+        Inches(top + (band - th) / 2.0), width=Inches(tw), height=Inches(th))
+    st.slide.shapes.add_picture(
+        io.BytesIO(fdata), Inches(f_left + (right_w - fw) / 2.0),
+        Inches(top + (band - fh) / 2.0), width=Inches(fw), height=Inches(fh))
+    st.y = top + band + _GAP
+    for b in rest:
+        _PLACERS.get(getattr(b, "kind", ""), _place_note)(st, b)
+    return True
+
+
 def _place_group(st: _PptxState, block) -> None:
    """Render a keep-together Group: move it whole to the next slide if needed."""
    blocks = getattr(block, "blocks", []) or []
@@ -810,6 +979,14 @@ def _place_group(st: _PptxState, block) -> None:
    if getattr(block, "page_break_before", False) and st.y > _CONTENT_TOP + 1e-6:
        _new_slide(st, cont=True)
    avail_full = _CONTENT_BOTTOM - _CONTENT_TOP
+    # layout="side_by_side": try table-left / figure-right on one slide; on any
+    # reason it can't, fall through to the normal stacked keep-together below.
+    if str(getattr(block, "layout", "stack")).lower() == "side_by_side":
+        try:
+            if _place_group_side_by_side(st, block, avail_full):
+                return
+        except Exception:  # noqa: BLE001 — degrade to stacking, never abort.
+            pass
    # Trim oversized tables first (keeps the chart on the same slide), then shrink
    # the figure to share the remaining room.
    blocks = _fit_group_blocks(st, blocks, avail_full)
@@ -843,6 +1020,44 @@ def _place_glossary_entry(st: _PptxState, block) -> None:
    st.y += _GAP


+def _place_toc_entry(st: _PptxState, block) -> None:
+    """Render one clickable index line and record its run as a link source.
+
+    Drawn as a bulleted line in the accent link colour; the run is recorded in
+    ``st.toc_runs`` so it later becomes a native slide-jump to the target chapter's
+    first slide. If the target is never resolved the line still shows as plain
+    (accent) text — never cut."""
+    label = tl.strip_inline_md(getattr(block, "label", "")) or ""
+    target_id = getattr(block, "target_id", "") or ""
+    fs = _FS_BODY
+    lines = tl.wrap(label, tl.chars_per_line(_USABLE_W - 0.3, fs)) or [""]
+    lh = tl.line_height_in(fs)
+    height = lh * len(lines) + 0.05
+    _ensure(st, height)
+    box = st.slide.shapes.add_textbox(
+        Inches(_ML), Inches(st.y), Inches(_USABLE_W), Inches(height))
+    tf = box.text_frame
+    tf.word_wrap = True
+    first = True
+    link_run = None
+    for idx, ln in enumerate(lines):
+        p = tf.paragraphs[0] if first else tf.add_paragraph()
+        first = False
+        r0 = p.add_run()
+        r0.text = "•  " if idx == 0 else "   "
+        r0.font.size = Pt(fs)
+        r0.font.color.rgb = _rgb(_LINK)
+        run = p.add_run()
+        run.text = ln
+        run.font.size = Pt(fs)
+        run.font.color.rgb = _rgb(_LINK)
+        if idx == 0:
+            link_run = run
+    if target_id and link_run is not None:
+        st.toc_runs.append((target_id, link_run, st.slide))
+    st.y += height
+
+
 _PLACERS = {
    "heading": _place_heading,
    "markdown": _place_markdown,
@@ -854,6 +1069,7 @@ _PLACERS = {
    "note": _place_note,
    "group": _place_group,
    "glossary_entry": _place_glossary_entry,
+    "toc_entry": _place_toc_entry,
 }


@@ -889,6 +1105,12 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:
            st.chapter = ch
            st.chapter_slides = 0
            _new_slide(st, cont=False)
+            # Record this chapter's first slide as a link target for the cover
+            # index (keyed by id AND title, since the cover only knows titles).
+            if ch.id:
+                st.chapter_starts[ch.id] = st.slide
+            if getattr(ch, "title", ""):
+                st.chapter_starts.setdefault(ch.title, st.slide)
            for block in ch.blocks:
                placer = _PLACERS.get(getattr(block, "kind", ""), _place_note)
                try:
@@ -916,7 +1138,7 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:

    note = f"{n_slides} slides"
    if n_links:
-        note += f" · {n_links} enlaces de glosario"
+        note += f" · {n_links} enlaces internos"
    if notes:
        note += " · " + "; ".join(notes)
    return {"path": out_path, "n_slides": n_slides, "chapters": chapters_meta,
@@ -924,19 +1146,21 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:


 def _wire_glossary_links(st: _PptxState, notes: list) -> int:
-    """Turn each recorded term run into a native jump to its glossary slide.
+    """Apply native slide-jumps: glossary terms + the cover index.

-    Returns the number of links applied. A term whose only appearance is inside
-    its own glossary entry (source slide == target slide) is skipped. Never
+    Each in-text glossary term run jumps to its glossary entry slide, and each
+    cover ``TocEntry`` run jumps to its chapter's first slide. Returns the total
+    number of links applied. A run whose target is its own slide is skipped. Never
    raises."""
-    if not st.term_runs or not st.term_anchor_slide:
+    if not (st.term_runs and st.term_anchor_slide) and not (
+            st.toc_runs and st.chapter_starts):
        return 0
-    linked = 0
    try:
        from datascience.pptx_link_run_to_slide import pptx_link_run_to_slide
    except Exception as e:  # noqa: BLE001
-        notes.append(f"glosario sin enlaces: {e}")
+        notes.append(f"enlaces internos no aplicados: {e}")
        return 0
+    linked = 0
    for key, run, src_slide in st.term_runs:
        tgt = st.term_anchor_slide.get(key)
        if tgt is None or tgt is src_slide:
@@ -946,4 +1170,14 @@ def _wire_glossary_links(st: _PptxState, notes: list) -> int:
                linked += 1
        except Exception:  # noqa: BLE001 — links are best-effort.
            pass
+    # Cover index → chapter first slide (clickable, navigable table of contents).
+    for target_id, run, src_slide in st.toc_runs:
+        tgt = st.chapter_starts.get(target_id)
+        if tgt is None or tgt is src_slide:
+            continue
+        try:
+            if pptx_link_run_to_slide(run, src_slide, tgt):
+                linked += 1
+        except Exception:  # noqa: BLE001 — links are best-effort.
+            pass
    return linked
@@ -0,0 +1,283 @@
+"""Golden tests for the global render-quality features (issue: eda-render-quality).
+
+Covers, with executable evidence:
+  * High DPI: every embedded figure is rasterized at 220 dpi, so a phone reader
+    can zoom in and still see crisp detail.
+  * Wide table → image: a table too wide to be legible as text (e.g. a 19-column
+    df.head) is rendered as one high-res image that scales to fit entirely, while
+    a narrow table keeps its selectable-text/native-table rendering.
+  * ``Group(layout="side_by_side")``: in PPTX the table and figure are placed in
+    two columns of the same slide; in PDF the same group stacks vertically.
+  * Backward compatibility: a Group without ``layout`` defaults to ``"stack"`` and
+    a fitting table renders exactly as before.
+
+Renderers are invoked for real; PDFs are inspected with PyMuPDF and PPTX decks
+with python-pptx.
+"""
+
+from __future__ import annotations
+
+import os
+import tempfile
+
+import matplotlib
+
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt  # noqa: E402
+
+import pytest  # noqa: E402
+
+from datascience.automatic_eda import model  # noqa: E402
+from datascience.automatic_eda.render_pdf_impl import (  # noqa: E402
+    render_pdf, _RASTER_DPI as _PDF_DPI, _table_fits_as_text as _pdf_fits)
+from datascience.automatic_eda.render_pptx_impl import (  # noqa: E402
+    render_pptx, _RASTER_DPI as _PPTX_DPI, _table_fits_as_text as _pptx_fits)
+
+
+# --------------------------------------------------------------------------- #
+# Helpers.
+# --------------------------------------------------------------------------- #
+def _simple_fig():
+    """A small, real matplotlib figure for the figure blocks."""
+    fig, ax = plt.subplots(figsize=(4, 3))
+    ax.plot([0, 1, 2, 3], [1, 3, 2, 4])
+    ax.set_title("demo")
+    return fig
+
+
+def _wide_table(n_cols=19, n_rows=5):
+    header = [f"columna_{i}" for i in range(n_cols)]
+    rows = [[f"v{r}_{c}" for c in range(n_cols)] for r in range(n_rows)]
+    return model.DataTable(header=header, rows=rows, title="Primeras filas")
+
+
+def _narrow_table():
+    return model.DataTable(header=["a", "b", "c"],
+                           rows=[["1", "2", "3"], ["4", "5", "6"]],
+                           title="Tabla estrecha")
+
+
+def _chapter(blocks, cid="cap", title="Capítulo"):
+    return [model.Chapter(id=cid, title=title, version="1.0.0", blocks=blocks)]
+
+
+# --------------------------------------------------------------------------- #
+# 1) High DPI — the unit constant and a real embedded image.
+# --------------------------------------------------------------------------- #
+def test_raster_dpi_is_high_both_renderers():
+    assert _PDF_DPI >= 200, "el DPI del PDF debe ser alto (>=200)"
+    assert _PPTX_DPI >= 200, "el DPI del PPTX debe ser alto (>=200)"
+
+
+def test_pdf_embedded_figure_is_high_resolution(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "fig.pdf")
+    res = render_pdf(_chapter([model.Figure(make=_simple_fig, caption="demo")]),
+                     out, {"title": "T"})
+    assert res["path"] == out
+    doc = fitz.open(out)
+    try:
+        widths = []
+        for page in doc:
+            for img in page.get_images(full=True):
+                xref = img[0]
+                info = doc.extract_image(xref)
+                widths.append(info.get("width", 0))
+        assert widths, "no se incrustó ninguna imagen en el PDF"
+        # A ~4" figure rasterized at 220 dpi is ~ >850 px wide. At the old 150 dpi
+        # it would be ~600 px. The high-res threshold proves the DPI bump.
+        assert max(widths) >= 800, \
+            f"la figura embebida no es de alta resolución: {max(widths)} px"
+    finally:
+        doc.close()
+
+
+# --------------------------------------------------------------------------- #
+# 2) Wide table → image (PDF and PPTX); narrow table stays text.
+# --------------------------------------------------------------------------- #
+def test_fit_criterion_flags_wide_and_keeps_narrow():
+    wide = _wide_table()
+    narrow = _narrow_table()
+    assert not _pdf_fits(wide.header, wide.rows), \
+        "una tabla de 19 columnas debería NO caber como texto en A5"
+    assert not _pptx_fits(wide.header, wide.rows), \
+        "una tabla de 19 columnas debería NO caber como tabla nativa en 16:9"
+    assert _pdf_fits(narrow.header, narrow.rows), \
+        "una tabla de 3 columnas debería caber como texto en A5"
+    assert _pptx_fits(narrow.header, narrow.rows), \
+        "una tabla de 3 columnas debería caber como tabla nativa en 16:9"
+
+
+def test_wide_table_rendered_as_image_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "wide.pdf")
+    res = render_pdf(_chapter([_wide_table()]), out, {"title": "T"})
+    assert res["path"] == out
+    doc = fitz.open(out)
+    try:
+        n_images = sum(len(page.get_images(full=True)) for page in doc)
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    assert n_images >= 1, "la tabla ancha no se rasterizó como imagen en el PDF"
+    # The cells are now inside the image, not selectable text. A unique cell value
+    # must therefore NOT appear as extractable text (it lives in the picture).
+    assert "v4_18" not in text, \
+        "la tabla ancha sigue como texto seleccionable (no se hizo imagen)"
+
+
+def test_narrow_table_stays_selectable_text_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "narrow.pdf")
+    render_pdf(_chapter([_narrow_table()]), out, {"title": "T"})
+    doc = fitz.open(out)
+    try:
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    # Narrow table is selectable text: its header/cells are extractable.
+    for v in ("a", "b", "c", "1", "6"):
+        assert v in text, f"la celda '{v}' debería ser texto seleccionable"
+
+
+def test_wide_table_rendered_as_picture_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    out = str(tmp_path / "wide.pptx")
+    res = render_pptx(_chapter([_wide_table()]), out, {"title": "T"})
+    assert res["path"] == out
+    prs = pptx.Presentation(out)
+    pics = sum(1 for s in prs.slides for sh in s.shapes
+               if sh.shape_type == MSO_SHAPE_TYPE.PICTURE)
+    assert pics >= 1, "la tabla ancha no se colocó como imagen en el PPTX"
+
+
+# --------------------------------------------------------------------------- #
+# 3) Group(layout="side_by_side"): two columns in PPTX, stacked in PDF.
+# --------------------------------------------------------------------------- #
+def _side_by_side_group():
+    return model.Group(
+        blocks=[model.Heading(text="Columna X", level=2),
+                _narrow_table(),
+                model.Figure(make=_simple_fig, caption="grafico")],
+        layout="side_by_side")
+
+
+def test_side_by_side_places_two_columns_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    from pptx.util import Inches
+    out = str(tmp_path / "sbs.pptx")
+    render_pptx(_chapter([_side_by_side_group()]), out, {"title": "T"})
+    prs = pptx.Presentation(out)
+    # Find the slide that holds the pair (table image + figure image).
+    centre_emu = int(Inches(13.333 / 2.0))
+    placed = False
+    for s in prs.slides:
+        lefts = [sh.left for sh in s.shapes
+                 if sh.shape_type == MSO_SHAPE_TYPE.PICTURE
+                 and sh.left is not None]
+        if len(lefts) >= 2:
+            # one picture starts in the left half, another in the right half.
+            if min(lefts) < centre_emu and max(lefts) > centre_emu:
+                placed = True
+                break
+    assert placed, \
+        "side_by_side no colocó tabla y figura en dos columnas de la misma slide"
+
+
+def test_side_by_side_stacks_in_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "sbs.pdf")
+    res = render_pdf(_chapter([_side_by_side_group()]), out, {"title": "T"})
+    assert res["path"] == out and res["n_pages"] >= 1
+    doc = fitz.open(out)
+    try:
+        n_images = sum(len(page.get_images(full=True)) for page in doc)
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    # PDF stacks: the narrow table stays selectable text (1 of its cells is
+    # extractable) and the figure is the single embedded image — not a 2-column
+    # pair of pictures like PPTX.
+    assert n_images == 1, "el PDF no debería usar el layout de dos imágenes"
+    assert "Columna X" in text and "1" in text, \
+        "la tabla del grupo debería seguir como texto apilado en el PDF"
+
+
+# --------------------------------------------------------------------------- #
+# 4) Backward compatibility — default layout stacks, fitting table unchanged.
+# --------------------------------------------------------------------------- #
+def test_group_default_layout_is_stack():
+    g = model.Group(blocks=[_narrow_table()])
+    assert g.layout == "stack", "el layout por defecto debe ser 'stack'"
+
+
+# --------------------------------------------------------------------------- #
+# 5) Clickable cover index ("Índice") → chapter first page/slide.
+# --------------------------------------------------------------------------- #
+def _doc_with_index():
+    portada = model.Chapter(id="portada", title="Portada", version="1.0.0",
+                            blocks=[model.Heading(text="Índice", level=2),
+                                    model.TocEntry(label="Distribuciones",
+                                                   target_id="Distribuciones")])
+    cap = model.Chapter(id="num", title="Distribuciones", version="1.0.0",
+                        blocks=[model.Markdown(text="contenido del capítulo")])
+    return [portada, cap]
+
+
+def test_cover_index_is_clickable_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "idx.pdf")
+    res = render_pdf(_doc_with_index(), out, {"title": "T"})
+    assert res["path"] == out
+    doc = fitz.open(out)
+    try:
+        # The cover (page 0) must carry a GOTO link jumping to a later page.
+        goto = [lk for lk in doc[0].get_links()
+                if lk.get("kind") == fitz.LINK_GOTO and lk.get("page", 0) > 0]
+    finally:
+        doc.close()
+    assert goto, "el índice de la portada no produjo enlaces clicables en el PDF"
+
+
+def test_cover_index_shows_heading_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "idxh.pdf")
+    render_pdf(_doc_with_index(), out, {"title": "T"})
+    doc = fitz.open(out)
+    try:
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    assert "Índice" in text, "la portada no muestra el encabezado 'Índice'"
+    assert "Este informe incluye" not in text, \
+        "la portada aún muestra el texto antiguo 'Este informe incluye'"
+
+
+def test_cover_index_is_clickable_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    out = str(tmp_path / "idx.pptx")
+    render_pptx(_doc_with_index(), out, {"title": "T"})
+    prs = pptx.Presentation(out)
+    cover_xml = prs.slides[0]._element.xml
+    assert "hlinksldjump" in cover_xml, \
+        "el índice de la portada no produjo un salto de slide nativo en el PPTX"
+
+
+def test_default_group_renders_like_before_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    out = str(tmp_path / "stack.pptx")
+    grp = model.Group(blocks=[model.Heading(text="Y", level=2),
+                              _narrow_table(),
+                              model.Figure(make=_simple_fig, caption="g")])
+    render_pptx(_chapter([grp]), out, {"title": "T"})
+    prs = pptx.Presentation(out)
+    # Stacked group: the narrow table is a NATIVE table (selectable), and there is
+    # exactly one picture (the figure) — not the two-image side-by-side layout.
+    n_tables = sum(1 for s in prs.slides for sh in s.shapes if sh.has_table)
+    n_pics = sum(1 for s in prs.slides for sh in s.shapes
+                 if sh.shape_type == MSO_SHAPE_TYPE.PICTURE)
+    assert n_tables >= 1, "el grupo apilado debería usar una tabla nativa"
+    assert n_pics == 1, "el grupo apilado no debería duplicar imágenes"
@@ -0,0 +1,77 @@
+---
+name: generate_synthetic_eda_folder
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def generate_synthetic_eda_folder(out_dir: str, n_rows: int = 2000, seed: int = 42) -> dict"
+description: "Genera una carpeta con 3 CSV RELACIONADOS (customers, orders, reviews) deterministas por seed (Faker + numpy) para ejercitar el motor AutomaticEDA multi-tabla / profile_database. orders.customer_id y reviews.customer_id estan contenidos al 100% en customers.customer_id (PK uuid), de modo que la deteccion FK por containment (min_inclusion=0.9) descubre ambas relaciones. customers es la tabla padre; reutiliza helpers de generate_synthetic_eda_table (texto multi-idioma, lat/lon validas, amount con outliers). Estilo dict-no-throw: nunca lanza."
+tags: [eda, synthetic, faker, testing, fixture, datascience]
+params:
+  - name: out_dir
+    desc: "Carpeta de salida. Se crea con mkdir -p si no existe. Recibe customers.csv, orders.csv y reviews.csv."
+  - name: n_rows
+    desc: "Numero de clientes (filas de customers). orders ~= 2*n_rows filas, reviews ~= n_rows filas. Default 2000."
+  - name: seed
+    desc: "Semilla para Faker (Faker.seed) y numpy (np.random.default_rng). Mismo seed -> CSVs identicos byte a byte. Default 42."
+output: "dict dict-no-throw. En exito {status:'ok', out_dir, files:{customers,orders,reviews}, n_customers, n_orders, n_reviews, expected_relations:[{from_table,from_col,to_table,to_col}, ...], seed}. En error (sin lanzar, p.ej. n_rows<=0) {status:'error', error:str}. expected_relations declara las 2 FK orders->customers y reviews->customers (ambas por customer_id)."
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: []
+tested: true
+tests: ["test_genera_ok_y_archivos", "test_determinismo_mismo_seed", "test_seeds_distintos_difieren", "test_fk_containment", "test_review_text_mediana_palabras", "test_n_rows_invalido"]
+test_file_path: "python/functions/datascience/generate_synthetic_eda_folder_test.py"
+file_path: "python/functions/datascience/generate_synthetic_eda_folder.py"
+---
+
+## Ejemplo
+
+```bash
+# Genera /tmp/eda_folder/{customers,orders,reviews}.csv (300 customers, seed 42)
+fn run generate_synthetic_eda_folder /tmp/eda_folder 300 42
+```
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience import generate_synthetic_eda_folder
+
+res = generate_synthetic_eda_folder("/tmp/eda_folder", n_rows=300, seed=42)
+# res["files"] -> {"customers": ".../customers.csv", "orders": ..., "reviews": ...}
+# res["expected_relations"] -> orders.customer_id y reviews.customer_id -> customers.customer_id
+# Luego perfila la carpeta/base con el grupo eda:
+#   fn run profile_database /tmp/eda_folder
+```
+
+## Cuando usarla
+
+- Cuando necesites un fixture REPRODUCIBLE multi-tabla para evaluar el EDA de carpeta/base (`profile_database`, join graph, capitulo de relaciones inter-tabla) con relaciones FK reales y detectables.
+- Cuando escribas tests de la deteccion de claves foraneas por containment: orders y reviews referencian customer_id contenido al 100% en customers (inclusion 1.0 >= min_inclusion 0.9).
+- Como contraparte multi-tabla de `generate_synthetic_eda_table` (que cubre el EDA de UNA tabla).
+
+## Gotchas
+
+- **Impura**: escribe 3 CSV a disco (`mkdir -p` de la carpeta). Sobrescribe los CSV existentes con el mismo nombre.
+- **Requiere `faker`, `numpy` y `pandas`** en el venv. Sin `faker` devuelve `{status:'error'}` (no lanza).
+- **El containment depende del orden**: customers se genera PRIMERO y orders/reviews muestrean sus `customer_id`. Si se invierte el orden, la FK deja de estar contenida y el detector no la encuentra.
+- **`signup_date`/`ts` se escriben como texto ISO en el CSV** (`YYYY-MM-DD` / `YYYY-MM-DD HH:MM:SS`): es CSV, todo es texto; el profiler los promociona a datetime al leerlos.
+- **Determinismo dependiente del orden de llamadas**: se siembra `Faker.seed(seed)` + `np.random.default_rng(seed)` al inicio; mismo seed -> CSVs identicos byte a byte.
+- **Reutiliza helpers privados** de `generate_synthetic_eda_table` (`_make_fakers`, `_make_latlon`, `_make_reviews`, `_amount_with_outliers`): no romper esas firmas sin actualizar esta funcion.
+
+## Notas
+
+Estructura generada:
+
+| Archivo | PK | FK | Columnas clave |
+|---|---|---|---|
+| customers.csv | customer_id (uuid) | — | name, country, signup_date, latitude, longitude, email |
+| orders.csv | order_id (uuid) | customer_id -> customers | amount (lognormal + outliers), category, ts |
+| reviews.csv | review_id (uuid) | customer_id -> customers | review_text (multi-idioma, mediana palabras>=20), rating (1..5) |
+
+orders tiene ~2x filas que customers y reviews ~1x. Todos los `customer_id` de orders
+y reviews estan contenidos en customers (containment ⊆), por lo que la deteccion FK por
+inclusion descubre las dos relaciones declaradas en `expected_relations`.
@@ -0,0 +1,177 @@
+"""generate_synthetic_eda_folder — fixture multi-tabla relacionado para el EDA de base/carpeta.
+
+Funcion impura (escribe CSVs a disco) y determinista por ``seed``: crea una
+carpeta con 3 CSV RELACIONADOS (customers, orders, reviews) cuyo contenido esta
+disenado para que el motor AutomaticEDA multi-tabla / `profile_database` detecte
+las relaciones FK por containment de valores (orders.customer_id y
+reviews.customer_id contenidos al 100% en customers.customer_id, por encima del
+``min_inclusion=0.9`` que usa la deteccion).
+
+Reutiliza los helpers de ``generate_synthetic_eda_table`` (texto multi-idioma,
+lat/lon validas, amount con outliers, listas fijas de paises/categorias) para no
+reimplementar logica.
+
+Estilo dict-no-throw del grupo `eda`: NUNCA lanza; devuelve
+``{"status": "error", "error": str}`` ante cualquier fallo.
+"""
+
+import os
+
+from .generate_synthetic_eda_table import (
+    _CATEGORIES,
+    _COUNTRIES,
+    _amount_with_outliers,
+    _make_fakers,
+    _make_latlon,
+    _make_reviews,
+)
+
+
+def generate_synthetic_eda_folder(out_dir, n_rows=2000, seed=42):
+    """Genera una carpeta con 3 CSV relacionados (customers/orders/reviews).
+
+    customers es la tabla padre (PK ``customer_id`` uuid unica). orders y reviews
+    referencian ``customer_id`` muestreandolo de customers, de modo que TODOS sus
+    valores estan contenidos en customers (inclusion 1.0 -> FK detectable).
+
+    Funcion impura (escribe a disco) y determinista por ``seed``. NUNCA lanza.
+
+    Args:
+        out_dir: carpeta de salida. Se crea con ``mkdir -p`` si no existe.
+        n_rows: numero de clientes (customers). orders ~= 2*n_rows, reviews ~= n_rows.
+            Default 2000.
+        seed: semilla para Faker y numpy. Default 42.
+
+    Returns:
+        dict dict-no-throw. En exito::
+
+            {"status": "ok", "out_dir": ..., "files": {customers, orders, reviews},
+             "n_customers": ..., "n_orders": ..., "n_reviews": ...,
+             "expected_relations": [{from_table, from_col, to_table, to_col}, ...],
+             "seed": seed}
+
+        En error (sin lanzar)::
+
+            {"status": "error", "error": str}
+    """
+    try:
+        import numpy as np
+        import pandas as pd
+
+        n = int(n_rows)
+        if n <= 0:
+            return {"status": "error", "error": f"n_rows debe ser > 0, dado {n_rows!r}"}
+
+        os.makedirs(out_dir, exist_ok=True)
+
+        fakers = _make_fakers(seed)
+        rng = np.random.default_rng(seed)
+
+        # ---------------- customers (tabla padre) ----------------
+        n_cust = n
+        customer_ids = [fakers["en_US"].uuid4() for _ in range(n_cust)]
+        names = [fakers["en_US"].name() for _ in range(n_cust)]
+        cust_country = rng.choice(_COUNTRIES, n_cust)
+        base = np.datetime64("2022-01-01")
+        signup_offsets = rng.integers(0, 730, n_cust)
+        signup_date = pd.to_datetime(base) + pd.to_timedelta(signup_offsets, unit="D")
+        signup_iso = [d.strftime("%Y-%m-%d") for d in signup_date]
+        lat, lon = _make_latlon(cust_country, rng)
+        cust_email = [fakers["en_US"].email() for _ in range(n_cust)]
+
+        customers = pd.DataFrame(
+            {
+                "customer_id": customer_ids,
+                "name": names,
+                "country": cust_country,
+                "signup_date": signup_iso,
+                "latitude": lat,
+                "longitude": lon,
+                "email": cust_email,
+            }
+        )
+
+        # ---------------- orders (FK -> customers) ----------------
+        n_orders = n_cust * 2
+        order_ids = [fakers["en_US"].uuid4() for _ in range(n_orders)]
+        order_cust = rng.choice(customer_ids, n_orders)  # subset/multiset de customers
+        amount = _amount_with_outliers(n_orders, rng, n_extreme=10)
+        order_cat = rng.choice(_CATEGORIES, n_orders)
+        ts_offsets = rng.integers(0, 730 * 24 * 3600, n_orders)
+        ts = pd.to_datetime(np.datetime64("2022-01-01T00:00:00")) + pd.to_timedelta(
+            ts_offsets, unit="s"
+        )
+        ts_iso = [t.strftime("%Y-%m-%d %H:%M:%S") for t in ts]
+
+        orders = pd.DataFrame(
+            {
+                "order_id": order_ids,
+                "customer_id": order_cust,
+                "amount": amount,
+                "category": order_cat,
+                "ts": ts_iso,
+            }
+        )
+
+        # ---------------- reviews (FK -> customers) ----------------
+        n_reviews = n_cust
+        review_ids = [fakers["en_US"].uuid4() for _ in range(n_reviews)]
+        # Subconjunto de customers (no todos) -> containment estricto ⊆ customers.
+        rev_cust = rng.choice(customer_ids, n_reviews)
+        review_text = _make_reviews(n_reviews, rng, fakers, null_frac=0.0)
+        rating = rng.integers(1, 6, n_reviews)
+
+        reviews = pd.DataFrame(
+            {
+                "review_id": review_ids,
+                "customer_id": rev_cust,
+                "review_text": review_text,
+                "rating": rating,
+            }
+        )
+
+        files = {
+            "customers": os.path.join(out_dir, "customers.csv"),
+            "orders": os.path.join(out_dir, "orders.csv"),
+            "reviews": os.path.join(out_dir, "reviews.csv"),
+        }
+        customers.to_csv(files["customers"], index=False)
+        orders.to_csv(files["orders"], index=False)
+        reviews.to_csv(files["reviews"], index=False)
+
+        return {
+            "status": "ok",
+            "out_dir": out_dir,
+            "files": files,
+            "n_customers": n_cust,
+            "n_orders": n_orders,
+            "n_reviews": n_reviews,
+            "expected_relations": [
+                {
+                    "from_table": "orders",
+                    "from_col": "customer_id",
+                    "to_table": "customers",
+                    "to_col": "customer_id",
+                },
+                {
+                    "from_table": "reviews",
+                    "from_col": "customer_id",
+                    "to_table": "customers",
+                    "to_col": "customer_id",
+                },
+            ],
+            "seed": seed,
+        }
+    except Exception as exc:  # noqa: BLE001 — dict-no-throw del grupo eda.
+        return {"status": "error", "error": str(exc)}
+
+
+if __name__ == "__main__":
+    import json
+    import sys
+
+    args = sys.argv[1:]
+    out = args[0] if len(args) > 0 else "/tmp/synthetic_eda_folder"
+    rows = int(args[1]) if len(args) > 1 else 2000
+    sd = int(args[2]) if len(args) > 2 else 42
+    print(json.dumps(generate_synthetic_eda_folder(out, rows, sd), indent=2))
@@ -0,0 +1,74 @@
+"""Tests para generate_synthetic_eda_folder."""
+
+import os
+import statistics
+
+import pandas as pd
+
+from datascience.generate_synthetic_eda_folder import generate_synthetic_eda_folder
+
+
+def test_genera_ok_y_archivos(tmp_path):
+    out = str(tmp_path / "folder")
+    res = generate_synthetic_eda_folder(out, n_rows=300, seed=42)
+    assert res["status"] == "ok"
+    assert res["n_customers"] == 300
+    assert res["n_orders"] == 600
+    assert res["n_reviews"] == 300
+    for key in ("customers", "orders", "reviews"):
+        assert os.path.exists(res["files"][key])
+    # Relaciones esperadas declaradas.
+    rels = {(r["from_table"], r["to_table"]) for r in res["expected_relations"]}
+    assert ("orders", "customers") in rels
+    assert ("reviews", "customers") in rels
+
+
+def test_determinismo_mismo_seed(tmp_path):
+    out1 = str(tmp_path / "f1")
+    out2 = str(tmp_path / "f2")
+    generate_synthetic_eda_folder(out1, n_rows=250, seed=11)
+    generate_synthetic_eda_folder(out2, n_rows=250, seed=11)
+    for name in ("customers.csv", "orders.csv", "reviews.csv"):
+        a = open(os.path.join(out1, name), "rb").read()
+        b = open(os.path.join(out2, name), "rb").read()
+        assert a == b, f"{name} difiere entre dos generaciones con el mismo seed"
+
+
+def test_seeds_distintos_difieren(tmp_path):
+    out1 = str(tmp_path / "f1")
+    out2 = str(tmp_path / "f2")
+    generate_synthetic_eda_folder(out1, n_rows=250, seed=11)
+    generate_synthetic_eda_folder(out2, n_rows=250, seed=12)
+    a = open(os.path.join(out1, "customers.csv"), "rb").read()
+    b = open(os.path.join(out2, "customers.csv"), "rb").read()
+    assert a != b
+
+
+def test_fk_containment(tmp_path):
+    out = str(tmp_path / "folder")
+    res = generate_synthetic_eda_folder(out, n_rows=300, seed=42)
+    customers = pd.read_csv(res["files"]["customers"])
+    orders = pd.read_csv(res["files"]["orders"])
+    reviews = pd.read_csv(res["files"]["reviews"])
+    cust_ids = set(customers["customer_id"])
+    # Todos los customer_id de orders y reviews ⊆ customers.
+    assert set(orders["customer_id"]) <= cust_ids
+    assert set(reviews["customer_id"]) <= cust_ids
+    # customer_id es PK unica en customers.
+    assert customers["customer_id"].is_unique
+    assert orders["order_id"].is_unique
+    assert reviews["review_id"].is_unique
+
+
+def test_review_text_mediana_palabras(tmp_path):
+    out = str(tmp_path / "folder")
+    res = generate_synthetic_eda_folder(out, n_rows=300, seed=42)
+    reviews = pd.read_csv(res["files"]["reviews"])
+    words = [len(str(t).split()) for t in reviews["review_text"].dropna()]
+    assert statistics.median(words) >= 20
+
+
+def test_n_rows_invalido(tmp_path):
+    out = str(tmp_path / "folder")
+    res = generate_synthetic_eda_folder(out, n_rows=0, seed=42)
+    assert res["status"] == "error"
@@ -0,0 +1,82 @@
+---
+name: generate_synthetic_eda_table
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def generate_synthetic_eda_table(out_db_path: str, table: str = 'synthetic', n_rows: int = 2000, seed: int = 42) -> dict"
+description: "Genera una tabla DuckDB sintetica (Faker + numpy, determinista por seed) cuyo contenido esta disenado para ACTIVAR el maximo de capitulos del motor AutomaticEDA del grupo eda: numericas continuas con correlacion lineal/no-lineal, numericas con outliers, categoricas desbalanceadas, texto libre multi-idioma con duplicados, fecha para serie temporal, lat/lon validas, semanticos/PII (uuid/email/iban/phone) y nulos con patron MCAR/MAR. Fixture para evaluar el EDA de punta a punta. Estilo dict-no-throw: nunca lanza."
+tags: [eda, synthetic, faker, testing, fixture, datascience]
+params:
+  - name: out_db_path
+    desc: "Ruta al archivo DuckDB de salida. Se crea (o reutiliza) y la tabla se reemplaza con CREATE OR REPLACE TABLE si ya existe."
+  - name: table
+    desc: "Nombre de la tabla a crear. Se valida contra ^[A-Za-z_][A-Za-z0-9_]*$ y se cita en el DDL. Default 'synthetic'."
+  - name: n_rows
+    desc: "Numero de filas (clientes unicos). Cada fila es un cliente con id/email/iban/phone propios. Default 2000."
+  - name: seed
+    desc: "Semilla para Faker (Faker.seed) y numpy (np.random.default_rng). Mismo seed -> tabla identica byte a byte. Default 42."
+output: "dict dict-no-throw. En exito {status:'ok', db_path, table, n_rows, columns:[19 nombres de columna], seed}. En error (sin lanzar, p.ej. nombre de tabla invalido o n_rows<=0) {status:'error', error:str}. Columnas: customer_id,email,iban,phone,income,spending,age,risk_score,tenure_months,engagement_quad,amount,n_purchases,country,category,plan,review,signup_date,latitude,longitude."
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: []
+tested: true
+tests: ["test_genera_ok_y_columnas", "test_determinismo_mismo_seed", "test_seeds_distintos_difieren", "test_latlon_en_rango", "test_plan_solo_niveles_validos", "test_income_spending_co_nulos", "test_review_mediana_palabras_y_signup_datetime", "test_phone_matchea_regex_internacional", "test_outliers_y_correlaciones", "test_tabla_invalida_devuelve_error"]
+test_file_path: "python/functions/datascience/generate_synthetic_eda_table_test.py"
+file_path: "python/functions/datascience/generate_synthetic_eda_table.py"
+---
+
+## Ejemplo
+
+```bash
+# Genera /tmp/x.duckdb con la tabla `synthetic` (2000 filas, seed 42)
+fn run generate_synthetic_eda_table /tmp/x.duckdb synthetic 2000 42
+```
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience import generate_synthetic_eda_table
+
+res = generate_synthetic_eda_table("/tmp/x.duckdb", "synthetic", n_rows=2000, seed=42)
+# res == {"status":"ok", "db_path":"/tmp/x.duckdb", "table":"synthetic",
+#         "n_rows":2000, "columns":[...19...], "seed":42}
+# Luego perfilala con el grupo eda:
+#   fn run profile_table /tmp/x.duckdb synthetic
+```
+
+## Cuando usarla
+
+- Cuando necesites un dataset de prueba REPRODUCIBLE para evaluar el motor AutomaticEDA de punta a punta: su contenido dispara, a proposito, num_distr, cat_distr, text_distr, correlacion, missingness (MCAR/MAR), modelos (PCA/KMeans/outliers), timeseries, geospatial, calidad, agregacion y los detectores semanticos / PII (`infer_semantic_type`).
+- Cuando escribas tests de capitulos del EDA y quieras una tabla con una columna que active CADA detector sin montar datos a mano.
+- Cuando quieras un fixture determinista (mismo seed -> misma tabla) para comparar el render del EDA entre versiones.
+
+## Gotchas
+
+- **Impura**: escribe a disco (crea/reutiliza el archivo DuckDB). Reemplaza la tabla destino con `CREATE OR REPLACE`.
+- **Requiere `faker`, `duckdb`, `numpy` y `pandas`** instalados en el venv. Sin `faker` la generacion devuelve `{status:'error'}` (no lanza).
+- **`signup_date` queda como TIMESTAMP/DATE en DuckDB** (se construye con `datetime64[ns]`), NO VARCHAR — condicion para que `detect_time_column` la elija y se active el capitulo timeseries. Si fuese VARCHAR, el detector de fecha fallaria.
+- **El texto de `review` debe superar el gate de text_distr**: media de caracteres >= 50 y mediana de palabras >= 20. Por eso cada review concatena dos parrafos Faker (~50 palabras de mediana); no reducir el numero de frases o el capitulo text_distr no activa.
+- **Determinismo dependiente del orden de llamadas**: se siembra `Faker.seed(seed)` + `np.random.default_rng(seed)` al inicio; cambiar el orden de las extracciones cambia la salida aunque el seed sea el mismo.
+- **PII real-istica**: `email`/`iban`/`phone`/`customer_id` matchean los regex de `infer_semantic_type` (email/iban/phone_intl/uuid) al 100%; son datos sinteticos de Faker, no personas reales.
+
+## Notas
+
+Mapa columna -> detector que activa:
+
+| Columna(s) | Tipo | Detector / capitulo |
+|---|---|---|
+| income, spending | num continua | correlacion POSITIVA fuerte (Pearson > 0.8) |
+| age, risk_score | num continua | correlacion NEGATIVA |
+| tenure_months, engagement_quad | num continua | relacion NO LINEAL (cuadratica) |
+| amount, n_purchases | num + outliers | num_distr / outliers (cola pesada + extremos inyectados) |
+| country (12), category (6), plan (3 desbalanceado) | categorica | cat_distr / agregacion (entropia baja en plan) |
+| review | texto libre multi-idioma | text_distr (len_mean>=50, mediana palabras>=20) + duplicados exactos |
+| signup_date | DATE/TIMESTAMP | timeseries |
+| latitude, longitude | num [-90,90]/[-180,180] | geospatial (detect_latlon_columns) |
+| customer_id, email, iban, phone | texto | semantic_type uuid/email/iban/phone_intl (PII) |
+| income+spending (co-nulos 12%), risk_score (nulo si plan=alta), review (8%) | nulos con patron | missingness MCAR/MAR |
@@ -0,0 +1,314 @@
+"""generate_synthetic_eda_table — fixture sintetico para ejercitar el motor AutomaticEDA.
+
+Funcion impura (escribe un archivo DuckDB a disco) y determinista por ``seed``:
+construye una unica tabla cuyo CONTENIDO esta disenado para ACTIVAR el maximo
+numero de capitulos del motor AutomaticEDA del grupo `eda` (num_distr, cat_distr,
+text_distr, correlacion, missingness, modelos, timeseries, geospatial, relaciones,
+calidad, agregacion) y los detectores semanticos / PII (`infer_semantic_type`).
+
+Estilo dict-no-throw del grupo `eda`: NUNCA lanza; captura cualquier error y
+devuelve ``{"status": "error", "error": str}``.
+
+Determinismo: con el mismo ``seed`` el DataFrame y, por tanto, la tabla DuckDB
+resultante son identicos byte a byte. Se siembra Faker (``Faker.seed``) y numpy
+(``np.random.default_rng(seed)``) al inicio de cada generacion.
+"""
+
+import re
+
+# Lista fija de paises (12 -> cardinalidad media para cat_distr / agregacion).
+_COUNTRIES = [
+    "ES", "FR", "DE", "IT", "PT", "NL",
+    "BE", "US", "GB", "IE", "SE", "PL",
+]
+
+# Lista fija de categorias de producto (6 -> cardinalidad media).
+_CATEGORIES = [
+    "electronics", "clothing", "home", "sports", "books", "toys",
+]
+
+# Niveles de plan con probabilidades DESBALANCEADAS (entropia baja para cat_distr).
+_PLANS = ["baja", "media", "alta"]
+_PLAN_PROBS = [0.70, 0.25, 0.05]
+
+# Centroides (lat, lon) aproximados por pais: muestrean coordenadas validas
+# dentro de [-90, 90] x [-180, 180] para que detect_latlon_columns las acepte.
+_CENTROIDS = {
+    "ES": (40.4, -3.7), "FR": (46.6, 2.2), "DE": (51.1, 10.4), "IT": (41.9, 12.5),
+    "PT": (39.4, -8.2), "NL": (52.1, 5.3), "BE": (50.5, 4.5), "US": (39.0, -98.0),
+    "GB": (54.0, -2.0), "IE": (53.4, -8.0), "SE": (60.1, 18.6), "PL": (52.0, 19.1),
+}
+
+# Locales rotados para generar texto multi-idioma (es/en/fr).
+_TEXT_LOCALES = ["es_ES", "en_US", "fr_FR"]
+
+# Identificador SQL valido (DuckDB no parametriza el nombre de tabla en DDL).
+_IDENT_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
+
+
+def _make_fakers(seed):
+    """Crea los Faker por locale tras sembrar el generador compartido.
+
+    ``Faker.seed(seed)`` siembra el ``random.Random`` compartido por todas las
+    instancias Faker que usan el generador por defecto, asi que el orden de
+    llamadas determina por completo la salida (determinismo).
+    """
+    from faker import Faker
+
+    Faker.seed(seed)
+    es_es, en_us, fr_fr = (Faker(loc) for loc in _TEXT_LOCALES)
+    return {"es_ES": es_es, "en_US": en_us, "fr_FR": fr_fr}
+
+
+# Texto duplicado canonico (multi-idioma, > 20 palabras) que se inyecta en una
+# fraccion de las filas para que el analisis de duplicados exactos lo detecte.
+_DUP_REVIEW = (
+    "Servicio excelente y entrega muy rapida, el producto llego en perfecto "
+    "estado y coincide con la descripcion publicada en la tienda. The customer "
+    "support team answered every question quickly and the packaging was solid "
+    "and well protected during shipping. Je recommande vivement ce vendeur a "
+    "tous mes amis, la qualite est vraiment au rendez-vous cette fois."
+)
+
+
+def _make_reviews(n, rng, fakers, dup_frac=0.04, null_frac=0.08):
+    """Genera ``n`` reviews de texto libre largo multi-idioma (es/en/fr).
+
+    Cada review concatena dos parrafos de Faker en el idioma rotado por fila, de
+    modo que la MEDIANA de palabras por documento queda muy por encima de 20 y la
+    media de caracteres por encima de 50 (gates del capitulo text_distr). Se
+    inyectan duplicados exactos (``dup_frac``) y nulos (``null_frac``).
+
+    Devuelve una ``list`` de ``str`` o ``None`` (nulos) de longitud ``n``.
+    """
+    # Numero de frases por parrafo precomputado con numpy (determinista) para no
+    # interleavar draws de rng dentro del bucle de faker.
+    nb1 = rng.integers(4, 8, n)
+    nb2 = rng.integers(3, 7, n)
+
+    reviews = []
+    for i in range(n):
+        fk = fakers[_TEXT_LOCALES[i % 3]]
+        p1 = fk.paragraph(nb_sentences=int(nb1[i]))
+        p2 = fk.paragraph(nb_sentences=int(nb2[i]))
+        reviews.append(f"{p1} {p2}")
+
+    # Duplicados exactos: una fraccion de filas comparte un review identico.
+    if n > 0 and dup_frac > 0:
+        k_dup = max(1, int(n * dup_frac))
+        dup_idx = rng.choice(n, size=min(k_dup, n), replace=False)
+        for j in dup_idx:
+            reviews[int(j)] = _DUP_REVIEW
+
+    # Nulos MCAR-ish: una fraccion de filas al azar queda en None.
+    if n > 0 and null_frac > 0:
+        k_null = max(1, int(n * null_frac))
+        null_idx = rng.choice(n, size=min(k_null, n), replace=False)
+        for j in null_idx:
+            reviews[int(j)] = None
+
+    return reviews
+
+
+def _make_phone_intl(rng):
+    """Construye un telefono en formato internacional que casa phone_intl.
+
+    Regex objetivo (fullmatch): ``\\+\\d[\\d\\s()-]{6,}\\d``. Empieza por '+',
+    digito, bloques de digitos separados por espacios y termina en digito.
+    """
+    cc = int(rng.integers(1, 99))
+    a = int(rng.integers(100, 999))
+    b = int(rng.integers(100, 999))
+    c = int(rng.integers(100, 999))
+    return f"+{cc} {a} {b} {c}"
+
+
+def _make_latlon(countries, rng):
+    """Devuelve (latitudes, longitudes) muestreando centroides de pais + jitter.
+
+    Mantiene los valores dentro de [-90, 90] y [-180, 180] (validez exigida por
+    detect_latlon_columns). El jitter es pequeno para no salirse del rango.
+    """
+    import numpy as np
+
+    lats = np.empty(len(countries), dtype=float)
+    lons = np.empty(len(countries), dtype=float)
+    jitter_lat = rng.normal(0.0, 0.5, len(countries))
+    jitter_lon = rng.normal(0.0, 0.5, len(countries))
+    for i, code in enumerate(countries):
+        base_lat, base_lon = _CENTROIDS[code]
+        lats[i] = float(np.clip(base_lat + jitter_lat[i], -90.0, 90.0))
+        lons[i] = float(np.clip(base_lon + jitter_lon[i], -180.0, 180.0))
+    return lats, lons
+
+
+def _amount_with_outliers(n, rng, n_extreme=6, factor=50.0):
+    """Serie lognormal de cola pesada con ~``n_extreme`` outliers altos (x``factor``)."""
+    import numpy as np
+
+    amount = rng.lognormal(mean=4.0, sigma=1.0, size=n)
+    if n > 0 and n_extreme > 0:
+        idx = rng.choice(n, size=min(n_extreme, n), replace=False)
+        amount[idx] = amount[idx] * factor
+    return amount
+
+
+def generate_synthetic_eda_table(
+    out_db_path, table="synthetic", n_rows=2000, seed=42
+):
+    """Genera una tabla DuckDB sintetica que activa el maximo de capitulos del EDA.
+
+    Construye un DataFrame de ``n_rows`` clientes unicos con columnas elegidas para
+    disparar detectores concretos del motor AutomaticEDA (numericas continuas con
+    correlaciones lineal/no-lineal, numericas con outliers, categoricas
+    desbalanceadas, texto libre multi-idioma con duplicados, fecha para serie
+    temporal, lat/lon validas, semanticos/PII y nulos con patron MCAR/MAR), y la
+    materializa en ``out_db_path`` con ``CREATE OR REPLACE TABLE``.
+
+    Funcion impura (escribe a disco) y determinista por ``seed``: con el mismo
+    seed la tabla resultante es identica byte a byte. NUNCA lanza.
+
+    Args:
+        out_db_path: ruta al archivo DuckDB de salida. Se crea (o reutiliza) y la
+            tabla se reemplaza si ya existe.
+        table: nombre de la tabla a crear. Se valida contra
+            ``^[A-Za-z_][A-Za-z0-9_]*$`` y se cita en el DDL.
+        n_rows: numero de filas (clientes unicos). Default 2000.
+        seed: semilla para Faker y numpy. Default 42.
+
+    Returns:
+        dict dict-no-throw. En exito::
+
+            {"status": "ok", "db_path": out_db_path, "table": table,
+             "n_rows": n_rows, "columns": [<nombres de columna>], "seed": seed}
+
+        En error (sin lanzar)::
+
+            {"status": "error", "error": str}
+    """
+    try:
+        import duckdb
+        import numpy as np
+        import pandas as pd
+
+        if not _IDENT_RE.match(table or ""):
+            return {
+                "status": "error",
+                "error": (
+                    f"nombre de tabla invalido: {table!r} "
+                    "(debe casar con ^[A-Za-z_][A-Za-z0-9_]*$)"
+                ),
+            }
+        n = int(n_rows)
+        if n <= 0:
+            return {"status": "error", "error": f"n_rows debe ser > 0, dado {n_rows!r}"}
+
+        fakers = _make_fakers(seed)
+        rng = np.random.default_rng(seed)
+
+        # --- Numericas continuas (distinct alto, correlaciones) ---
+        income = np.clip(rng.normal(40000.0, 12000.0, n), 1000.0, None)
+        spending = income * 0.35 + rng.normal(0.0, 2000.0, n)  # corr POSITIVA fuerte
+        age = rng.integers(18, 91, n)
+        risk_score = 90.0 - age * 0.7 + rng.normal(0.0, 5.0, n)  # corr NEGATIVA con age
+        tenure_months = rng.uniform(0.0, 60.0, n)
+        engagement_quad = ((tenure_months - 30.0) ** 2) / 30.0 + rng.normal(0.0, 1.0, n)
+
+        # --- Numericas con outliers claros ---
+        amount = _amount_with_outliers(n, rng)
+        n_purchases = rng.poisson(3.0, n).astype(float)
+        if n > 0:
+            k_hi = min(max(1, int(n * 0.002)) + 2, n)  # ~3-5 valores altisimos
+            hi_idx = rng.choice(n, size=k_hi, replace=False)
+            n_purchases[hi_idx] = rng.integers(200, 400, len(hi_idx)).astype(float)
+
+        # --- Categoricas ---
+        country = rng.choice(_COUNTRIES, n)
+        category = rng.choice(_CATEGORIES, n)
+        plan = rng.choice(_PLANS, n, p=_PLAN_PROBS)
+
+        # --- Texto libre multi-idioma con duplicados ---
+        review = _make_reviews(n, rng, fakers)
+
+        # --- Fecha / serie temporal (rango ~2 anios, cadencia ~diaria) ---
+        base = np.datetime64("2022-01-01")
+        offsets = rng.integers(0, 730, n)
+        signup_date = pd.to_datetime(base) + pd.to_timedelta(offsets, unit="D")
+
+        # --- Geo lat/lon validas ---
+        latitude, longitude = _make_latlon(country, rng)
+
+        # --- Semanticos / PII (>=80% match para infer_semantic_type) ---
+        customer_id = [fakers["en_US"].uuid4() for _ in range(n)]
+        email = [fakers["en_US"].email() for _ in range(n)]
+        iban = [fakers["en_US"].iban() for _ in range(n)]
+        phone = [_make_phone_intl(rng) for _ in range(n)]
+
+        df = pd.DataFrame(
+            {
+                "customer_id": customer_id,
+                "email": email,
+                "iban": iban,
+                "phone": phone,
+                "income": income,
+                "spending": spending,
+                "age": age,
+                "risk_score": risk_score,
+                "tenure_months": tenure_months,
+                "engagement_quad": engagement_quad,
+                "amount": amount,
+                "n_purchases": n_purchases,
+                "country": country,
+                "category": category,
+                "plan": plan,
+                "review": review,
+                "signup_date": signup_date,
+                "latitude": latitude,
+                "longitude": longitude,
+            }
+        )
+
+        # --- Nulos con patron ---
+        # income + spending faltan JUNTAS en las MISMAS filas (co-ocurrencia -> MAR).
+        k_co = max(1, int(n * 0.12))
+        co_idx = rng.choice(n, size=min(k_co, n), replace=False)
+        df.loc[co_idx, "income"] = np.nan
+        df.loc[co_idx, "spending"] = np.nan
+        # risk_score falta cuando plan == "alta" (mas una pizca de azar) -> MAR.
+        risk_mask = (df["plan"] == "alta").to_numpy() | (rng.random(n) < 0.02)
+        df.loc[risk_mask, "risk_score"] = np.nan
+
+        columns = list(df.columns)
+
+        con = duckdb.connect(out_db_path)
+        try:
+            con.register("df_synth_eda", df)
+            con.execute(
+                f'CREATE OR REPLACE TABLE "{table}" AS SELECT * FROM df_synth_eda'
+            )
+            con.unregister("df_synth_eda")
+        finally:
+            con.close()
+
+        return {
+            "status": "ok",
+            "db_path": out_db_path,
+            "table": table,
+            "n_rows": n,
+            "columns": columns,
+            "seed": seed,
+        }
+    except Exception as exc:  # noqa: BLE001 — dict-no-throw del grupo eda.
+        return {"status": "error", "error": str(exc)}
+
+
+if __name__ == "__main__":
+    import json
+    import sys
+
+    args = sys.argv[1:]
+    db_path = args[0] if len(args) > 0 else "/tmp/synthetic_eda.duckdb"
+    tbl = args[1] if len(args) > 1 else "synthetic"
+    rows = int(args[2]) if len(args) > 2 else 2000
+    sd = int(args[3]) if len(args) > 3 else 42
+    print(json.dumps(generate_synthetic_eda_table(db_path, tbl, rows, sd), indent=2))
@@ -0,0 +1,129 @@
+"""Tests para generate_synthetic_eda_table."""
+
+import os
+import re
+import statistics
+
+import duckdb
+
+from datascience.generate_synthetic_eda_table import generate_synthetic_eda_table
+
+_EXPECTED_COLS = [
+    "customer_id", "email", "iban", "phone", "income", "spending", "age",
+    "risk_score", "tenure_months", "engagement_quad", "amount", "n_purchases",
+    "country", "category", "plan", "review", "signup_date", "latitude", "longitude",
+]
+_PHONE_RE = re.compile(r"\+\d[\d\s()-]{6,}\d")
+
+
+def _load(db_path, table="synthetic"):
+    con = duckdb.connect(db_path, read_only=True)
+    try:
+        return con.execute(f'SELECT * FROM "{table}"').fetch_df()
+    finally:
+        con.close()
+
+
+def test_genera_ok_y_columnas(tmp_path):
+    db = str(tmp_path / "t.duckdb")
+    res = generate_synthetic_eda_table(db, "synthetic", n_rows=500, seed=42)
+    assert res["status"] == "ok"
+    assert res["table"] == "synthetic"
+    assert res["n_rows"] == 500
+    assert res["columns"] == _EXPECTED_COLS
+    assert os.path.exists(db)
+    df = _load(db)
+    assert list(df.columns) == _EXPECTED_COLS
+    assert len(df) == 500
+
+
+def test_determinismo_mismo_seed(tmp_path):
+    db1 = str(tmp_path / "a.duckdb")
+    db2 = str(tmp_path / "b.duckdb")
+    generate_synthetic_eda_table(db1, "synthetic", n_rows=400, seed=7)
+    generate_synthetic_eda_table(db2, "synthetic", n_rows=400, seed=7)
+    df1 = _load(db1).astype(str)
+    df2 = _load(db2).astype(str)
+    # Misma semilla -> tabla identica fila a fila.
+    assert df1.equals(df2)
+
+
+def test_seeds_distintos_difieren(tmp_path):
+    db1 = str(tmp_path / "a.duckdb")
+    db2 = str(tmp_path / "b.duckdb")
+    generate_synthetic_eda_table(db1, "synthetic", n_rows=400, seed=7)
+    generate_synthetic_eda_table(db2, "synthetic", n_rows=400, seed=8)
+    df1 = _load(db1).astype(str)
+    df2 = _load(db2).astype(str)
+    assert not df1.equals(df2)
+
+
+def test_latlon_en_rango(tmp_path):
+    db = str(tmp_path / "t.duckdb")
+    generate_synthetic_eda_table(db, "synthetic", n_rows=500, seed=42)
+    df = _load(db)
+    assert df["latitude"].between(-90, 90).all()
+    assert df["longitude"].between(-180, 180).all()
+
+
+def test_plan_solo_niveles_validos(tmp_path):
+    db = str(tmp_path / "t.duckdb")
+    generate_synthetic_eda_table(db, "synthetic", n_rows=500, seed=42)
+    df = _load(db)
+    assert set(df["plan"].unique()) <= {"baja", "media", "alta"}
+
+
+def test_income_spending_co_nulos(tmp_path):
+    db = str(tmp_path / "t.duckdb")
+    generate_synthetic_eda_table(db, "synthetic", n_rows=600, seed=42)
+    df = _load(db)
+    inc_null = df["income"].isna()
+    sp_null = df["spending"].isna()
+    # income y spending faltan exactamente en las MISMAS filas.
+    assert (inc_null == sp_null).all()
+    assert inc_null.sum() > 0
+
+
+def test_review_mediana_palabras_y_signup_datetime(tmp_path):
+    db = str(tmp_path / "t.duckdb")
+    generate_synthetic_eda_table(db, "synthetic", n_rows=500, seed=42)
+    df = _load(db)
+    words = [len(str(r).split()) for r in df["review"].dropna()]
+    assert statistics.median(words) >= 20
+    # signup_date debe ser datetime/date en DuckDB (no VARCHAR).
+    con = duckdb.connect(db, read_only=True)
+    try:
+        dtype = con.execute(
+            "SELECT column_type FROM (DESCRIBE synthetic) WHERE column_name='signup_date'"
+        ).fetchone()[0]
+    finally:
+        con.close()
+    assert dtype.upper().startswith(("DATE", "TIMESTAMP"))
+
+
+def test_phone_matchea_regex_internacional(tmp_path):
+    db = str(tmp_path / "t.duckdb")
+    generate_synthetic_eda_table(db, "synthetic", n_rows=500, seed=42)
+    df = _load(db)
+    phones = [p for p in df["phone"].tolist() if p is not None]
+    assert all(_PHONE_RE.fullmatch(str(p)) for p in phones)
+
+
+def test_outliers_y_correlaciones(tmp_path):
+    db = str(tmp_path / "t.duckdb")
+    generate_synthetic_eda_table(db, "synthetic", n_rows=800, seed=42)
+    df = _load(db)
+    # amount tiene cola con outliers altos evidentes.
+    assert df["amount"].max() > df["amount"].median() * 20
+    # correlacion positiva fuerte income~spending y negativa age~risk_score.
+    sub = df[["income", "spending"]].dropna()
+    assert sub["income"].corr(sub["spending"]) > 0.8
+    sub2 = df[["age", "risk_score"]].dropna()
+    assert sub2["age"].corr(sub2["risk_score"]) < -0.6
+
+
+def test_tabla_invalida_devuelve_error(tmp_path):
+    db = str(tmp_path / "t.duckdb")
+    res = generate_synthetic_eda_table(db, "bad name;", n_rows=10, seed=42)
+    assert res["status"] == "error"
+    assert "invalido" in res["error"]
@@ -0,0 +1,121 @@
+---
+id: render_table_as_figure_py_datascience
+name: render_table_as_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def render_table_as_figure(header, rows, title=None, note=None, fontsize=9.0, max_cell_chars=40) -> \"matplotlib.figure.Figure\""
+description: "Dibuja un bloque tabular (cabecera + filas) como una matplotlib.figure.Figure nítida, lista para rasterizar a DPI alto. Pensada para tablas que NO caben como texto en una página/slide del informe EDA: se rasteriza a alta resolución (el caller usa dpi=220, bbox_inches='tight') y el usuario hace zoom en el móvil para leerla entera sin perder datos. Cabecera sombreada (#eef3f6) y en negrita, filas pares (1-based) con zebra suave (#f6f8fa), tinta oscura (#1b1b1b) sobre blanco, rejilla gris muy fina (#cccccc). Trunca cada celda a max_cell_chars con elipsis y str()-ea cada valor (None -> \"\"). figsize proporcional al contenido (ancho por nº y longitud de columnas, alto por nº de filas) para que sea legible con zoom. Backend Agg sin pyplot global. Defensiva: header/rows vacíos o None, filas irregulares o cualquier error interno devuelven una Figure placeholder con texto centrado \"(tabla no disponible)\". NUNCA lanza."
+tags: [eda, table, figure, matplotlib, visualization, rasterize, zoom, render, datascience, impure]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib]
+example: |
+  from datascience.render_table_as_figure import render_table_as_figure
+  header = ["columna", "n_nulos", "%_nulos", "distintos", "tipo", "ejemplo"]
+  rows = [
+      ["ingresos", 12, "1.2%", 980, "float64", "2345.67"],
+      ["edad", 0, "0.0%", 88, "int64", "37"],
+      ["ciudad", 5, "0.5%", 412, "object", "Madrid"],
+  ]
+  fig = render_table_as_figure(header, rows, title="Resumen de columnas",
+                               note="rasteriza a dpi=220 y haz zoom")
+  fig.savefig("/tmp/tabla.png", dpi=220, bbox_inches="tight")
+tested: true
+tests:
+  - "test_returns_figure_with_table"
+  - "test_rows_none_does_not_raise"
+  - "test_header_none_does_not_raise"
+  - "test_empty_lists_return_placeholder_figure"
+  - "test_both_none_return_placeholder_figure"
+  - "test_long_cell_is_truncated"
+  - "test_none_cells_become_empty_strings"
+  - "test_can_rasterize_to_png_high_dpi"
+  - "test_placeholder_can_rasterize"
+  - "test_ragged_rows_are_padded"
+test_file_path: "python/functions/datascience/render_table_as_figure_test.py"
+file_path: "python/functions/datascience/render_table_as_figure.py"
+params:
+  - name: header
+    desc: "Lista de nombres de columna (puede ser [] o None). Cada nombre se str()-ea, se trunca a max_cell_chars y se pinta en la fila cabecera sombreada en negrita. Si está vacío/None no se dibuja fila de cabecera (solo cuerpo)."
+  - name: rows
+    desc: "Lista de filas; cada fila es una lista de celdas con valores cualesquiera (se str()-ean; None -> \"\"). Admite None (se trata como []), filas escalares (se envuelven en una celda) y filas de distinta longitud (la rejilla se rectangulariza al ancho máximo, rellenando con celdas vacías). Saltos de línea/tabs en una celda se colapsan a espacios para que no desborde a otras filas."
+  - name: title
+    desc: "Título opcional dibujado encima de la tabla, en negrita tinta #1b1b1b, alineado a la izquierda. None o \"\" => sin título. Default None."
+  - name: note
+    desc: "Nota opcional al pie de la figura, en gris #8a8a8a e itálica. None o \"\" => sin nota. Default None."
+  - name: fontsize
+    desc: "Tamaño de fuente base (pt) de las celdas del cuerpo. La cabecera usa fontsize+3 y la nota max(7, fontsize-1). Un valor no numérico o <= 0 cae a 9.0. Default 9.0."
+  - name: max_cell_chars
+    desc: "Trunca el texto de cada celda a este nº de chars (con … final cuando se recorta) para que el ancho no explote. Un valor no entero cae a 40; <= 0 deja las celdas vacías. Default 40."
+output: "Un matplotlib.figure.Figure (figsize proporcional al contenido: ancho ≈ 0.9-1.6\" por columna según su texto, total acotado a 3-26\"; alto ≈ 0.32\" por fila + cabecera + espacio para título/nota, acotado) con un Axes sin ejes que contiene un ax.table(...) NO cerrado. Cabecera fondo #eef3f6 texto #1b1b1b bold; filas pares (1-based) zebra #f6f8fa, impares blanco; tinta #1b1b1b; bordes/rejilla #cccccc lw 0.4; texto alineado a la izquierda. Título encima (bold) y nota debajo (gris itálica) si se pasan. Si header/rows son vacíos o None, o ante cualquier error interno, devuelve una Figure placeholder pequeña con el texto centrado \"(tabla no disponible)\". NUNCA lanza. El caller la rasteriza (dpi=220, bbox_inches='tight') y la cierra; la función no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience.render_table_as_figure import render_table_as_figure
+
+# Tabla que no cabe como texto en la slide -> se rasteriza y se lee con zoom.
+header = ["columna", "n_nulos", "%_nulos", "distintos", "tipo", "ejemplo"]
+rows = [
+    ["ingresos", 12, "1.2%", 980, "float64", "2345.67"],
+    ["edad", 0, "0.0%", 88, "int64", "37"],
+    ["ciudad", 5, "0.5%", 412, "object", "Madrid"],
+    ["categoria_producto", 0, "0.0%", 1840, "object",
+     "un_valor_categorico_muy_largo_que_se_trunca"],
+]
+
+fig = render_table_as_figure(
+    header,
+    rows,
+    title="Resumen de columnas",
+    note="rasteriza a dpi=220 y haz zoom en el móvil",
+    fontsize=9.0,
+    max_cell_chars=40,
+)
+
+# El renderer del informe lo rasteriza a alta resolución; aquí lo persistimos.
+fig.savefig("/tmp/tabla.png", dpi=220, bbox_inches="tight")
+```
+
+## Cuando usarla
+
+Úsala en un informe EDA cuando una tabla **no cabe como texto** en una página o
+slide y prefieres una imagen nítida que el lector pueda ampliar en el móvil para
+leerla entera (perfiles de columnas, matrices de conteo, tablas de frecuencias
+con muchas filas o columnas anchas). Pásale la cabecera y las filas tal cual (los
+valores se `str()`-ean por ti) más un `title`/`note` opcionales; el llamante la
+rasteriza a `dpi=220` con `bbox_inches='tight'`. Es la pareja "tabla-como-imagen"
+de los gráficos `build_boxplots_figure` / `categorical_top_pie_figure`: misma
+paleta y mismo contrato (Agg, sin `pyplot`, el caller cierra la figura).
+
+## Gotchas
+
+- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
+  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
+  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
+  es thread-safe; esta función construye el `Figure` directamente, así que es
+  segura de llamar en bucle desde el renderer.
+- **El caller cierra la figura.** Devuelve el `Figure` pero no lo muestra ni lo
+  guarda. Quien la consume debe rasterizarla y luego liberarla
+  (`matplotlib.pyplot.close(fig)`) para no acumular memoria en lotes grandes.
+- **Pensada para rasterizar a DPI alto.** El `figsize` es proporcional al
+  contenido pero la legibilidad real viene del DPI: rasteriza con `dpi=220` y
+  `bbox_inches='tight'`. Una tabla con muchísimas filas crece en alto (capado a
+  ~60") — para miles de filas, parte la tabla o resume antes de pasarla.
+- **Truncación de celda visible.** Cada celda se recorta a `max_cell_chars`
+  (default 40) con `…` final y los saltos de línea/tabs se colapsan a espacios,
+  para que ninguna celda desborde a otras filas. Sube `max_cell_chars` si
+  necesitas ver el valor completo (a costa de ancho).
+- **Defensiva, nunca lanza.** `header`/`rows` vacíos o `None`, filas escalares,
+  filas de distinta longitud o cualquier error interno se manejan sin propagar:
+  en el peor caso devuelve una `Figure` placeholder con "(tabla no disponible)".
+  No envuelvas la llamada en try/except por miedo a un raise — no lo hay.
@@ -0,0 +1,241 @@
+"""Impure EDA helper: a crisp table rendered as a matplotlib Figure (`eda` group).
+
+Draws a tabular block (header + rows) as a sharp ``matplotlib.figure.Figure``
+ready to be rasterized at high DPI, so a table that does NOT fit as text on a
+page/slide can still be read in full by zooming into the rasterized image on a
+phone. The header is shaded and bold, even rows carry a soft zebra stripe, the
+ink is dark on white and the grid is very thin.
+
+Impure because it touches matplotlib's rendering machinery. It uses the headless
+Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
+global state and is safe to call repeatedly from a report renderer. It is fully
+defensive and NEVER raises: empty/invalid input or any internal error returns a
+small placeholder figure carrying a centered "(tabla no disponible)".
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+from matplotlib.figure import Figure  # noqa: E402
+
+# Palette shared with the EDA report renderer so the document stays coherent.
+_HEADER_BG = "#eef3f6"   # header cell background.
+_HEADER_TEXT = "#1b1b1b"  # header cell text (bold).
+_ZEBRA_BG = "#f6f8fa"    # even (1-based) row background stripe.
+_BODY_BG = "#ffffff"     # odd row background.
+_INK = "#1b1b1b"         # body text + title ink.
+_GRID = "#cccccc"        # cell borders / grid (thin).
+_NOTE_TEXT = "#8a8a8a"   # muted gray for the note (italic).
+
+
+def _placeholder_figure(message: str = "(tabla no disponible)") -> "Figure":
+    """Return a small fallback ``Figure`` carrying a single centered message."""
+    fig = Figure(figsize=(6.0, 1.6), dpi=150)
+    ax = fig.add_subplot(111)
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=11,
+        color=_NOTE_TEXT,
+        style="italic",
+        wrap=True,
+        transform=ax.transAxes,
+    )
+    fig.tight_layout()
+    return fig
+
+
+def _cell_text(value, max_cell_chars: int) -> str:
+    """``str()`` a cell value defensively, None -> "", truncate with an ellipsis."""
+    s = "" if value is None else str(value)
+    # Collapse newlines/tabs so a single cell never spills across table rows.
+    s = s.replace("\n", " ").replace("\r", " ").replace("\t", " ")
+    try:
+        limit = int(max_cell_chars)
+    except (TypeError, ValueError):
+        limit = 40
+    if limit <= 0:
+        return ""
+    if len(s) <= limit:
+        return s
+    if limit == 1:
+        return "…"
+    return s[: limit - 1] + "…"
+
+
+def render_table_as_figure(
+    header,
+    rows,
+    title=None,
+    note=None,
+    fontsize=9.0,
+    max_cell_chars=40,
+):
+    """Dibuja una tabla nítida como matplotlib.figure.Figure, lista para rasterizar a DPI alto.
+
+    Pensada para tablas que NO caben como texto en una página/slide: se rasteriza
+    a alta resolución y el usuario hace zoom en el móvil para leerla entera sin
+    perder datos. Cabecera sombreada + negrita, filas pares con zebra suave,
+    tinta oscura sobre blanco, rejilla muy fina.
+
+    Args:
+        header: lista de nombres de columna (puede ser []).
+        rows: lista de filas; cada fila es una lista de celdas (valores cualquiera, se str()-ean).
+        title: título opcional dibujado encima de la tabla (o None).
+        note: nota opcional en gris/itálica bajo la tabla (o None).
+        fontsize: tamaño de fuente base (pt) de las celdas.
+        max_cell_chars: trunca el texto de celda a este nº de chars (con … final) para que no explote el ancho.
+
+    Returns:
+        matplotlib.figure.Figure — NO cerrada (el llamante la rasteriza y la cierra).
+        Nunca lanza: ante cualquier error devuelve una Figure con el texto "(tabla no disponible)".
+    """
+    try:
+        # --- Defensive normalization of header/rows into a rectangular grid.
+        header_list = list(header) if isinstance(header, (list, tuple)) else []
+        raw_rows = list(rows) if isinstance(rows, (list, tuple)) else []
+
+        clean_rows = []
+        for row in raw_rows:
+            if isinstance(row, (list, tuple)):
+                clean_rows.append(list(row))
+            elif row is None:
+                clean_rows.append([])
+            else:
+                # A scalar row becomes a single-cell row instead of being dropped.
+                clean_rows.append([row])
+
+        # Nothing to draw at all -> placeholder.
+        if not header_list and not clean_rows:
+            return _placeholder_figure()
+
+        # Number of columns = widest of header / any row.
+        n_cols = len(header_list)
+        for row in clean_rows:
+            if len(row) > n_cols:
+                n_cols = len(row)
+        if n_cols <= 0:
+            return _placeholder_figure()
+
+        # Base font size, tolerate a bad value.
+        try:
+            base_fs = float(fontsize)
+        except (TypeError, ValueError):
+            base_fs = 9.0
+        if base_fs <= 0:
+            base_fs = 9.0
+
+        # --- Build the truncated, padded text matrix.
+        header_cells = [
+            _cell_text(header_list[c] if c < len(header_list) else "", max_cell_chars)
+            for c in range(n_cols)
+        ]
+        body_cells = []
+        for row in clean_rows:
+            body_cells.append(
+                [
+                    _cell_text(row[c] if c < len(row) else "", max_cell_chars)
+                    for c in range(n_cols)
+                ]
+            )
+
+        has_header = any(t for t in header_cells)
+        n_body = len(body_cells)
+        # Total drawn table rows (header counts as one when present).
+        n_table_rows = n_body + (1 if has_header else 0)
+        if n_table_rows <= 0:
+            return _placeholder_figure()
+
+        # --- figsize proportional to content so it reads under zoom.
+        # Width: per-column width scales with the longest text in that column,
+        # clamped to a sensible per-column range, total capped.
+        per_col_widths = []
+        for c in range(n_cols):
+            col_texts = [header_cells[c]] if has_header else []
+            col_texts += [body_cells[r][c] for r in range(n_body)]
+            longest = max((len(t) for t in col_texts), default=0)
+            # ~0.085" per char at the base font, clamped to [0.9, 1.6] inches.
+            w = 0.9 + 0.085 * max(longest - 6, 0)
+            w = max(0.9, min(1.6, w))
+            per_col_widths.append(w)
+        fig_w = sum(per_col_widths)
+        fig_w = max(3.0, min(26.0, fig_w))
+
+        # Height: ~0.32" per row + room for title / note.
+        fig_h = 0.32 * n_table_rows + 0.30
+        if title is not None and str(title) != "":
+            fig_h += 0.45
+        if note is not None and str(note) != "":
+            fig_h += 0.30
+        fig_h = max(1.0, min(60.0, fig_h))
+
+        fig = Figure(figsize=(fig_w, fig_h), dpi=150)
+        ax = fig.add_subplot(111)
+        ax.axis("off")
+
+        # Reserve vertical bands for the optional title (top) and note (bottom)
+        # so the table itself never overlaps them.
+        title_band = 0.10 if (title is not None and str(title) != "") else 0.0
+        note_band = 0.07 if (note is not None and str(note) != "") else 0.0
+        table_bbox = [0.0, note_band, 1.0, max(0.05, 1.0 - title_band - note_band)]
+
+        cell_text = ([header_cells] if has_header else []) + body_cells
+
+        col_widths = [w / fig_w for w in per_col_widths]
+
+        table = ax.table(
+            cellText=cell_text,
+            colWidths=col_widths,
+            cellLoc="left",
+            loc="center",
+            bbox=table_bbox,
+        )
+        table.auto_set_font_size(False)
+        table.set_fontsize(base_fs)
+
+        # --- Style every cell: zebra body, shaded bold header, thin gray grid.
+        for (r, _c), cell in table.get_celld().items():
+            cell.set_edgecolor(_GRID)
+            cell.set_linewidth(0.4)
+            # Small horizontal padding so text does not touch the border.
+            cell.PAD = 0.04
+            if has_header and r == 0:
+                cell.set_facecolor(_HEADER_BG)
+                cell.set_text_props(color=_HEADER_TEXT, fontweight="bold", ha="left")
+            else:
+                body_index = r - 1 if has_header else r  # 0-based body row.
+                # 1-based even rows get the zebra stripe.
+                is_even = ((body_index + 1) % 2) == 0
+                cell.set_facecolor(_ZEBRA_BG if is_even else _BODY_BG)
+                cell.set_text_props(color=_INK, ha="left")
+
+        if title is not None and str(title) != "":
+            ax.set_title(
+                str(title),
+                fontsize=base_fs + 3.0,
+                fontweight="bold",
+                color=_INK,
+                loc="left",
+                pad=8,
+            )
+
+        if note is not None and str(note) != "":
+            fig.text(
+                0.01,
+                0.01,
+                str(note),
+                ha="left",
+                va="bottom",
+                fontsize=max(7.0, base_fs - 1.0),
+                color=_NOTE_TEXT,
+                style="italic",
+            )
+
+        return fig
+    except Exception:  # noqa: BLE001 — never raise from a figure builder.
+        return _placeholder_figure()
@@ -0,0 +1,119 @@
+"""Tests para render_table_as_figure (tabla nítida como Figure, grupo eda).
+
+Usa el backend Agg sin display; no muestra ni guarda figuras a disco salvo a un
+BytesIO en memoria. Cada test cierra explícitamente la Figure construida
+(matplotlib.pyplot.close) para no acumular estado entre tests.
+"""
+
+from io import BytesIO
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from render_table_as_figure import render_table_as_figure
+
+
+def _grid(n_cols, n_rows):
+    """Cabecera de n_cols columnas + n_rows filas de celdas."""
+    header = [f"col_{c}" for c in range(n_cols)]
+    rows = [[f"r{r}c{c}" for c in range(n_cols)] for r in range(n_rows)]
+    return header, rows
+
+
+def test_returns_figure_with_table():
+    header, rows = _grid(6, 5)
+    fig = render_table_as_figure(header, rows, title="Tabla", note="nota al pie")
+    assert isinstance(fig, Figure)
+    # Hay al menos un Axes y ese Axes contiene una tabla con celdas.
+    assert len(fig.axes) >= 1
+    ax = fig.axes[0]
+    assert len(ax.tables) >= 1
+    # 6 columnas x (1 cabecera + 5 filas) = 36 celdas.
+    assert len(ax.tables[0].get_celld()) == 6 * (5 + 1)
+    plt.close(fig)
+
+
+def test_rows_none_does_not_raise():
+    fig = render_table_as_figure(["a", "b"], None)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_header_none_does_not_raise():
+    fig = render_table_as_figure(None, [["x", "y"], ["z", "w"]])
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_empty_lists_return_placeholder_figure():
+    fig = render_table_as_figure([], [])
+    assert isinstance(fig, Figure)
+    # Placeholder: un Axes con texto, sin tabla.
+    assert len(fig.axes) >= 1
+    assert len(fig.axes[0].tables) == 0
+    plt.close(fig)
+
+
+def test_both_none_return_placeholder_figure():
+    fig = render_table_as_figure(None, None)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes[0].tables) == 0
+    plt.close(fig)
+
+
+def test_long_cell_is_truncated():
+    long_value = "x" * 200
+    header, _ = _grid(2, 0)
+    fig = render_table_as_figure(header, [[long_value, "ok"]], max_cell_chars=20)
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    texts = [c.get_text().get_text() for c in ax.tables[0].get_celld().values()]
+    # La celda larga aparece truncada con elipsis y nunca en su forma completa.
+    assert any(t.endswith("…") and len(t) <= 20 for t in texts)
+    assert long_value not in texts
+    plt.close(fig)
+
+
+def test_none_cells_become_empty_strings():
+    fig = render_table_as_figure(["a", "b"], [[None, "v"], ["w", None]])
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    texts = [c.get_text().get_text() for c in ax.tables[0].get_celld().values()]
+    # Hay celdas vacías (los None) y celdas con valor.
+    assert "" in texts
+    assert "v" in texts
+    plt.close(fig)
+
+
+def test_can_rasterize_to_png_high_dpi():
+    header, rows = _grid(6, 8)
+    fig = render_table_as_figure(header, rows, title="Render", note="zoom me")
+    buf = BytesIO()
+    # No debe lanzar al rasterizar a DPI alto con bbox tight.
+    fig.savefig(buf, format="png", dpi=220, bbox_inches="tight")
+    assert buf.getbuffer().nbytes > 0
+    plt.close(fig)
+
+
+def test_placeholder_can_rasterize():
+    fig = render_table_as_figure([], [])
+    buf = BytesIO()
+    fig.savefig(buf, format="png", dpi=220, bbox_inches="tight")
+    assert buf.getbuffer().nbytes > 0
+    plt.close(fig)
+
+
+def test_ragged_rows_are_padded():
+    # Filas de distinta longitud: la rejilla se rectangulariza al ancho máximo.
+    fig = render_table_as_figure(["a", "b", "c"], [["1"], ["1", "2", "3", "4"]])
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    # 4 columnas (la fila más ancha) x (1 cabecera + 2 filas) = 12 celdas.
+    assert len(ax.tables[0].get_celld()) == 4 * (2 + 1)
+    plt.close(fig)
@@ -0,0 +1,466 @@
+"""Batería de tests de ACEPTACIÓN del AutomaticEDA — "que cada AEDA salga como queremos".
+
+Esta suite es la red de seguridad del subsistema EDA del grupo `eda`: garantiza
+que CADA capítulo de un informe AutomaticEDA sale poblado y con su contenido
+esencial, que la feature de capítulos sueltos (``only_chapters``) resuelve sus
+dependencias de cómputo, que los capítulos opcionales devuelven None cuando no
+aplican, que el informe de carpeta multi-tabla detecta la FK, y que el Markdown
+trae el apéndice completo (matriz de asociación entera + describe con
+skew/kurtosis). A diferencia de los tests unitarios de cada capítulo, aquí se
+ejercita el pipeline END-TO-END sobre un dataset sintético determinista que
+activa todos los capítulos a la vez.
+
+Determinismo: el dataset se genera con ``seed`` fijo y el pipeline corre sin LLM
+(``profile_level='standard'``), de modo que el manifest y el Markdown son
+reproducibles entre corridas. Un único render `standard` se reutiliza vía un
+fixture de scope module para no repetir el cómputo caro.
+
+dict-no-throw: los pipelines del grupo `eda` nunca lanzan; aquí se asserta sobre
+``status == 'ok'`` y luego sobre el contenido concreto del manifest / Markdown.
+
+Honestidad (DoD): los asserts comprueban CONTENIDO real (texto esencial de cada
+capítulo), no solo el heading. Si un capítulo dejara de emitir su contenido (un
+cambio rompiera la distribución numérica, el Isolation Forest, la matriz de
+correlación completa, …), el test correspondiente FALLA nombrando el capítulo y
+el fragmento ausente — no se ablanda para que pase.
+"""
+
+import json
+import os
+import subprocess
+import sys
+
+import pytest
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+from datascience.automatic_eda import CHAPTER_ORDER  # noqa: E402
+from datascience.generate_synthetic_eda_folder import (  # noqa: E402
+    generate_synthetic_eda_folder,
+)
+from datascience.generate_synthetic_eda_table import (  # noqa: E402
+    generate_synthetic_eda_table,
+)
+from pipelines.render_automatic_eda import render_automatic_eda  # noqa: E402
+from pipelines.render_automatic_eda_folder import (  # noqa: E402
+    render_automatic_eda_folder,
+)
+
+# --------------------------------------------------------------------------- #
+# Parámetros deterministas del fixture de oro.
+# --------------------------------------------------------------------------- #
+SEED = 42
+N_ROWS = 800
+TABLE = "synthetic"
+
+# El capítulo `analisis_llm` SOLO se computa con run_llm=True; en el preset
+# `standard` (sin LLM, lo que esta suite usa) no debe aparecer. Por eso los
+# capítulos esperados en un informe `standard` son todos los de CHAPTER_ORDER
+# MENOS analisis_llm. CHAPTER_ORDER es la fuente de verdad de los 16 capítulos
+# del motor (portada … glosario).
+LLM_ONLY_CHAPTERS = {"analisis_llm"}
+EXPECTED_STANDARD = [c for c in CHAPTER_ORDER if c not in LLM_ONLY_CHAPTERS]
+
+
+def _pdf_text(path):
+    """Texto del PDF vía pdftotext, o None si la herramienta no está disponible."""
+    try:
+        out = subprocess.run(
+            ["pdftotext", "-layout", path, "-"],
+            capture_output=True, text=True, timeout=60,
+        )
+        return out.stdout if out.returncode == 0 else None
+    except Exception:  # noqa: BLE001 — la verificación principal es sobre el MD.
+        return None
+
+
+def _manifest_chapters(result):
+    """Set de ids de capítulo presentes en el manifest del resultado."""
+    with open(result["manifest_path"], encoding="utf-8") as fh:
+        return set((json.load(fh).get("chapters") or {}).keys())
+
+
+# --------------------------------------------------------------------------- #
+# Fixtures de scope module: el dataset sintético se genera UNA vez y el render
+# `standard` se computa UNA vez; todos los tests de contenido lo reutilizan.
+# --------------------------------------------------------------------------- #
+@pytest.fixture(scope="module")
+def synth_db(tmp_path_factory):
+    """Tabla sintética determinista que activa los 16 capítulos del motor."""
+    d = tmp_path_factory.mktemp("aeda_accept_synth")
+    db = str(d / "synthetic.duckdb")
+    g = generate_synthetic_eda_table(db, TABLE, n_rows=N_ROWS, seed=SEED)
+    assert g["status"] == "ok", g.get("error")
+    return {"db": db, "table": TABLE, "gen": g}
+
+
+@pytest.fixture(scope="module")
+def standard_run(synth_db, tmp_path_factory):
+    """Render AutomaticEDA `standard` (sin LLM) sobre el dataset sintético.
+
+    Devuelve el dict del pipeline más el manifest cargado, el texto del Markdown
+    y el del PDF (si pdftotext está). Reutilizado por la mayoría de los tests.
+    """
+    out = str(tmp_path_factory.mktemp("aeda_accept_std"))
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        profile_level="standard", out_dir=out, basename="synth_std",
+    )
+    assert r["status"] == "ok", r.get("error")
+    with open(r["manifest_path"], encoding="utf-8") as fh:
+        manifest = json.load(fh)
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    return {
+        "r": r,
+        "manifest": manifest,
+        "chapters": manifest.get("chapters") or {},
+        "md": md,
+        "pdf_text": _pdf_text(r["pdf_path"]),
+    }
+
+
+@pytest.fixture(scope="module")
+def minimal_db(tmp_path_factory):
+    """Tabla mínima SIN texto libre, SIN fecha y SIN lat/lon.
+
+    Sirve para comprobar que text_distr / timeseries / geospatial devuelven None
+    (no aparecen en el manifest) y el EDA no peta. Solo numéricas continuas +
+    una categórica de baja cardinalidad.
+    """
+    import random
+
+    import duckdb
+
+    d = tmp_path_factory.mktemp("aeda_accept_min")
+    db = str(d / "minimal.duckdb")
+    con = duckdb.connect(db)
+    con.execute("CREATE TABLE minimal (a DOUBLE, b DOUBLE, c INTEGER, grp VARCHAR)")
+    random.seed(7)
+    rows = [
+        (round(random.gauss(10, 2), 3), round(random.gauss(50, 5), 3),
+         random.randint(1, 100), ["x", "y", "z"][i % 3])
+        for i in range(120)
+    ]
+    con.executemany("INSERT INTO minimal VALUES (?,?,?,?)", rows)
+    con.close()
+    return {"db": db, "table": "minimal"}
+
+
+# --------------------------------------------------------------------------- #
+# 1) COBERTURA DE CAPÍTULOS (golden) — el manifest standard trae los 15
+#    capítulos no-LLM esperados, ninguno falta, y analisis_llm NO sale sin LLM.
+# --------------------------------------------------------------------------- #
+def test_standard_cubre_todos_los_capitulos_esperados(standard_run):
+    chapters = set(standard_run["chapters"].keys())
+    expected = set(EXPECTED_STANDARD)
+    missing = expected - chapters
+    assert not missing, (
+        "capítulos esperados ausentes del manifest standard: "
+        f"{sorted(missing)} (presentes: {sorted(chapters)})"
+    )
+    # analisis_llm requiere run_llm=True: en standard NO debe aparecer.
+    assert "analisis_llm" not in chapters, (
+        "analisis_llm apareció sin LLM: el preset standard no debería computarlo"
+    )
+
+
+def test_manifest_top_level_es_valido(standard_run):
+    """El manifest declara el motor y un dict de capítulos con metadatos por id."""
+    man = standard_run["manifest"]
+    assert man.get("engine") == "AutomaticEDA"
+    assert man.get("engine_version")
+    chapters = standard_run["chapters"]
+    # Cada capítulo trae version + nº de páginas/slides (formato del manifest).
+    for cid, meta in chapters.items():
+        assert meta.get("version"), f"capítulo {cid} sin version en el manifest"
+        assert (meta.get("n_pages") or 0) > 0, f"capítulo {cid} con 0 páginas"
+
+
+# --------------------------------------------------------------------------- #
+# 2) CONTENIDO CLAVE POR CAPÍTULO (acceptance) — cada capítulo trae su contenido
+#    ESENCIAL en el Markdown, no solo el heading. Un fragmento ausente nombra el
+#    capítulo y el texto que falta.
+# --------------------------------------------------------------------------- #
+# Fragmentos de texto ESTABLE que cada capítulo emite en el Markdown del dataset
+# sintético. No son números frágiles: son etiquetas/estructura del capítulo más
+# nombres de columna del fixture. Si un capítulo deja de poblar su contenido, su
+# fragmento desaparece y el test falla nombrándolo.
+CHAPTER_NEEDLES = {
+    "portada":      ["800 filas", "19 columnas"],
+    "overview":     ["Primeras filas (df.head)", "Diccionario de columnas",
+                     "customer_id", "signup_date"],
+    "num_distr":    ["Distribuciones numéricas", "vallas Tukey", "income"],
+    "cat_distr":    ["Distribuciones categóricas", "Entropía", "Top categorías",
+                     "country"],
+    "text_distr":   ["Texto libre (NLP)", "TTR", "Términos más frecuentes",
+                     "Idioma dominante"],
+    "calidad":      ["Cómo se calcula la calidad", "Calidad global"],
+    "missingness":  ["Datos faltantes", "Celdas faltantes (global)",
+                     "Faltantes por columna"],
+    "outliers":     ["Valores atípicos por columna", "Filas atípicas (multivariante)",
+                     "Isolation Forest", "Filas analizadas"],
+    "correlacion":  ["Matriz de asociación", "Pares más correlacionados"],
+    "relaciones":   ["Candidatas a clave primaria", "customer_id"],
+    "modelos":      ["PCA — varianza explicada", "Segmentación (KMeans)"],
+    "timeseries":   ["Series temporales", "Columna de fecha", "signup_date"],
+    "geospatial":   ["Análisis geoespacial", "Extensión geográfica", "Centroide"],
+    "agregacion":   ["Agregación por grupos", "Agrupado por"],
+    "glosario":     ["Glosario de términos",
+                     "### Isolation Forest (anomalías multivariantes)",
+                     "### PCA (componentes principales)"],
+}
+
+
+def test_needles_cubren_exactamente_los_capitulos_standard():
+    """Guard de mantenimiento: las needles cubren los mismos 15 capítulos no-LLM.
+
+    Si alguien añade un capítulo nuevo a CHAPTER_ORDER, este test recuerda que
+    hay que documentar su contenido esencial aquí (o marcarlo como LLM-only)."""
+    assert set(CHAPTER_NEEDLES.keys()) == set(EXPECTED_STANDARD), (
+        "CHAPTER_NEEDLES desincronizado con los capítulos esperados de standard: "
+        f"falta needles para {set(EXPECTED_STANDARD) - set(CHAPTER_NEEDLES)}, "
+        f"sobra {set(CHAPTER_NEEDLES) - set(EXPECTED_STANDARD)}"
+    )
+
+
+@pytest.mark.parametrize("chapter_id", list(CHAPTER_NEEDLES.keys()))
+def test_capitulo_trae_su_contenido_esencial(standard_run, chapter_id):
+    md = standard_run["md"]
+    # Pre-condición: el capítulo está en el manifest (cobertura). Si no, es un
+    # fallo de cobertura, no de contenido — se reporta como tal.
+    assert chapter_id in standard_run["chapters"], (
+        f"capítulo {chapter_id} ausente del manifest (fallo de cobertura)"
+    )
+    for needle in CHAPTER_NEEDLES[chapter_id]:
+        assert needle in md, (
+            f"capítulo '{chapter_id}': falta su contenido esencial en el Markdown "
+            f"— fragmento ausente: {needle!r}"
+        )
+
+
+def test_outliers_isolation_forest_poblado_no_degradado(standard_run):
+    """El bloque multivariante (Isolation Forest) sale con datos, no degradado."""
+    md = standard_run["md"]
+    assert "Anomalías multivariantes" in md
+    assert "Filas analizadas" in md, "el Isolation Forest no trae su tabla poblada"
+    assert "No se pudo analizar la anomalía multivariante" not in md, (
+        "el bloque multivariante salió degradado en el informe completo"
+    )
+    # El perfil trae el bloque de modelos con los outliers multivariantes.
+    models = (standard_run["r"]["profile"] or {}).get("models") or {}
+    assert models.get("outliers") is not None, "profile['models']['outliers'] vacío"
+
+
+# --------------------------------------------------------------------------- #
+# 3) CAPÍTULOS SUELTOS CON DEPS RESUELTAS (acceptance de only_chapters) — pedir
+#    un capítulo suelto lo deja POBLADO porque la resolución de dependencias
+#    activa el cómputo que necesita, aunque el caller no lo pidiera.
+# --------------------------------------------------------------------------- #
+def test_only_outliers_isolation_forest_poblado(synth_db, tmp_path):
+    """only=['outliers'] sin run_models explícito → IsolationForest poblado."""
+    out = str(tmp_path / "only_out")
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        only_chapters=["outliers"], out_dir=out, basename="only_outliers",
+    )
+    assert r["status"] == "ok", r.get("error")
+    # Documento = portada + outliers + glosario, nada más.
+    assert _manifest_chapters(r) == {"portada", "outliers", "glosario"}
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Filas atípicas (multivariante)" in md
+    assert "Filas analizadas" in md, "Isolation Forest sin tabla poblada"
+    assert "No se pudo analizar la anomalía multivariante" not in md, (
+        "el multivariante salió degradado pese a resolver las deps"
+    )
+    # La resolución activó run_models → el perfil trae el bloque de modelos.
+    assert ((r["profile"] or {}).get("models") or {}).get("outliers") is not None
+
+
+def test_only_timeseries_rango_temporal_presente(synth_db, tmp_path):
+    """only=['timeseries'] → rango temporal poblado (run_series resuelto)."""
+    out = str(tmp_path / "only_ts")
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        only_chapters=["timeseries"], out_dir=out, basename="only_ts",
+    )
+    assert r["status"] == "ok", r.get("error")
+    assert "timeseries" in _manifest_chapters(r)
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Columna de fecha" in md
+    assert "signup_date" in md, "la serie no nombra su columna de fecha"
+    # run_series resuelto por deps → el perfil trae el análisis de serie.
+    assert (r["profile"] or {}).get("series") is not None, (
+        "only=['timeseries'] debe activar run_series por dependencias"
+    )
+
+
+def test_only_correlacion_scatters_presentes(synth_db, tmp_path):
+    """only=['correlacion'] → matriz + scatters de los pares fuertes."""
+    out = str(tmp_path / "only_corr")
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        only_chapters=["correlacion"], out_dir=out, basename="only_corr",
+    )
+    assert r["status"] == "ok", r.get("error")
+    assert _manifest_chapters(r) == {"portada", "correlacion", "glosario"}
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Matriz de asociación" in md
+    assert "Relaciones más fuertes (scatter)" in md, "faltan los scatters"
+    assert "Dispersión de" in md, "no se emitió ninguna figura de dispersión"
+
+
+# --------------------------------------------------------------------------- #
+# 4) NONE CUANDO NO APLICA — sobre una tabla sin texto largo, sin fecha y sin
+#    lat/lon, text_distr / timeseries / geospatial NO aparecen y el EDA no peta.
+# --------------------------------------------------------------------------- #
+def test_capitulos_opcionales_ausentes_cuando_no_aplican(minimal_db, tmp_path):
+    out = str(tmp_path / "minimal_out")
+    r = render_automatic_eda(
+        minimal_db["db"], minimal_db["table"],
+        profile_level="standard", out_dir=out, basename="minimal",
+    )
+    assert r["status"] == "ok", r.get("error")
+    chapters = _manifest_chapters(r)
+    for absent in ("text_distr", "timeseries", "geospatial"):
+        assert absent not in chapters, (
+            f"capítulo {absent} apareció en una tabla que no lo justifica "
+            f"(presentes: {sorted(chapters)})"
+        )
+    # El documento sigue siendo válido: portada + glosario + capítulos que sí
+    # aplican (overview/num_distr/correlacion al menos).
+    assert {"portada", "glosario", "overview", "num_distr"} <= chapters
+
+
+# --------------------------------------------------------------------------- #
+# 5) FOLDER MULTI-TABLA (acceptance) — el informe de carpeta perfila las N tablas
+#    y el capítulo de relaciones detecta la FK por containment.
+# --------------------------------------------------------------------------- #
+def test_folder_multitabla_con_fk_detectada(tmp_path):
+    fdir = str(tmp_path / "folder")
+    g = generate_synthetic_eda_folder(fdir, n_rows=300, seed=SEED)
+    assert g["status"] == "ok", g.get("error")
+
+    out = str(tmp_path / "fout")
+    rf = render_automatic_eda_folder(fdir, out_dir=out, basename="folder")
+    assert rf["status"] == "ok", rf.get("error")
+
+    # Las 3 tablas se perfilaron.
+    assert rf["n_tables"] == 3, f"esperadas 3 tablas, vistas {rf['n_tables']}"
+
+    # El manifest base trae el capítulo de relaciones inter-tabla.
+    with open(rf["manifest_path"], encoding="utf-8") as fh:
+        chapters = set((json.load(fh).get("chapters") or {}).keys())
+    assert "relaciones" in chapters, (
+        f"el documento de carpeta no incluye el capítulo de relaciones: {chapters}"
+    )
+
+    # El Markdown nombra las 3 tablas y declara la FK detectada por containment.
+    md = open(rf["md_path"], encoding="utf-8").read()
+    for tbl in ("customers", "orders", "reviews"):
+        assert tbl in md, f"la tabla {tbl} no aparece en el informe de carpeta"
+    assert "FK candidatas" in md, "no se declaran las FK candidatas"
+    assert "orders.customer_id" in md and "customers.customer_id" in md, (
+        "la FK orders→customers no se detectó por containment"
+    )
+    assert "reviews.customer_id" in md, "la FK reviews→customers no se detectó"
+
+
+# --------------------------------------------------------------------------- #
+# 6) MD COMPLETITUD (regresión) — el Markdown trae el apéndice con la matriz de
+#    asociación COMPLETA (todos los pares, no solo el top) y el describe con
+#    skew/kurtosis de todas las numéricas. Protege un fix ya mergeado.
+# --------------------------------------------------------------------------- #
+def test_md_apendice_matriz_correlacion_completa(standard_run):
+    md = standard_run["md"]
+    assert "Matriz de asociación — todos los pares" in md, (
+        "falta el apéndice con la matriz de asociación completa"
+    )
+    # Un par num-num de correlación BAJA que el top del capítulo NUNCA mostraría:
+    # su presencia prueba que el apéndice lista TODOS los pares, no solo el top.
+    assert "income ↔ longitude" in md, (
+        "el apéndice no contiene los pares de baja correlación: no es la matriz "
+        "completa, solo el top-k del capítulo"
+    )
+
+
+def test_md_apendice_describe_con_skew_kurtosis(standard_run):
+    md = standard_run["md"]
+    assert "Estadísticos numéricos completos (describe)" in md, (
+        "falta el apéndice describe completo"
+    )
+    # La cabecera del describe del apéndice lleva las columnas skew y kurtosis
+    # (subcadena única de ese header). Sin ellas el describe está incompleto.
+    assert "| skew | kurtosis |" in md, (
+        "el describe del apéndice no trae las columnas skew/kurtosis"
+    )
+
+
+# --------------------------------------------------------------------------- #
+# 7) LAS 3 SALIDAS NO-VACÍAS — PDF con páginas, PPTX con slides, MD con un mínimo
+#    de caracteres, y los tres archivos en disco. Manifest válido.
+# --------------------------------------------------------------------------- #
+def test_tres_salidas_no_vacias(standard_run):
+    r = standard_run["r"]
+    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
+    assert r["pptx_path"] and os.path.exists(r["pptx_path"])
+    assert r["aeda_md_path"] and os.path.exists(r["aeda_md_path"])
+    assert (r["n_pages"] or 0) > 0, "el PDF no tiene páginas"
+    assert (r["n_slides"] or 0) > 0, "el PPTX no tiene slides"
+    # El informe completo es grande: un mínimo holgado protege contra un MD vacío
+    # o truncado sin atarse a un tamaño exacto.
+    assert (r["md_chars"] or 0) > 10000, f"MD demasiado corto: {r['md_chars']} chars"
+    assert r["manifest_path"] and os.path.exists(r["manifest_path"])
+
+
+def test_pdf_texto_extraible_con_contenido(standard_run):
+    """Si pdftotext está disponible, el PDF debe traer texto real (no solo
+    imágenes): la portada nombra el dataset y su forma. Si no está la
+    herramienta, el test se omite (no es un fallo del EDA)."""
+    txt = standard_run["pdf_text"]
+    if txt is None:
+        pytest.skip("pdftotext no disponible")
+    assert len(txt) > 5000, "el PDF apenas tiene texto extraíble"
+    assert "Portada" in txt or "synthetic" in txt, (
+        "el texto del PDF no contiene la portada esperada"
+    )
+
+
+# --------------------------------------------------------------------------- #
+# DETERMINISMO — dos renders del MISMO dataset producen el MISMO manifest
+# (mismos capítulos y mismos n_pages/n_slides por capítulo). El generated_at
+# difiere por timestamp, por eso se compara el dict de capítulos, no el archivo.
+# --------------------------------------------------------------------------- #
+def test_render_es_determinista(synth_db, tmp_path):
+    out1 = str(tmp_path / "det1")
+    out2 = str(tmp_path / "det2")
+    r1 = render_automatic_eda(synth_db["db"], synth_db["table"],
+                              profile_level="standard", out_dir=out1, basename="d1")
+    r2 = render_automatic_eda(synth_db["db"], synth_db["table"],
+                              profile_level="standard", out_dir=out2, basename="d2")
+    assert r1["status"] == "ok" and r2["status"] == "ok"
+    c1 = json.load(open(r1["manifest_path"], encoding="utf-8")).get("chapters")
+    c2 = json.load(open(r2["manifest_path"], encoding="utf-8")).get("chapters")
+    assert c1 == c2, "el manifest no es determinista entre dos renders del mismo dataset"
+
+
+# --------------------------------------------------------------------------- #
+# SLOW (opcional, skippeable) — informe `full` con narrativa LLM. Requiere red /
+# credenciales y NO es determinista, por eso está apagado salvo opt-in explícito
+# vía la variable de entorno EDA_ACCEPT_LLM=1. Se omite con skipif (no con un
+# marker custom) para no depender de registro de marks en la config del repo.
+# --------------------------------------------------------------------------- #
+@pytest.mark.skipif(
+    os.environ.get("EDA_ACCEPT_LLM") != "1",
+    reason="full+LLM es lento/no determinista; exporta EDA_ACCEPT_LLM=1 para correrlo",
+)
+def test_full_incluye_capitulo_analisis_llm(synth_db, tmp_path):
+    out = str(tmp_path / "full")
+    r = render_automatic_eda(synth_db["db"], synth_db["table"],
+                             profile_level="full", out_dir=out, basename="full")
+    assert r["status"] == "ok", r.get("error")
+    assert "analisis_llm" in _manifest_chapters(r), (
+        "el preset full debe incluir el capítulo de análisis LLM"
+    )
@@ -9,6 +9,7 @@ dependencies = [
    "contextily>=1.7.0",
    "cryptography>=46.0.6",
    "duckdb>=1.5.2",
+    "faker>=40.27.0",
    "fpdf2>=2.8.7",
    "geopandas>=1.1.3",
    "google-api-python-client>=2.197.0",
@@ -839,6 +839,18 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/c1/ea/53f2148663b321f21b5a606bd5f191517cf40b7072c0497d3c92c4a13b1e/executing-2.2.1-py2.py3-none-any.whl", hash = "sha256:760643d3452b4d777d295bb167ccc74c64a81df23fb5e08eff250c425a4b2017", size = 28317, upload-time = "2025-09-01T09:48:08.5Z" },
 ]

+[[package]]
+name = "faker"
+version = "40.27.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "tzdata", marker = "sys_platform == 'win32'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/1a/7b/c62c98764137c949be240ad83f763b6f96cf76055952a3e2835359acc3af/faker-40.27.0.tar.gz", hash = "sha256:f697cf07f461474ad7d511164c21f45317e69f1d531d25f3e0f872b639e346a1", size = 2018361, upload-time = "2026-06-30T18:05:17.775Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c6/b2/788aae329da3d7e4f08f8e1a82e82243c3376c0f3f49b75ae29eea40b371/faker-40.27.0-py3-none-any.whl", hash = "sha256:6099bd6d7bc79041b46c28e100815e2558952bcf384b76ce6c71c8bdca744256", size = 2057897, upload-time = "2026-06-30T18:05:15.555Z" },
+]
+
 [[package]]
 name = "fastapi"
 version = "0.136.3"
@@ -890,6 +902,7 @@ dependencies = [
    { name = "contextily" },
    { name = "cryptography" },
    { name = "duckdb" },
+    { name = "faker" },
    { name = "fpdf2" },
    { name = "geopandas" },
    { name = "google-api-python-client" },
@@ -949,6 +962,7 @@ requires-dist = [
    { name = "contextily", specifier = ">=1.7.0" },
    { name = "cryptography", specifier = ">=46.0.6" },
    { name = "duckdb", specifier = ">=1.5.2" },
+    { name = "faker", specifier = ">=40.27.0" },
    { name = "fpdf2", specifier = ">=2.8.7" },
    { name = "geopandas", specifier = ">=1.1.3" },
    { name = "gliner", marker = "extra == 'nlp'", specifier = ">=0.2.13" },
Author	SHA1	Message	Date
egutierrez	a74a5a047f	feat(eda): render quality global — DPI 220, tablas anchas como imagen, layout side_by_side, índice clicable Mejoras transversales del motor AutomaticEDA (PDF + PPTX) sobre el modelo de bloques: 1. DPI alto global: toda figura/imagen embebida se rasteriza a 220 dpi (antes 150, y en PDF la página se guardaba a ~100 dpi re-rasterizando los imshow). En PDF se aplica savefig.dpi=220 a la página; el texto sigue vectorial y seleccionable. Permite ampliar en el móvil sin pixelar. Imagen embebida medida: ~1081px (antes ~492px). 2. Tabla ancha → imagen de alta resolución: cuando un DataTable tiene demasiadas columnas para ser legible como texto (criterio _table_fits_as_text), se dibuja entera como una imagen nítida (nueva función render_table_as_figure_py_datascience: cabecera sombreada + zebra) escalada para caber completa, de modo que el lector hace zoom y la lee sin perder datos. Las tablas que sí caben siguen como texto seleccionable / tabla nativa. Aplica en PDF y PPTX. El df.head de 19 columnas del dataset sintético ya no se corta: sale como imagen. 3. Group.layout: nuevo hint retrocompatible (default "stack"). "side_by_side" coloca la tabla a la izquierda (~55%) y la figura a la derecha (~45%) en la misma slide PPTX (cae a apilado si no hay par tabla+figura o no caben); en PDF se trata como "stack" (el ancho A5 móvil no admite dos columnas). Pensado para que el capítulo cat_distr ponga el gráfico al lado de la tabla en PPT. 4. Portada con índice clicable: la lista de capítulos pasa de "Este informe incluye..." (markdown) a un Heading "Índice" + un TocEntry por capítulo. El renderer registra el inicio de cada capítulo y cablea cada entrada como salto real (PDF: link GOTO PyMuPDF; PPTX: salto a slide nativo), reutilizando el mecanismo del glosario clicable. Modelo: Group gana `layout`; nuevo bloque TocEntry; normalizers y __init__ actualizados. Contrato: documentado en docs/automatic_eda_contract.md §11.4 (incluye el contrato exacto del campo layout para el agente de cat_distr). Tests: nuevo render_quality_test.py (13 golden: DPI alto real, tabla ancha→imagen PDF/PPTX, narrow→texto, side_by_side PPTX dos columnas / PDF apilado, índice clicable PDF+PPTX, retrocompatibilidad layout por defecto). render_features_test actualizado al índice nuevo. Suite: 188 passed (módulo) + 38 passed/1 skipped (acceptance + pipeline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 01:34:21 +02:00
egutierrez	f2eb782a5f	merge(eda): portada v2 (sin Criterios, descripcion LLM, resumen a la derecha) + zebra global PDF + nombre PPTX grande/subrayado	2026-06-30 22:53:46 +02:00
egutierrez	80d10010f5	feat(eda): portada cap01 + zebra global y emphasis de render Itera el capítulo PORTADA del AutomaticEDA y dos mejoras globales de los renderers PDF/PPTX: 1. Zebra global (PDF): _place_kv_table ahora sombrea las filas pares igual que las DataTable, así toda tabla del documento queda rayada (no solo las DataTable). Mismo patrón coherente al partir/repetir cabecera. 2. Portada usa la descripción LLM rica (profile['llm']['summary']) cuando el perfil la tiene; se elimina del fallback derivado el texto ruido "active la interpretación LLM (run_llm)…". No fuerza llamadas LLM en el capítulo, solo consume profile['llm'] si está. 3. Se quita el bloque "Criterios de calidad" de la portada (PDF y PPTX); el score "Calidad" se mantiene. 4. "Resumen del análisis" (PDF): los valores se alinean al margen derecho via el nuevo KVTable.value_align="right". 5. Nombre del dataset en la portada PPTX más grande (44pt) y subrayado via los nuevos hints Heading.underline / Heading.size_pt (el PDF los ignora). Bump CHAPTER_VERSION de portada 1.2.0 -> 1.3.0. Verificado: suite 213 passed / 1 skipped (incl. aceptación de los 16 capítulos); golden zebra = 185 filas zebra en 13 capítulos del PDF completo; portada con run_llm sin "Criterios de calidad", con descripción LLM rica y valores a la derecha; PPTX con nombre 44pt subrayado; edge sin LLM cae al fallback derivado sin ruido; fn index sin error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 22:44:33 +02:00
egutierrez	ecc22d6d57	merge(eda): suite de aceptacion de los 16 capitulos (29 passed, rescatado de ejecutor con auth caida)	2026-06-30 22:07:21 +02:00
agent	7bdb8bffb5	test(eda): suite de aceptacion de los 16 capitulos del AutomaticEDA Bateria que blinda el subsistema: cobertura de los 16 capitulos sobre el dataset sintetico Faker, contenido esencial por capitulo (needles parametrizados), capitulos sueltos con resolucion de dependencias (only_chapters=[outliers] puebla IsolationForest sin run_models; timeseries; correlacion), None cuando no aplica, folder multi-tabla con FK, completitud del MD (matriz de correlacion completa + skew/kurtosis), 3 salidas no vacias, determinismo. Test full+LLM skippeable. 29 passed, 1 skipped. Sin hallazgos: los 16 capitulos salen como deben.	2026-06-30 22:07:15 +02:00
egutierrez	4139394326	merge(eda): only_chapters con resolucion automatica de dependencias de computo por capitulo	2026-06-30 21:37:16 +02:00
egutierrez	4773781323	merge(eda): generadores sinteticos Faker (tabla todo-en-uno + carpeta multi-tabla) que activan todos los capitulos	2026-06-30 21:26:20 +02:00
egutierrez	ea6678ec23	feat(eda): generadores de datasets sintéticos Faker que ejercitan el AutomaticEDA Añade dos funciones impuras dict-no-throw, deterministas por seed, al dominio datascience (grupo eda): - generate_synthetic_eda_table: una tabla DuckDB de 19 columnas (numéricas correlacionadas + outliers, categóricas desbalanceadas, texto largo multi-idioma es/en/fr, fecha DATE, lat/lon válidas, PII email/iban/phone/uuid, nulos con patrón MCAR/MAR co-ocurrentes). Activa 14 capítulos del motor AutomaticEDA (num_distr, cat_distr, text_distr, calidad, missingness, correlacion, relaciones, modelos, timeseries, geospatial, agregacion, glosario + portada/overview). - generate_synthetic_eda_folder: 3 CSV relacionados (customers/orders/reviews) con FK customer detectable por containment, para el EDA de carpeta multi-tabla. Determinismo via Faker.seed_instance + numpy.default_rng. Tests: 16 passed (incluye determinismo por hash, rangos lat/lon, co-nulos income/spending, mediana palabras review >=20, phone formato internacional, FK containment). Añade faker (40.27.0) a python/pyproject.toml + uv.lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 21:25:31 +02:00