chore: auto-commit (8 archivos)

- scratchpad/gen_docs.py - scratchpad/gen_intel.py - scratchpad/gen_verify.py - scratchpad/intel_build.json - scratchpad/intel_lineage.json - scratchpad/lineage_graph.json - scratchpad/trace_intel.py - scratchpad/trace_lineage.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chore: auto-commit (10 archivos)
2026-07-01 19:00:06 +02:00 · 2026-07-01 17:58:03 +02:00 · 2026-07-01 12:45:39 +02:00 · 2026-07-01 11:42:49 +02:00 · 2026-07-01 11:41:56 +02:00 · 2026-07-01 02:16:25 +02:00
74 changed files with 8721 additions and 197 deletions
@@ -41,12 +41,13 @@ reconocido se degrada a `Note`, nunca lanza).
 | `Heading(text, level=1)` | título de sección, `level` 1 (grande) … 3 (chico) | una o varias líneas en negrita; nivel 1 lleva subrayado de acento |
 | `Markdown(text)` | texto markdown ligero | ver subset abajo; **nunca corta a media línea** |
 | `KVTable(rows, title=None)` | `rows = [(clave, valor), ...]` | tabla de 2 columnas etiqueta/valor; el valor se envuelve |
-| `DataTable(header, rows, title=None, note=None)` | `header=[...]`, `rows=[[...],...]` | tabla con cabecera; **se parte por filas repitiendo cabecera**; las celdas largas se envuelven dentro de su columna |
+| `DataTable(header, rows, title=None, note=None)` | `header=[...]`, `rows=[[...],...]` | tabla con cabecera; **si cabe** como texto se parte por filas repitiendo cabecera; **si NO cabe** (demasiadas columnas) se rasteriza entera como imagen de alta resolución para hacer zoom. Ver §11.4 |
 | `Figure(fig=None, make=None, caption=None, height_in=None)` | una `matplotlib.figure.Figure` ya construida (`fig`) o un callable `make()->Figure` (perezoso) | se rasteriza y escala para caber entera (nunca recortada) |
 | `Image(path, caption=None, height_in=None)` | ruta a PNG/JPG | se escala para caber entera |
 | `Caption(text)` / `Note(text)` | texto auxiliar pequeño | pie/nota en gris; `Note` es además el fallback de lo desconocido |
-| `Group(blocks, title=None)` | unidad **keep-together**: sus bloques se mantienen juntos | el renderer mide el grupo entero y lo mueve completo a la página/slide siguiente si no cabe; encoge la figura para dejar sitio al título+texto. Ver §11 |
+| `Group(blocks, title=None, page_break_before=False, layout="stack")` | unidad **keep-together**: sus bloques se mantienen juntos | el renderer mide el grupo entero y lo mueve completo a la página/slide siguiente si no cabe; encoge la figura para dejar sitio al título+texto. `layout="side_by_side"` coloca tabla+figura en dos columnas (solo PPTX). Ver §11 y §11.4 |
 | `GlossaryEntry(key, label, definition)` | una entrada del glosario (destino clicable) | la genera el capítulo `glosario`; registra su posición como destino de los términos marcados. Ver §11 |
+| `TocEntry(label, target_id)` | una entrada de **índice clicable** en la portada | la genera el capítulo `portada`; el renderer la cablea como salto al inicio del capítulo cuyo `id` o `title` coincide con `target_id`. Ver §11.4 |

 `Figure`/`Image` aceptan `height_in` (hint): el renderer **clampa** la figura a esa altura máxima (lo usa `Group` para encoger la figura). Toda figura escala dejando sitio a su caption en la misma página/slide; en PPTX el caption es **siempre** visible (si no se da `caption`, cae al último heading o a "Figura").

@@ -397,6 +398,65 @@ cabecera con su fondo propio. Es automático en PDF y PPTX; el patrón se mantie
 cuando una tabla larga se parte y repite cabecera (el índice de fila es lógico, no por
 página). No hay nada que hacer en los capítulos.

+### 11.4 Calidad de render global: DPI alto, tabla ancha → imagen, figura al lado, índice clicable
+
+Cuatro capacidades transversales del motor, **todas automáticas salvo `layout`** (que un
+capítulo activa explícitamente). Aplican a PDF y PPTX salvo donde se indique.
+
+**(a) DPI alto (automático).** Toda figura/imagen embebida se rasteriza a **220 dpi**
+(constante `_RASTER_DPI` en ambos renderers; en PDF se aplica también al `savefig` de la
+página, porque matplotlib re-rasteriza cada `imshow` al escribir la página). Objetivo:
+ampliar en el móvil y leer detalle (ejes, celdas) sin pixelar. El texto sigue siendo
+vectorial y seleccionable. No hay nada que hacer en los capítulos.
+
+**(b) Tabla ancha → imagen de alta resolución (automático).** Cuando un `DataTable` tiene
+**demasiadas columnas para ser legible como texto** en el ancho útil (criterio
+`_table_fits_as_text`: ancho mínimo legible por columna × nº de columnas > ancho útil; en
+la práctica salta sobre tablas tipo `df.head` con muchas columnas), en vez de comprimir las
+columnas hasta hacerlas ilegibles, la tabla se dibuja **entera como una imagen de alta
+resolución** (función `render_table_as_figure_py_datascience`: cabecera sombreada + zebra)
+escalada para caber completa, de modo que el lector hace **zoom** y la lee sin perder datos.
+Si la tabla **sí cabe**, se mantiene como texto seleccionable (PDF) / tabla nativa (PPTX).
+Las `KVTable` (2 columnas) caben siempre y se quedan como texto. No hay nada que hacer en
+los capítulos.
+
+**(c) Figura al lado de la tabla — `Group(layout="side_by_side")`.** Hint de layout que un
+capítulo activa para que su **tabla quede a la izquierda y su figura a la derecha** en la
+misma diapositiva, en lugar de apiladas:
+
+```python
+model.Group(
+    layout="side_by_side",
+    blocks=[
+        model.Heading(text=str(name), level=2),       # va a ancho completo arriba
+        model.DataTable(header=..., rows=...),         # columna IZQUIERDA (~55%)
+        model.Figure(make=_grafico_perezoso(...)),     # columna DERECHA (~45%)
+        model.Markdown(text="explicación…"),           # va a ancho completo abajo
+    ])
+```
+
+Contrato exacto del campo:
+
+| Campo | Valor | Efecto |
+|---|---|---|
+| `layout` | `"stack"` (por defecto) | comportamiento histórico: apilado vertical (keep-together). |
+| `layout` | `"side_by_side"` | **PPTX**: la tabla (rasterizada a imagen) ocupa la columna izquierda (~55% del ancho útil) y la figura la derecha (~45%); cualquier otro bloque (heading, markdown) va a ancho completo arriba/abajo. Si no hay un par tabla+figura, o no caben lado a lado en una slide, **cae automáticamente a apilado**. **PDF**: se trata **igual que `stack`** (el ancho A5 móvil no admite dos columnas legibles). Valores desconocidos degradan a `"stack"`. |
+
+Es **retrocompatible**: un `Group` sin `layout` (o `layout="stack"`) se comporta exactamente
+como antes. El capítulo `cat_distr` es el consumidor previsto (gráfico a la derecha de la
+tabla de categorías en PPT); este motor solo provee el soporte.
+
+**(d) Índice clicable en la portada — `TocEntry`.** La portada emite un `Heading("Índice")`
+seguido de un `TocEntry(label, target_id)` por capítulo. El renderer registra la
+página/slide de inicio de **cada** capítulo (indexado por `id` **y** por `title`) y cablea
+cada `TocEntry` como un salto real a ese inicio: en **PDF** vía
+`add_pdf_internal_links_py_datascience` (link GOTO de PyMuPDF), en **PPTX** vía
+`pptx_link_run_to_slide_py_datascience` (salto a slide nativo). Como la portada solo conoce
+los **títulos** de los capítulos, el `target_id` se hace coincidir contra el `title` (o el
+`id`) de destino. Si un destino no resuelve, la entrada se muestra igualmente como texto
+(en color de enlace), nunca se corta. Es el mismo mecanismo que los términos clicables del
+glosario (§11.1), reutilizado en sentido portada → capítulo.
+
 ---

 ## 10. Integración futura con `profile_table` (siguiente fase)
@@ -79,8 +79,10 @@ from .render_paper_pdf import render_paper_pdf
 from .draw_join_graph_figure import draw_join_graph_figure
 from .generate_synthetic_eda_table import generate_synthetic_eda_table
 from .generate_synthetic_eda_folder import generate_synthetic_eda_folder
+from .load_bq_table_to_duckdb import load_bq_table_to_duckdb

 __all__ = [
+    "load_bq_table_to_duckdb",
    "generate_synthetic_eda_table",
    "generate_synthetic_eda_folder",
    "render_paper_pdf",
@@ -29,6 +29,7 @@ from .model import (  # noqa: F401
    KVTable,
    Markdown,
    Note,
+    TocEntry,
    as_blocks,
    as_chapters,
    merge_manifest,
@@ -52,6 +53,7 @@ __all__ = [
    "Group",
    "GlossaryEntry",
    "GlossaryCollector",
+    "TocEntry",
    "Chapter",
    "as_blocks",
    "as_chapters",
@@ -0,0 +1,109 @@
+"""Tests del filtro `only` de build_document (selección de capítulos).
+
+Verifican que:
+  - only=None mantiene el comportamiento histórico (todos los capítulos).
+  - only=[ids] restringe el CUERPO a esos ids, pero portada (primera) y glosario
+    (última) están SIEMPRE presentes.
+  - only=[] produce el documento mínimo (solo portada + glosario).
+  - la selección también viaja por la clave reservada ctx['_only_chapters']
+    (el canal que usan los renderers, que llaman build_document sin `only`), y
+    esa clave nunca se filtra a los capítulos.
+"""
+
+import os
+import sys
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+from datascience.automatic_eda import build_document  # noqa: E402
+
+
+def _profile_with_cat_and_num():
+    """Perfil mínimo que hace construir cat_distr y num_distr (cuerpo no vacío)."""
+    return {
+        "table": "ventas", "n_rows": 120, "n_cols": 2, "quality_score": 91,
+        "duplicate_pct": 1.5, "null_cell_pct": 0.8,
+        "columns": [
+            {"name": "region", "inferred_type": "categorical",
+             "categorical": {
+                 "top": [{"value": "norte", "count": 50, "pct": 0.42},
+                         {"value": "sur", "count": 40, "pct": 0.33},
+                         {"value": "este", "count": 30, "pct": 0.25}],
+                 "mode": "norte", "n_distinct": 3, "entropy": 1.55,
+                 "imbalance": 0.1}},
+            {"name": "importe", "inferred_type": "numeric",
+             "numeric": {"mean": 50.0, "median": 48.0, "std": 10.0,
+                         "min": 10, "max": 99, "iqr": 15,
+                         "histogram": [{"lo": 0, "hi": 50, "count": 40},
+                                       {"lo": 50, "hi": 100, "count": 80}]}},
+        ],
+    }
+
+
+def test_only_none_is_full_document():
+    """Retro-compat: sin `only`, salen todos los capítulos aplicables."""
+    chs = build_document(_profile_with_cat_and_num(), ctx={"dataset_name": "v"})
+    ids = [c.id for c in chs]
+    assert ids[0] == "portada"
+    assert ids[-1] == "glosario"
+    # El cuerpo trae las distribuciones (cat/num), no solo portada+glosario.
+    assert "num_distr" in ids
+    assert "cat_distr" in ids
+
+
+def test_only_restricts_body_but_keeps_cover_and_glossary():
+    # cat_distr registra el término "entropía" en el glosario, así que el
+    # glosario (destino del término clicable) aparece — demuestra el contrato
+    # "portada primera + capítulo + glosario última".
+    chs = build_document(_profile_with_cat_and_num(),
+                         ctx={"dataset_name": "v"}, only=["cat_distr"])
+    ids = [c.id for c in chs]
+    assert ids[0] == "portada", f"portada no es la primera: {ids}"
+    assert ids[-1] == "glosario", f"glosario no es la última: {ids}"
+    assert "cat_distr" in ids
+    # num_distr quedó fuera de la selección.
+    assert "num_distr" not in ids
+
+
+def test_only_empty_yields_minimal_document():
+    # only=[] -> cuerpo vacío. La portada está siempre; el glosario solo aparece
+    # si algún capítulo registró términos (patrón preexistente: glosario vacío se
+    # omite). Sin cuerpo no hay términos → documento mínimo = solo portada.
+    chs = build_document(_profile_with_cat_and_num(),
+                         ctx={"dataset_name": "v"}, only=[])
+    ids = [c.id for c in chs]
+    assert ids == ["portada"], \
+        f"only=[] debe dar el documento mínimo (solo portada), no {ids}"
+
+
+def test_selection_via_reserved_ctx_key():
+    """La selección viaja por ctx['_only_chapters'] cuando no se pasa `only`."""
+    chs = build_document(_profile_with_cat_and_num(),
+                         ctx={"dataset_name": "v",
+                              "_only_chapters": ["cat_distr"]})
+    ids = [c.id for c in chs]
+    assert "cat_distr" in ids
+    assert "num_distr" not in ids
+    assert ids[0] == "portada" and ids[-1] == "glosario"
+
+
+def test_explicit_only_arg_wins_over_ctx_key():
+    """Si se pasan ambos, el argumento `only` manda sobre la clave del ctx."""
+    chs = build_document(_profile_with_cat_and_num(),
+                         ctx={"dataset_name": "v",
+                              "_only_chapters": ["cat_distr"]},
+                         only=["num_distr"])
+    ids = [c.id for c in chs]
+    assert "num_distr" in ids
+    assert "cat_distr" not in ids
+
+
+def test_reserved_key_not_leaked_to_caller_ctx():
+    """build_document no muta el ctx del caller (copia interna)."""
+    ctx = {"dataset_name": "v", "_only_chapters": ["num_distr"]}
+    build_document(_profile_with_cat_and_num(), ctx=ctx)
+    # La clave reservada sigue en el dict del caller (no se mutó su copia).
+    assert ctx["_only_chapters"] == ["num_distr"]
@@ -0,0 +1,205 @@
+"""chapter_deps — mapa central de dependencias de cómputo por capítulo del EDA.
+
+Fuente de verdad ÚNICA de qué necesita cada capítulo de ``CHAPTER_ORDER`` para
+computarse COMPLETO (sin caer en su rama degradada "datos insuficientes"). Lo
+consume el pipeline ``render_automatic_eda`` cuando se le pide renderizar un
+SUBCONJUNTO de capítulos (kwarg ``only_chapters``): antes de perfilar, resuelve
+los requisitos de los capítulos pedidos y activa SOLO el cómputo que esos
+capítulos necesitan, de modo que un capítulo suelto siempre llegue poblado y a la
+vez no se malgaste CPU/LLM en piezas que ningún capítulo pedido usa.
+
+Diseño: el mapa es CENTRAL (este módulo), NO una constante por capítulo. Así se
+evita tocar los ``chapters/<id>.py`` (cada agente es dueño de su capítulo) y se
+elimina el riesgo de colisión entre ramas. Si un capítulo cambia lo que lee del
+``profile``/``ctx``, se actualiza ESTE mapa — es donde el motor mira.
+
+Dos clases de dependencia, derivadas inspeccionando qué lee cada capítulo:
+
+  - ``profile_flags``: flags de coste de ``profile_table`` que hay que ACTIVAR
+    para que el ``profile`` traiga el bloque que el capítulo lee. Son los caros:
+      * ``run_models``  -> ``profile['models']`` (KMeans/IsolationForest/PCA).
+        Lo leen ``outliers`` (fallback del multivariante) y ``modelos``.
+      * ``run_series``  -> ``profile['series']`` (análisis de serie temporal).
+        Lo lee ``timeseries``.
+      * ``run_llm``     -> ``profile['llm']`` (interpretación del modelo).
+        Lo lee ``analisis_llm``.
+
+  - ``ctx``: etiquetas de las piezas de DATOS CRUDOS que construye
+    ``build_eda_render_ctx`` y que el capítulo lee del ``ctx``. Si la lista está
+    vacía, el capítulo no necesita datos crudos y el pipeline puede saltarse
+    ``build_eda_render_ctx`` por completo cuando ningún capítulo pedido los pide.
+    Etiquetas y claves reales que mapean (ver ``CTX_LABEL_TO_KEYS``):
+      * ``head_rows``      -> ``ctx['head_rows']``      (overview: df.head real).
+      * ``raw_numeric``    -> ``ctx['raw_numeric']``    (outliers/modelos/
+        correlacion/missingness/geospatial: muestra numérica alineada por fila).
+      * ``timeseries_raw`` -> ``ctx['timeseries_raw']`` (timeseries: serie cruda).
+      * ``geo_points``     -> ``ctx['geo_points']`` (+ ``raw_numeric``)
+        (geospatial: lat/lon).
+      * ``db_path_table``  -> ``ctx['db_path']`` + ``ctx['table']`` (agregacion/
+        text_distr/missingness/relaciones: push-down de queries propias).
+
+``portada`` y ``glosario`` NO son opcionales: el pipeline los incluye SIEMPRE
+(la portada resume el documento y el glosario es el destino de los términos
+clicables), así que aquí se declaran sin requisitos de cómputo.
+
+Todas las funciones de este módulo son PURAS (no I/O, deterministas): se prestan
+a test unitario directo.
+"""
+
+from __future__ import annotations
+
+# Mapa central. Una entrada por id de CHAPTER_ORDER. ``profile_flags`` lista los
+# flags de coste a activar; ``ctx`` las etiquetas de datos crudos que lee. Las
+# claves vacías significan "no necesita ese tipo de dependencia".
+CHAPTER_DEPS = {
+    # Portada y glosario: SIEMPRE presentes, sin cómputo propio (la portada lee
+    # el document_summary que arma build_document; el glosario lee los términos
+    # que el resto registró). Se declaran para que el mapa cubra CHAPTER_ORDER
+    # entero y la validación los reconozca.
+    "portada":      {"profile_flags": [], "ctx": []},
+    "overview":     {"profile_flags": [], "ctx": ["head_rows"]},
+    "analisis_llm": {"profile_flags": ["run_llm"], "ctx": []},
+    "num_distr":    {"profile_flags": [], "ctx": []},
+    "cat_distr":    {"profile_flags": [], "ctx": []},
+    # text_distr empuja su propia query de texto (no usa raw_numeric); necesita
+    # db_path/table en el ctx para hacerlo.
+    "text_distr":   {"profile_flags": [], "ctx": ["db_path_table"]},
+    "calidad":      {"profile_flags": [], "ctx": []},
+    # missingness lee la muestra numérica cruda (co-ocurrencia de ausencias) y
+    # puede empujar una query de patrón de nulos con db_path/table.
+    "missingness":  {"profile_flags": [], "ctx": ["raw_numeric", "db_path_table"]},
+    # outliers corre IsolationForest EN VIVO sobre ctx['raw_numeric']; run_models
+    # asegura además el fallback profile['models']['outliers'] si el ctx faltara.
+    "outliers":     {"profile_flags": ["run_models"], "ctx": ["raw_numeric"]},
+    "correlacion":  {"profile_flags": [], "ctx": ["raw_numeric"]},
+    "relaciones":   {"profile_flags": [], "ctx": ["db_path_table"]},
+    "modelos":      {"profile_flags": ["run_models"], "ctx": ["raw_numeric"]},
+    "timeseries":   {"profile_flags": ["run_series"], "ctx": ["timeseries_raw"]},
+    "geospatial":   {"profile_flags": [], "ctx": ["geo_points", "raw_numeric"]},
+    "agregacion":   {"profile_flags": [], "ctx": ["db_path_table"]},
+    "glosario":     {"profile_flags": [], "ctx": []},
+}
+
+# Capítulos que el documento incluye SIEMPRE, independientemente de only_chapters.
+ALWAYS_PRESENT = ("portada", "glosario")
+
+# Flags de coste reconocidos (el orden no importa; se devuelven como set).
+KNOWN_PROFILE_FLAGS = ("run_models", "run_series", "run_llm")
+
+# Mapeo de cada etiqueta de ctx a las claves REALES que produce
+# build_eda_render_ctx. ``db_path_table`` es especial: db_path/table siempre se
+# ponen para un backend válido y son inofensivos, por eso no se podan nunca (no
+# aparecen en DATA_CTX_KEYS). El resto (head_rows/raw_numeric/timeseries_raw/
+# geo_points) son las piezas de datos podables.
+CTX_LABEL_TO_KEYS = {
+    "head_rows":      {"head_rows"},
+    "raw_numeric":    {"raw_numeric"},
+    "timeseries_raw": {"timeseries_raw"},
+    "geo_points":     {"geo_points", "raw_numeric"},
+    "db_path_table":  set(),  # db_path/table siempre presentes; nunca se podan.
+}
+
+# Claves de datos crudos del ctx que se pueden podar cuando ningún capítulo
+# pedido las necesita (las que cuestan muestreo). db_path/table NO entran aquí.
+DATA_CTX_KEYS = ("head_rows", "raw_numeric", "timeseries_raw", "geo_points")
+
+
+def _as_id_list(chapter_ids):
+    """Normaliza la entrada a una lista de ids string, defensiva. None -> []."""
+    if chapter_ids is None:
+        return []
+    if isinstance(chapter_ids, str):
+        return [chapter_ids]
+    return [c for c in chapter_ids if isinstance(c, str)]
+
+
+def validate_chapter_ids(chapter_ids, order):
+    """Separa los ids pedidos en válidos y desconocidos respecto a ``order``.
+
+    Args:
+        chapter_ids: lista (o str) de ids de capítulo pedidos.
+        order: lista canónica de ids válidos (CHAPTER_ORDER).
+
+    Returns:
+        dict ``{"valid": [...], "unknown": [...]}`` preservando el orden de
+        aparición de la entrada. Función pura.
+    """
+    valid_set = set(order or [])
+    valid, unknown = [], []
+    for cid in _as_id_list(chapter_ids):
+        (valid if cid in valid_set else unknown).append(cid)
+    return {"valid": valid, "unknown": unknown}
+
+
+def resolve_requirements(chapter_ids):
+    """Une los requisitos de cómputo de los capítulos pedidos.
+
+    Es el corazón de la resolución de dependencias: dado el subconjunto de
+    capítulos a renderizar, devuelve TODO lo que hay que activar/construir para
+    que esos capítulos lleguen COMPLETOS, y solo eso.
+
+    Los capítulos ``ALWAYS_PRESENT`` (portada/glosario) se añaden implícitamente
+    porque el pipeline siempre los incluye; como no tienen requisitos, no alteran
+    el resultado, pero se contemplan para que el conjunto sea coherente.
+
+    Args:
+        chapter_ids: lista (o str) de ids de capítulo. Ids desconocidos se
+            ignoran silenciosamente (la validación estricta es de quien llama).
+            None o lista vacía -> requisitos vacíos.
+
+    Returns:
+        dict ``{"profile_flags": set[str], "ctx_keys": set[str]}`` donde
+        ``ctx_keys`` son las ETIQUETAS de ctx (no las claves reales). Función
+        pura.
+    """
+    ids = set(_as_id_list(chapter_ids)) | set(ALWAYS_PRESENT)
+    profile_flags = set()
+    ctx_keys = set()
+    for cid in ids:
+        dep = CHAPTER_DEPS.get(cid)
+        if not isinstance(dep, dict):
+            continue
+        for f in dep.get("profile_flags", []) or []:
+            if f in KNOWN_PROFILE_FLAGS:
+                profile_flags.add(f)
+        for k in dep.get("ctx", []) or []:
+            ctx_keys.add(k)
+    return {"profile_flags": profile_flags, "ctx_keys": ctx_keys}
+
+
+def resolve_profile_flags(chapter_ids):
+    """Atajo: solo el set de profile_flags a activar para los capítulos pedidos.
+
+    Función pura. Devuelve un set ⊆ KNOWN_PROFILE_FLAGS.
+    """
+    return resolve_requirements(chapter_ids)["profile_flags"]
+
+
+def needs_render_ctx(chapter_ids):
+    """True si algún capítulo pedido necesita datos crudos del ctx.
+
+    Cuando es False, el pipeline puede saltarse ``build_eda_render_ctx`` entero
+    (ahorro real de CPU/I/O): los capítulos pedidos no leen ninguna pieza de
+    datos crudos. Función pura.
+    """
+    return bool(resolve_requirements(chapter_ids)["ctx_keys"])
+
+
+def resolve_ctx_data_keys(chapter_ids):
+    """Claves REALES de datos del ctx a CONSERVAR para los capítulos pedidos.
+
+    Traduce las etiquetas de ctx a las claves concretas que produce
+    ``build_eda_render_ctx`` (head_rows/raw_numeric/timeseries_raw/geo_points).
+    El pipeline poda del ctx las claves de datos que NO estén en este set, para
+    que un capítulo suelto no arrastre piezas de datos que no usa. db_path/table
+    nunca se podan (no aparecen aquí). Función pura.
+
+    Returns:
+        set[str] subconjunto de DATA_CTX_KEYS.
+    """
+    req = resolve_requirements(chapter_ids)
+    keep = set()
+    for label in req["ctx_keys"]:
+        keep |= CTX_LABEL_TO_KEYS.get(label, set())
+    # Solo claves de datos podables (db_path/table se gestionan aparte).
+    return {k for k in keep if k in DATA_CTX_KEYS}
@@ -0,0 +1,160 @@
+"""Tests del mapa central de dependencias por capítulo (chapter_deps).
+
+Todas las funciones bajo prueba son PURAS (sin I/O): se ejercitan directamente
+sin DuckDB ni renderizado. Cubren la resolución de requisitos (golden + edges),
+la validación de ids y los helpers de eficiencia (qué cómputo se salta).
+"""
+
+import os
+import sys
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+from datascience.automatic_eda.chapter_deps import (  # noqa: E402
+    ALWAYS_PRESENT,
+    CHAPTER_DEPS,
+    DATA_CTX_KEYS,
+    needs_render_ctx,
+    resolve_ctx_data_keys,
+    resolve_profile_flags,
+    resolve_requirements,
+    validate_chapter_ids,
+)
+from datascience.automatic_eda.chapters_registry import CHAPTER_ORDER  # noqa: E402
+
+
+# --------------------------------------------------------------------------- #
+# El mapa cubre CHAPTER_ORDER entero (sin huecos ni claves de más).
+# --------------------------------------------------------------------------- #
+def test_chapter_deps_covers_every_chapter_in_order():
+    assert set(CHAPTER_DEPS) == set(CHAPTER_ORDER), (
+        "CHAPTER_DEPS debe declarar exactamente los ids de CHAPTER_ORDER")
+    # Cada entrada tiene la forma esperada.
+    for cid, dep in CHAPTER_DEPS.items():
+        assert isinstance(dep.get("profile_flags"), list), cid
+        assert isinstance(dep.get("ctx"), list), cid
+
+
+# --------------------------------------------------------------------------- #
+# resolve_requirements — golden: outliers exige run_models + raw_numeric.
+# --------------------------------------------------------------------------- #
+def test_resolve_outliers_requires_run_models_and_raw_numeric():
+    req = resolve_requirements(["outliers"])
+    assert "run_models" in req["profile_flags"]
+    assert "raw_numeric" in req["ctx_keys"]
+    assert "run_series" not in req["profile_flags"]
+    assert "run_llm" not in req["profile_flags"]
+
+
+def test_resolve_timeseries_requires_run_series():
+    req = resolve_requirements(["timeseries"])
+    assert req["profile_flags"] == {"run_series"}
+    assert "timeseries_raw" in req["ctx_keys"]
+
+
+def test_resolve_analisis_llm_requires_run_llm():
+    assert resolve_requirements(["analisis_llm"])["profile_flags"] == {"run_llm"}
+
+
+def test_resolve_union_of_several_chapters():
+    req = resolve_requirements(["outliers", "timeseries", "analisis_llm"])
+    assert req["profile_flags"] == {"run_models", "run_series", "run_llm"}
+
+
+# --------------------------------------------------------------------------- #
+# Eficiencia: capítulos que NO necesitan flags caros no los activan.
+# --------------------------------------------------------------------------- #
+def test_resolve_geospatial_needs_no_cost_flags():
+    """geospatial sale de geo_points/raw_numeric del ctx, NO de los modelos."""
+    req = resolve_requirements(["geospatial"])
+    assert req["profile_flags"] == set(), \
+        "geospatial no debe activar run_models/run_series/run_llm"
+    assert "geo_points" in req["ctx_keys"]
+
+
+def test_resolve_correlacion_needs_raw_numeric_but_no_models():
+    req = resolve_requirements(["correlacion"])
+    assert req["profile_flags"] == set()
+    assert "raw_numeric" in req["ctx_keys"]
+
+
+def test_always_present_chapters_add_no_requirements():
+    """portada y glosario están siempre, pero no arrastran cómputo."""
+    for cid in ALWAYS_PRESENT:
+        req = resolve_requirements([cid])
+        assert req["profile_flags"] == set()
+        assert req["ctx_keys"] == set()
+
+
+def test_resolve_profile_flags_shortcut():
+    assert resolve_profile_flags(["modelos"]) == {"run_models"}
+    assert resolve_profile_flags(["num_distr"]) == set()
+
+
+# --------------------------------------------------------------------------- #
+# needs_render_ctx — cuándo se puede saltar build_eda_render_ctx por completo.
+# --------------------------------------------------------------------------- #
+def test_needs_render_ctx_true_when_chapter_reads_raw_data():
+    assert needs_render_ctx(["outliers"]) is True
+    assert needs_render_ctx(["agregacion"]) is True  # db_path/table push-down
+    assert needs_render_ctx(["timeseries"]) is True
+
+
+def test_needs_render_ctx_false_for_purely_aggregated_chapters():
+    """num_distr / cat_distr / calidad solo leen el profile agregado."""
+    assert needs_render_ctx(["num_distr"]) is False
+    assert needs_render_ctx(["cat_distr", "calidad"]) is False
+
+
+# --------------------------------------------------------------------------- #
+# resolve_ctx_data_keys — poda: qué claves de DATOS conservar (db_path/table no).
+# --------------------------------------------------------------------------- #
+def test_resolve_ctx_data_keys_outliers_keeps_only_raw_numeric():
+    assert resolve_ctx_data_keys(["outliers"]) == {"raw_numeric"}
+
+
+def test_resolve_ctx_data_keys_geospatial_keeps_geo_and_numeric():
+    assert resolve_ctx_data_keys(["geospatial"]) == {"geo_points", "raw_numeric"}
+
+
+def test_resolve_ctx_data_keys_aggregation_keeps_nothing_prunable():
+    """agregacion usa db_path/table (siempre presentes), 0 claves podables."""
+    assert resolve_ctx_data_keys(["agregacion"]) == set()
+
+
+def test_resolve_ctx_data_keys_subset_of_data_keys():
+    keep = resolve_ctx_data_keys(["overview", "timeseries", "geospatial"])
+    assert keep <= set(DATA_CTX_KEYS)
+    assert {"head_rows", "timeseries_raw", "geo_points", "raw_numeric"} == keep
+
+
+# --------------------------------------------------------------------------- #
+# validate_chapter_ids — separa válidos de desconocidos preservando orden.
+# --------------------------------------------------------------------------- #
+def test_validate_separates_known_and_unknown():
+    out = validate_chapter_ids(["outliers", "nope", "timeseries", "ghost"],
+                               CHAPTER_ORDER)
+    assert out["valid"] == ["outliers", "timeseries"]
+    assert out["unknown"] == ["nope", "ghost"]
+
+
+def test_validate_all_known():
+    out = validate_chapter_ids(["portada", "glosario"], CHAPTER_ORDER)
+    assert out["unknown"] == []
+
+
+# --------------------------------------------------------------------------- #
+# Robustez: entradas raras nunca lanzan.
+# --------------------------------------------------------------------------- #
+def test_resolve_handles_none_and_empty():
+    assert resolve_requirements(None)["profile_flags"] == set()
+    assert resolve_requirements([])["profile_flags"] == set()
+    # ids desconocidos se ignoran silenciosamente en la resolución.
+    assert resolve_requirements(["no_existe"])["ctx_keys"] == set()
+
+
+def test_resolve_accepts_single_string():
+    assert resolve_requirements("outliers")["profile_flags"] == {"run_models"}
@@ -5,28 +5,32 @@ page (PDF) / slide (PPTX)**: every column is wrapped in a keep-together
 ``model.Group`` with ``page_break_before=True`` (except the first, which may share
 the intro's page), so its chart sits next to its tables and no column is split.

-A short intro names the clickable **[[term:entropia]]entropía[[/term]]** term —
-the full definition lives in the GLOSARIO chapter, so it is NOT repeated inline
-here (one click jumps to the glossary entry). The intro also carries the dataset
-row total used as a comparison baseline.
+Per column the Group is laid out ``side_by_side`` (PPTX: cardinality table LEFT,
+chart RIGHT; PDF: stacked) and contains, in order:

-Per column the Group contains, in order:
-
-1. A cardinality key/value table: distinct values, ``% distinct`` (distinct /
+1. The column name plus, when the LLM layer ran, its business **description** and
+   **unit** (read from ``profile['llm']['dictionary']``, matched by column name).
+2. A cardinality key/value table: distinct values, ``% distinct`` (distinct /
   total rows), total dataset rows, singleton values (frequency 1), entropy with
   its theoretical maximum and the normalized ratio, mode, imbalance and
   string-length stats.
-2. A short note flagging problematic cardinality (id-like ≈100% distinct, or a
+3. A short note flagging problematic cardinality (id-like ≈100% distinct, or a
   single dominating category).
-3. A ``top-k`` table (value / count / %).
-4. A **donut pie chart** of the most common categories (top-k + an "Otros"
+4. A ``top-k`` table (value / count / %).
+5. A **horizontal bar chart** of the most common categories (top-k + an "Otros"
   bucket), drawn lazily so the renderers scale it to fit entirely.

+A short intro names the clickable **[[term:entropia]]entropía[[/term]]** and
+**[[term:pagina_categorica]]page-layout[[/term]]** terms — their full
+definitions live in the GLOSARIO chapter, so they are NOT repeated inline here
+(one click jumps to the glossary entry). The intro also carries the dataset row
+total used as a comparison baseline.
+
 Data comes from the ``eda`` group: each ``columns[i]['categorical']`` is the
 output of ``summarize_categorical`` (``top[{value,count,pct}]``, ``mode``,
 ``n_distinct``, ``entropy``, ``imbalance``, ``len_min/mean/max``). The derived
-cardinality metrics and the pie figure are delegated to two registry functions
-(``categorical_cardinality_block`` and ``categorical_top_pie_figure``); both are
+cardinality metrics and the bar figure are delegated to two registry functions
+(``categorical_cardinality_block`` and ``categorical_top_bar_figure``); both are
 imported lazily and degrade to a minimal inline fallback so this chapter never
 raises even if they are unavailable.

@@ -39,10 +43,21 @@ import math

 from .. import model

-CHAPTER_VERSION = "1.2.0"
+CHAPTER_VERSION = "1.3.0"
 CHAPTER_ID = "cat_distr"
 CHAPTER_TITLE = "Distribuciones categóricas"

+# Key under which eda_llm_insights stores its interpretive block in the profile.
+LLM_KEY = "llm"
+
+# Second glossary term this chapter names: "how each categorical page is laid
+# out". The long paragraph that used to describe it inline in the intro now lives
+# in the GLOSARIO chapter (canonical definition in ``glosario._BASELINE_TERMS``);
+# the intro only names the clickable term, relocating the explanation, not losing
+# it. The chapter only needs to register key+label here.
+_TERM_PAGINA_KEY = "pagina_categorica"
+_TERM_PAGINA_LABEL = "Cómo se organiza cada página categórica"
+
 # Glossary term this chapter explains. Registered in the shared collector and
 # marked clickable on its first appearance (end-to-end glossary example —
 # mejora 6). Other chapters hook their own terms the same way (see the contract).
@@ -59,14 +74,14 @@ _TERM_ENTROPIA_DEF = (
 # Cap the number of categorical columns rendered to keep the document bounded;
 # the rest are summarized in a closing note (no silent truncation).
 MAX_COLS = 40
-# Rows shown in each top-k table and explicit slices in the pie. Kept moderate so
-# the whole column — cardinality table + top-k table + donut — fits on ONE
+# Rows shown in each top-k table and explicit bars in the chart. Kept moderate so
+# the whole column — cardinality table + top-k table + bar chart — fits on ONE
 # page/slide with the chart next to its tables; the table note still reports
 # "top N of M" so nothing is silently hidden. For id-like columns (≈100%
 # distinct) the top-k table is dropped entirely (it would be a list of unique
-# values — pure noise), which also frees the room the donut needs (see build).
+# values — pure noise), which also frees the room the chart needs (see build).
 TOP_TABLE_ROWS = 8
-PIE_TOP_K = 6
+CHART_TOP_K = 6
 # Truncate very long category labels in tables (the renderer also wraps). Kept
 # tight so a column with long id-like values (names, tickets) still fits its page.
 LABEL_MAX = 28
@@ -208,26 +223,74 @@ def _fallback_cardinality(cat: dict, n_rows) -> dict:
    }


-def _pie_make(top, n_distinct, title, n_rows):
-    """Return a zero-arg callable that builds the donut figure lazily."""
+def _llm_index(profile: dict, ctx: dict) -> dict:
+    """Map column name -> its LLM dictionary entry (description/unit/...).
+
+    Reads the ``llm.dictionary`` list that ``eda_llm_insights`` stored in the
+    profile (``profile['llm']``; falls back to ``ctx['llm']``). Returns an empty
+    dict when ``run_llm`` did not run, so the caller degrades cleanly. Fully
+    defensive: never raises on malformed input.
+    """
+    llm = profile.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        llm = ctx.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        return {}
+    entries = llm.get("dictionary")
+    if not isinstance(entries, (list, tuple)):
+        return {}
+    index: dict = {}
+    for e in entries:
+        if not isinstance(e, dict):
+            continue
+        col = e.get("column")
+        if col is None:
+            continue
+        index[model._safe_str(col)] = e
+    return index
+
+
+def _llm_desc_unit_block(name: str, llm_index: dict):
+    """Markdown block with the LLM business description + unit of a column, or
+    None when no LLM entry matches the column (clean fallback without LLM)."""
+    entry = llm_index.get(model._safe_str(name))
+    if not isinstance(entry, dict):
+        return None
+    raw_desc = entry.get("description") or entry.get("business_meaning")
+    desc = " ".join(model._safe_str(raw_desc).split()) if raw_desc else ""
+    raw_unit = entry.get("unit")
+    unit = " ".join(model._safe_str(raw_unit).split()) if raw_unit else ""
+    parts = []
+    if desc:
+        parts.append(f"**Descripción:** {desc}")
+    if unit:
+        parts.append(f"**Unidad:** {unit}")
+    if not parts:
+        return None
+    return model.Markdown(text=" · ".join(parts))
+
+
+def _bar_make(top, n_distinct, title, n_rows):
+    """Return a zero-arg callable that builds the bar figure lazily."""

    def make():
        try:
-            from datascience.categorical_top_pie_figure import (
-                categorical_top_pie_figure,
+            from datascience.categorical_top_bar_figure import (
+                categorical_top_bar_figure,
            )

-            return categorical_top_pie_figure(
+            return categorical_top_bar_figure(
                top=top, n_distinct=n_distinct or 0, title=title,
-                top_k=PIE_TOP_K, n_rows=n_rows)
+                top_k=CHART_TOP_K, n_rows=n_rows)
        except Exception:  # noqa: BLE001 — minimal local fallback figure.
-            return _fallback_pie(top, title)
+            return _fallback_bar(top, title)

    return make


-def _fallback_pie(top, title):
-    """Minimal donut figure used only if the registry function is unavailable."""
+def _fallback_bar(top, title):
+    """Minimal horizontal-bar figure used only if the registry function is
+    unavailable. Largest category on top, the rest folded into "Otros"."""
    import matplotlib

    matplotlib.use("Agg")
@@ -238,8 +301,8 @@ def _fallback_pie(top, title):
    items = [t for t in (top or [])
             if isinstance(t, dict) and isinstance(t.get("count"), (int, float))]
    items = sorted(items, key=lambda t: t.get("count") or 0, reverse=True)
-    head = items[:PIE_TOP_K]
-    rest = items[PIE_TOP_K:]
+    head = items[:CHART_TOP_K]
+    rest = items[CHART_TOP_K:]
    labels = [_truncate(t.get("value"), 20) for t in head]
    sizes = [float(t.get("count") or 0) for t in head]
    if rest:
@@ -249,10 +312,13 @@ def _fallback_pie(top, title):
        ax.text(0.5, 0.5, "sin datos categóricos", ha="center", va="center")
        ax.axis("off")
        return fig
-    ax.pie(sizes, labels=None, wedgeprops={"width": 0.42},
-           autopct=lambda p: f"{p:.0f}%" if p >= 4 else "")
-    ax.legend(labels, loc="center left", bbox_to_anchor=(1.0, 0.5),
-              fontsize=7, frameon=False)
+    # barh draws bottom-up, so reverse to put the largest category on top.
+    y_pos = range(len(labels))
+    ax.barh(list(y_pos), list(reversed(sizes)), color="#4C72B0",
+            edgecolor="white")
+    ax.set_yticks(list(y_pos))
+    ax.set_yticklabels(list(reversed(labels)), fontsize=7)
+    ax.set_xlabel("conteo", fontsize=8)
    ax.set_title(_truncate(title, 40))
    fig.tight_layout()
    return fig
@@ -373,22 +439,17 @@ def _topk_table(cat: dict):
                           note=note)


-def _intro_blocks(n_rows, mark_term: bool = False):
-    total = _fmt_int(n_rows)
-    # Mark the first appearance of the term as a clickable glossary jump when the
-    # term was registered (mark_term). The full definition of entropy lives in the
-    # GLOSARIO chapter, so the intro only names the clickable term here instead of
-    # repeating the long explanation (avoids the redundancy with the glossary).
+def _intro_blocks(mark_term: bool = False):
+    # The full explanation of entropy AND of how each categorical page is laid out
+    # lives in the GLOSARIO chapter; the chapter body keeps only the minimal
+    # clickable terms — no descriptive prose — to avoid duplicating the glossary.
+    # The dataset row total is not repeated here: each column's cardinality table
+    # already carries "Total filas (dataset)".
    entropia = ("[[term:entropia]]entropía[[/term]]" if mark_term
                else "entropía")
-    text = (
-        f"Cada columna categórica ocupa su propia página: sus métricas de "
-        f"cardinalidad —incluida la {entropia}—, una nota que señala cardinalidad "
-        "problemática, la tabla de las categorías más frecuentes y un gráfico de "
-        "tarta (donut) de las más comunes, todo junto."
-    )
-    if n_rows is not None:
-        text += f" El dataset tiene {total} filas en total como referencia."
+    pagina = ("[[term:pagina_categorica]]cómo se organiza cada página[[/term]]"
+              if mark_term else "cómo se organiza cada página")
+    text = f"Términos: {entropia} · {pagina}."
    return [
        model.Heading(text="Entropía y cardinalidad", level=2),
        model.Markdown(text=text),
@@ -406,15 +467,22 @@ def build_cat_distr(profile: dict, ctx: dict):
        return None

    n_rows = profile.get("n_rows")
-    # Register "entropía" in the shared glossary collector (if present) and mark
-    # its first appearance clickable. End-to-end glossary example (mejora 6).
+    # Register "entropía" and the "how each categorical page is laid out" term in
+    # the shared glossary collector (if present) and mark their first appearance
+    # clickable. End-to-end glossary example (mejora 6).
    glossary = ctx.get("glossary")
    mark_term = False
    if isinstance(glossary, model.GlossaryCollector):
        glossary.add(_TERM_ENTROPIA_KEY, _TERM_ENTROPIA_LABEL,
                     _TERM_ENTROPIA_DEF)
+        glossary.add(_TERM_PAGINA_KEY, _TERM_PAGINA_LABEL)
        mark_term = True
-    blocks = list(_intro_blocks(n_rows, mark_term=mark_term))
+    blocks = list(_intro_blocks(mark_term=mark_term))
+
+    # Business description + unit per column come from the LLM dictionary
+    # (profile['llm']['dictionary'], matched by column name); absent without
+    # run_llm, in which case the per-column description block is simply omitted.
+    llm_index = _llm_index(profile, ctx)

    rendered = cat_cols[:MAX_COLS]
    for idx, col in enumerate(rendered):
@@ -422,31 +490,36 @@ def build_cat_distr(profile: dict, ctx: dict):
        cat = col.get("categorical") or {}
        card = _normalize_card(_cardinality(cat, n_rows))

-        # One Group per categorical column: heading + cardinality table + flag
-        # note + top-k table + donut figure are kept together and the renderer
-        # starts each on a fresh page/slide (page_break_before) so every column
-        # gets its own page with its chart next to its tables. The first column
-        # may share the intro's page (no forced break) to avoid a near-empty page.
-        col_blocks = [
-            model.Heading(text=str(name), level=2),
-            _cardinality_block(card),
-        ]
+        # One Group per categorical column: heading + (optional) LLM description +
+        # cardinality table + flag note + top-k table + bar figure are kept
+        # together and the renderer starts each on a fresh page/slide
+        # (page_break_before) so every column gets its own page with its chart next
+        # to its tables. The first column may share the intro's page (no forced
+        # break) to avoid a near-empty page.
+        col_blocks = [model.Heading(text=str(name), level=2)]
+        desc_block = _llm_desc_unit_block(name, llm_index)
+        if desc_block is not None:
+            col_blocks.append(desc_block)
+        col_blocks.append(_cardinality_block(card))
        note = _flag_note(card)
        if note is not None:
            col_blocks.append(note)
        # For id-like columns (≈100% distinct) the top-k is a list of unique
        # values — pure noise; skip it (the flag note already explains why) and
-        # let the donut take that room so the whole column fits one page/slide.
+        # let the bar chart take that room so the whole column fits one page/slide.
        if not card.get("id_like"):
            topk = _topk_table(cat)
            if topk is not None:
                col_blocks.append(topk)
        col_blocks.append(model.Figure(
-            make=_pie_make(cat.get("top") or [], card.get("n_distinct"),
+            make=_bar_make(cat.get("top") or [], card.get("n_distinct"),
                           str(name), n_rows),
            caption=(f"Categorías más comunes de «{_truncate(name, 32)}» "
-                     "(donut: top-k + «Otros»)")))
-        blocks.append(model.Group(blocks=col_blocks,
+                     "(barras: top-k + «Otros»)")))
+        # layout="side_by_side": in PPTX the cardinality table goes to the LEFT and
+        # the bar chart to the RIGHT of the same slide; the PDF renderer stacks it
+        # (the A5 mobile page is too narrow for two readable columns).
+        blocks.append(model.Group(blocks=col_blocks, layout="side_by_side",
                                  page_break_before=(idx > 0)))

    if len(cat_cols) > len(rendered):
@@ -2,12 +2,14 @@

 Self-contained: builds synthetic TableProfiles (no DuckDB) so the suite is fast
 and deterministic. Verifies that ``build_cat_distr`` emits the blocks the user
-asked for (distinct/total/%-distinct/unique metrics, top-k table and a donut
+asked for (distinct/total/%-distinct/unique metrics, top-k table and a bar
 figure), that EACH categorical column is wrapped in its own keep-together
-``Group`` that starts on a fresh page/slide (one column per page, chart next to
-its tables), that the long entropy explanation is NOT repeated inline (it lives
-in the glossary — only the clickable term is kept), that the chapter renders
-inside the full document to both PDF and PPTX showing that content, that a
+``Group`` laid out ``side_by_side`` (PPTX: table left / bars right) that starts on
+a fresh page/slide (one column per page, chart next to its tables), that the LLM
+business description + unit are shown per column when the profile carries an LLM
+block, that the long entropy / page-layout explanations are NOT repeated inline
+(they live in the glossary — only the clickable terms are kept), that the chapter
+renders inside the full document to both PDF and PPTX showing that content, that a
 profile with no categorical columns yields ``None`` without raising, and that
 long labels / many columns are never cut in either output.
 """
@@ -116,6 +118,10 @@ def test_golden_build_cat_distr_emite_bloques_pedidos():
    assert "log2" not in md.text          # redundant explanation removed.
    assert "máxima diversidad" not in md.text

+    # The donut/pie is gone: the intro no longer mentions tarta/donut (the chart
+    # is now a bar chart; the long page-layout explanation moved to the glossary).
+    assert "donut" not in md.text and "tarta" not in md.text
+
    # Per-column blocks are wrapped in keep-together Groups: flatten to inspect.
    flat = _flatten(ch.blocks)
    kv = next(b for b in flat if isinstance(b, KVTable))
@@ -128,11 +134,13 @@ def test_golden_build_cat_distr_emite_bloques_pedidos():
    assert any("Entropía" in lbl for lbl in labels)
    assert "únicos" in values and "%" in values
    assert "bits" in values and "norm" in values   # entropy + max + normalized.
-    # Top-k table + pie figure.
+    # Top-k table + bar figure.
    dt = next(b for b in flat if isinstance(b, DataTable))
    assert dt.header == ["Valor", "Conteo", "%"]
    assert any("neumaticos" in str(cell) for row in dt.rows for cell in row)
    assert any(isinstance(b, Figure) for b in flat)
+    # Each per-column Group is laid out side_by_side (table left / bars right).
+    assert all(g.layout == "side_by_side" for g in _column_groups(ch))
    # id-like column flagged with a Note that also explains the top-k is dropped.
    idnote = next((b for b in flat
                   if isinstance(b, Note) and "identificador" in b.text), None)
@@ -140,9 +148,9 @@ def test_golden_build_cat_distr_emite_bloques_pedidos():
    assert "No se lista el top" in idnote.text


-def test_golden_idlike_omite_topk_y_conserva_donut():
+def test_golden_idlike_omite_topk_y_conserva_grafico():
    # The id-like column (uuid, 100% distinct) must NOT carry a top-k DataTable
-    # (it would be a list of unique values), but must still keep its donut Figure
+    # (it would be a list of unique values), but must still keep its bar Figure
    # and its cardinality table so it stays a full per-column page.
    ch = build_cat_distr(_profile(), {})
    groups = _column_groups(ch)
@@ -151,7 +159,7 @@ def test_golden_idlike_omite_topk_y_conserva_donut():
    kinds = [b.kind for b in uuid_group.blocks]
    assert "data_table" not in kinds      # top-k of unique values dropped.
    assert "kv_table" in kinds            # cardinality kept.
-    assert "figure" in kinds              # donut kept (chart per column).
+    assert "figure" in kinds              # bar chart kept (chart per column).
    # A non-id-like column keeps its top-k table.
    cat_group = next(g for g in groups
                     if any(getattr(b, "text", "") == "categoria"
@@ -205,7 +213,7 @@ def test_golden_render_pdf_una_pagina_por_columna():
        assert "Entrop" in txt
        assert "distintos" in txt
        assert "categoria" in txt and "neumaticos" in txt
-        assert "donut" in txt           # figure caption rendered as text.
+        assert "barras" in txt          # bar-chart caption rendered as text (PDF).
        assert "identificador" in txt   # id-like note rendered.


@@ -258,9 +266,11 @@ def _profile_high_card() -> dict:


 def test_golden_pptx_una_slide_por_columna_con_su_grafico():
-    """Each categorical column occupies EXACTLY ONE cat_distr slide that carries
-    BOTH its cardinality table and its donut figure (picture) — i.e. the chart is
-    never separated from its table, even for a high-cardinality column."""
+    """Cada columna categórica ocupa EXACTAMENTE UN slide cat_distr que lleva su
+    gráfico (picture) en la misma slide — el chart nunca se separa de su columna,
+    ni siquiera para una columna de alta cardinalidad. Con layout side_by_side la
+    tabla se rasteriza a imagen, así que la comprobación se hace por presencia de
+    picture (no por el texto de la tabla)."""
    from pptx.enum.shapes import MSO_SHAPE_TYPE

    prof = _profile_high_card()
@@ -272,7 +282,7 @@ def test_golden_pptx_una_slide_por_columna_con_su_grafico():
        prs = Presentation(out)

        # Per column: the cat_distr slides whose text mentions it, and whether the
-        # owning slide also has the donut caption + an actual picture shape.
+        # owning slide also carries an actual picture shape (its chart).
        slides_with_col = {n: [] for n in cat_names}
        owner_has_chart = {n: False for n in cat_names}
        for i, sl in enumerate(prs.slides):
@@ -288,15 +298,106 @@ def test_golden_pptx_una_slide_por_columna_con_su_grafico():
            for n in cat_names:
                if n in txt:
                    slides_with_col[n].append(i)
-                    has_table = "Cardinalidad" in txt or "distintos" in txt
-                    if has_pic and "donut" in txt and has_table:
+                    if has_pic:
                        owner_has_chart[n] = True

        for n in cat_names:
            # Exactly one slide carries the column (not split across slides).
            assert len(slides_with_col[n]) == 1, (n, slides_with_col[n])
-            # That single slide also holds its table AND its donut picture.
-            assert owner_has_chart[n], (n, "tabla y donut no están en el mismo slide")
+            # That single slide also holds its chart picture.
+            assert owner_has_chart[n], (n, "el gráfico no está en el slide de la columna")
+
+
+def test_golden_pptx_columna_side_by_side_tabla_izq_barra_der():
+    """Con layout side_by_side, una columna categórica coloca su tabla de
+    cardinalidad (imagen) en la mitad izquierda y su gráfico de barras (imagen) en
+    la mitad derecha de la MISMA slide. Verifica que al menos una columna queda en
+    dos columnas (tabla-izq / barras-der), evidencia del side_by_side en PPTX."""
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    from pptx.util import Inches
+
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "eda.pptx")
+        render_automatic_eda_pptx(_profile(), out, {"title": "EDA"})
+        prs = Presentation(out)
+        centre = int(Inches(13.333 / 2.0))   # half of the 16:9 slide width.
+        two_col_slides = 0
+        for sl in prs.slides:
+            texts, lefts = [], []
+            for sh in sl.shapes:
+                if sh.has_text_frame:
+                    texts.append(sh.text_frame.text)
+                if (sh.shape_type == MSO_SHAPE_TYPE.PICTURE
+                        and sh.left is not None):
+                    lefts.append(sh.left)
+            txt = re.sub(r"\s+", " ", " ".join(texts))
+            if "Distribuciones categ" not in txt:
+                continue
+            # One picture starts in the left half, another in the right half.
+            if len(lefts) >= 2 and min(lefts) < centre and max(lefts) > centre:
+                two_col_slides += 1
+        assert two_col_slides >= 1, (
+            "ninguna columna quedó con tabla-izq / barras-der (side_by_side)")
+
+
+def _profile_with_llm() -> dict:
+    """The base profile plus an ``llm`` block (as eda_llm_insights would store it
+    with run_llm=True): a data dictionary with description/unit per column."""
+    prof = _profile()
+    prof["llm"] = {
+        "dictionary": [
+            {"column": "categoria",
+             "description": "Familia de producto del recambio",
+             "business_meaning": "Agrupa el catálogo por tipo de pieza",
+             "unit": "categoría"},
+            {"column": "uuid",
+             "description": "Identificador único de registro",
+             "unit": ""},
+        ],
+    }
+    return prof
+
+
+def test_llm_descripcion_y_unidad_por_columna():
+    # With an LLM dictionary, each categorical column whose name matches shows its
+    # business description and unit in a per-column markdown block.
+    ch = build_cat_distr(_profile_with_llm(), {})
+    groups = _column_groups(ch)
+    cat_group = next(g for g in groups
+                     if any(getattr(b, "text", "") == "categoria"
+                            for b in g.blocks))
+    md = " ".join(b.text for b in cat_group.blocks
+                  if getattr(b, "kind", "") == "markdown")
+    assert "Descripción" in md and "Familia de producto" in md
+    assert "Unidad" in md and "categoría" in md
+
+
+def test_edge_sin_llm_no_anade_descripcion():
+    # Without an LLM block the per-column description markdown is simply omitted;
+    # the column still renders its cardinality table and bar figure.
+    ch = build_cat_distr(_profile(), {})
+    for g in _column_groups(ch):
+        mds = [b.text for b in g.blocks if getattr(b, "kind", "") == "markdown"]
+        assert not any("Descripción" in t for t in mds)
+
+
+def test_pagina_categorica_clicable_y_definicion_en_glosario():
+    # The "how each categorical page is laid out" term is registered + marked
+    # clickable in the intro, and its full definition lands in the glossary
+    # chapter (canonical baseline catalog), not inline.
+    from datascience.automatic_eda.chapters.glosario import build_glosario
+
+    gc = GlossaryCollector()
+    ch = build_cat_distr(_profile(), {"glossary": gc})
+    md = next(b for b in ch.blocks if isinstance(b, Markdown))
+    assert "[[term:pagina_categorica]]" in md.text
+    assert gc.has("pagina_categorica")
+    glos = build_glosario(_profile(), {"glossary": gc})
+    entry = next(b for b in glos.blocks
+                 if getattr(b, "kind", "") == "glossary_entry"
+                 and b.key == "pagina_categorica")
+    assert "barras" in entry.definition
+    assert "identificador" in entry.definition


 def test_edge_sin_categoricas_devuelve_none():
@@ -17,10 +17,63 @@ from __future__ import annotations

 from .. import model

-CHAPTER_VERSION = "1.0.0"
+CHAPTER_VERSION = "1.1.0"
 CHAPTER_ID = "glosario"
 CHAPTER_TITLE = "Glosario"

+# Canonical definitions for cross-cutting terms — the "how to read it" entries
+# that do not belong to a single chapter. A chapter only needs to *register* the
+# term (``ctx['glossary'].add(key, label)``) and mark its in-text appearance with
+# ``[[term:key]]…[[/term]]``; this chapter supplies the full definition here when
+# the collector carries the term without one. Keeping the prose in a single place
+# avoids repeating a long paragraph inline in every chapter that names the term
+# (the explanation moved out of the NUM DISTR and CAT DISTR intros lives here).
+_BASELINE_TERMS = {
+    "histograma_boxplot": {
+        "label": "Cómo leer el histograma y el boxplot",
+        "definition": (
+            "Para cada columna numérica se muestra su histograma con tres líneas "
+            "de referencia: la media (línea roja discontinua), la mediana (línea "
+            "verde continua) y la banda ±1σ (zona sombreada que cubre una "
+            "desviación estándar a cada lado de la media). Debajo, alineado al "
+            "mismo eje horizontal, un boxplot de Tukey: la caja abarca del primer "
+            "al tercer cuartil (P25–P75), la línea interior es la mediana y los "
+            "bigotes llegan hasta 1,5·IQR; los puntos rojos señalan que hay "
+            "valores más allá de las vallas (posibles atípicos). Comparar la media "
+            "con la mediana revela la asimetría: si la media supera a la mediana la "
+            "cola larga cae hacia los valores altos (asimetría a la derecha), y al "
+            "revés hacia los bajos."),
+    },
+    "pagina_categorica": {
+        "label": "Cómo se organiza cada página categórica",
+        "definition": (
+            "Cada columna categórica ocupa su propia página: muestra sus métricas "
+            "de cardinalidad —incluida la entropía—, una nota que señala "
+            "cardinalidad problemática (columnas que se comportan como "
+            "identificador, con casi todos los valores distintos, o dominadas por "
+            "una sola categoría), la tabla de las categorías más frecuentes (top-k, "
+            "con su conteo y porcentaje) y un gráfico de barras de las categorías "
+            "más comunes (top-k más una barra «Otros» que agrupa la cola). El total "
+            "de filas del dataset se usa como referencia para interpretar los "
+            "conteos."),
+    },
+}
+
+
+def _resolve_term(term: dict) -> tuple:
+    """Return (label, definition) for a collected term, completing a missing
+    definition (and, if absent, the label) from the canonical baseline catalog."""
+    key = model._safe_str(term.get("key"))
+    label = model._safe_str(term.get("label"))
+    definition = model._safe_str(term.get("definition"))
+    base = _BASELINE_TERMS.get(key)
+    if base:
+        if not definition.strip():
+            definition = model._safe_str(base.get("definition"))
+        if not label.strip() or label == key:
+            label = model._safe_str(base.get("label")) or label
+    return label, definition
+

 def build_glosario(profile: dict, ctx: dict):
    """Build the glossary Chapter from the shared collector, or None if empty."""
@@ -36,12 +89,14 @@ def build_glosario(profile: dict, ctx: dict):
            "Cada término va resaltado en el texto y, al pulsarlo, salta a su "
            "definición en esta sección.")),
    ]
-    # One clickable destination per term, alphabetically by visible label.
+    # One clickable destination per term, alphabetically by visible label. A term
+    # registered without a definition is completed from the canonical baseline.
    for term in glossary.terms(by="label"):
+        label, definition = _resolve_term(term)
        blocks.append(model.GlossaryEntry(
            key=model._safe_str(term.get("key")),
-            label=model._safe_str(term.get("label")),
-            definition=model._safe_str(term.get("definition"))))
+            label=label,
+            definition=definition))

    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
                         version=CHAPTER_VERSION, blocks=blocks)
@@ -35,10 +35,21 @@ try:
 except Exception:  # noqa: BLE001 — keep the chapter importable no matter what.
    build_boxplot_stats = None  # type: ignore[assignment]

-CHAPTER_VERSION = "1.2.0"
+CHAPTER_VERSION = "1.3.0"
 CHAPTER_ID = "num_distr"
 CHAPTER_TITLE = "Distribuciones numéricas"

+# Glossary term this chapter explains. The long "how to read the histogram and
+# the boxplot" paragraph used to live inline in the intro; it now lives in the
+# GLOSARIO chapter (canonical definition in ``glosario._BASELINE_TERMS``) and the
+# intro only names the clickable term — one click jumps to the full explanation,
+# so the information is relocated, not lost (mejora glosario).
+_TERM_HISTOBOX_KEY = "histograma_boxplot"
+_TERM_HISTOBOX_LABEL = "Cómo leer el histograma y el boxplot"
+
+# Key under which eda_llm_insights stores its interpretive block in the profile.
+LLM_KEY = "llm"
+
 # Plain-Spanish gloss for every label ``detect_distribution_type`` can emit, so a
 # non-expert reader understands the shape and the suggested next step (MUST-4.3).
 _DIST_GLOSS = {
@@ -99,6 +110,53 @@ def _numeric_columns(profile: dict) -> list:
    return out


+def _llm_index(profile: dict, ctx: dict) -> dict:
+    """Map column name -> its LLM dictionary entry (description/unit/...).
+
+    Reads the ``llm.dictionary`` list that ``eda_llm_insights`` stored in the
+    profile (``profile['llm']``; falls back to ``ctx['llm']``). Returns an empty
+    dict when ``run_llm`` did not run, so the caller degrades cleanly. Fully
+    defensive: never raises on malformed input.
+    """
+    llm = profile.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        llm = ctx.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        return {}
+    entries = llm.get("dictionary")
+    if not isinstance(entries, (list, tuple)):
+        return {}
+    index: dict = {}
+    for e in entries:
+        if not isinstance(e, dict):
+            continue
+        col = e.get("column")
+        if col is None:
+            continue
+        index[model._safe_str(col)] = e
+    return index
+
+
+def _llm_desc_unit_block(name: str, llm_index: dict):
+    """Markdown block with the LLM business description + unit of a column, or
+    None when no LLM entry matches the column (clean fallback without LLM)."""
+    entry = llm_index.get(model._safe_str(name))
+    if not isinstance(entry, dict):
+        return None
+    raw_desc = entry.get("description") or entry.get("business_meaning")
+    desc = " ".join(model._safe_str(raw_desc).split()) if raw_desc else ""
+    raw_unit = entry.get("unit")
+    unit = " ".join(model._safe_str(raw_unit).split()) if raw_unit else ""
+    parts = []
+    if desc:
+        parts.append(f"**Descripción:** {desc}")
+    if unit:
+        parts.append(f"**Unidad:** {unit}")
+    if not parts:
+        return None
+    return model.Markdown(text=" · ".join(parts))
+
+
 def _make_hist_box(name: str, numeric: dict, box: dict):
    """Build the histogram (with mean/median/±σ lines) + boxplot figure.

@@ -271,15 +329,26 @@ def build_num_distr(profile: dict, ctx: dict):
    if not numerics:
        return None  # chapter does not apply to a dataset with no numerics.

+    # Register the "how to read the histogram and boxplot" term in the shared
+    # glossary collector (if present) and mark its first appearance clickable. The
+    # full explanation (colour code, 1,5·IQR rule, asymmetry reading) lives in the
+    # GLOSARIO chapter instead of inline here: the intro only names the term.
+    glossary = ctx.get("glossary")
+    mark_term = False
+    if isinstance(glossary, model.GlossaryCollector):
+        glossary.add(_TERM_HISTOBOX_KEY, _TERM_HISTOBOX_LABEL)
+        mark_term = True
+    como_leer = ("[[term:histograma_boxplot]]cómo leer estos gráficos[[/term]]"
+                 if mark_term else "cómo leer estos gráficos")
    intro = (
-        "Para cada columna numérica se muestra su **histograma** con tres líneas "
-        "de referencia: la **media** (línea roja discontinua), la **mediana** "
-        "(línea verde continua) y la banda **±1σ** (zona sombreada). Debajo, "
-        "alineado al mismo eje, un **boxplot de Tukey**: la caja abarca del "
-        "primer al tercer cuartil (P25–P75), la línea interior es la mediana y "
-        "los bigotes llegan hasta 1,5·IQR; los puntos rojos señalan que hay "
-        "valores más allá de las vallas. Comparar media y mediana revela la "
-        "asimetría de la distribución.")
+        "Cada columna numérica muestra su **histograma** (con la **media**, la "
+        "**mediana** y la banda **±1σ**) y, debajo y al mismo eje, su **boxplot "
+        f"de Tukey** — {como_leer}.")
+
+    # Business description + unit per column come from the LLM dictionary
+    # (profile['llm']['dictionary'], matched by column name); absent without
+    # run_llm, in which case the per-column description block is simply omitted.
+    llm_index = _llm_index(profile, ctx)

    blocks = [
        model.Heading(text=CHAPTER_TITLE, level=1),
@@ -293,17 +362,20 @@ def build_num_distr(profile: dict, ctx: dict):
                box = build_boxplot_stats(numeric) or {}
            except Exception:  # noqa: BLE001 — degrade, never raise.
                box = {}
-        # Keep the column heading, its figure and its stats note together on the
-        # same page/slide (mejora 3 — keep-together): the renderers measure the
-        # whole Group and move it whole when it would not fit.
-        blocks.append(model.Group(blocks=[
-            model.Heading(text=str(name), level=2),
-            model.Figure(
+        # Keep the column heading, its (optional) LLM description, its figure and
+        # its stats note together on the same page/slide (mejora 3 —
+        # keep-together): the renderers measure the whole Group and move it whole
+        # when it would not fit.
+        col_blocks = [model.Heading(text=str(name), level=2)]
+        desc_block = _llm_desc_unit_block(name, llm_index)
+        if desc_block is not None:
+            col_blocks.append(desc_block)
+        col_blocks.append(model.Figure(
            make=_figure_maker(name, numeric, box),
            caption=f"Distribución de «{name}» — histograma "
-                        f"(media/mediana/±σ) y boxplot."),
-            model.Markdown(text=_stats_note(name, numeric, box)),
-        ]))
+                    f"(media/mediana/±σ) y boxplot."))
+        col_blocks.append(model.Markdown(text=_stats_note(name, numeric, box)))
+        blocks.append(model.Group(blocks=col_blocks))

    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
                         version=CHAPTER_VERSION, blocks=blocks)
@@ -101,7 +101,7 @@ def test_golden_chapter_estructura_y_bloques():


 def test_golden_media_mediana_sigma_y_boxplot_presentes():
-    # The intro documents the three reference lines and the Tukey boxplot; the
+    # The short intro names the three reference lines and the Tukey boxplot; the
    # per-column note carries the actual mean/median/σ numbers and the shape.
    ch = build_num_distr(_profile(n_numeric=1, extra_categorical=False), {})
    md_texts = " ".join(b.text for b in _flatten(ch.blocks)
@@ -110,10 +110,58 @@ def test_golden_media_mediana_sigma_y_boxplot_presentes():
    assert "±1σ" in md_texts or "σ" in md_texts
    assert "boxplot" in md_texts.lower()
    assert "Tukey" in md_texts
+    # The long "how to read it" explanation moved to the glossary: the colour-code
+    # / 1,5·IQR walkthrough is no longer inline in the chapter body.
+    assert "1,5·IQR" not in md_texts
+    assert "línea roja" not in md_texts
    # distribution_type gloss surfaced for the column (right-skewed preset).
    assert _DIST_GLOSS["right-skewed"].split(";")[0][:20] in md_texts


+def test_glosario_histograma_boxplot_clicable_y_definicion():
+    # With a glossary collector the intro marks the clickable term and the FULL
+    # explanation (the long paragraph removed from the body) lands in the glossary.
+    from datascience.automatic_eda.chapters.glosario import build_glosario
+
+    gc = model.GlossaryCollector()
+    prof = _profile(n_numeric=1, extra_categorical=False)
+    ch = build_num_distr(prof, {"glossary": gc})
+    intro = next(b for b in ch.blocks if b.kind == "markdown")
+    assert "[[term:histograma_boxplot]]" in intro.text
+    assert gc.has("histograma_boxplot")
+    glos = build_glosario(prof, {"glossary": gc})
+    entry = next(b for b in glos.blocks
+                 if getattr(b, "kind", "") == "glossary_entry"
+                 and b.key == "histograma_boxplot")
+    assert "boxplot" in entry.definition.lower()
+    assert "1,5·IQR" in entry.definition
+
+
+def test_llm_descripcion_y_unidad_por_columna():
+    # With an LLM dictionary, each numeric column whose name matches shows its
+    # business description and unit in a per-column markdown block.
+    prof = _profile(n_numeric=2)
+    prof["llm"] = {"dictionary": [
+        {"column": "precio", "description": "Precio de venta del producto",
+         "unit": "EUR"},
+        {"column": "alcohol", "business_meaning": "Grado alcohólico",
+         "unit": "% vol"},
+    ]}
+    ch = build_num_distr(prof, {})
+    md_all = " ".join(b.text for b in _flatten(ch.blocks)
+                      if b.kind == "markdown")
+    assert "Precio de venta" in md_all and "EUR" in md_all
+    assert "Grado alcohólico" in md_all and "% vol" in md_all
+
+
+def test_edge_sin_llm_no_anade_descripcion():
+    # Without an LLM block the per-column description markdown is simply omitted.
+    ch = build_num_distr(_profile(n_numeric=2), {})
+    md_all = " ".join(b.text for b in _flatten(ch.blocks)
+                      if b.kind == "markdown")
+    assert "Descripción" not in md_all
+
+
 def test_boxplot_stats_se_consumen_del_registry():
    # The chapter must feed build_boxplot_stats (group eda) and the resulting
    # box must carry the Tukey fences for the figure.
@@ -0,0 +1,593 @@
+"""Outliers chapter (OUTLIERS) — univariate + multivariate atypical values.
+
+Today the analysis of atypical values is scattered across the document: the
+NUM DISTR chapter mentions the per-column outlier count inside each distribution
+figure, and the MODELOS chapter runs Isolation Forest as one of several cheap
+models. This chapter gathers and deepens the whole outlier story in a single
+place, with its interpretation: an [[term:outlier]]outlier[[/term]] is **not
+necessarily an error** — it can be a legitimate, extreme but real observation —
+so the reading is exploratory (what to look at), never confirmatory (what to
+delete).
+
+Sections, in order:
+
+1. **Resumen univariante por columna** — for every numeric column, the number
+   and percentage of atypical values by two complementary criteria: Tukey's
+   1.5·IQR rule ([[term:tukey_fence]]vallas de Tukey[[/term]]) and the
+   [[term:zscore]]z-score[[/term]] rule (|z| > 3). The most contaminated columns
+   are flagged. The fences come from the pure registry function
+   ``build_boxplot_stats`` (derived from the profile percentiles); the per-column
+   counts use the raw sample in ``ctx['raw_numeric']`` when available (the exact
+   count), degrading to the profile's own z-score counts otherwise.
+2. **Boxplots** — a single figure with the Tukey boxplots of the most
+   contaminated columns (box, whiskers and atypical points), delegated to the
+   reusable registry helper ``build_boxplots_figure``.
+3. **Multivariante (filas anómalas)** — rows that are atypical considering ALL
+   columns at once, via the registry function ``isolation_forest_outliers``: the
+   count and percentage of anomalous rows, the most anomalous rows with their
+   score, and the dimensions that make each one rare (top columns by |z|, via
+   ``summarize_outlier_dims``). Run live on ``ctx['raw_numeric']`` (the same
+   numeric columns ``summarize_outlier_dims`` uses, so the row indexing stays
+   coherent and the dimension breakdown is correct); falls back to the
+   precomputed ``profile['models']['outliers']`` only when no raw sample is
+   available (e.g. the lite preset), where no per-row breakdown is shown.
+4. **Interpretación** — outlier ≠ error: how to tell a data-entry error from a
+   genuine extreme value, and what to do (inspect, winsorize, or re-express —
+   linking to the Tukey re-expression the profile already computes).
+
+The chapter activates whenever the table has at least one numeric column; with
+no numeric column it returns ``None`` and disappears from the document.
+
+Reads everything defensively (``.get``) and never raises: every registry
+delegation is imported lazily and degraded to an honest note on any failure.
+
+Contract: build_<id>(profile, ctx) -> Chapter | None ; CHAPTER_VERSION = "x.y.z".
+"""
+
+from __future__ import annotations
+
+from .. import model
+
+CHAPTER_VERSION = "1.0.0"
+CHAPTER_ID = "outliers"
+CHAPTER_TITLE = "Valores atípicos"
+
+# z-score threshold for the univariate z rule: |z| > 3 flags a value ~3 standard
+# deviations from the mean (≈99.7% of a normal distribution lies within ±3σ).
+_Z_THRESH = 3.0
+# How many columns to draw in the boxplots figure (most contaminated first) and
+# how many anomalous rows to list in the multivariate table.
+_TOP_BOX = 12
+_TOP_ROWS = 12
+# Cap on the raw atypical values passed as boxplot fliers, so a heavy-tailed
+# column does not flood the figure with thousands of points.
+_MAX_FLIERS = 200
+# How many columns flagged as "most contaminated" in the summary note.
+_TOP_FLAGGED = 3
+
+# Glossary terms this chapter explains (contract §11.1). Registered in the shared
+# collector and marked clickable on first appearance. ``isolation_forest`` and
+# ``zscore`` may also be registered by the MODELOS chapter — ``add`` is
+# idempotent (first definition wins), so registering them here is harmless and
+# keeps this chapter self-contained when MODELOS does not render.
+_TERM_DEFS = {
+    "outlier": (
+        "Valor atípico (outlier)",
+        "Una observación que se aparta mucho del grueso de los datos. Un atípico "
+        "NO es necesariamente un error: puede ser un fallo de medida o de "
+        "registro, pero también un dato real extremo (un cliente que gasta diez "
+        "veces la media, un día de ventas excepcional). Por eso se señalan para "
+        "revisarlos, no para borrarlos automáticamente.",
+    ),
+    "tukey_fence": (
+        "Vallas de Tukey (1,5·IQR)",
+        "Regla clásica para marcar atípicos a partir de los cuartiles: se calcula "
+        "el rango intercuartílico IQR = P75 − P25 y se trazan dos vallas, una "
+        "inferior en P25 − 1,5·IQR y otra superior en P75 + 1,5·IQR. Los valores "
+        "que caen fuera de esas vallas se consideran atípicos. Es robusta porque "
+        "se apoya en la mediana y los cuartiles, no en la media.",
+    ),
+    "zscore": (
+        "z-score (puntuación típica)",
+        "Mide a cuántas desviaciones típicas está un valor de la media de su "
+        "columna: z = (valor − media) / desviación típica. Un |z| grande (aquí > "
+        "3) señala un valor alejado del centro. A diferencia de las vallas de "
+        "Tukey, el z-score usa media y desviación, así que es más sensible a la "
+        "presencia de los propios atípicos.",
+    ),
+    "isolation_forest": (
+        "Isolation Forest (anomalías multivariantes)",
+        "Algoritmo de detección de anomalías que considera TODAS las columnas a "
+        "la vez: construye árboles que parten el espacio con cortes aleatorios y "
+        "mide cuántos cortes hacen falta para aislar cada fila. Las filas raras "
+        "se aíslan con muy pocos cortes y se marcan como atípicas según un umbral "
+        "de contaminación. Detecta combinaciones de valores poco frecuentes que "
+        "ninguna columna por separado revelaría.",
+    ),
+}
+
+
+# --------------------------------------------------------------------------- #
+# Lazy registry delegations (each degrades to None / no-op on any failure).
+# --------------------------------------------------------------------------- #
+def _load_build_boxplot_stats():
+    try:
+        from datascience.build_boxplot_stats import build_boxplot_stats
+        return build_boxplot_stats
+    except Exception:  # noqa: BLE001
+        return None
+
+
+def _load_detect_outliers():
+    # detect_outliers lives in the monolithic ``datascience.datascience`` module
+    # (file_path datascience.py), not in its own submodule — try both shapes.
+    try:
+        from datascience.datascience import detect_outliers
+        return detect_outliers
+    except Exception:  # noqa: BLE001
+        try:
+            from datascience import detect_outliers
+            return detect_outliers
+        except Exception:  # noqa: BLE001
+            return None
+
+
+def _load_isolation_forest():
+    try:
+        from datascience.isolation_forest_outliers import isolation_forest_outliers
+        return isolation_forest_outliers
+    except Exception:  # noqa: BLE001
+        return None
+
+
+def _load_summarize_dims():
+    try:
+        from datascience.summarize_outlier_dims import summarize_outlier_dims
+        return summarize_outlier_dims
+    except Exception:  # noqa: BLE001
+        return None
+
+
+# --------------------------------------------------------------------------- #
+# Defensive formatters (own copy: the chapter never imports siblings).
+# --------------------------------------------------------------------------- #
+def _fmt_num(value, decimals: int = 3) -> str:
+    if value is None:
+        return "—"
+    if isinstance(value, bool):
+        return "sí" if value else "no"
+    if isinstance(value, int):
+        return f"{value:,}".replace(",", ".")
+    if isinstance(value, float):
+        if value != value:  # NaN
+            return "—"
+        if value in (float("inf"), float("-inf")):
+            return str(value)
+        text = f"{value:.{decimals}f}".rstrip("0").rstrip(".")
+        return text if text else "0"
+    return model._safe_str(value)
+
+
+def _fmt_int(value) -> str:
+    if value is None:
+        return "—"
+    try:
+        return f"{int(round(float(value))):,}".replace(",", ".")
+    except (TypeError, ValueError):
+        return model._safe_str(value)
+
+
+def _fmt_pct(value, decimals: int = 2) -> str:
+    """Format an already-0-100 value as a percentage. None -> placeholder."""
+    if value is None:
+        return "—"
+    try:
+        return f"{float(value):.{decimals}f}%"
+    except (TypeError, ValueError):
+        return model._safe_str(value)
+
+
+def _term(mark: bool, key: str, text: str) -> str:
+    return f"[[term:{key}]]{text}[[/term]]" if mark else text
+
+
+def _is_dict(v) -> bool:
+    return isinstance(v, dict)
+
+
+# --------------------------------------------------------------------------- #
+# Profile reads.
+# --------------------------------------------------------------------------- #
+def _numeric_columns(profile: dict) -> list:
+    """Return [(name, numeric_dict)] for numeric columns with usable stats."""
+    out = []
+    for col in profile.get("columns") or []:
+        if not isinstance(col, dict):
+            continue
+        if col.get("inferred_type") != "numeric":
+            continue
+        num = col.get("numeric")
+        if not isinstance(num, dict) or not num:
+            continue
+        if num.get("mean") is None and num.get("median") is None:
+            continue
+        out.append((col.get("name") or "(columna)", num))
+    return out
+
+
+def _clean_values(raw):
+    """Return the finite float values of a raw column list (drop None/NaN/inf)."""
+    if not isinstance(raw, (list, tuple)):
+        return None
+    vals = []
+    for v in raw:
+        if v is None or isinstance(v, bool):
+            continue
+        try:
+            f = float(v)
+        except (TypeError, ValueError):
+            continue
+        if f != f or f in (float("inf"), float("-inf")):
+            continue
+        vals.append(f)
+    return vals
+
+
+# --------------------------------------------------------------------------- #
+# Per-column univariate summary.
+# --------------------------------------------------------------------------- #
+def _univariate_row(name, numeric, raw_vals, box_fn, detect_fn):
+    """Compute one univariate summary row + boxplot inputs for a column.
+
+    Returns a dict with the table cells and, when raw values are available, the
+    exact Tukey/z counts and the list of atypical (flier) values; otherwise it
+    degrades to the profile's own z-score counts and the fence flags.
+    """
+    box = {}
+    if box_fn is not None:
+        try:
+            box = box_fn(numeric) or {}
+        except Exception:  # noqa: BLE001
+            box = {}
+    lf = box.get("lower_fence")
+    uf = box.get("upper_fence")
+
+    vals = _clean_values(raw_vals)
+    n_tukey = pct_tukey = None
+    n_z = pct_z = None
+    low_extreme = high_extreme = None
+    fliers = []
+    contamination = None  # metric used to rank columns (prefer Tukey %).
+
+    if vals:
+        n = len(vals)
+        tukey_out = []
+        for v in vals:
+            below = (lf is not None and v < lf)
+            above = (uf is not None and v > uf)
+            if below or above:
+                tukey_out.append(v)
+        n_tukey = len(tukey_out)
+        pct_tukey = 100.0 * n_tukey / n if n else None
+        if tukey_out:
+            low_extreme = min(tukey_out)
+            high_extreme = max(tukey_out)
+            fliers = tukey_out[:_MAX_FLIERS]
+        # z-score rule via the registry function (returns parallel bools).
+        if detect_fn is not None:
+            try:
+                flags = detect_fn(vals, _Z_THRESH) or []
+                n_z = int(sum(1 for b in flags if b))
+                pct_z = 100.0 * n_z / n if n else None
+            except Exception:  # noqa: BLE001
+                n_z = pct_z = None
+        contamination = pct_tukey
+    else:
+        # Degrade: no raw sample for this column. The profile's own outlier
+        # count/pct come from the z-score block (build_boxplot_stats note); the
+        # Tukey count is unknown, only the fence flags are.
+        n_z = numeric.get("n_outliers")
+        pct_z = numeric.get("outlier_pct")
+        if box.get("has_low_outliers") and box.get("min") is not None:
+            low_extreme = box.get("min")
+        if box.get("has_high_outliers") and box.get("max") is not None:
+            high_extreme = box.get("max")
+        contamination = pct_z if isinstance(pct_z, (int, float)) else None
+
+    # Compact "extremos atípicos" cell: down/up arrows for the low/high tail.
+    extremes = []
+    if low_extreme is not None:
+        extremes.append(f"↓ {_fmt_num(low_extreme)}")
+    if high_extreme is not None:
+        extremes.append(f"↑ {_fmt_num(high_extreme)}")
+    extremes_cell = "  ".join(extremes) if extremes else "—"
+
+    return {
+        "name": model._safe_str(name),
+        "n_tukey": n_tukey,
+        "pct_tukey": pct_tukey,
+        "n_z": n_z,
+        "pct_z": pct_z,
+        "lower_fence": lf,
+        "upper_fence": uf,
+        "extremes": extremes_cell,
+        "box": box,
+        "fliers": fliers,
+        "has_raw": bool(vals),
+        "contamination": contamination if isinstance(contamination, (int, float)) else -1.0,
+    }
+
+
+def _univariate_table(rows: list) -> model.DataTable:
+    header = ["Columna", "Atípicos Tukey", "% Tukey", "Atípicos z", "% z",
+              "Valla inf.", "Valla sup.", "Extremos atípicos"]
+    table_rows = []
+    for r in rows:
+        table_rows.append([
+            r["name"],
+            _fmt_int(r["n_tukey"]) if r["n_tukey"] is not None else "—",
+            _fmt_pct(r["pct_tukey"]) if r["pct_tukey"] is not None else "—",
+            _fmt_int(r["n_z"]) if r["n_z"] is not None else "—",
+            _fmt_pct(r["pct_z"]) if r["pct_z"] is not None else "—",
+            _fmt_num(r["lower_fence"]),
+            _fmt_num(r["upper_fence"]),
+            r["extremes"],
+        ])
+    return model.DataTable(
+        header=header, rows=table_rows,
+        title="Valores atípicos por columna",
+        note="Tukey = fuera de las vallas 1,5·IQR · z = |z-score| > 3 · "
+             "ordenado de más a menos contaminada")
+
+
+# --------------------------------------------------------------------------- #
+# Multivariate (Isolation Forest) section.
+# --------------------------------------------------------------------------- #
+def _resolve_multivariate(profile: dict, ctx: dict, raw_numeric):
+    """Return (outliers_dict_or_None, source).
+
+    Prefers a LIVE Isolation Forest over ``raw_numeric`` so the detector and
+    ``summarize_outlier_dims`` use EXACTLY the same numeric columns and the same
+    valid-row indexing — otherwise the precomputed ``profile['models']
+    ['outliers']`` (run by MODELOS over a possibly different column subset) would
+    yield ``row_index`` values that no longer point at the rows
+    ``summarize_outlier_dims`` reconstructs, mislabelling the "dimensions that
+    make each row rare". Falls back to the precomputed block when no raw sample
+    is available (e.g. the lite preset drops ``raw_numeric``)."""
+    if _is_dict(raw_numeric) and raw_numeric:
+        iso = _load_isolation_forest()
+        if iso is not None:
+            try:
+                out = iso(raw_numeric)
+                if _is_dict(out) and out.get("n_outliers") is not None and out.get("n_rows_used"):
+                    return out, "live"
+            except Exception:  # noqa: BLE001
+                pass
+    # Fallback: the model the MODELOS chapter already computed (no raw sample to
+    # recompute against, so no per-row dimension breakdown either).
+    models = profile.get("models") if _is_dict(profile.get("models")) else {}
+    pre = models.get("outliers") if _is_dict(models) else None
+    if _is_dict(pre) and pre.get("n_outliers") is not None and pre.get("n_rows_used"):
+        return pre, "precomputed"
+    return None, "none"
+
+
+def _multivariate_blocks(outliers: dict, raw_numeric, mark: bool) -> list:
+    isof = _term(mark, "isolation_forest", "**Isolation Forest**")
+    blocks = [
+        model.Heading(text="Filas atípicas (multivariante)", level=2),
+        model.Markdown(text=(
+            f"Hasta aquí cada columna se ha mirado por separado. {isof} busca "
+            "filas raras considerando **todas las columnas a la vez**: una fila "
+            "puede ser normal en cada variable y aun así ser atípica por la "
+            "**combinación** de sus valores (p. ej. una edad baja con una tarifa "
+            "muy alta). La tabla resume cuántas filas se marcaron y el umbral de "
+            "decisión.")),
+        model.KVTable(rows=[
+            ("Filas analizadas", _fmt_int(outliers.get("n_rows_used"))),
+            ("Columnas consideradas", _fmt_int(outliers.get("n_features"))),
+            ("Filas atípicas", _fmt_int(outliers.get("n_outliers"))),
+            ("% filas atípicas", _fmt_pct(outliers.get("outlier_pct"))),
+            ("Umbral de decisión", _fmt_num(outliers.get("threshold"), 4)),
+        ], title="Anomalías multivariantes"),
+    ]
+
+    rows_in = outliers.get("outlier_rows") or []
+    if not rows_in:
+        return blocks
+
+    # Enrich each anomalous row with the dimensions that make it rare, when the
+    # raw sample is available (summarize_outlier_dims reconstructs the same
+    # valid-row indexing as isolation_forest_outliers).
+    dims_by_row = {}
+    if _is_dict(raw_numeric) and raw_numeric:
+        summ = _load_summarize_dims()
+        if summ is not None:
+            try:
+                enriched = summ(raw_numeric, rows_in, top_k=3) or []
+                for e in enriched:
+                    if _is_dict(e) and e.get("row_index") is not None:
+                        dims_by_row[e.get("row_index")] = e.get("dims") or []
+            except Exception:  # noqa: BLE001
+                dims_by_row = {}
+
+    has_dims = bool(dims_by_row)
+    header = ["Fila (entre válidas)", "Score"]
+    if has_dims:
+        header.append("Dimensiones que la hacen rara (col = valor, z)")
+    table_rows = []
+    for r in rows_in[:_TOP_ROWS]:
+        if not _is_dict(r):
+            continue
+        ridx = r.get("row_index")
+        cells = [_fmt_int(ridx), _fmt_num(r.get("score"), 4)]
+        if has_dims:
+            dims = dims_by_row.get(ridx) or []
+            parts = []
+            for d in dims:
+                if not _is_dict(d):
+                    continue
+                parts.append(
+                    f"{model._safe_str(d.get('col'))} = {_fmt_num(d.get('value'))} "
+                    f"(z {_fmt_num(d.get('z'), 2)})")
+            cells.append("; ".join(parts) if parts else "—")
+        table_rows.append(cells)
+
+    if table_rows:
+        shown = len(table_rows)
+        total = outliers.get("n_outliers")
+        note = "las filas más anómalas primero (score más bajo = más rara)"
+        if isinstance(total, int) and total > shown:
+            note += f" — top {shown} de {total}"
+        if not has_dims:
+            note += (" · no se pudo recuperar la muestra cruda para explicar las "
+                     "dimensiones de cada fila")
+        blocks.append(model.DataTable(
+            header=header, rows=table_rows,
+            title="Filas más atípicas", note=note))
+    return blocks
+
+
+# --------------------------------------------------------------------------- #
+# Interpretation section.
+# --------------------------------------------------------------------------- #
+def _interpretation_block(mark: bool) -> model.Markdown:
+    outlier = _term(mark, "outlier", "atípico")
+    text = (
+        f"**Un {outlier} no es necesariamente un error.** Conviene distinguir "
+        "dos casos antes de actuar:\n\n"
+        "- **Error de dato** (medida, registro o unidad equivocada): una edad de "
+        "200 años, un importe negativo donde no puede haberlo, un decimal "
+        "desplazado. Estos sí se corrigen o se eliminan, idealmente en el origen.\n"
+        "- **Dato real extremo**: una observación legítima de la cola de la "
+        "distribución (un cliente que gasta mucho más, una tarifa de lujo, un día "
+        "de ventas excepcional). Borrarla sesga el análisis y oculta información "
+        "valiosa.\n\n"
+        "**Qué hacer.** Primero, **revisar** los valores señalados arriba contra "
+        "su origen para decidir cuál de los dos casos es. Si son errores, "
+        "corregirlos. Si son datos reales que distorsionan medias y modelos, hay "
+        "alternativas a borrarlos: **winsorizar** (recortar los extremos a un "
+        "percentil), o **re-expresar** la variable (por ejemplo una "
+        "transformación logarítmica o la escalera de re-expresión de Tukey que "
+        "este mismo perfil ya calcula para las columnas asimétricas), que suele "
+        "domar la cola sin perder ninguna fila. La elección depende del objetivo: "
+        "esta lectura es **exploratoria** —orienta dónde mirar—, no una regla "
+        "automática de limpieza.")
+    return model.Markdown(text=text)
+
+
+# --------------------------------------------------------------------------- #
+# Entry point.
+# --------------------------------------------------------------------------- #
+def build_outliers(profile: dict, ctx: dict):
+    """Build the OUTLIERS Chapter, or None if the dataset has no numeric column."""
+    profile = profile or {}
+    ctx = ctx or {}
+    if not isinstance(profile, dict):
+        return None
+
+    numerics = _numeric_columns(profile)
+    if not numerics:
+        return None  # chapter does not apply to a dataset with no numerics.
+
+    # Register glossary terms (if a collector is present) and mark them clickable.
+    glossary = ctx.get("glossary")
+    mark = False
+    if isinstance(glossary, model.GlossaryCollector):
+        for key, (label, definition) in _TERM_DEFS.items():
+            glossary.add(key, label, definition)
+        mark = True
+
+    raw_numeric = ctx.get("raw_numeric")
+    raw_numeric = raw_numeric if isinstance(raw_numeric, dict) else {}
+
+    box_fn = _load_build_boxplot_stats()
+    detect_fn = _load_detect_outliers()
+
+    # --- Univariate summary ------------------------------------------------- #
+    uni_rows = []
+    for name, numeric in numerics:
+        uni_rows.append(_univariate_row(
+            name, numeric, raw_numeric.get(name), box_fn, detect_fn))
+    # Rank columns by contamination (Tukey % when available, else z %).
+    uni_rows.sort(key=lambda r: r.get("contamination", -1.0), reverse=True)
+
+    intro = (
+        "Este capítulo reúne en un solo sitio el análisis de los **valores "
+        "atípicos** de la tabla, que en el resto del informe aparecen dispersos. "
+        f"Un {_term(mark, 'outlier', 'atípico')} es una observación que se aparta "
+        "mucho del grueso de los datos. Cada columna numérica se evalúa con dos "
+        f"criterios complementarios: las {_term(mark, 'tukey_fence', 'vallas de Tukey')} "
+        "(fuera de P25−1,5·IQR o P75+1,5·IQR, robusto a la propia cola) y el "
+        f"{_term(mark, 'zscore', 'z-score')} (|z| > 3, sensible a la media). La "
+        "tabla está ordenada de la columna más contaminada a la menos.")
+
+    blocks = [
+        model.Heading(text=CHAPTER_TITLE, level=1),
+        model.Markdown(text=intro),
+        _univariate_table(uni_rows),
+    ]
+
+    # Flag the most contaminated columns explicitly.
+    flagged = [r["name"] for r in uni_rows
+               if r.get("contamination", -1.0) > 0][:_TOP_FLAGGED]
+    if flagged:
+        names = ", ".join(f"**{n}**" for n in flagged)
+        blocks.append(model.Markdown(text=(
+            f"Las columnas con mayor proporción de atípicos son {names}: "
+            "concentran el grueso de los valores fuera de las vallas y son las "
+            "primeras a revisar.")))
+
+    # --- Boxplots figure ---------------------------------------------------- #
+    box_entries = [
+        {"name": r["name"], "box": r["box"], "fliers": r.get("fliers")}
+        for r in uni_rows
+        if r.get("box")
+    ][:_TOP_BOX]
+    if box_entries:
+        def _boxplots_make(entries=box_entries):
+            try:
+                from datascience.build_boxplots_figure import build_boxplots_figure
+                return build_boxplots_figure(
+                    entries, title="Boxplots de Tukey por columna",
+                    max_boxes=_TOP_BOX)
+            except Exception:  # noqa: BLE001 — minimal fallback figure.
+                import matplotlib
+                matplotlib.use("Agg")
+                from matplotlib.figure import Figure
+                fig = Figure(figsize=(5.0, 2.2))
+                ax = fig.add_subplot(111)
+                ax.text(0.5, 0.5, "(boxplots no disponibles)",
+                        ha="center", va="center")
+                ax.axis("off")
+                return fig
+
+        blocks.append(model.Group(blocks=[
+            model.Heading(text="Boxplots", level=2),
+            model.Markdown(text=(
+                "Cada caja abarca del primer al tercer cuartil (P25–P75), la línea "
+                "interior es la mediana y los bigotes llegan hasta 1,5·IQR; los "
+                "puntos son los valores que caen fuera de las vallas (atípicos por "
+                "Tukey).")),
+            model.Figure(
+                make=_boxplots_make,
+                caption="Boxplots de Tukey de las columnas más contaminadas."),
+        ]))
+
+    # --- Multivariate ------------------------------------------------------- #
+    outliers, _src = _resolve_multivariate(profile, ctx, raw_numeric)
+    if outliers is not None:
+        blocks.extend(_multivariate_blocks(outliers, raw_numeric, mark))
+    else:
+        blocks.append(model.Heading(text="Filas atípicas (multivariante)", level=2))
+        blocks.append(model.Note(
+            "No se pudo analizar la anomalía multivariante: hacen falta al menos "
+            "dos columnas numéricas y la muestra cruda (o los modelos del perfil) "
+            "para correr Isolation Forest."))
+
+    # --- Interpretation ----------------------------------------------------- #
+    blocks.append(model.Heading(text="Cómo interpretar los atípicos", level=2))
+    blocks.append(_interpretation_block(mark))
+
+    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
+                         version=CHAPTER_VERSION, blocks=blocks)
@@ -0,0 +1,304 @@
+"""Tests for the OUTLIERS chapter — DoD: golden + edges + error path.
+
+Self-contained: builds synthetic ``numeric`` blocks + a raw_numeric sample (no
+DuckDB) so the suite is fast and deterministic. Verifies that the chapter emits
+the univariate per-column table, a boxplots figure, the multivariate Isolation
+Forest section and the outlier≠error interpretation; that the most contaminated
+column is ranked first; that a profile with no numeric column yields None; that
+None/empty never raises; that the glossary terms are registered; and that the
+chapter renders into both PDF and PPTX without cutting its title.
+"""
+
+import math
+import os
+import re
+import tempfile
+
+from pypdf import PdfReader
+
+from datascience.automatic_eda.chapters.outliers import (
+    build_outliers, CHAPTER_VERSION, CHAPTER_TITLE, _TERM_DEFS,
+)
+from datascience.automatic_eda import model
+from datascience.render_automatic_eda_pdf import render_automatic_eda_pdf
+from datascience.render_automatic_eda_pptx import render_automatic_eda_pptx
+
+
+def _percentile(sorted_vals, q):
+    """Linear-interpolation percentile (q in 0..1) on an already-sorted list."""
+    if not sorted_vals:
+        return None
+    if len(sorted_vals) == 1:
+        return float(sorted_vals[0])
+    pos = q * (len(sorted_vals) - 1)
+    lo = int(math.floor(pos))
+    hi = int(math.ceil(pos))
+    if lo == hi:
+        return float(sorted_vals[lo])
+    frac = pos - lo
+    return float(sorted_vals[lo] * (1 - frac) + sorted_vals[hi] * frac)
+
+
+def _col_from_values(values, nbins=10):
+    """Build a ``numeric`` sub-block shaped like describe_numeric's output from a
+    concrete list of raw values, so the profile percentiles and the raw sample
+    are consistent (the boxplot fences match the crudo)."""
+    vals = [float(v) for v in values]
+    s = sorted(vals)
+    n = len(s)
+    mean = sum(vals) / n
+    var = sum((v - mean) ** 2 for v in vals) / n
+    std = math.sqrt(var)
+    median = _percentile(s, 0.5)
+    p25 = _percentile(s, 0.25)
+    p75 = _percentile(s, 0.75)
+    mn, mx = s[0], s[-1]
+    # z-score outlier count (population), what the profile's n_outliers carries.
+    n_out = sum(1 for v in vals if std > 0 and abs((v - mean) / std) > 3.0)
+    width = (mx - mn) / nbins if mx > mn else 1.0
+    hist = [{"lo": mn + i * width, "hi": mn + (i + 1) * width, "count": 1}
+            for i in range(nbins)]
+    return {
+        "min": mn, "max": mx, "mean": mean, "median": median, "std": std,
+        "p25": p25, "p50": median, "p75": p75, "iqr": (p75 - p25),
+        "n_outliers": n_out, "outlier_pct": 100.0 * n_out / n,
+        "distribution_type": "right-skewed", "histogram": hist,
+    }
+
+
+def _fare_values():
+    """A heavy-tailed column (most ~10-30, a few 200-512): clear Tukey/z outliers."""
+    base = [7.0 + (i % 25) for i in range(120)]      # bulk 7..31
+    tail = [180.0, 210.0, 263.0, 512.0]              # extreme upper tail
+    return base + tail
+
+
+def _age_values():
+    """A roughly symmetric column with one extreme low value."""
+    base = [22.0 + (i % 40) for i in range(120)]     # 22..61
+    return base + [80.0, 0.5, 74.0, 1.0]
+
+
+def _quiet_values():
+    """A clean column with no atypical values."""
+    return [50.0 + (i % 5) for i in range(124)]
+
+
+def _profile_and_ctx(with_models=True, with_raw=True):
+    fare = _fare_values()
+    age = _age_values()
+    quiet = _quiet_values()
+    cols = [
+        {"name": "Fare", "inferred_type": "numeric", "numeric": _col_from_values(fare)},
+        {"name": "Age", "inferred_type": "numeric", "numeric": _col_from_values(age)},
+        {"name": "Quiet", "inferred_type": "numeric", "numeric": _col_from_values(quiet)},
+        {"name": "Sexo", "inferred_type": "categorical",
+         "categorical": {"top": [{"value": "male", "count": 80}]}},
+    ]
+    profile = {"table": "titanic", "n_rows": len(fare), "n_cols": len(cols),
+               "columns": cols}
+    if with_models:
+        profile["models"] = {
+            "outliers": {
+                "n_outliers": 4, "outlier_pct": 3.2,
+                "outlier_rows": [
+                    {"row_index": 123, "score": -0.21},
+                    {"row_index": 121, "score": -0.15},
+                ],
+                "threshold": -0.02, "n_rows_used": 124, "n_features": 3,
+            }
+        }
+    ctx = {}
+    if with_raw:
+        ctx["raw_numeric"] = {"Fare": fare, "Age": age, "Quiet": quiet}
+    return profile, ctx
+
+
+def _pdf_text(path: str) -> str:
+    txt = "".join((pg.extract_text() or "") for pg in PdfReader(path).pages)
+    return re.sub(r"\s+", " ", txt)
+
+
+def _flatten(blocks):
+    out = []
+    for b in blocks:
+        if getattr(b, "kind", "") == "group":
+            out.extend(_flatten(getattr(b, "blocks", []) or []))
+        else:
+            out.append(b)
+    return out
+
+
+# --------------------------------------------------------------------------- #
+# Golden.
+# --------------------------------------------------------------------------- #
+def test_golden_estructura_y_secciones():
+    profile, ctx = _profile_and_ctx()
+    ctx["glossary"] = model.GlossaryCollector()
+    ch = build_outliers(profile, ctx)
+    assert ch is not None
+    assert ch.id == "outliers"
+    assert ch.version == CHAPTER_VERSION
+
+    flat = _flatten(ch.blocks)
+    kinds = [b.kind for b in flat]
+    # Title heading + univariate DataTable + boxplots Figure + multivariate
+    # KVTable + interpretation Markdown.
+    assert kinds[0] == "heading" and flat[0].text == CHAPTER_TITLE
+    tables = [b for b in flat if b.kind == "data_table"]
+    titles = [t.title for t in tables]
+    assert any(t and "atípicos por columna" in t for t in titles)
+    assert any(b.kind == "figure" for b in flat), "falta la figura de boxplots"
+    assert any(b.kind == "kv_table" for b in flat), "falta el resumen multivariante"
+
+    # The boxplots figure maker yields a real matplotlib figure (or its fallback).
+    fig = next(b for b in flat if b.kind == "figure").make()
+    assert fig is not None
+    import matplotlib.pyplot as plt
+    plt.close(fig)
+
+
+def test_golden_fare_es_la_mas_contaminada():
+    # The univariate table must rank Fare (heavy tail) first and report a
+    # non-zero Tukey percentage for it.
+    profile, ctx = _profile_and_ctx()
+    ch = build_outliers(profile, ctx)
+    table = next(b for b in _flatten(ch.blocks)
+                 if b.kind == "data_table" and b.title
+                 and "atípicos por columna" in b.title)
+    first_col = table.rows[0][0]
+    assert first_col == "Fare", f"esperaba Fare primera, fue {first_col}"
+    # % Tukey column (index 2) of the first row must be > 0.
+    pct_cell = table.rows[0][2]
+    assert pct_cell not in ("—", "0%", "0.00%"), f"% Tukey de Fare vacío: {pct_cell}"
+    # The z-score rule (detect_outliers) must actually run with raw_numeric: at
+    # least one column reports a non-empty z count/percentage (regression guard
+    # for the detect_outliers import path).
+    z_pcts = [r[4] for r in table.rows]
+    assert any(c not in ("—",) for c in z_pcts), f"columna z toda vacía: {z_pcts}"
+    z_counts = [r[3] for r in table.rows]
+    assert any(c not in ("—",) for c in z_counts), f"conteo z vacío: {z_counts}"
+
+
+def test_golden_interpretacion_outlier_no_es_error():
+    profile, ctx = _profile_and_ctx()
+    ch = build_outliers(profile, ctx)
+    md = " ".join(b.text for b in _flatten(ch.blocks) if b.kind == "markdown")
+    assert "no es necesariamente un error" in md.lower()
+    # Mentions the actionable options (winsorize / re-express).
+    assert "winsoriz" in md.lower()
+    assert "re-expres" in md.lower() or "logarítmic" in md.lower()
+
+
+def test_golden_terminos_glosario_registrados():
+    profile, ctx = _profile_and_ctx()
+    gloss = model.GlossaryCollector()
+    ctx["glossary"] = gloss
+    build_outliers(profile, ctx)
+    for key in _TERM_DEFS:
+        assert gloss.has(key), f"término '{key}' no registrado en el glosario"
+    # Terms are marked clickable in the body text.
+    md = " ".join(b.text for b in _flatten(build_outliers(profile, ctx).blocks)
+                  if b.kind == "markdown")
+    assert "[[term:outlier]]" in md and "[[term:tukey_fence]]" in md
+
+
+# --------------------------------------------------------------------------- #
+# Multivariate.
+# --------------------------------------------------------------------------- #
+def test_multivariante_live_con_raw_y_dims():
+    # With a raw sample the chapter runs Isolation Forest live (over the same
+    # columns summarize_outlier_dims uses) and lists the anomalous rows with the
+    # dimensions that make each one rare.
+    profile, ctx = _profile_and_ctx(with_models=False, with_raw=True)
+    ch = build_outliers(profile, ctx)
+    flat = _flatten(ch.blocks)
+    kv = next(b for b in flat if b.kind == "kv_table")
+    flat_kv = " ".join(f"{k} {v}" for (k, v) in kv.rows)
+    assert "Filas atípicas" in flat_kv
+    # A non-zero number of anomalous rows is reported.
+    n_cell = dict(kv.rows).get("Filas atípicas")
+    assert n_cell not in (None, "—", "0"), f"sin filas atípicas: {n_cell}"
+    # The anomalous-rows table carries the per-row dimension breakdown.
+    tbls = [b for b in flat if b.kind == "data_table" and b.title
+            and "más atípicas" in b.title]
+    assert tbls, "falta la tabla de filas más atípicas"
+    assert any("hacen rara" in h for h in tbls[0].header), \
+        f"falta la columna de dimensiones: {tbls[0].header}"
+
+
+def test_multivariante_precomputed_sin_raw():
+    # Without a raw sample the chapter falls back to profile['models']['outliers']
+    # (lite preset path); the precomputed n_outliers (4) surfaces in the KV table.
+    profile, ctx = _profile_and_ctx(with_models=True, with_raw=False)
+    ch = build_outliers(profile, ctx)
+    kv = next(b for b in _flatten(ch.blocks) if b.kind == "kv_table")
+    assert any("4" in str(v) for (k, v) in kv.rows)
+
+
+def test_multivariante_ausente_degrada_a_nota():
+    # No models and no raw sample → an honest note, never a crash.
+    profile, ctx = _profile_and_ctx(with_models=False, with_raw=False)
+    ch = build_outliers(profile, ctx)
+    assert ch is not None
+    notes = [b.text for b in _flatten(ch.blocks) if b.kind == "note"]
+    assert any("Isolation Forest" in n for n in notes)
+
+
+# --------------------------------------------------------------------------- #
+# Edges / error path.
+# --------------------------------------------------------------------------- #
+def test_edge_sin_columnas_numericas_devuelve_none():
+    prof = {"columns": [{"name": "c", "inferred_type": "categorical",
+                         "categorical": {"top": [{"value": "x", "count": 3}]}}]}
+    assert build_outliers(prof, {}) is None
+
+
+def test_edge_solo_texto_sintetico_devuelve_none():
+    # A text-only synthetic table (no numeric column) yields None (does not break).
+    prof = {"table": "notas", "n_rows": 3, "n_cols": 1,
+            "columns": [{"name": "comentario", "inferred_type": "text",
+                         "text": {"n_docs": 3}}]}
+    assert build_outliers(prof, {}) is None
+
+
+def test_edge_profile_none_y_vacio_no_revienta():
+    assert build_outliers(None, None) is None
+    assert build_outliers({}, {}) is None
+    assert build_outliers({"columns": []}, {}) is None
+
+
+def test_edge_sin_raw_numeric_degrada_a_perfil():
+    # Without raw_numeric the chapter still builds, using the profile z-score
+    # counts; the univariate table exists and Tukey counts degrade to '—'.
+    profile, ctx = _profile_and_ctx(with_models=True, with_raw=False)
+    ch = build_outliers(profile, ctx)
+    assert ch is not None
+    table = next(b for b in _flatten(ch.blocks)
+                 if b.kind == "data_table" and b.title
+                 and "atípicos por columna" in b.title)
+    # z column comes from the profile; Tukey count is unknown ('—').
+    assert all(len(r) == 8 for r in table.rows)
+
+
+# --------------------------------------------------------------------------- #
+# Anti-cut render.
+# --------------------------------------------------------------------------- #
+def test_render_pdf_y_pptx_incluyen_el_capitulo():
+    profile, ctx = _profile_and_ctx()
+    # The renderers build the whole document; the chapter is reached via the
+    # registry. Render the chapter standalone through a one-chapter document by
+    # passing the profile directly (the renderers run the full chapter registry).
+    with tempfile.TemporaryDirectory() as d:
+        pdf = os.path.join(d, "out.pdf")
+        res_pdf = render_automatic_eda_pdf(profile, pdf,
+                                           {"write_manifest": False, "ctx": ctx})
+        assert res_pdf["path"] == pdf
+        txt = _pdf_text(pdf)
+        assert CHAPTER_TITLE in txt, "el capítulo OUTLIERS no aparece en el PDF"
+        assert "Fare" in txt
+        pptx = os.path.join(d, "out.pptx")
+        res_pptx = render_automatic_eda_pptx(profile, pptx,
+                                             {"write_manifest": False, "ctx": ctx})
+        assert res_pptx["path"] == pptx
+        assert res_pptx["n_slides"] >= 1
@@ -7,11 +7,21 @@ as needed, the renderers paginate):
   NOT carry the raw head, so this is read from ``ctx['head_rows']`` /
   ``profile['head_rows']`` (a list of row dicts). When absent the chapter shows
   an honest placeholder documenting the missing key instead of inventing data.
-2. Column dictionary — name / type / nulls / non-null examples. Examples come
+2. Column dictionary — name / type / nulls / non-null examples plus, when the
+   LLM layer ran, the business **description** and **unit** of each column so the
+   reader knows at a glance what every column is and in which unit. Examples come
   from ``columns[i]['examples']`` when present; otherwise they are derived from
   real non-null profile values (categorical top values, numeric min/median/max)
   so the cell is never empty nor fabricated.
-3. ``df.describe`` — mean / median / min / max / std for every numeric column.
+3. ``df.describe`` — mean / median / min / max / std for every numeric column,
+   plus its **unit** (same LLM source) so the stats read in context.
+
+The description/unit come from the ``llm`` block that ``eda_llm_insights`` (group
+``eda``) already stored in the profile (``profile['llm']['dictionary']``, a list
+of ``{"column","description","business_meaning","unit"}`` entries) — this chapter
+only **consumes** it, matching by column name; it never calls the LLM nor
+recomputes anything. When the block is absent (``run_llm`` did not run) those
+cells degrade to ``"—"`` and the tables still render.

 Contract: build_<id>(profile, ctx) -> Chapter | None ; CHAPTER_VERSION = "x.y.z".
 """
@@ -20,13 +30,59 @@ from __future__ import annotations

 from .. import model

-CHAPTER_VERSION = "1.1.0"
+CHAPTER_VERSION = "1.2.0"
 CHAPTER_ID = "overview"
 CHAPTER_TITLE = "Overview"

 # Profile/ctx keys the calculation phase must add for a full head + examples.
 HEAD_KEY = "head_rows"          # list[dict] — df.head(n)
 EXAMPLES_KEY = "examples"       # per column: list of non-null sample values
+LLM_KEY = "llm"                 # interpretive block from eda_llm_insights
+
+
+def _llm_dict_index(profile: dict, ctx: dict) -> dict:
+    """Map column name -> its LLM dictionary entry (description/unit/...).
+
+    Reads the ``llm.dictionary`` list that ``eda_llm_insights`` stored in the
+    profile (``profile['llm']``; falls back to ``ctx['llm']``). Returns an empty
+    dict when no LLM block ran, so the caller degrades to "—" cells. Fully
+    defensive: never raises on malformed input.
+    """
+    llm = profile.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        llm = ctx.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        return {}
+    entries = llm.get("dictionary")
+    if not isinstance(entries, (list, tuple)):
+        return {}
+    index: dict = {}
+    for e in entries:
+        if not isinstance(e, dict):
+            continue
+        col = e.get("column")
+        if col is None:
+            continue
+        index[model._safe_str(col)] = e
+    return index
+
+
+def _llm_desc(entry) -> str:
+    """Business description of a column from its LLM entry, or "—"."""
+    if not isinstance(entry, dict):
+        return "—"
+    raw = entry.get("description") or entry.get("business_meaning")
+    text = " ".join(model._safe_str(raw).split()) if raw is not None else ""
+    return text or "—"
+
+
+def _llm_unit(entry) -> str:
+    """Unit of a column from its LLM entry, or "—"."""
+    if not isinstance(entry, dict):
+        return "—"
+    raw = entry.get("unit")
+    text = " ".join(model._safe_str(raw).split()) if raw is not None else ""
+    return text or "—"


 def _fmt_num(value, decimals: int = 3) -> str:
@@ -104,9 +160,12 @@ def _head_block(profile: dict, ctx: dict):
        "pasarlo en ctx['head_rows'] para mostrar las primeras filas.")


-def _columns_block(profile: dict):
+def _columns_block(profile: dict, llm_index: dict):
    cols = profile.get("columns") or []
-    header = ["Columna", "Tipo", "Nulos", "Ejemplos (no nulos)"]
+    # Descripción / Unidad come from the LLM dictionary (matched by column name);
+    # they read "—" when run_llm did not run, so the table always renders.
+    header = ["Columna", "Tipo", "Nulos", "Ejemplos (no nulos)",
+              "Descripción", "Unidad"]
    rows = []
    for c in cols:
        if not isinstance(c, dict):
@@ -126,15 +185,18 @@ def _columns_block(profile: dict):
            nulls = str(null_count)
        else:
            nulls = "—"
-        rows.append([name, ctype, nulls, _examples_for(c)])
+        entry = llm_index.get(model._safe_str(name))
+        rows.append([name, ctype, nulls, _examples_for(c),
+                     _llm_desc(entry), _llm_unit(entry)])
    if not rows:
        return None
    return model.DataTable(header=header, rows=rows, title="Columnas")


-def _describe_block(profile: dict):
+def _describe_block(profile: dict, llm_index: dict):
    cols = profile.get("columns") or []
-    header = ["Columna", "mean", "median", "min", "max", "std"]
+    # "Unidad" (LLM source) lets the reader know in which unit each stat is.
+    header = ["Columna", "mean", "median", "min", "max", "std", "Unidad"]
    rows = []
    for c in cols:
        if not isinstance(c, dict) or c.get("inferred_type") != "numeric":
@@ -142,13 +204,16 @@ def _describe_block(profile: dict):
        num = c.get("numeric") or {}
        if not num:
            continue
+        name = c.get("name") or "(col)"
+        entry = llm_index.get(model._safe_str(name))
        rows.append([
-            c.get("name") or "(col)",
+            name,
            _fmt_num(num.get("mean")),
            _fmt_num(num.get("median")),
            _fmt_num(num.get("min")),
            _fmt_num(num.get("max")),
            _fmt_num(num.get("std")),
+            _llm_unit(entry),
        ])
    if not rows:
        return None
@@ -163,16 +228,18 @@ def build_overview(profile: dict, ctx: dict):
    if not cols and not (ctx.get(HEAD_KEY) or profile.get(HEAD_KEY)):
        return None

+    llm_index = _llm_dict_index(profile, ctx)
+
    blocks = [
        model.Heading(text="Primeras filas (df.head)", level=2),
        _head_block(profile, ctx),
    ]
-    cols_block = _columns_block(profile)
+    cols_block = _columns_block(profile, llm_index)
    if cols_block is not None:
        blocks.append(model.Heading(
            text="Diccionario de columnas", level=2))
        blocks.append(cols_block)
-    desc_block = _describe_block(profile)
+    desc_block = _describe_block(profile, llm_index)
    if desc_block is not None:
        blocks.append(model.Heading(
            text="Resumen estadístico numérico", level=2))
@@ -56,7 +56,21 @@ def _head_rows() -> list:
    ]


-def _profile(with_head: bool = True) -> dict:
+def _llm() -> dict:
+    """Interpretive block as eda_llm_insights stores it under profile['llm']."""
+    return {
+        "summary": "Pasajeros del Titanic.",
+        "dictionary": [
+            {"column": "PassengerId", "description": "Identificador del pasajero",
+             "business_meaning": "Clave única de cada pasajero", "unit": "id"},
+            {"column": "Pclass", "description": "Clase del billete",
+             "business_meaning": "Clase socioeconómica", "unit": "clase (1-3)"},
+            # No entry for Survived/Name/Sex on purpose -> they degrade to "—".
+        ],
+    }
+
+
+def _profile(with_head: bool = True, with_llm: bool = False) -> dict:
    prof = {
        "table": "titanic",
        "source": "/data/titanic.csv",
@@ -68,6 +82,8 @@ def _profile(with_head: bool = True) -> dict:
    }
    if with_head:
        prof["head_rows"] = _head_rows()
+    if with_llm:
+        prof["llm"] = _llm()
    return prof


@@ -185,3 +201,70 @@ def test_edge_none_y_vacio_no_rompen():
    assert ch is not None
    tables = [b for b in _flatten(ch.blocks) if isinstance(b, DataTable)]
    assert tables and len(tables[0].rows) == 3
+
+
+def _table_by_header(blocks, marker: str):
+    """Return the first DataTable whose header contains ``marker``."""
+    for b in _flatten(blocks):
+        if isinstance(b, DataTable) and marker in b.header:
+            return b
+    return None
+
+
+def test_golden_diccionario_lleva_descripcion_y_unidad_del_llm():
+    # With run_llm: the column dictionary gains "Descripción" and "Unidad"
+    # columns populated from profile['llm']['dictionary'], matched by name.
+    ch = build_overview(_profile(with_llm=True), {})
+    assert ch is not None
+    dic = _table_by_header(ch.blocks, "Descripción")
+    assert dic is not None
+    assert dic.header == ["Columna", "Tipo", "Nulos", "Ejemplos (no nulos)",
+                          "Descripción", "Unidad"]
+    by_name = {row[0]: row for row in dic.rows}
+    # PassengerId has an LLM entry -> description + unit populated.
+    assert by_name["PassengerId"][4] == "Identificador del pasajero"
+    assert by_name["PassengerId"][5] == "id"
+    assert by_name["Pclass"][5] == "clase (1-3)"
+    # Columns with no LLM entry degrade to "—" without breaking the row.
+    assert by_name["Survived"][4] == "—" and by_name["Survived"][5] == "—"
+
+
+def test_golden_describe_lleva_unidad_del_llm():
+    ch = build_overview(_profile(with_llm=True), {})
+    desc = _table_by_header(ch.blocks, "std")
+    assert desc is not None
+    assert desc.header[-1] == "Unidad"
+    by_name = {row[0]: row for row in desc.rows}
+    assert by_name["PassengerId"][-1] == "id"
+    assert by_name["Pclass"][-1] == "clase (1-3)"
+    # Numeric column with no LLM unit still renders, unit "—".
+    assert by_name["Survived"][-1] == "—"
+
+
+def test_edge_sin_llm_descripcion_unidad_son_guion():
+    # No profile['llm'] at all: the new cells degrade to "—" and nothing breaks.
+    ch = build_overview(_profile(), {})
+    assert ch is not None
+    dic = _table_by_header(ch.blocks, "Unidad")
+    assert dic is not None
+    for row in dic.rows:
+        assert row[4] == "—" and row[5] == "—"
+    desc = _table_by_header(ch.blocks, "std")
+    assert all(row[-1] == "—" for row in desc.rows)
+
+
+def test_golden_llm_via_ctx_tambien_funciona():
+    # LLM block arriving through ctx['llm'] (fallback path) is consumed too.
+    ch = build_overview(_profile(with_llm=False), {"llm": _llm()})
+    dic = _table_by_header(ch.blocks, "Descripción")
+    by_name = {row[0]: row for row in dic.rows}
+    assert by_name["PassengerId"][5] == "id"
+
+
+def test_golden_render_pdf_muestra_descripcion_y_unidad():
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "eda.pdf")
+        render_automatic_eda_pdf(_profile(with_llm=True), out, {"title": "EDA"})
+        txt = _pdf_text(out)
+        assert "Descripción" in txt and "Unidad" in txt
+        assert "Identificador del pasajero" in txt
@@ -26,7 +26,7 @@ from datetime import datetime, timezone

 from .. import model

-CHAPTER_VERSION = "1.2.0"
+CHAPTER_VERSION = "1.4.0"
 CHAPTER_ID = "portada"
 CHAPTER_TITLE = "Portada"

@@ -35,12 +35,9 @@ CHAPTER_TITLE = "Portada"
 # row represents) from it when the LLM layer ran (``run_llm``).
 _LLM_KEY = "llm"

-# Default human description of what the table quality score measures. Chapters
-# can override it via ctx["quality_criteria"].
-_DEFAULT_QUALITY_CRITERIA = (
-    "media de los scores por columna (0–100): completitud (sin nulos/vacíos), "
-    "validez (tipo y rango coherentes) y consistencia (sin duplicados/constantes)."
-)
+# Font size (pt) for the dataset name on the PPTX cover slide — notably larger
+# than the default H1 so the dataset name stands out (shown underlined too).
+_PPTX_TITLE_PT = 44.0


 def _storage_from_source(source: str) -> str:
@@ -120,11 +117,20 @@ def _summary_blocks(summary) -> list:

    blocks = [model.Heading(text="Resumen del análisis", level=2)]
    if rows:
-        blocks.append(model.KVTable(rows=rows))
+        # Values pinned to the right margin (numbers flush right, label left).
+        blocks.append(model.KVTable(rows=rows, value_align="right"))
    if titles:
-        bullets = "\n".join(f"- {model._safe_str(t)}" for t in titles)
-        blocks.append(model.Markdown(
-            text="Este informe incluye los siguientes capítulos:\n" + bullets))
+        # Clickable index ("Índice"): one TocEntry per chapter title. Each entry
+        # becomes a real jump to that chapter's first page/slide once the document
+        # is laid out (the renderers register every chapter start and wire the
+        # links; ``target_id`` is matched against the chapter title). The cover only
+        # knows chapter titles, so the title doubles as the link target.
+        blocks.append(model.Heading(text="Índice", level=2))
+        for t in titles:
+            label = model._safe_str(t)
+            if not label:
+                continue
+            blocks.append(model.TocEntry(label=label, target_id=label))
    return blocks


@@ -213,9 +219,7 @@ def _derive_description(profile: dict, ctx: dict) -> str:
    score = profile.get("quality_score")
    if score is not None:
        parts.append(f"Calidad media estimada: {score}/100.")
-    parts.append(
-        "Resumen derivado del perfil; active la interpretación LLM (`run_llm`) "
-        "para una descripción de negocio más rica.")
+    parts.append("Resumen derivado del perfil.")
    return " ".join(parts)


@@ -259,7 +263,6 @@ def build_portada(profile: dict, ctx: dict):
    shape = f"{_fmt_int(n_rows)} filas × {_fmt_int(n_cols)} columnas"

    score = profile.get("quality_score")
-    quality_criteria = ctx.get("quality_criteria") or _DEFAULT_QUALITY_CRITERIA
    quality_value = "—" if score is None else f"{score} / 100"

    llm = _llm_block(profile, ctx)
@@ -282,8 +285,11 @@ def build_portada(profile: dict, ctx: dict):

    # Title + dataset size shown together and BIG (Heading) at the top, kept on
    # the same page (Group). The size is no longer buried in the metadata table.
+    # The dataset name is shown big and underlined on the PPTX cover slide
+    # (size_pt/underline are honoured by the PPTX renderer; the PDF ignores them).
    cover = [
-        model.Heading(text=str(dataset_name), level=1),
+        model.Heading(text=str(dataset_name), level=1, underline=True,
+                      size_pt=_PPTX_TITLE_PT),
        model.Markdown(text="**Automatic-EDA** · informe exploratorio automático"),
        model.Heading(text=shape, level=2),
    ]
@@ -295,7 +301,6 @@ def build_portada(profile: dict, ctx: dict):
            ("Almacenamiento", storage),
            ("Generado", when),
            ("Calidad", quality_value),
-            ("Criterios de calidad", quality_criteria),
        ]),
        model.Heading(text="Descripción", level=2),
        model.Markdown(text=str(description)),
@@ -34,6 +34,7 @@ CHAPTER_ORDER = [
    "text_distr",    # free-text / NLP distributions (non-tabular content)
    "calidad",       # data quality
    "missingness",   # missing-data patterns (co-occurrence of absences; MCAR/MAR)
+    "outliers",      # atypical values: univariate (Tukey/z) + multivariate (IsolationForest)
    "correlacion",   # correlations / associations
    "relaciones",    # key relations: declared/candidate PK + FK (inter/intra-table)
    "modelos",       # cheap models (PCA/KMeans/outliers)
@@ -72,24 +73,51 @@ def build_chapter(chapter_id: str, profile: dict, ctx: dict):
    return model.as_chapter(result)


-def build_document(profile: dict, ctx: dict = None) -> list:
-    """Build the full ordered list of chapters for a TableProfile.
+def build_document(profile: dict, ctx: dict = None, only: list = None) -> list:
+    """Build the ordered list of chapters for a TableProfile.

    Args:
        profile: the ``eda`` group TableProfile dict (may be None/empty).
        ctx: optional context dict carrying presentation metadata not present in
            the profile (dataset_name, source_origin, storage, generated_at,
            description, granularity, quality_criteria, head_rows, ...).
+        only: optional list of chapter ids to render. ``None`` (default) keeps
+            the historical behaviour — every implemented & applicable chapter in
+            canonical order. A list restricts the BODY to just those ids (in
+            canonical order), but the cover (``portada``) and glossary
+            (``glosario``) are ALWAYS included so the document stays valid and
+            the clickable terms keep a destination — so passing ``only=["x"]``
+            yields portada + x + glosario. Unknown ids are simply skipped (the
+            caller is responsible for strict validation). ``only=[]`` yields the
+            minimal document (portada + glosario only). This argument is additive
+            and backward-compatible: the signature is unchanged for existing
+            callers (default ``None``).

    Returns:
        list[Chapter] in canonical order, containing only the chapters that are
-        implemented and applicable. Never raises.
+        implemented, applicable and selected. Never raises.
    """
    if not isinstance(profile, dict):
        profile = {}
    # Copy ctx so the shared collector / summary we add do not leak to the caller.
    ctx = dict(ctx) if isinstance(ctx, dict) else {}

+    # only=None -> all body chapters (historical). only=list -> restrict body to
+    # that selection (portada/glosario are added unconditionally below). The
+    # renderers call build_document(profile, meta['ctx']) without an `only`
+    # argument, so the pipeline forwards the selection through a reserved ctx key
+    # (``_only_chapters``); an explicit `only` argument always wins. The key is
+    # popped from the local ctx copy so it never reaches the chapters.
+    if only is None:
+        _carried = ctx.pop("_only_chapters", None)
+        if isinstance(_carried, (list, tuple, set)):
+            only = list(_carried)
+    else:
+        ctx.pop("_only_chapters", None)
+    # A set makes the membership test cheap; the iteration order stays
+    # CHAPTER_ORDER. only=[] is a valid (empty) selection -> minimal document.
+    only_set = set(only) if isinstance(only, (list, tuple, set)) else None
+
    # A single glossary collector is shared by every chapter via ctx['glossary'].
    # Chapters call ctx['glossary'].add(key, label, definition) and mark in-text
    # appearances with [[term:key]]…[[/term]]; the glosario chapter renders the
@@ -105,6 +133,10 @@ def build_document(profile: dict, ctx: dict = None) -> list:
    for cid in CHAPTER_ORDER:
        if cid in (_PORTADA, _GLOSARIO):
            continue
+        # When a selection is given, skip body chapters outside it. portada and
+        # glosario are never filtered (handled out of this loop).
+        if only_set is not None and cid not in only_set:
+            continue
        ch = build_chapter(cid, profile, ctx)
        if ch is not None and ch.blocks:
            body.append(ch)
@@ -38,10 +38,18 @@ ENGINE_NAME = "AutomaticEDA"
 # --------------------------------------------------------------------------- #
@dataclass
 class Heading:
-    """A section heading. ``level`` 1 (largest) .. 3 (smallest)."""
+    """A section heading. ``level`` 1 (largest) .. 3 (smallest).
+
+    ``underline`` and ``size_pt`` are optional emphasis hints honoured by the
+    PPTX renderer (the cover uses them to show the dataset name big and
+    underlined). ``size_pt`` overrides the per-level font size when set; the PDF
+    renderer ignores both so its layout is unchanged.
+    """

    text: str = ""
    level: int = 1
+    underline: bool = False
+    size_pt: Optional[float] = None
    kind: str = field(default="heading", init=False)


@@ -62,10 +70,17 @@ class Markdown:

@dataclass
 class KVTable:
-    """A two-column key/value table. ``rows`` is a list of ``(label, value)``."""
+    """A two-column key/value table. ``rows`` is a list of ``(label, value)``.
+
+    ``value_align`` controls the horizontal alignment of the value column in the
+    PDF renderer: ``"left"`` (default) keeps values next to the label column;
+    ``"right"`` pins them to the right margin (used by the cover's analysis
+    summary so the numbers line up flush right).
+    """

    rows: list = field(default_factory=list)
    title: Optional[str] = None
+    value_align: str = "left"
    kind: str = field(default="kv_table", init=False)


@@ -145,11 +160,21 @@ class Group:
    a chapter can give each unit its own page — e.g. one categorical column per
    page (see CAT DISTR). It is purely additive: the default False keeps the plain
    keep-together behaviour for every existing chapter.
+
+    ``layout`` is a hint for how the group's children are arranged:
+    ``"stack"`` (default) keeps the historical top-to-bottom flow; ``"side_by_side"``
+    asks the PPTX renderer to place the group's table to the LEFT and its figure to
+    the RIGHT of the same slide (table ~55% width, figure ~45%), measuring so both
+    fit and falling back to stacking when they do not. The PDF renderer treats
+    ``"side_by_side"`` exactly like ``"stack"`` (the A5 mobile page is too narrow for
+    two readable columns). Unknown values degrade to ``"stack"``. Purely additive:
+    the default keeps every existing chapter unchanged.
    """

    blocks: list = field(default_factory=list)
    title: Optional[str] = None
    page_break_before: bool = False
+    layout: str = "stack"
    kind: str = field(default="group", init=False)


@@ -168,6 +193,22 @@ class GlossaryEntry:
    kind: str = field(default="glossary_entry", init=False)


+@dataclass
+class TocEntry:
+    """One clickable index (table-of-contents) entry shown on the cover.
+
+    Rendered as a single line — the chapter ``label`` in the accent link colour —
+    that, once the document is laid out, becomes a real click jumping to the first
+    page/slide of the target chapter (PDF link annotation via PyMuPDF; PPTX native
+    slide jump). ``target_id`` is matched against each chapter's ``id`` *and* its
+    ``title`` (the cover only knows chapter titles), so either resolves. If the
+    target cannot be resolved the entry still renders as plain text (never cut)."""
+
+    label: str = ""
+    target_id: str = ""
+    kind: str = field(default="toc_entry", init=False)
+
+
@dataclass
 class Chapter:
    """An ordered set of blocks with an id, a title and a generation version."""
@@ -192,13 +233,14 @@ _BLOCK_BY_KIND = {
    "note": Note,
    "group": Group,
    "glossary_entry": GlossaryEntry,
+    "toc_entry": TocEntry,
 }


 def as_block(obj: Any):
    """Coerce a value into a block dataclass. Unknown values become a Note."""
    if isinstance(obj, (Heading, Markdown, KVTable, DataTable, Figure, Image,
-                        Caption, Note, Group, GlossaryEntry)):
+                        Caption, Note, Group, GlossaryEntry, TocEntry)):
        if isinstance(obj, Group):
            obj.blocks = as_blocks(obj.blocks)
        return obj
@@ -210,13 +252,20 @@ def as_block(obj: Any):
        # Build only with fields the dataclass accepts (ignore extras).
        try:
            if cls is Heading:
+                size_pt = obj.get("size_pt")
                return Heading(text=_safe_str(obj.get("text")),
-                               level=int(obj.get("level", 1) or 1))
+                               level=int(obj.get("level", 1) or 1),
+                               underline=bool(obj.get("underline", False)),
+                               size_pt=(float(size_pt)
+                                        if isinstance(size_pt, (int, float))
+                                        else None))
            if cls is Markdown:
                return Markdown(text=_safe_str(obj.get("text")))
            if cls is KVTable:
                return KVTable(rows=list(obj.get("rows") or []),
-                               title=obj.get("title"))
+                               title=obj.get("title"),
+                               value_align=_safe_str(
+                                   obj.get("value_align")) or "left")
            if cls is DataTable:
                return DataTable(header=list(obj.get("header") or []),
                                 rows=list(obj.get("rows") or []),
@@ -237,11 +286,15 @@ def as_block(obj: Any):
                return Group(blocks=as_blocks(obj.get("blocks")),
                             title=obj.get("title"),
                             page_break_before=bool(
-                                 obj.get("page_break_before", False)))
+                                 obj.get("page_break_before", False)),
+                             layout=_safe_str(obj.get("layout")) or "stack")
            if cls is GlossaryEntry:
                return GlossaryEntry(key=_safe_str(obj.get("key")),
                                     label=_safe_str(obj.get("label")),
                                     definition=_safe_str(obj.get("definition")))
+            if cls is TocEntry:
+                return TocEntry(label=_safe_str(obj.get("label")),
+                                target_id=_safe_str(obj.get("target_id")))
        except Exception:  # noqa: BLE001 — never raise on a malformed block.
            return Note(text=_safe_str(obj))
    return Note(text=_safe_str(obj))
@@ -298,11 +298,16 @@ def test_cover_first_glossary_last_with_summary():
    headings = [b.text for b in cover.blocks if b.kind == "heading"]
    assert any("Resumen" in h for h in headings), \
        "la portada no incluye el resumen agregado"
-    # The summary reflects the body chapters (e.g. the numeric/categorical ones).
-    cover_text = " ".join(
-        b.text for b in cover.blocks if getattr(b, "kind", "") == "markdown")
-    assert "Distribuciones" in cover_text, \
-        "el resumen de portada no menciona los capítulos del cuerpo"
+    # The index ("Índice") is now a clickable list of TocEntry blocks (one per
+    # body chapter), not a markdown bullet list. Verify both the heading and that
+    # the entries name the body chapters.
+    assert any("Índice" in h for h in headings), \
+        "la portada no incluye la sección Índice"
+    toc_labels = " ".join(
+        getattr(b, "label", "") for b in cover.blocks
+        if getattr(b, "kind", "") == "toc_entry")
+    assert "Distribuciones" in toc_labels, \
+        "el índice de portada no menciona los capítulos del cuerpo"


 # --------------------------------------------------------------------------- #
@@ -46,11 +46,23 @@ _MUTED = "#8a8a8a"
 _RULE = "#cccccc"
 _HEAD_BG = "#eef3f6"

+# Rasterization DPI for every embedded raster (figure/table image) AND for the
+# page save itself. Raised from the old 150/default-100 to 220 so a reader can
+# pinch-zoom on a phone and still see crisp detail (axis labels, table cells)
+# without pixelation. Text stays vectorial (pdf.fonttype=42) so it remains
+# selectable regardless of DPI — only the embedded images gain resolution. 220 is
+# a deliberate balance: noticeably sharper than 150 while keeping the file size
+# reasonable. ``savefig.dpi`` matters because matplotlib re-rasterizes each
+# ``imshow`` when PdfPages writes the page; without it the final image would land
+# at ~100 dpi no matter how sharp the intermediate PNG was.
+_RASTER_DPI = 220
+
 _RC = {
    "font.size": 10,
    "font.family": "sans-serif",
    "figure.facecolor": "white",
    "savefig.facecolor": "white",
+    "savefig.dpi": _RASTER_DPI,
    "pdf.fonttype": 42,  # embed TrueType — text stays selectable on mobile.
 }

@@ -80,6 +92,10 @@ class _PdfState:
        # points (1/72") with a top-left origin — same convention as PyMuPDF.
        self.term_sources = []       # [{key, page, rect:[x0,y0,x1,y1]}]
        self.term_dests = {}         # key -> {page, point:[x,y]}
+        # Clickable index (cover → chapter). Sources are the cover's TocEntry
+        # rects; chapter_starts maps a chapter id AND its title to its first page.
+        self.toc_sources = []        # [{target_id, page, rect:[x0,y0,x1,y1]}]
+        self.chapter_starts = {}     # id|title -> {page, point:[x,y]}


 # --------------------------------------------------------------------------- #
@@ -317,10 +333,18 @@ def _place_kv_table(st: _PdfState, block) -> None:
    if title:
        _place_heading(st, model.Heading(title, level=2))
    rows = getattr(block, "rows", []) or []
+    # ``value_align="right"`` pins the value column to the right margin (label
+    # left, number flush right) — used by the cover's analysis summary.
+    right = str(getattr(block, "value_align", "left")).lower() == "right"
    key_w = 1.9  # inches reserved for the label column.
+    # Right-aligned values wrap against the full usable width minus the label
+    # column; left-aligned values wrap against the value column only.
    val_chars = tl.chars_per_line(_USABLE_W - key_w - 0.1, _FS_BODY)
    lh = tl.line_height_in(_FS_BODY)
-    for row in rows:
+    # ``data_idx`` is the 0-based logical row index: even rows (1-based) are
+    # zebra-shaded → 0-based odd indices, matching the data-table convention so
+    # every table in the document carries the same striping.
+    for data_idx, row in enumerate(rows):
        try:
            label, value = row[0], row[1]
        except Exception:  # noqa: BLE001
@@ -329,11 +353,25 @@ def _place_kv_table(st: _PdfState, block) -> None:
        row_h = lh * len(v_lines) + _ROW_VPAD
        _ensure_space(st, row_h)
        y0 = st.y
+        # Faint zebra fill for even rows, drawn first (zorder 0) so striping
+        # never hides the text/value drawn on top.
+        if data_idx % 2 == 1:
+            st.fig.add_artist(Rectangle(
+                (_xf(_ML), _yf(y0 + row_h)), _xf(_ML + _USABLE_W) - _xf(_ML),
+                _yf(y0) - _yf(y0 + row_h), transform=st.fig.transFigure,
+                color=_ZEBRA, lw=0, zorder=0))
        st.fig.text(_xf(_ML), _yf(y0), tl.strip_inline_md(model._safe_str(label)),
-                    fontsize=_FS_BODY, color=_MUTED, ha="left", va="top")
+                    fontsize=_FS_BODY, color=_MUTED, ha="left", va="top",
+                    zorder=2)
        for k, vl in enumerate(v_lines):
+            if right:
+                st.fig.text(_xf(_ML + _USABLE_W), _yf(y0 + k * lh), vl,
+                            fontsize=_FS_BODY, color=_INK, ha="right",
+                            va="top", zorder=2)
+            else:
                st.fig.text(_xf(_ML + key_w), _yf(y0 + k * lh), vl,
-                        fontsize=_FS_BODY, color=_INK, ha="left", va="top")
+                            fontsize=_FS_BODY, color=_INK, ha="left",
+                            va="top", zorder=2)
        st.y = y0 + row_h
    st.y += _GAP

@@ -363,6 +401,57 @@ def _col_widths(header: list, rows: list, fs: float) -> list:
    return widths


+# Minimal legible characters reserved per column when deciding whether a table
+# can be shown as selectable text. Below this width per column the cells become
+# unreadable, so the table is rasterized to a zoomable high-res image instead.
+_MIN_LEGIBLE_CHARS = 8
+
+
+def _table_fits_as_text(header: list, rows: list) -> bool:
+    """True when the table fits the usable width as readable text.
+
+    A table whose columns cannot each get a minimal legible width within the A5
+    usable width (typically many columns, e.g. a 19-column ``df.head``) is flagged
+    so it is rendered as a single high-resolution image — the reader zooms in on
+    the phone and reads every cell, nothing cut — instead of being squeezed until
+    unreadable. Narrow tables (few columns) keep the selectable-text rendering."""
+    header = header or []
+    rows = rows or []
+    ncol = len(header) if header else (len(rows[0]) if rows else 1)
+    ncol = max(1, ncol)
+    cw = tl.avg_char_width_in(_FS_CELL)
+    min_needed = ncol * (_MIN_LEGIBLE_CHARS * cw + _CELL_PAD * 2)
+    return min_needed <= _USABLE_W
+
+
+def _table_figure_block(block):
+    """Wrap a too-wide table as a lazily-rasterized Figure (cached on the block).
+
+    The table is drawn once via ``render_table_as_figure`` (header shading + zebra)
+    and embedded as one high-res image scaled to fit entirely. The same Figure is
+    reused for measuring and placing so keep-together stays consistent. The table
+    title/note are drawn inside the image (self-describing when zoomed/shared), so
+    the block-level caption is left empty to avoid a duplicate title."""
+    cached = getattr(block, "_aeda_tablefig", None)
+    if cached is not None:
+        return cached
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    title = getattr(block, "title", None)
+    note = getattr(block, "note", None)
+
+    def _make():
+        from datascience.render_table_as_figure import render_table_as_figure
+        return render_table_as_figure(header, rows, title=title, note=note)
+
+    fig = model.Figure(make=_make, caption=None)
+    try:
+        block._aeda_tablefig = fig
+    except Exception:  # noqa: BLE001 — block may reject attributes; degrade.
+        pass
+    return fig
+
+
 def _wrap_row(cells: list, widths: list, fs: float) -> list:
    """Wrap each cell to its column width → list of line-lists per cell."""
    out = []
@@ -402,11 +491,16 @@ def _draw_table_row(st: _PdfState, cells_lines: list, widths: list, fs: float,


 def _place_data_table(st: _PdfState, block) -> None:
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Too many columns to be legible as text → render the whole table as one
+    # high-res image, scaled to fit entirely (the reader zooms to read it).
+    if not _table_fits_as_text(header, rows):
+        _place_figure(st, _table_figure_block(block))
+        return
    title = getattr(block, "title", None)
    if title:
        _place_heading(st, model.Heading(title, level=2))
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows, fs)
    header_lines = _wrap_row(header, widths, fs) if header else None
@@ -464,8 +558,11 @@ def _resolve_figure(block):


 def _png_from_figure(fig) -> bytes:
+    # ``bbox_inches='tight'`` is kept so the real aspect ratio is what we measure
+    # and place. The page save (savefig.dpi in _RC) re-rasterizes this at the same
+    # high DPI, so the embedded image stays crisp for phone zoom.
    buf = io.BytesIO()
-    fig.savefig(buf, format="png", dpi=150, bbox_inches="tight")
+    fig.savefig(buf, format="png", dpi=_RASTER_DPI, bbox_inches="tight")
    buf.seek(0)
    return buf.read()

@@ -707,12 +804,16 @@ def _measure_data_table(block) -> float:
    Counts the optional title heading, the wrapped header row, every wrapped data
    row (per-column wrap via the same ``_col_widths``/``_wrap_row`` the placer
    uses) and the optional note. Keep this in sync with ``_place_data_table``."""
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Mirror the placer: a too-wide table is drawn as a single image, so its
+    # keep-together height is the image's, not the (squeezed) text layout's.
+    if not _table_fits_as_text(header, rows):
+        return _measure_figure_like(_table_figure_block(block))
    h = 0.0
    title = getattr(block, "title", None)
    if title:
        h += _measure_heading_text(title, 2)
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows, fs)
    lh = tl.line_height_in(fs)
@@ -744,6 +845,10 @@ def _measure_block(st: _PdfState, block) -> float:
            lines = tl.wrap(getattr(block, "text", ""),
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + _GAP
+        if kind == "toc_entry":
+            lines = tl.wrap(tl.strip_inline_md(getattr(block, "label", "")),
+                            tl.chars_per_line(_USABLE_W - 0.22, _FS_BODY)) or [""]
+            return tl.line_height_in(_FS_BODY) * len(lines) + _GAP * 0.4
        if kind == "kv_table":
            return _measure_kv_table(block)
        if kind == "data_table":
@@ -828,6 +933,38 @@ def _place_glossary_entry(st: _PdfState, block) -> None:
    st.y += _GAP * 0.5


+def _place_toc_entry(st: _PdfState, block) -> None:
+    """Render one clickable index line and record it as a link source.
+
+    Drawn as a bulleted line in the accent link colour; its rectangle is recorded
+    in ``st.toc_sources`` so the post-processor turns it into a real jump to the
+    target chapter's first page. If the target is never resolved the line still
+    shows as plain (accent) text — never cut, never broken."""
+    label = tl.strip_inline_md(getattr(block, "label", "")) or ""
+    target_id = getattr(block, "target_id", "") or ""
+    fs = _FS_BODY
+    lh = tl.line_height_in(fs)
+    bullet = "•  "
+    indent = 0.22
+    max_chars = tl.chars_per_line(_USABLE_W - indent, fs)
+    lines = tl.wrap(label, max_chars) or [""]
+    for idx, ln in enumerate(lines):
+        _ensure_space(st, lh)
+        x = _ML
+        st.fig.text(_xf(x), _yf(st.y), bullet if idx == 0 else "   ",
+                    fontsize=fs, color=_LINK, ha="left", va="top")
+        x += indent
+        w = _text_width_in(st, ln, fs, False)
+        st.fig.text(_xf(x), _yf(st.y), ln, fontsize=fs, color=_LINK,
+                    ha="left", va="top")
+        if target_id and idx == 0:
+            st.toc_sources.append({
+                "target_id": target_id, "page": st.page - 1,
+                "rect": _pt_rect(_ML, st.y, x + w, st.y + lh)})
+        st.y += lh
+    st.y += _GAP * 0.4
+
+
 _PLACERS = {
    "heading": _place_heading,
    "markdown": _place_markdown,
@@ -839,6 +976,7 @@ _PLACERS = {
    "note": _place_note,
    "group": _place_group,
    "glossary_entry": _place_glossary_entry,
+    "toc_entry": _place_toc_entry,
 }


@@ -870,6 +1008,15 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:
                    st.chapter = ch
                    st.chapter_pages = 0
                    _new_page(st)  # each chapter starts on a fresh page.
+                    # Record this chapter's first page as a link target for the
+                    # cover index (keyed by id AND title, since the cover only
+                    # knows titles). Point is the top of the content area.
+                    _start = {"page": st.page - 1,
+                              "point": [_ML * 72.0, _CONTENT_TOP * 72.0]}
+                    if ch.id:
+                        st.chapter_starts[ch.id] = _start
+                    if getattr(ch, "title", ""):
+                        st.chapter_starts.setdefault(ch.title, _start)
                    for block in ch.blocks:
                        placer = _PLACERS.get(getattr(block, "kind", ""),
                                              _place_note)
@@ -902,7 +1049,7 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:

    note = f"{n_pages} páginas"
    if n_links:
-        note += f" · {n_links} enlaces de glosario"
+        note += f" · {n_links} enlaces internos"
    if notes:
        note += " · " + "; ".join(notes)
    return {"path": out_path, "n_pages": n_pages, "chapters": chapters_meta,
@@ -910,9 +1057,11 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:


 def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
-    """Build {source rect → glossary dest} links and apply them via PyMuPDF.
+    """Apply internal PDF links via PyMuPDF: glossary terms + the cover index.

-    Returns the number of links applied (0 if there is nothing to wire or the
+    Builds two sets of GOTO links — every in-text glossary term → its entry, and
+    every cover ``TocEntry`` → its chapter's first page — and applies them in one
+    pass. Returns the number of links applied (0 if there is nothing to wire or the
    post-processor is unavailable). Never raises."""
    try:
        links = []
@@ -923,6 +1072,14 @@ def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
            links.append({
                "src_page": src["page"], "src_rect": src["rect"],
                "dst_page": dest["page"], "dst_point": dest["point"]})
+        # Cover index → chapter first page (clickable, navigable table of contents).
+        for src in st.toc_sources:
+            dest = st.chapter_starts.get(src.get("target_id"))
+            if not dest:
+                continue
+            links.append({
+                "src_page": src["page"], "src_rect": src["rect"],
+                "dst_page": dest["page"], "dst_point": dest["point"]})
        if not links:
            return 0
        from datascience.add_pdf_internal_links import add_pdf_internal_links
@@ -930,7 +1087,7 @@ def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
        if isinstance(res, dict) and res.get("status") == "ok":
            return int(res.get("n_links") or 0)
        if isinstance(res, dict) and res.get("error"):
-            notes.append(f"glosario sin enlaces: {res.get('error')}")
+            notes.append(f"enlaces internos no aplicados: {res.get('error')}")
    except Exception as e:  # noqa: BLE001 — links are best-effort.
-        notes.append(f"glosario sin enlaces: {e}")
+        notes.append(f"enlaces internos no aplicados: {e}")
    return 0
@@ -51,6 +51,12 @@ _FS_H1, _FS_H2, _FS_H3 = 20, 16, 13
 _FS_BODY, _FS_CELL, _FS_NOTE = 14, 11, 11
 _GAP = 0.12

+# Rasterization DPI for every embedded figure/table image. Raised from 150 to 220
+# so a viewer can zoom into a slide (or a shared picture) and read crisp detail —
+# axis labels, table cells — without pixelation. Kept moderate so the deck size
+# stays reasonable. Same value as the PDF renderer.
+_RASTER_DPI = 220
+

 class _PptxState:
    def __init__(self, prs, title: str):
@@ -65,6 +71,10 @@ class _PptxState:
        # Glossary wiring (mejora 6): runs to link and per-term target slide.
        self.term_runs = []           # [(key, run)]
        self.term_anchor_slide = {}   # key -> Slide (glossary entry)
+        # Clickable index (cover → chapter). toc_runs are the cover's index runs;
+        # chapter_starts maps a chapter id AND its title to its first slide.
+        self.toc_runs = []            # [(target_id, run, src_slide)]
+        self.chapter_starts = {}      # id|title -> Slide (chapter first slide)


 def _rgb(c):
@@ -135,7 +145,7 @@ def _ensure(st: _PptxState, height: float) -> None:


 def _add_text(st: _PptxState, lines: list, fs: float, color, bold=False,
-              italic=False, indent=0.0, bullet=False) -> None:
+              italic=False, indent=0.0, bullet=False, underline=False) -> None:
    lh = tl.line_height_in(fs)
    height = lh * len(lines) + 0.05
    _ensure(st, height)
@@ -153,6 +163,7 @@ def _add_text(st: _PptxState, lines: list, fs: float, color, bold=False,
        run.font.size = Pt(fs)
        run.font.bold = bold
        run.font.italic = italic
+        run.font.underline = underline
        run.font.color.rgb = _rgb(color)
    st.y += height

@@ -206,10 +217,16 @@ def _add_rich_text(st: _PptxState, rich_lines: list, fs: float, color,
 def _place_heading(st: _PptxState, block) -> None:
    level = max(1, min(3, int(getattr(block, "level", 1) or 1)))
    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
+    # Optional per-heading emphasis (cover dataset name): a larger font and an
+    # underline. ``size_pt`` overrides the per-level size when set.
+    size_override = getattr(block, "size_pt", None)
+    if isinstance(size_override, (int, float)) and size_override > 0:
+        fs = float(size_override)
+    underline = bool(getattr(block, "underline", False))
    text = tl.strip_inline_md(getattr(block, "text", ""))
    st.last_heading = text or st.last_heading
    lines = tl.wrap(text, tl.chars_per_line(_USABLE_W, fs))
-    _add_text(st, lines, fs, _INK, bold=True)
+    _add_text(st, lines, fs, _INK, bold=True, underline=underline)
    st.y += 0.04


@@ -302,6 +319,58 @@ def _col_widths(header, rows):
    return [_USABLE_W * w / total for w in clamped]


+# Minimal legible characters reserved per column when deciding whether a table
+# can be shown as a native (selectable) PowerPoint table. Below this width per
+# column the cells become unreadable, so the table is rasterized to a zoomable
+# high-res image instead. The 16:9 slide is wide, so more columns fit than on A5.
+_MIN_LEGIBLE_CHARS = 8
+_CELL_PAD = 0.05
+
+
+def _table_fits_as_text(header: list, rows: list) -> bool:
+    """True when the table fits the usable slide width as a readable table.
+
+    A table whose columns cannot each get a minimal legible width within the slide
+    usable width (typically many columns, e.g. a 19-column ``df.head``) is flagged
+    so it is rendered as one high-resolution image — the viewer zooms in and reads
+    every cell — instead of being squeezed unreadable. Narrow tables keep the
+    native selectable table."""
+    header = header or []
+    rows = rows or []
+    ncol = len(header) if header else (len(rows[0]) if rows else 1)
+    ncol = max(1, ncol)
+    cw = tl.avg_char_width_in(_FS_CELL)
+    min_needed = ncol * (_MIN_LEGIBLE_CHARS * cw + _CELL_PAD * 2)
+    return min_needed <= _USABLE_W
+
+
+def _table_figure_block(block):
+    """Wrap a too-wide table as a lazily-rasterized Figure (cached on the block).
+
+    Drawn once via ``render_table_as_figure`` (header shading + zebra) and embedded
+    as one high-res image scaled to fit entirely. The title/note are drawn inside
+    the image (self-describing when zoomed/shared), so no separate caption is
+    emitted. Reused for measuring and placing so keep-together stays consistent."""
+    cached = getattr(block, "_aeda_tablefig", None)
+    if cached is not None:
+        return cached
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    title = getattr(block, "title", None)
+    note = getattr(block, "note", None)
+
+    def _make():
+        from datascience.render_table_as_figure import render_table_as_figure
+        return render_table_as_figure(header, rows, title=title, note=note)
+
+    fig = model.Figure(make=_make, caption=None)
+    try:
+        block._aeda_tablefig = fig
+    except Exception:  # noqa: BLE001 — block may reject attributes; degrade.
+        pass
+    return fig
+
+
 def _row_height_in(cells, widths, fs) -> float:
    lh = tl.line_height_in(fs)
    maxlines = 1
@@ -365,11 +434,27 @@ def _style_cell(cell, fs, color, bold, fill) -> None:

 def _place_data_table(st: _PptxState, block, shaded_header=True,
                      key_value=False) -> None:
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Too many columns to be legible as a native table → render the whole table as
+    # one high-res picture, scaled to fit entirely (the viewer zooms to read it).
+    # KVTables (rendered here as a 2-column Campo/Valor table) are excluded: they
+    # always fit in width and stay as a selectable table.
+    if not key_value and not _table_fits_as_text(header, rows):
+        figblock = _table_figure_block(block)
+        data, _asp = _figure_bytes_cached(figblock)
+        if data is None:
+            _add_text(st, ["(tabla no disponible)"], _FS_NOTE, _MUTED,
+                      italic=True)
+            st.y += _GAP
+            return
+        _place_picture_bytes(st, data, None,
+                             max_h_in=getattr(figblock, "height_in", None),
+                             force_caption=False)
+        return
    title = getattr(block, "title", None)
    if title:
        _place_heading(st, model.Heading(title, level=2))
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows)
    header_h = _row_height_in(header, widths, fs) if header else 0.0
@@ -429,7 +514,7 @@ def _resolve_png(block):
    try:
        import matplotlib.pyplot as plt
        buf = io.BytesIO()
-        f.savefig(buf, format="png", dpi=150, bbox_inches="tight")
+        f.savefig(buf, format="png", dpi=_RASTER_DPI, bbox_inches="tight")
        buf.seek(0)
        return buf.read()
    except Exception:  # noqa: BLE001
@@ -476,12 +561,15 @@ def _figure_bytes_cached(block):


 def _place_picture_bytes(st: _PptxState, data: bytes, caption,
-                         max_h_in=None) -> None:
+                         max_h_in=None, force_caption=True) -> None:
    # Mejora 4 — every figure on a slide carries a visible caption/title. If the
    # block has no caption, fall back to the current section heading, then to a
-    # generic label, so no image is ever shown untitled.
-    caption = (model._safe_str(caption).strip()
-               or model._safe_str(st.last_heading).strip() or "Figura")
+    # generic label, so no image is ever shown untitled. ``force_caption=False``
+    # suppresses that fallback (used for table images, whose title is inside the
+    # picture) so no redundant caption is drawn.
+    caption = model._safe_str(caption).strip()
+    if not caption and force_caption:
+        caption = model._safe_str(st.last_heading).strip() or "Figura"
    w_px, h_px = _img_size_px(data)
    aspect = (h_px / w_px) if w_px else 0.66
    # Reserve the caption's REAL (possibly multi-line) height FIRST, then scale
@@ -489,9 +577,11 @@ def _place_picture_bytes(st: _PptxState, data: bytes, caption,
    # so its caption always fits on the SAME slide and no image is untitled.
    # cap_real = what _add_text consumes; cap_reserve adds the post-image gap and
    # a small cushion so the caption never spills to the next slide.
-    cap_lines = tl.wrap(caption, tl.chars_per_line(_USABLE_W, _FS_NOTE))
-    cap_real = tl.line_height_in(_FS_NOTE) * len(cap_lines) + 0.05
-    cap_reserve = cap_real + 0.05 + 0.10
+    cap_lines = tl.wrap(caption, tl.chars_per_line(_USABLE_W, _FS_NOTE)) \
+        if caption else []
+    cap_real = (tl.line_height_in(_FS_NOTE) * len(cap_lines) + 0.05) \
+        if cap_lines else 0.0
+    cap_reserve = (cap_real + 0.05 + 0.10) if cap_lines else 0.05
    max_h = _CONTENT_BOTTOM - _CONTENT_TOP
    # height_in hint (model.Figure/Image): cap the target height so a figure in a
    # keep-together Group shrinks to leave room for its heading and text.
@@ -510,6 +600,7 @@ def _place_picture_bytes(st: _PptxState, data: bytes, caption,
    st.slide.shapes.add_picture(io.BytesIO(data), Inches(left), Inches(st.y),
                                width=Inches(target_w), height=Inches(target_h))
    st.y += target_h + 0.05
+    if cap_lines:
        _add_text(st, cap_lines, _FS_NOTE, _MUTED, italic=True)
    st.y += _GAP

@@ -552,9 +643,11 @@ def _place_note(st: _PptxState, block) -> None:
 # WITHOUT drawing it so a Group can move whole to the next slide before drawing.
 # Over-estimating only triggers an earlier slide break, never a content cut.
 # --------------------------------------------------------------------------- #
-def _measure_heading_text(text: str, level: int) -> float:
+def _measure_heading_text(text: str, level: int, size_pt=None) -> float:
    level = max(1, min(3, int(level or 1)))
    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
+    if isinstance(size_pt, (int, float)) and size_pt > 0:
+        fs = float(size_pt)
    lines = tl.wrap(tl.strip_inline_md(text), tl.chars_per_line(_USABLE_W, fs))
    return tl.line_height_in(fs) * len(lines) + 0.05 + 0.04

@@ -654,12 +747,16 @@ def _measure_kv_table(block) -> float:
 def _measure_data_table(block) -> float:
    """Faithful DataTable height — matches ``_place_data_table`` (title heading +
    wrapped header + every wrapped row + optional note). Keep in sync."""
+    header = list(getattr(block, "header", []) or [])
+    rows = list(getattr(block, "rows", []) or [])
+    # Mirror the placer: a too-wide table is drawn as one image, so its
+    # keep-together height is the image's, not the (squeezed) table layout's.
+    if not _table_fits_as_text(header, rows):
+        return _measure_figure_like(_table_figure_block(block))
    h = 0.0
    title = getattr(block, "title", None)
    if title:
        h += _measure_heading_text(title, 2)
-    header = list(getattr(block, "header", []) or [])
-    rows = list(getattr(block, "rows", []) or [])
    fs = _FS_CELL
    widths = _col_widths(header, rows)
    if header:
@@ -679,7 +776,8 @@ def _measure_block(st: _PptxState, block) -> float:
    try:
        if kind == "heading":
            return _measure_heading_text(getattr(block, "text", ""),
-                                         getattr(block, "level", 1))
+                                         getattr(block, "level", 1),
+                                         size_pt=getattr(block, "size_pt", None))
        if kind == "markdown":
            return _measure_markdown(block)
        if kind in ("figure", "image"):
@@ -688,6 +786,10 @@ def _measure_block(st: _PptxState, block) -> float:
            lines = tl.wrap(getattr(block, "text", ""),
                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
            return tl.line_height_in(_FS_NOTE) * len(lines) + 0.05 + _GAP
+        if kind == "toc_entry":
+            lines = tl.wrap(tl.strip_inline_md(getattr(block, "label", "")),
+                            tl.chars_per_line(_USABLE_W - 0.3, _FS_BODY)) or [""]
+            return tl.line_height_in(_FS_BODY) * len(lines) + 0.05
        if kind == "kv_table":
            return _measure_kv_table(block)
        if kind == "data_table":
@@ -800,6 +902,73 @@ def _fit_group_blocks(st: _PptxState, blocks: list, avail_full: float) -> list:
    return out


+def _fit_img(width_col: float, aspect: float, max_h: float):
+    """Scale an image to ``width_col`` then clamp to ``max_h`` keeping aspect."""
+    w = width_col
+    h = w * aspect
+    if h > max_h:
+        h = max_h
+        w = (h / aspect) if aspect else width_col
+    return w, h
+
+
+def _place_group_side_by_side(st: _PptxState, block, avail_full: float) -> bool:
+    """Place a Group's table (left ~55%) next to its figure (right ~45%).
+
+    Both the table and the figure are rasterized to high-res images and placed in
+    two columns of the SAME slide; any other blocks (e.g. a heading) render full
+    width above the pair, the rest below. Returns True on success; returns False
+    (so the caller falls back to stacking) when the group has no table+figure pair
+    or the pair cannot fit side by side on one slide. Never raises by itself."""
+    blocks = getattr(block, "blocks", []) or []
+    tbl = next((b for b in blocks
+                if getattr(b, "kind", "") in ("data_table", "kv_table")), None)
+    fig = next((b for b in blocks
+                if getattr(b, "kind", "") in ("figure", "image")), None)
+    if tbl is None or fig is None:
+        return False
+    gap_col = 0.3
+    left_w = _USABLE_W * 0.55 - gap_col / 2.0
+    right_w = _USABLE_W * 0.45 - gap_col / 2.0
+    if left_w <= 1.0 or right_w <= 1.0:
+        return False
+    tdata, tasp = _figure_bytes_cached(_table_figure_block(tbl))
+    fdata, fasp = _figure_bytes_cached(fig)
+    if not tdata or not fdata:
+        return False
+    ti, fi = blocks.index(tbl), blocks.index(fig)
+    lo = min(ti, fi)
+    lead = list(blocks[:lo])
+    rest = [b for b in blocks[lo + 1:] if b is not tbl and b is not fig]
+    lead_h = sum(_measure_block(st, b) for b in lead)
+    rest_h = sum(_measure_block(st, b) for b in rest)
+    col_max_h = avail_full - lead_h - rest_h - _GAP * 2
+    if col_max_h < 1.2:
+        return False  # not enough vertical room to put the pair side by side.
+    tw, th = _fit_img(left_w, tasp, col_max_h)
+    fw, fh = _fit_img(right_w, fasp, col_max_h)
+    band = max(th, fh)
+    needed = lead_h + band + rest_h + _GAP * 2
+    if needed > avail_full:
+        return False  # taller than a whole slide even side by side → stack.
+    if needed > _remaining(st):
+        _new_slide(st, cont=True)
+    for b in lead:
+        _PLACERS.get(getattr(b, "kind", ""), _place_note)(st, b)
+    top = st.y
+    f_left = _ML + left_w + gap_col
+    st.slide.shapes.add_picture(
+        io.BytesIO(tdata), Inches(_ML + (left_w - tw) / 2.0),
+        Inches(top + (band - th) / 2.0), width=Inches(tw), height=Inches(th))
+    st.slide.shapes.add_picture(
+        io.BytesIO(fdata), Inches(f_left + (right_w - fw) / 2.0),
+        Inches(top + (band - fh) / 2.0), width=Inches(fw), height=Inches(fh))
+    st.y = top + band + _GAP
+    for b in rest:
+        _PLACERS.get(getattr(b, "kind", ""), _place_note)(st, b)
+    return True
+
+
 def _place_group(st: _PptxState, block) -> None:
    """Render a keep-together Group: move it whole to the next slide if needed."""
    blocks = getattr(block, "blocks", []) or []
@@ -810,6 +979,14 @@ def _place_group(st: _PptxState, block) -> None:
    if getattr(block, "page_break_before", False) and st.y > _CONTENT_TOP + 1e-6:
        _new_slide(st, cont=True)
    avail_full = _CONTENT_BOTTOM - _CONTENT_TOP
+    # layout="side_by_side": try table-left / figure-right on one slide; on any
+    # reason it can't, fall through to the normal stacked keep-together below.
+    if str(getattr(block, "layout", "stack")).lower() == "side_by_side":
+        try:
+            if _place_group_side_by_side(st, block, avail_full):
+                return
+        except Exception:  # noqa: BLE001 — degrade to stacking, never abort.
+            pass
    # Trim oversized tables first (keeps the chart on the same slide), then shrink
    # the figure to share the remaining room.
    blocks = _fit_group_blocks(st, blocks, avail_full)
@@ -843,6 +1020,44 @@ def _place_glossary_entry(st: _PptxState, block) -> None:
    st.y += _GAP


+def _place_toc_entry(st: _PptxState, block) -> None:
+    """Render one clickable index line and record its run as a link source.
+
+    Drawn as a bulleted line in the accent link colour; the run is recorded in
+    ``st.toc_runs`` so it later becomes a native slide-jump to the target chapter's
+    first slide. If the target is never resolved the line still shows as plain
+    (accent) text — never cut."""
+    label = tl.strip_inline_md(getattr(block, "label", "")) or ""
+    target_id = getattr(block, "target_id", "") or ""
+    fs = _FS_BODY
+    lines = tl.wrap(label, tl.chars_per_line(_USABLE_W - 0.3, fs)) or [""]
+    lh = tl.line_height_in(fs)
+    height = lh * len(lines) + 0.05
+    _ensure(st, height)
+    box = st.slide.shapes.add_textbox(
+        Inches(_ML), Inches(st.y), Inches(_USABLE_W), Inches(height))
+    tf = box.text_frame
+    tf.word_wrap = True
+    first = True
+    link_run = None
+    for idx, ln in enumerate(lines):
+        p = tf.paragraphs[0] if first else tf.add_paragraph()
+        first = False
+        r0 = p.add_run()
+        r0.text = "•  " if idx == 0 else "   "
+        r0.font.size = Pt(fs)
+        r0.font.color.rgb = _rgb(_LINK)
+        run = p.add_run()
+        run.text = ln
+        run.font.size = Pt(fs)
+        run.font.color.rgb = _rgb(_LINK)
+        if idx == 0:
+            link_run = run
+    if target_id and link_run is not None:
+        st.toc_runs.append((target_id, link_run, st.slide))
+    st.y += height
+
+
 _PLACERS = {
    "heading": _place_heading,
    "markdown": _place_markdown,
@@ -854,6 +1069,7 @@ _PLACERS = {
    "note": _place_note,
    "group": _place_group,
    "glossary_entry": _place_glossary_entry,
+    "toc_entry": _place_toc_entry,
 }


@@ -889,6 +1105,12 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:
            st.chapter = ch
            st.chapter_slides = 0
            _new_slide(st, cont=False)
+            # Record this chapter's first slide as a link target for the cover
+            # index (keyed by id AND title, since the cover only knows titles).
+            if ch.id:
+                st.chapter_starts[ch.id] = st.slide
+            if getattr(ch, "title", ""):
+                st.chapter_starts.setdefault(ch.title, st.slide)
            for block in ch.blocks:
                placer = _PLACERS.get(getattr(block, "kind", ""), _place_note)
                try:
@@ -916,7 +1138,7 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:

    note = f"{n_slides} slides"
    if n_links:
-        note += f" · {n_links} enlaces de glosario"
+        note += f" · {n_links} enlaces internos"
    if notes:
        note += " · " + "; ".join(notes)
    return {"path": out_path, "n_slides": n_slides, "chapters": chapters_meta,
@@ -924,19 +1146,21 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:


 def _wire_glossary_links(st: _PptxState, notes: list) -> int:
-    """Turn each recorded term run into a native jump to its glossary slide.
+    """Apply native slide-jumps: glossary terms + the cover index.

-    Returns the number of links applied. A term whose only appearance is inside
-    its own glossary entry (source slide == target slide) is skipped. Never
+    Each in-text glossary term run jumps to its glossary entry slide, and each
+    cover ``TocEntry`` run jumps to its chapter's first slide. Returns the total
+    number of links applied. A run whose target is its own slide is skipped. Never
    raises."""
-    if not st.term_runs or not st.term_anchor_slide:
+    if not (st.term_runs and st.term_anchor_slide) and not (
+            st.toc_runs and st.chapter_starts):
        return 0
-    linked = 0
    try:
        from datascience.pptx_link_run_to_slide import pptx_link_run_to_slide
    except Exception as e:  # noqa: BLE001
-        notes.append(f"glosario sin enlaces: {e}")
+        notes.append(f"enlaces internos no aplicados: {e}")
        return 0
+    linked = 0
    for key, run, src_slide in st.term_runs:
        tgt = st.term_anchor_slide.get(key)
        if tgt is None or tgt is src_slide:
@@ -946,4 +1170,14 @@ def _wire_glossary_links(st: _PptxState, notes: list) -> int:
                linked += 1
        except Exception:  # noqa: BLE001 — links are best-effort.
            pass
+    # Cover index → chapter first slide (clickable, navigable table of contents).
+    for target_id, run, src_slide in st.toc_runs:
+        tgt = st.chapter_starts.get(target_id)
+        if tgt is None or tgt is src_slide:
+            continue
+        try:
+            if pptx_link_run_to_slide(run, src_slide, tgt):
+                linked += 1
+        except Exception:  # noqa: BLE001 — links are best-effort.
+            pass
    return linked
@@ -0,0 +1,283 @@
+"""Golden tests for the global render-quality features (issue: eda-render-quality).
+
+Covers, with executable evidence:
+  * High DPI: every embedded figure is rasterized at 220 dpi, so a phone reader
+    can zoom in and still see crisp detail.
+  * Wide table → image: a table too wide to be legible as text (e.g. a 19-column
+    df.head) is rendered as one high-res image that scales to fit entirely, while
+    a narrow table keeps its selectable-text/native-table rendering.
+  * ``Group(layout="side_by_side")``: in PPTX the table and figure are placed in
+    two columns of the same slide; in PDF the same group stacks vertically.
+  * Backward compatibility: a Group without ``layout`` defaults to ``"stack"`` and
+    a fitting table renders exactly as before.
+
+Renderers are invoked for real; PDFs are inspected with PyMuPDF and PPTX decks
+with python-pptx.
+"""
+
+from __future__ import annotations
+
+import os
+import tempfile
+
+import matplotlib
+
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt  # noqa: E402
+
+import pytest  # noqa: E402
+
+from datascience.automatic_eda import model  # noqa: E402
+from datascience.automatic_eda.render_pdf_impl import (  # noqa: E402
+    render_pdf, _RASTER_DPI as _PDF_DPI, _table_fits_as_text as _pdf_fits)
+from datascience.automatic_eda.render_pptx_impl import (  # noqa: E402
+    render_pptx, _RASTER_DPI as _PPTX_DPI, _table_fits_as_text as _pptx_fits)
+
+
+# --------------------------------------------------------------------------- #
+# Helpers.
+# --------------------------------------------------------------------------- #
+def _simple_fig():
+    """A small, real matplotlib figure for the figure blocks."""
+    fig, ax = plt.subplots(figsize=(4, 3))
+    ax.plot([0, 1, 2, 3], [1, 3, 2, 4])
+    ax.set_title("demo")
+    return fig
+
+
+def _wide_table(n_cols=19, n_rows=5):
+    header = [f"columna_{i}" for i in range(n_cols)]
+    rows = [[f"v{r}_{c}" for c in range(n_cols)] for r in range(n_rows)]
+    return model.DataTable(header=header, rows=rows, title="Primeras filas")
+
+
+def _narrow_table():
+    return model.DataTable(header=["a", "b", "c"],
+                           rows=[["1", "2", "3"], ["4", "5", "6"]],
+                           title="Tabla estrecha")
+
+
+def _chapter(blocks, cid="cap", title="Capítulo"):
+    return [model.Chapter(id=cid, title=title, version="1.0.0", blocks=blocks)]
+
+
+# --------------------------------------------------------------------------- #
+# 1) High DPI — the unit constant and a real embedded image.
+# --------------------------------------------------------------------------- #
+def test_raster_dpi_is_high_both_renderers():
+    assert _PDF_DPI >= 200, "el DPI del PDF debe ser alto (>=200)"
+    assert _PPTX_DPI >= 200, "el DPI del PPTX debe ser alto (>=200)"
+
+
+def test_pdf_embedded_figure_is_high_resolution(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "fig.pdf")
+    res = render_pdf(_chapter([model.Figure(make=_simple_fig, caption="demo")]),
+                     out, {"title": "T"})
+    assert res["path"] == out
+    doc = fitz.open(out)
+    try:
+        widths = []
+        for page in doc:
+            for img in page.get_images(full=True):
+                xref = img[0]
+                info = doc.extract_image(xref)
+                widths.append(info.get("width", 0))
+        assert widths, "no se incrustó ninguna imagen en el PDF"
+        # A ~4" figure rasterized at 220 dpi is ~ >850 px wide. At the old 150 dpi
+        # it would be ~600 px. The high-res threshold proves the DPI bump.
+        assert max(widths) >= 800, \
+            f"la figura embebida no es de alta resolución: {max(widths)} px"
+    finally:
+        doc.close()
+
+
+# --------------------------------------------------------------------------- #
+# 2) Wide table → image (PDF and PPTX); narrow table stays text.
+# --------------------------------------------------------------------------- #
+def test_fit_criterion_flags_wide_and_keeps_narrow():
+    wide = _wide_table()
+    narrow = _narrow_table()
+    assert not _pdf_fits(wide.header, wide.rows), \
+        "una tabla de 19 columnas debería NO caber como texto en A5"
+    assert not _pptx_fits(wide.header, wide.rows), \
+        "una tabla de 19 columnas debería NO caber como tabla nativa en 16:9"
+    assert _pdf_fits(narrow.header, narrow.rows), \
+        "una tabla de 3 columnas debería caber como texto en A5"
+    assert _pptx_fits(narrow.header, narrow.rows), \
+        "una tabla de 3 columnas debería caber como tabla nativa en 16:9"
+
+
+def test_wide_table_rendered_as_image_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "wide.pdf")
+    res = render_pdf(_chapter([_wide_table()]), out, {"title": "T"})
+    assert res["path"] == out
+    doc = fitz.open(out)
+    try:
+        n_images = sum(len(page.get_images(full=True)) for page in doc)
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    assert n_images >= 1, "la tabla ancha no se rasterizó como imagen en el PDF"
+    # The cells are now inside the image, not selectable text. A unique cell value
+    # must therefore NOT appear as extractable text (it lives in the picture).
+    assert "v4_18" not in text, \
+        "la tabla ancha sigue como texto seleccionable (no se hizo imagen)"
+
+
+def test_narrow_table_stays_selectable_text_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "narrow.pdf")
+    render_pdf(_chapter([_narrow_table()]), out, {"title": "T"})
+    doc = fitz.open(out)
+    try:
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    # Narrow table is selectable text: its header/cells are extractable.
+    for v in ("a", "b", "c", "1", "6"):
+        assert v in text, f"la celda '{v}' debería ser texto seleccionable"
+
+
+def test_wide_table_rendered_as_picture_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    out = str(tmp_path / "wide.pptx")
+    res = render_pptx(_chapter([_wide_table()]), out, {"title": "T"})
+    assert res["path"] == out
+    prs = pptx.Presentation(out)
+    pics = sum(1 for s in prs.slides for sh in s.shapes
+               if sh.shape_type == MSO_SHAPE_TYPE.PICTURE)
+    assert pics >= 1, "la tabla ancha no se colocó como imagen en el PPTX"
+
+
+# --------------------------------------------------------------------------- #
+# 3) Group(layout="side_by_side"): two columns in PPTX, stacked in PDF.
+# --------------------------------------------------------------------------- #
+def _side_by_side_group():
+    return model.Group(
+        blocks=[model.Heading(text="Columna X", level=2),
+                _narrow_table(),
+                model.Figure(make=_simple_fig, caption="grafico")],
+        layout="side_by_side")
+
+
+def test_side_by_side_places_two_columns_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    from pptx.util import Inches
+    out = str(tmp_path / "sbs.pptx")
+    render_pptx(_chapter([_side_by_side_group()]), out, {"title": "T"})
+    prs = pptx.Presentation(out)
+    # Find the slide that holds the pair (table image + figure image).
+    centre_emu = int(Inches(13.333 / 2.0))
+    placed = False
+    for s in prs.slides:
+        lefts = [sh.left for sh in s.shapes
+                 if sh.shape_type == MSO_SHAPE_TYPE.PICTURE
+                 and sh.left is not None]
+        if len(lefts) >= 2:
+            # one picture starts in the left half, another in the right half.
+            if min(lefts) < centre_emu and max(lefts) > centre_emu:
+                placed = True
+                break
+    assert placed, \
+        "side_by_side no colocó tabla y figura en dos columnas de la misma slide"
+
+
+def test_side_by_side_stacks_in_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "sbs.pdf")
+    res = render_pdf(_chapter([_side_by_side_group()]), out, {"title": "T"})
+    assert res["path"] == out and res["n_pages"] >= 1
+    doc = fitz.open(out)
+    try:
+        n_images = sum(len(page.get_images(full=True)) for page in doc)
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    # PDF stacks: the narrow table stays selectable text (1 of its cells is
+    # extractable) and the figure is the single embedded image — not a 2-column
+    # pair of pictures like PPTX.
+    assert n_images == 1, "el PDF no debería usar el layout de dos imágenes"
+    assert "Columna X" in text and "1" in text, \
+        "la tabla del grupo debería seguir como texto apilado en el PDF"
+
+
+# --------------------------------------------------------------------------- #
+# 4) Backward compatibility — default layout stacks, fitting table unchanged.
+# --------------------------------------------------------------------------- #
+def test_group_default_layout_is_stack():
+    g = model.Group(blocks=[_narrow_table()])
+    assert g.layout == "stack", "el layout por defecto debe ser 'stack'"
+
+
+# --------------------------------------------------------------------------- #
+# 5) Clickable cover index ("Índice") → chapter first page/slide.
+# --------------------------------------------------------------------------- #
+def _doc_with_index():
+    portada = model.Chapter(id="portada", title="Portada", version="1.0.0",
+                            blocks=[model.Heading(text="Índice", level=2),
+                                    model.TocEntry(label="Distribuciones",
+                                                   target_id="Distribuciones")])
+    cap = model.Chapter(id="num", title="Distribuciones", version="1.0.0",
+                        blocks=[model.Markdown(text="contenido del capítulo")])
+    return [portada, cap]
+
+
+def test_cover_index_is_clickable_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "idx.pdf")
+    res = render_pdf(_doc_with_index(), out, {"title": "T"})
+    assert res["path"] == out
+    doc = fitz.open(out)
+    try:
+        # The cover (page 0) must carry a GOTO link jumping to a later page.
+        goto = [lk for lk in doc[0].get_links()
+                if lk.get("kind") == fitz.LINK_GOTO and lk.get("page", 0) > 0]
+    finally:
+        doc.close()
+    assert goto, "el índice de la portada no produjo enlaces clicables en el PDF"
+
+
+def test_cover_index_shows_heading_pdf(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "idxh.pdf")
+    render_pdf(_doc_with_index(), out, {"title": "T"})
+    doc = fitz.open(out)
+    try:
+        text = "".join(page.get_text() for page in doc)
+    finally:
+        doc.close()
+    assert "Índice" in text, "la portada no muestra el encabezado 'Índice'"
+    assert "Este informe incluye" not in text, \
+        "la portada aún muestra el texto antiguo 'Este informe incluye'"
+
+
+def test_cover_index_is_clickable_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    out = str(tmp_path / "idx.pptx")
+    render_pptx(_doc_with_index(), out, {"title": "T"})
+    prs = pptx.Presentation(out)
+    cover_xml = prs.slides[0]._element.xml
+    assert "hlinksldjump" in cover_xml, \
+        "el índice de la portada no produjo un salto de slide nativo en el PPTX"
+
+
+def test_default_group_renders_like_before_pptx(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+    out = str(tmp_path / "stack.pptx")
+    grp = model.Group(blocks=[model.Heading(text="Y", level=2),
+                              _narrow_table(),
+                              model.Figure(make=_simple_fig, caption="g")])
+    render_pptx(_chapter([grp]), out, {"title": "T"})
+    prs = pptx.Presentation(out)
+    # Stacked group: the narrow table is a NATIVE table (selectable), and there is
+    # exactly one picture (the figure) — not the two-image side-by-side layout.
+    n_tables = sum(1 for s in prs.slides for sh in s.shapes if sh.has_table)
+    n_pics = sum(1 for s in prs.slides for sh in s.shapes
+                 if sh.shape_type == MSO_SHAPE_TYPE.PICTURE)
+    assert n_tables >= 1, "el grupo apilado debería usar una tabla nativa"
+    assert n_pics == 1, "el grupo apilado no debería duplicar imágenes"
@@ -0,0 +1,125 @@
+---
+id: build_boxplots_figure_py_datascience
+name: build_boxplots_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def build_boxplots_figure(boxes: list, title: str = \"\", max_boxes: int = 12) -> \"matplotlib.figure.Figure\""
+description: "Construye una unica figura matplotlib con boxplots de Tukey HORIZONTALES (uno por columna) usando ax.bxp: caja Q1-Q3, bigotes hasta 1.5*IQR, linea de mediana y puntos atipicos. Consume la salida de build_boxplot_stats (un dict box por columna, leido con .get) mas una lista opcional de outliers crudos por columna; si vienen los dibuja como puntos (showfliers), si no marca solo box[min]/box[max] cuando hay outliers de cola (igual que num_distr). Dibuja como mucho max_boxes cajas (las primeras, ya ordenadas por contaminacion por el caller) y avisa de la truncacion con (mostrando N de M). Backend Agg sin pyplot global; alto adaptativo al nº de cajas. Defensiva: omite entradas invalidas y NUNCA lanza — sin cajas validas devuelve una figura placeholder (sin boxplots). Es la version small-multiples del capitulo num_distr para responder que columnas tienen mas outliers de un vistazo."
+tags: [eda, outliers, boxplot, tukey, iqr, bxp, matplotlib, figure, visualization, small-multiples, datascience, impure]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib]
+example: |
+  from datascience.build_boxplot_stats import build_boxplot_stats
+  from datascience.build_boxplots_figure import build_boxplots_figure
+  boxes = [
+      {"name": "ingresos", "box": build_boxplot_stats({"min": 1.0, "max": 9e3,
+          "p25": 1e3, "median": 2e3, "p75": 3e3, "n_outliers": 7}), "fliers": None},
+      {"name": "edad", "box": build_boxplot_stats({"min": 0.0, "max": 99.0,
+          "p25": 25.0, "median": 38.0, "p75": 52.0}), "fliers": None},
+  ]
+  fig = build_boxplots_figure(boxes, title="Outliers por columna", max_boxes=12)
+tested: true
+tests:
+  - "test_returns_figure_with_axes"
+  - "test_empty_list_returns_placeholder_figure"
+  - "test_invalid_box_is_skipped_not_raised"
+  - "test_all_invalid_returns_placeholder"
+  - "test_raw_fliers_are_drawn"
+  - "test_max_boxes_truncates_and_does_not_raise"
+test_file_path: "python/functions/datascience/build_boxplots_figure_test.py"
+file_path: "python/functions/datascience/build_boxplots_figure.py"
+params:
+  - name: boxes
+    desc: "Lista de dicts, cada uno {\"name\": str, \"box\": dict, \"fliers\": list|None}. box es EXACTAMENTE la salida de build_boxplot_stats (claves leidas con .get: q1, median, q3, whisker_lo, whisker_hi, min, max, has_low_outliers, has_high_outliers, lower_fence, upper_fence, n_outliers). fliers es la lista opcional de outliers crudos: si viene se dibuja como puntos; si es None/ausente solo se marcan los extremos box[min]/box[max] cuando hay outliers de cola. Entradas que no son dict, sin box dict, o sin q1/median/q3 se omiten. El caller las pasa ya ordenadas por contaminacion (la mayor primera)."
+  - name: title
+    desc: "Titulo de la figura (fig.suptitle, alineado a la izquierda). Vacio => sin titulo. Si len(boxes) > max_boxes se le anade una nota \"(mostrando N de M)\" para que la truncacion no sea silenciosa. Default \"\"."
+  - name: max_boxes
+    desc: "Numero maximo de cajas a dibujar (las primeras de la lista). Default 12. Un valor no entero o <= 0 cae a 12. Si la lista trae mas entradas, las sobrantes se descartan pero se reporta en el titulo con (mostrando N de M)."
+output: "Un matplotlib.figure.Figure (figsize 7.0 x alto adaptativo = max(2.0, 0.5*n + 1.0), dpi 150) con un unico Axes que apila boxplots horizontales de Tukey (ax.bxp, orientation=horizontal con fallback vert=False), uno por columna valida, de arriba a abajo en el orden recibido. Cada caja: relleno #9ec6df, borde/bigotes/caps #5b8aa6, mediana #2e8b57, atipicos #c0392b. Etiquetas del eje Y = nombres de columna; eje X etiquetado \"valor\". Outliers dibujados desde fliers crudos (showfliers) o, si faltan, marcados en box[min]/box[max] segun has_low/high_outliers. Si no queda ninguna caja valida (lista vacia o todas invalidas) devuelve una Figure placeholder con texto centrado \"(sin boxplots)\"; cualquier error inesperado se captura y devuelve una Figure con el mensaje de error. NUNCA lanza. El caller rasteriza/cierra la figura; la funcion no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience.build_boxplot_stats import build_boxplot_stats
+from datascience.build_boxplots_figure import build_boxplots_figure
+
+# Un `box` por columna numérica, derivado del sub-bloque `numeric` del profile
+# (salida de describe_numeric). El caller los pasa ya ordenados por outlier_pct.
+boxes = [
+    {
+        "name": "ingresos",
+        "box": build_boxplot_stats({
+            "min": 1.0, "max": 9000.0,
+            "p25": 1000.0, "median": 2000.0, "p75": 3000.0,
+            "n_outliers": 7,
+        }),
+        "fliers": None,  # valores crudos desconocidos -> se marca solo el extremo.
+    },
+    {
+        "name": "edad",
+        "box": build_boxplot_stats({
+            "min": 0.0, "max": 99.0,
+            "p25": 25.0, "median": 38.0, "p75": 52.0,
+        }),
+        "fliers": [88.0, 95.0, 99.0],  # outliers crudos -> se dibujan como puntos.
+    },
+]
+
+fig = build_boxplots_figure(boxes, title="Outliers por columna", max_boxes=12)
+
+# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
+fig.savefig("/tmp/boxplots.png")
+```
+
+## Cuando usarla
+
+Úsala en el capítulo de outliers de un informe EDA cuando quieras comparar de un
+vistazo *qué columnas están más contaminadas por valores atípicos*: a diferencia
+de `num_distr` (que dibuja un histograma+boxplot por columna en figuras
+separadas), aquí apilas todos los boxplots horizontales en **una sola figura**
+(small multiples). Primero deriva el `box` de cada columna con
+`build_boxplot_stats`, ordénalas por `outlier_pct` descendente, envuélvelas como
+`{"name", "box", "fliers"}` y pásaselas. Si tienes los valores crudos fuera de
+las vallas, métele la lista `fliers` y se dibujarán como puntos; si no, la
+función marca solo los extremos `min`/`max` cuando hay cola.
+
+## Gotchas
+
+- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
+  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
+  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
+  es thread-safe; esta función construye el `Figure` directamente, así que es
+  segura de llamar en bucle desde el renderer.
+- **El caller cierra la figura.** Devuelve el `Figure` pero no lo muestra ni lo
+  guarda. Quien la consume debe rasterizarla y luego liberarla
+  (`matplotlib.pyplot.close(fig)`) para no acumular memoria en lotes grandes.
+- **`fliers` opcional, semántica distinta.** Si pasas la lista de outliers
+  crudos se dibujan todos como puntos (`showfliers=True`). Si es `None`/ausente
+  los valores son desconocidos y solo se marca un punto en `box["min"]` /
+  `box["max"]` cuando `has_low_outliers` / `has_high_outliers` — mismo criterio
+  que `num_distr`. No inventes fliers a partir del profile: el `box` no trae los
+  valores crudos, solo si los extremos superan las vallas.
+- **API de orientación de `ax.bxp`.** matplotlib reciente usa
+  `orientation="horizontal"`; las versiones antiguas usan `vert=False`. La
+  función prueba la primera y cae a la segunda en `except TypeError`, así que
+  funciona en ambas. Si `bxp` falla del todo, el Axes degrada a un texto
+  "(boxplot no disponible)" en vez de propagar.
+- **Truncación visible.** `max_boxes` (default 12) limita el nº de cajas para que
+  ninguna se solape; si la lista trae más, las sobrantes se descartan pero se
+  avisa en el título con "(mostrando N de M)". Pasa las columnas ya ordenadas por
+  contaminación para que las descartadas sean las menos relevantes.
+- **Defensiva, nunca lanza.** Lista vacía, entradas no-dict, sin `box`, o sin
+  `q1`/`median`/`q3` se omiten sin propagar; sin cajas válidas devuelve un
+  placeholder "(sin boxplots)" y cualquier error inesperado se captura en una
+  figura con el texto del error. No envuelvas la llamada en try/except por miedo
+  a un raise — no lo hay.
@@ -0,0 +1,250 @@
+"""Impure EDA helper: a single figure of horizontal Tukey boxplots (`eda` group).
+
+Draws, in one ``matplotlib.figure.Figure``, a stack of horizontal Tukey boxplots
+(one per column) using ``ax.bxp``: each carries its box (Q1–Q3), whiskers (up to
+1.5·IQR), the median line and its outlier points. It consumes the output of the
+pure registry function ``build_boxplot_stats`` (one ``box`` dict per column) plus
+an optional list of raw outlier values per column; it never recomputes anything.
+
+It is the "small-multiples" companion of ``num_distr`` (which draws one
+histogram+boxplot per column): here every column shares a single figure so the
+caller can show, at a glance, *which* columns are the most contaminated by
+outliers (the caller passes them already ordered by contamination).
+
+Impure because it touches matplotlib's rendering machinery. It uses the headless
+Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
+global state and is safe to call repeatedly from a report renderer. It is fully
+defensive and NEVER raises: invalid entries are skipped and, if nothing valid
+remains, it returns a placeholder figure carrying a centered "(sin boxplots)".
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+from matplotlib.figure import Figure  # noqa: E402
+
+# Blue palette shared with the ``num_distr`` chapter so the report stays coherent.
+_BOX_FACE = "#9ec6df"   # box fill.
+_BOX_EDGE = "#5b8aa6"   # box / whisker / cap border.
+_MEDIAN = "#2e8b57"     # median line (sea green).
+_OUTLIER = "#c0392b"    # outlier points (soft red).
+# Muted gray for the placeholder / fallback message text.
+_MUTED_TEXT = "#5f6b7a"
+# Soft red for the error fallback message.
+_ERROR_TEXT = "#b00020"
+
+
+def _num(value):
+    """Coerce ``value`` to float defensively; None for None/bool/non-numeric/NaN."""
+    # bool is a subclass of int; a stat value is never a real bool, so treat
+    # True/False as missing instead of silently coercing to 1.0/0.0.
+    if value is None or isinstance(value, bool):
+        return None
+    try:
+        f = float(value)
+    except (TypeError, ValueError):
+        return None
+    if f != f:  # NaN guard.
+        return None
+    return f
+
+
+def _placeholder_figure(message: str, color: str = _MUTED_TEXT) -> "Figure":
+    """Return a fallback ``Figure`` carrying a single centered message."""
+    fig = Figure(figsize=(7.0, 2.4), dpi=150)
+    ax = fig.add_subplot(111)
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=12,
+        color=color,
+        wrap=True,
+        transform=ax.transAxes,
+    )
+    fig.tight_layout()
+    return fig
+
+
+def build_boxplots_figure(
+    boxes: list,
+    title: str = "",
+    max_boxes: int = 12,
+) -> "matplotlib.figure.Figure":
+    """Build one figure of stacked horizontal Tukey boxplots (one per column).
+
+    For each entry the function builds a ``bxp`` stats record (``med, q1, q3,
+    whislo, whishi, fliers, label``) from its ``box`` sub-dict (the output of
+    ``build_boxplot_stats``) and draws all of them as horizontal boxplots sharing
+    the X axis, top-to-bottom in the order received (the caller is expected to
+    pass them already sorted by contamination).
+
+    Outliers are shown two ways:
+
+    - If an entry carries a ``fliers`` list (the raw out-of-fence values), they
+      are drawn as red points via ``ax.bxp(..., showfliers=True)``.
+    - If ``fliers`` is ``None``/absent, the raw values are unknown, so only the
+      extremes are marked: a red point at ``box["min"]`` when
+      ``box["has_low_outliers"]`` and at ``box["max"]`` when
+      ``box["has_high_outliers"]`` (same convention as ``num_distr``).
+
+    The function is fully defensive and NEVER raises. Entries that are not dicts,
+    lack a ``box`` dict, or miss any of ``q1``/``median``/``q3`` are skipped. If
+    after filtering no valid box remains it returns a placeholder ``Figure`` with
+    a centered "(sin boxplots)"; any unexpected error is caught and turned into a
+    fallback figure carrying the error text. It always returns a ``Figure``.
+
+    Args:
+        boxes: List of dicts ``{"name": str, "box": dict, "fliers": list|None}``.
+            ``box`` is exactly the output of ``build_boxplot_stats`` (read with
+            ``.get``: ``q1, median, q3, whisker_lo, whisker_hi, min, max,
+            has_low_outliers, has_high_outliers, ...``). ``fliers`` is the
+            optional list of raw outlier values; when present they are plotted,
+            otherwise only the extremes are marked.
+        title: Figure title (``fig.suptitle``). Empty => no title. When the list
+            is longer than ``max_boxes`` a "(mostrando N de M)" note is appended.
+        max_boxes: Draw at most the first ``max_boxes`` entries (default 12). The
+            rest are dropped but their omission is surfaced in the title note, so
+            the truncation is never silent.
+
+    Returns:
+        A ``matplotlib.figure.Figure`` with a single Axes holding the horizontal
+        boxplots (height adaptive to the box count so none overlap). The caller is
+        responsible for rasterizing/closing it; this function never shows nor
+        saves it.
+    """
+    try:
+        if not isinstance(boxes, (list, tuple)) or len(boxes) == 0:
+            return _placeholder_figure("(sin boxplots)")
+
+        total = len(boxes)
+
+        # Cap the number of boxes; tolerate a non-int / non-positive max_boxes.
+        try:
+            cap = int(max_boxes)
+        except (TypeError, ValueError):
+            cap = 12
+        if cap <= 0:
+            cap = 12
+        candidates = list(boxes)[:cap]
+
+        stats_list = []        # bxp stats records, in draw order.
+        labels = []            # Y tick labels (column names).
+        manual_markers = []    # (position, box) for entries without raw fliers.
+        any_fliers = False     # whether to enable showfliers in the bxp call.
+
+        for entry in candidates:
+            if not isinstance(entry, dict):
+                continue
+            box = entry.get("box")
+            if not isinstance(box, dict):
+                continue
+
+            q1 = _num(box.get("q1"))
+            med = _num(box.get("median"))
+            q3 = _num(box.get("q3"))
+            # Without the three quartiles a boxplot cannot be drawn — skip it.
+            if q1 is None or med is None or q3 is None:
+                continue
+
+            # Whisker extremes fall back to the quartiles when missing.
+            whislo = _num(box.get("whisker_lo"))
+            whishi = _num(box.get("whisker_hi"))
+            if whislo is None:
+                whislo = q1
+            if whishi is None:
+                whishi = q3
+
+            name = entry.get("name")
+            label = "" if name is None else str(name)
+
+            position = len(stats_list) + 1  # bxp positions are 1-indexed.
+            fliers_raw = entry.get("fliers")
+            if isinstance(fliers_raw, (list, tuple)):
+                fliers = [v for v in (_num(x) for x in fliers_raw) if v is not None]
+                if fliers:
+                    any_fliers = True
+            else:
+                # Raw values unknown: draw no bxp fliers, mark min/max by hand.
+                fliers = []
+                manual_markers.append((position, box))
+
+            stats_list.append({
+                "med": med,
+                "q1": q1,
+                "q3": q3,
+                "whislo": whislo,
+                "whishi": whishi,
+                "fliers": fliers,
+                "label": label,
+            })
+            labels.append(label)
+
+        if not stats_list:
+            return _placeholder_figure("(sin boxplots)")
+
+        n = len(stats_list)
+        positions = list(range(1, n + 1))
+
+        # Height grows with the box count so none of them overlap.
+        height = max(2.0, 0.5 * n + 1.0)
+        fig = Figure(figsize=(7.0, height), dpi=150)
+        ax = fig.add_subplot(111)
+
+        bxp_kw = dict(
+            showfliers=any_fliers, widths=0.5, patch_artist=True,
+            boxprops={"facecolor": _BOX_FACE, "edgecolor": _BOX_EDGE},
+            medianprops={"color": _MEDIAN, "linewidth": 1.6},
+            whiskerprops={"color": _BOX_EDGE},
+            capprops={"color": _BOX_EDGE},
+            flierprops={"marker": "o", "markersize": 3.5,
+                        "markerfacecolor": _OUTLIER, "markeredgecolor": _OUTLIER,
+                        "linestyle": "none"})
+        try:
+            # ``orientation`` is the current API; older matplotlib uses ``vert``.
+            try:
+                ax.bxp(stats_list, positions=positions,
+                       orientation="horizontal", **bxp_kw)
+            except TypeError:
+                ax.bxp(stats_list, positions=positions, vert=False, **bxp_kw)
+        except Exception:  # noqa: BLE001 — never let bxp kill the whole figure.
+            ax.text(0.5, 0.5, "(boxplot no disponible)", ha="center",
+                    va="center", fontsize=10, color=_MUTED_TEXT,
+                    transform=ax.transAxes)
+
+        # For entries without raw fliers, mark only the out-of-fence extremes.
+        for position, box in manual_markers:
+            mn = _num(box.get("min"))
+            mx = _num(box.get("max"))
+            if box.get("has_low_outliers") and mn is not None:
+                ax.plot([mn], [position], marker="o", markersize=3.5,
+                        color=_OUTLIER, zorder=5)
+            if box.get("has_high_outliers") and mx is not None:
+                ax.plot([mx], [position], marker="o", markersize=3.5,
+                        color=_OUTLIER, zorder=5)
+
+        # Pin the Y tick labels explicitly so they work across matplotlib
+        # versions regardless of whether ``bxp`` consumed the ``label`` key.
+        ax.set_yticks(positions)
+        ax.set_yticklabels(labels, fontsize=8)
+        ax.set_xlabel("valor", fontsize=9)
+        ax.tick_params(labelsize=7)
+        ax.margins(y=0.15)
+        for spine in ("top", "right"):
+            ax.spines[spine].set_visible(False)
+
+        # Surface truncation in the title instead of silently dropping boxes.
+        note = f"(mostrando {n} de {total})" if total > cap else ""
+        heading = "  ".join(p for p in (title, note) if p)
+        if heading:
+            fig.suptitle(heading, fontsize=12, x=0.02, ha="left")
+
+        fig.tight_layout()
+        return fig
+    except Exception as exc:  # noqa: BLE001 — never raise from a figure builder.
+        return _placeholder_figure(
+            f"error al dibujar boxplots: {exc}", color=_ERROR_TEXT)
@@ -0,0 +1,109 @@
+"""Tests para build_boxplots_figure (boxplots horizontales de Tukey, grupo eda).
+
+Usa el backend Agg sin display; no muestra ni guarda figuras. Cada test cierra
+explícitamente la Figure construida (matplotlib.pyplot.close) para no acumular
+estado entre tests.
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from build_boxplots_figure import build_boxplots_figure
+
+
+def _box(name, q1, median, q3, mn, mx, low=False, high=False, fliers=None):
+    """Construye una entrada {name, box, fliers} con un box estilo build_boxplot_stats."""
+    iqr = q3 - q1
+    return {
+        "name": name,
+        "box": {
+            "q1": q1,
+            "median": median,
+            "q3": q3,
+            "iqr": iqr,
+            "lower_fence": q1 - 1.5 * iqr,
+            "upper_fence": q3 + 1.5 * iqr,
+            "whisker_lo": max(mn, q1 - 1.5 * iqr),
+            "whisker_hi": min(mx, q3 + 1.5 * iqr),
+            "min": mn,
+            "max": mx,
+            "has_low_outliers": low,
+            "has_high_outliers": high,
+            "n_outliers": 0,
+        },
+        "fliers": fliers,
+    }
+
+
+def test_returns_figure_with_axes():
+    boxes = [
+        _box("edad", 10.0, 25.0, 40.0, 1.0, 100.0, high=True),
+        _box("ingresos", 100.0, 200.0, 300.0, 50.0, 400.0),
+        _box("score", -1.0, 0.0, 1.0, -5.0, 5.0, low=True, high=True),
+    ]
+    fig = build_boxplots_figure(boxes, title="Boxplots", max_boxes=12)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    # Tres cajas -> tres etiquetas en el eje Y.
+    ax = fig.axes[0]
+    assert len(ax.get_yticks()) == 3
+    plt.close(fig)
+
+
+def test_empty_list_returns_placeholder_figure():
+    fig = build_boxplots_figure([], title="vacío")
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_invalid_box_is_skipped_not_raised():
+    boxes = [
+        {"name": "rota", "box": {"q1": None, "median": None, "q3": None}},
+        {"name": "sin_box"},                         # falta la clave box.
+        "no_es_dict",                                 # entrada no-dict.
+        _box("buena", 1.0, 2.0, 3.0, 0.0, 10.0, high=True),
+    ]
+    fig = build_boxplots_figure(boxes)
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    # Solo la caja válida sobrevive al filtrado.
+    assert len(ax.get_yticks()) == 1
+    plt.close(fig)
+
+
+def test_all_invalid_returns_placeholder():
+    boxes = [
+        {"name": "a", "box": {"q1": None, "median": 1.0, "q3": 2.0}},
+        {"name": "b"},
+    ]
+    fig = build_boxplots_figure(boxes)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_raw_fliers_are_drawn():
+    boxes = [
+        _box("con_fliers", 10.0, 20.0, 30.0, 5.0, 200.0,
+             high=True, fliers=[150.0, 180.0, 200.0]),
+    ]
+    fig = build_boxplots_figure(boxes)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_max_boxes_truncates_and_does_not_raise():
+    boxes = [_box(f"c{i}", float(i), float(i + 1), float(i + 2),
+                  float(i - 5), float(i + 10)) for i in range(20)]
+    fig = build_boxplots_figure(boxes, title="muchos", max_boxes=5)
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    # Solo se dibujan las primeras 5 cajas.
+    assert len(ax.get_yticks()) == 5
+    plt.close(fig)
@@ -0,0 +1,111 @@
+---
+id: categorical_top_bar_figure_py_datascience
+name: categorical_top_bar_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def categorical_top_bar_figure(top: list, n_distinct: int = 0, title: str = \"\", top_k: int = 6, n_rows=None) -> \"matplotlib.figure.Figure\""
+description: "Construye una figura matplotlib de barras horizontales de las top_k categorías más frecuentes de una columna categórica, con la mayor arriba y agregando el resto en una barra gris \"Otros (N categorías)\". Contrato de entrada idéntico a categorical_top_pie_figure (swap directo donut↔barras): consume el bloque `top` de summarize_categorical y devuelve un matplotlib.figure.Figure listo para rasterizar por el renderer del informe EDA. Backend Agg sin pyplot global; defensivo total ante top vacío/None, nunca lanza."
+tags: [eda, categorical, bar, barh, matplotlib, figure, visualization, datascience, impure]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib]
+example: |
+  from categorical_top_bar_figure import categorical_top_bar_figure
+  top = [
+      {"value": "rojo", "count": 40, "pct": 0.4},
+      {"value": "azul", "count": 30, "pct": 0.3},
+      {"value": "verde", "count": 20, "pct": 0.2},
+  ]
+  fig = categorical_top_bar_figure(top, n_distinct=12, title="color", top_k=6, n_rows=100)
+tested: true
+tests:
+  - "test_returns_figure"
+  - "test_ten_items_topk_six_yields_seven_bars"
+  - "test_empty_top_does_not_raise_and_returns_figure"
+  - "test_long_value_truncated"
+  - "test_none_value_and_none_count_are_handled"
+  - "test_n_rows_adds_exact_others_bar"
+test_file_path: "python/functions/datascience/categorical_top_bar_figure_test.py"
+file_path: "python/functions/datascience/categorical_top_bar_figure.py"
+params:
+  - name: top
+    desc: "Lista de dicts {value, count, pct} ordenada de mayor a menor por count (salida del bloque `top` de summarize_categorical). Puede venir vacía o con dicts incompletos: items no-dict, sin count, con count None o count <= 0 se descartan. value None se admite (etiqueta vacía)."
+  - name: n_distinct
+    desc: "Nº total de categorías distintas de la columna. Etiqueta la barra agregada como \"Otros (n_distinct - top_k)\" (mínimo 0). Si no supera el nº de barras mostradas, se usa el overflow real de `top` como nº de categorías agregadas. Default 0."
+  - name: title
+    desc: "Título de la figura (nombre de la columna). Se trunca a ~48 chars con elipsis si es muy largo. Default \"\" (sin título)."
+  - name: top_k
+    desc: "Nº máximo de barras explícitas. Default 6. La barra \"Otros\" no cuenta contra este límite. Con top_k <= 0 se muestra al menos la categoría mayor."
+  - name: n_rows
+    desc: "Opcional. Total de filas del dataset. Si se da y la suma de counts mostrados < n_rows, la barra \"Otros\" usa (n_rows - suma_mostrada) como count para que sea exacta respecto al total real. Si se omite, \"Otros\" usa la suma de counts fuera del top_k mostrado (solo cuando top trae más de top_k items). Default None."
+output: "Un matplotlib.figure.Figure (figsize 6.4 x altura escalada con el nº de barras, dpi 150) con un Axes de barras horizontales: la categoría más frecuente arriba, la barra gris \"Otros (N categorías)\" abajo, cada barra anotada con su conteo y porcentaje al final y etiquetas de categoría (yticklabels) truncadas a ~22 chars. Si no hay counts válidos devuelve igualmente una Figure con un texto centrado \"sin datos categóricos\" (nunca lanza); cualquier error inesperado cae a una Figure con el texto del error. El caller rasteriza/cierra la figura; la función no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+from categorical_top_bar_figure import categorical_top_bar_figure
+
+# `top` es la salida del bloque "top" de summarize_categorical (ya ordenado desc).
+top = [
+    {"value": "rojo", "count": 40, "pct": 0.40},
+    {"value": "azul", "count": 30, "pct": 0.30},
+    {"value": "verde", "count": 20, "pct": 0.20},
+    {"value": "amarillo", "count": 5, "pct": 0.05},
+]
+
+fig = categorical_top_bar_figure(
+    top,
+    n_distinct=12,            # 12 categorías distintas en total
+    title="color_producto",
+    top_k=6,                  # hasta 6 barras explícitas
+    n_rows=100,               # "Otros" = 100 - 95 = 5, sobre 8 categorías agregadas
+)
+
+# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
+fig.savefig("/tmp/barras_color.png")
+```
+
+## Cuando usarla
+
+Úsala dentro de un informe EDA cuando quieras comparar **magnitudes** de las
+categorías dominantes de una columna categórica: qué categoría manda y por
+cuánto frente a las siguientes. Pásale directamente el bloque `top` de
+`summarize_categorical` (ya ordenado de mayor a menor) más `n_distinct` para que
+la barra "Otros" indique cuántas categorías quedan agrupadas. Es el clon "de
+barras" del donut `categorical_top_pie_figure` con **contrato de entrada
+idéntico**: puedes intercambiar una por otra sin tocar el caller. Elige barras
+cuando importe comparar tamaños exactos; el donut cuando importe la proporción
+del total.
+
+## Gotchas
+
+- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
+  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
+  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
+  es thread-safe; esta función evita ese riesgo construyendo el `Figure`
+  directamente, así que es segura de llamar en bucle desde el renderer.
+- **El caller cierra la figura.** La función devuelve el `Figure` pero no lo
+  muestra ni lo guarda. Quien la consume debe rasterizarla y luego liberarla
+  (`fig.clf()` / `matplotlib.pyplot.close(fig)` si se usó pyplot en el caller)
+  para no acumular memoria en lotes grandes de columnas.
+- **`barh` dibuja de abajo arriba.** La categoría más frecuente va arriba porque
+  el orden de display se invierte antes de plotear; la barra "Otros" queda
+  siempre al fondo. No reordenes `top` esperando otro layout: la función asume
+  que ya viene ordenado desc por count.
+- **Magnitud exacta de "Otros" solo con `n_rows`.** Sin `n_rows`, la barra
+  "Otros" se calcula con el overflow presente en `top`; si `top` ya viene
+  recortado a `top_k` por el productor, no habrá "Otros" aunque existan más
+  categorías. Pasa `n_rows` (total de filas del dataset) para una barra correcta
+  respecto al total real.
+- **Defensiva, nunca lanza.** `top=[]`, `value=None`, `count=None` o counts no
+  numéricos se manejan sin error: en el peor caso devuelve una `Figure` con
+  "sin datos categóricos", y cualquier excepción inesperada cae a una `Figure`
+  con el texto del error. No envuelvas la llamada en try/except por miedo a un
+  raise — no lo hay.
@@ -0,0 +1,233 @@
+"""Impure EDA helper: horizontal bar figure of the most common categories (`eda` group).
+
+Builds a horizontal bar chart of the ``top_k`` most frequent categories of a
+categorical column, folding everything else into a single gray
+"Otros (N categorías)" bar. The most frequent category sits at the top, each bar
+labelled with its count (and percentage) at the end. Returns a ready-to-rasterize
+``matplotlib.figure.Figure``; it never shows nor saves it.
+
+This is the "magnitude" twin of ``categorical_top_pie_figure``: identical input
+contract (same ``top``/``n_distinct``/``title``/``top_k``/``n_rows`` signature) so
+it can be swapped in directly, but it communicates comparable magnitudes via bars
+instead of proportions via wedges.
+
+Impure because it touches matplotlib's rendering machinery. It uses the headless
+Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
+global state and is safe to call repeatedly from a report renderer.
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+from matplotlib.figure import Figure  # noqa: E402
+
+
+# Gray reserved for the aggregated "Otros" bar.
+_OTHER_COLOR = "#9e9e9e"
+# Muted gray for secondary text (title fallback, no-data message).
+_MUTED_TEXT = "#5f6b7a"
+# Soft red for the error fallback message.
+_ERROR_TEXT = "#b00020"
+# Pleasant, colour-blind-friendly qualitative palette for the explicit bars.
+_PALETTE = [
+    "#4C72B0",
+    "#DD8452",
+    "#55A868",
+    "#C44E52",
+    "#8172B3",
+    "#937860",
+    "#DA8BC3",
+    "#8C8C8C",
+    "#CCB974",
+    "#64B5CD",
+]
+
+
+def _truncate(text, width: int = 22) -> str:
+    """Truncate ``text`` to ``width`` chars, appending an ellipsis if cut."""
+    s = "" if text is None else str(text)
+    if len(s) <= width:
+        return s
+    if width <= 1:
+        return s[:width]
+    return s[: width - 1] + "…"
+
+
+def _message_figure(message: str, color: str = _MUTED_TEXT, title: str = "") -> "Figure":
+    """Return a fallback ``Figure`` carrying a single centered message."""
+    fig = Figure(figsize=(6.4, 4.0), dpi=150)
+    ax = fig.add_subplot(111)
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=12,
+        color=color,
+        wrap=True,
+        transform=ax.transAxes,
+    )
+    if title:
+        ax.set_title(_truncate(title, 48), fontsize=12, loc="center", pad=8)
+    fig.tight_layout()
+    return fig
+
+
+def categorical_top_bar_figure(
+    top: list,
+    n_distinct: int = 0,
+    title: str = "",
+    top_k: int = 6,
+    n_rows=None,
+) -> "matplotlib.figure.Figure":
+    """Build a horizontal bar figure of the most common categories of a column.
+
+    Renders the ``top_k`` most frequent categories as explicit horizontal bars,
+    largest at the top, and aggregates every remaining category into a single
+    gray "Otros (N categorías)" bar at the bottom. Each bar is annotated with its
+    count and percentage of the total at the end of the bar; the category names
+    are truncated Y tick labels.
+
+    The function shares the exact input contract of
+    ``categorical_top_pie_figure`` (the donut twin) so it is a drop-in swap. It is
+    fully defensive: empty input, missing/``None`` values or counts never raise.
+    When there is nothing valid to draw it still returns a ``Figure`` carrying a
+    centered "sin datos categóricos" message, and any unexpected error is caught
+    and turned into a fallback ``Figure`` carrying the error text.
+
+    Args:
+        top: List of ``{value, count, pct}`` dicts, already sorted by ``count``
+            descending (the ``top`` block of ``summarize_categorical``). May be
+            empty or carry incomplete/``None`` entries; non-dict items, items
+            without a positive numeric ``count`` and ``None`` counts are skipped.
+        n_distinct: Total number of distinct categories in the column. Used to
+            label the aggregated bar as "Otros (n_distinct - top_k)" (floored at
+            0). Ignored when it does not exceed the number of shown bars.
+        title: Figure title (the column name). Truncated when too long.
+        top_k: Maximum number of explicit bars. Default 6. The "Otros" bar does
+            not count against this limit.
+        n_rows: Optional total row count of the dataset. When given and the sum of
+            shown counts is below ``n_rows``, the "Otros" bar uses
+            ``n_rows - sum_shown`` as its count so it is exact with respect to the
+            real total. When omitted, "Otros" uses the sum of the counts that fall
+            outside the shown ``top_k`` (only when ``top`` carries more than
+            ``top_k`` items).
+
+    Returns:
+        A ``matplotlib.figure.Figure`` with a single horizontal-bar Axes. The
+        caller is responsible for rasterizing/closing it.
+    """
+    try:
+        safe_title = _truncate(title, 48)
+
+        # --- Defensive parse: keep only well-formed {value, count} with count > 0.
+        cleaned = []
+        if isinstance(top, list):
+            for item in top:
+                if not isinstance(item, dict):
+                    continue
+                count = item.get("count")
+                if count is None:
+                    continue
+                try:
+                    count = float(count)
+                except (TypeError, ValueError):
+                    continue
+                if count <= 0:
+                    continue
+                cleaned.append((item.get("value"), count))
+
+        if not cleaned:
+            return _message_figure("sin datos categóricos", title=title)
+
+        # --- Split into shown bars and the aggregated remainder.
+        shown = cleaned[: max(int(top_k), 0)]
+        if not shown:  # top_k <= 0 — show at least the largest category.
+            shown = cleaned[:1]
+
+        sum_shown = sum(c for _, c in shown)
+        overflow_count = sum(c for _, c in cleaned[len(shown):])
+
+        # How many categories are folded into "Otros".
+        try:
+            nd = int(n_distinct)
+        except (TypeError, ValueError):
+            nd = 0
+        others_categories = max(nd - len(shown), 0)
+        # If n_distinct is unknown/too small, fall back to the overflow we
+        # actually have in `top` beyond the shown bars.
+        overflow_items = len(cleaned) - len(shown)
+        if others_categories == 0 and overflow_items > 0:
+            others_categories = overflow_items
+
+        # Count attributed to the "Otros" bar.
+        others_count = 0.0
+        if n_rows is not None:
+            try:
+                total_rows = float(n_rows)
+            except (TypeError, ValueError):
+                total_rows = None
+            if total_rows is not None and total_rows > sum_shown:
+                others_count = total_rows - sum_shown
+        if others_count <= 0:
+            others_count = overflow_count
+
+        # --- Build the display order (top to bottom): largest .. smallest, Otros.
+        display_labels = [_truncate(v, 22) for v, _ in shown]
+        display_values = [c for _, c in shown]
+        display_colors = [_PALETTE[i % len(_PALETTE)] for i in range(len(shown))]
+
+        has_others = others_count > 0 and others_categories > 0
+        if has_others:
+            display_labels.append(f"Otros ({others_categories} categorías)")
+            display_values.append(others_count)
+            display_colors.append(_OTHER_COLOR)
+
+        total = sum(display_values) or 1.0
+
+        # barh draws bottom-up, so reverse the display order before plotting to
+        # land the largest category on top and "Otros" at the bottom.
+        labels = list(reversed(display_labels))
+        values = list(reversed(display_values))
+        colors = list(reversed(display_colors))
+        y_pos = range(len(values))
+
+        # Height scales with the number of bars so dense reports stay readable.
+        n_bars = len(values)
+        height = max(2.4, min(0.4 * n_bars + 1.2, 14.0))
+        fig = Figure(figsize=(6.4, height), dpi=150)
+        ax = fig.add_subplot(111)
+
+        ax.barh(list(y_pos), values, color=colors, edgecolor="white")
+        ax.set_yticks(list(y_pos))
+        ax.set_yticklabels(labels, fontsize=8)
+        ax.set_xlabel("conteo", fontsize=9)
+
+        max_val = max(values) if values else 1.0
+        ax.set_xlim(0, max_val * 1.18 if max_val > 0 else 1.0)
+
+        # Annotate each bar with its count and percentage at the end of the bar.
+        for y, val in zip(y_pos, values):
+            pct = val / total * 100.0
+            ax.text(
+                val + max_val * 0.012,
+                y,
+                f"{int(round(val))} ({pct:.0f}%)",
+                va="center",
+                ha="left",
+                fontsize=7,
+                color="#202020",
+            )
+
+        if safe_title:
+            ax.set_title(safe_title, fontsize=13, loc="left", pad=10)
+
+        fig.tight_layout()
+        return fig
+    except Exception as exc:  # noqa: BLE001 — never raise from a figure builder.
+        return _message_figure(
+            f"error al dibujar barras: {exc}", color=_ERROR_TEXT
+        )
@@ -0,0 +1,103 @@
+"""Tests para categorical_top_bar_figure (barras de categorías top, grupo eda).
+
+Usa el backend Agg sin pyplot; no muestra ni guarda figuras. Cada test cierra
+explícitamente la Figure construida (matplotlib.pyplot.close) para no acumular
+estado entre tests.
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from categorical_top_bar_figure import categorical_top_bar_figure
+
+
+def _make_top(n):
+    """n items {value, count, pct} ordenados desc por count."""
+    return [
+        {"value": f"cat_{i}", "count": n - i, "pct": (n - i) / sum(range(1, n + 1))}
+        for i in range(n)
+    ]
+
+
+def _bar_count(ax):
+    """Devuelve el nº de barras (longitud del primer BarContainer del Axes)."""
+    if ax.containers:
+        return len(ax.containers[0])
+    return 0
+
+
+def test_returns_figure():
+    fig = categorical_top_bar_figure(_make_top(3), n_distinct=3, title="col")
+    assert isinstance(fig, Figure)
+    plt.close(fig)
+
+
+def test_ten_items_topk_six_yields_seven_bars():
+    top = _make_top(10)
+    fig = categorical_top_bar_figure(top, n_distinct=10, title="muchas", top_k=6)
+    ax = fig.axes[0]
+    # 6 categorías explícitas + 1 barra "Otros".
+    assert _bar_count(ax) == 7
+    plt.close(fig)
+
+
+def test_empty_top_does_not_raise_and_returns_figure():
+    fig = categorical_top_bar_figure([], n_distinct=0, title="vacía")
+    assert isinstance(fig, Figure)
+    # Sin datos: no debe haber barras.
+    assert _bar_count(fig.axes[0]) == 0
+    plt.close(fig)
+
+
+def test_long_value_truncated():
+    long_value = "una_categoria_con_un_nombre_larguisimo_que_excede_el_limite"
+    top = [
+        {"value": long_value, "count": 10, "pct": 0.5},
+        {"value": "corta", "count": 10, "pct": 0.5},
+    ]
+    fig = categorical_top_bar_figure(top, n_distinct=2, title="col", top_k=6)
+    ax = fig.axes[0]
+    tick_texts = [t.get_text() for t in ax.get_yticklabels()]
+    # El valor largo aparece truncado con elipsis y NO en su forma completa.
+    assert any("…" in t for t in tick_texts)
+    assert long_value not in " ".join(tick_texts)
+    plt.close(fig)
+
+
+def test_none_value_and_none_count_are_handled():
+    top = [
+        {"value": None, "count": 5, "pct": 0.5},
+        {"value": "b", "count": None, "pct": 0.0},  # count None -> se descarta
+        {"value": "c", "count": 5, "pct": 0.5},
+    ]
+    fig = categorical_top_bar_figure(top, n_distinct=2, title="con nones", top_k=6)
+    assert isinstance(fig, Figure)
+    # Solo 2 items válidos, sin overflow -> 2 barras, sin "Otros".
+    assert _bar_count(fig.axes[0]) == 2
+    plt.close(fig)
+
+
+def test_n_rows_adds_exact_others_bar():
+    # 3 categorías mostradas suman 30, dataset real 100 -> "Otros" = 70.
+    top = [
+        {"value": "a", "count": 15, "pct": 0.15},
+        {"value": "b", "count": 10, "pct": 0.10},
+        {"value": "c", "count": 5, "pct": 0.05},
+    ]
+    fig = categorical_top_bar_figure(
+        top, n_distinct=20, title="col", top_k=3, n_rows=100
+    )
+    ax = fig.axes[0]
+    # 3 explícitas + Otros.
+    assert _bar_count(ax) == 4
+    tick_texts = [t.get_text() for t in ax.get_yticklabels()]
+    # La barra Otros refleja n_distinct - top_k = 17 categorías.
+    assert any("Otros (17 categorías)" in t for t in tick_texts)
+    # Su anotación lleva el count 70.
+    annotation_texts = [t.get_text() for t in ax.texts]
+    assert any("70" in t for t in annotation_texts)
+    plt.close(fig)
@@ -0,0 +1,86 @@
+---
+name: load_bq_table_to_duckdb
+kind: function
+lang: py
+domain: datascience
+version: "1.1.0"
+purity: impure
+signature: "def load_bq_table_to_duckdb(table_fqn: str, duckdb_path: str, dest_table: str = '', sample_frac: float = None, max_rows: int = 0, project_id: str = '', pseudonymize_cols: list = None) -> dict"
+description: "Adaptador BigQuery -> DuckDB local para el grupo eda. Trae una tabla o vista de Google BigQuery a un archivo DuckDB local (por defecto COMPLETA, todas las filas; muestreo opt-in con sample_frac), de modo que las funciones del grupo de capacidad eda (que solo hablan DuckDB/PostgreSQL) puedan perfilarla. Fetch via BigQuery Storage Read API (Arrow) con fallback REST. Seudonimiza columnas PII con hash SHA-1 truncado antes de materializar (LOPDGDD/RGPD)."
+tags: [eda, bigquery, duckdb, datascience]
+params:
+  - name: table_fqn
+    desc: "FQN completo de la tabla/vista BigQuery: `project.dataset.table`."
+  - name: duckdb_path
+    desc: "Ruta del archivo DuckDB local donde materializar la tabla (se crea/sobrescribe la tabla dest)."
+  - name: dest_table
+    desc: "Nombre de la tabla DuckDB destino. Vacío = último segmento del FQN, saneado."
+  - name: sample_frac
+    desc: "None (DEFAULT) = FULL, trae todas las filas. Un float en (0,1) activa el muestreo opt-in con `WHERE rand() < frac` (~frac del total). Vistas no admiten TABLESAMPLE, por eso rand()."
+  - name: max_rows
+    desc: "Tope duro opcional de filas (LIMIT). 0 (DEFAULT) = sin tope. Se combina con sample_frac si ambos se pasan."
+  - name: project_id
+    desc: "Proyecto GCP de facturación. Vacío = primer segmento del FQN o el del ADC."
+  - name: pseudonymize_cols
+    desc: "Lista de columnas PII a seudonimizar con hash SHA-1 truncado antes de materializar (LOPDGDD/RGPD). Preserva nulos y cardinalidad."
+output: "dict dict-no-throw. En éxito {status:'ok', duckdb_path, table, n_rows_source, n_rows_fetched, sampled, sample_frac, columns, pseudonymized}. En error {status:'error', error}."
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: []
+tested: false
+tests: []
+test_file_path: ""
+file_path: "python/functions/datascience/load_bq_table_to_duckdb.py"
+---
+
+## Ejemplo
+
+```python
+from datascience import load_bq_table_to_duckdb
+
+# FULL por defecto: trae TODAS las filas de la vista (3,8M) a DuckDB.
+r = load_bq_table_to_duckdb(
+    "autingo-159109.customer_marts.customer_profile",
+    "/tmp/eda_bq.duckdb",
+    pseudonymize_cols=["document_number", "full_name", "email", "phone"],
+)
+print(r["table"], r["n_rows_fetched"], "de", r["n_rows_source"], "sampled=", r["sampled"])
+
+# Muestreo opt-in: ~5 % de las filas.
+r = load_bq_table_to_duckdb(
+    "autingo-159109.customer_marts.customer_profile",
+    "/tmp/eda_bq_sample.duckdb",
+    sample_frac=0.05,
+    pseudonymize_cols=["document_number", "full_name", "email", "phone"],
+)
+```
+
+## Cuando usarla
+
+- Antes de perfilar una tabla/vista de BigQuery con el grupo `eda` (que solo habla DuckDB/PostgreSQL): trae el origen COMPLETO a DuckDB local (o una muestra con `sample_frac`) con seudonimización PII.
+- Cuando necesites un puente único BigQuery -> DuckDB local -> grupo `eda` sin escribir el bridge inline cada vez.
+- Cuando quieras que un EDA sobre datos de negocio conserve valor analítico (cardinalidad, nulos, distribución) sin incrustar datos personales reales.
+
+## Gotchas
+
+- **Impura**: hace I/O de red (BigQuery) + escritura a disco (DuckDB). Requiere ADC configurado (`gcloud auth application-default login`).
+- **403 USER_PROJECT_DENIED**: se evita aplicando `creds.with_quota_project(None)` cuando el ADC arrastra un quota project ajeno (memoria `bq_direct_quota_project`).
+- **TABLESAMPLE no funciona en vistas**: el muestreo (opt-in, `sample_frac`) usa `WHERE rand() < frac` (aplicable a tablas y vistas). `max_rows` es un `LIMIT` como tope duro opcional.
+- **FULL por defecto**: `sample_frac=None` trae TODAS las filas. Trae el resultado a RAM como DataFrame de pandas antes de materializar en DuckDB, así que una tabla de muchos millones × muchas columnas puede consumir varios GB. Para tablas enormes que no quepan, pasa `sample_frac` (muestra) o `max_rows` (tope). El fetch usa el BigQuery Storage Read API (Arrow) cuando `google-cloud-bigquery-storage` + `pyarrow` están disponibles — mucho más rápido que REST para millones de filas; si no, cae al conversor REST automáticamente.
+- **La seudonimización es un hash unidireccional** (SHA-1 truncado a 12 hex): no es reversible, correcto para EDA. Preserva nulos, cardinalidad y patrón de faltantes, pero NO permite recuperar el valor original.
+- **dict-no-throw**: nunca lanza excepción; ante cualquier fallo (FQN inválido, auth, query) devuelve `{status:'error', error:str}`.
+
+## Notas
+
+Adaptador del grupo de capacidad `eda`: el resto de funciones del grupo perfilan
+DuckDB/PostgreSQL, pero no hablan BigQuery de forma nativa. Esta función cubre ese
+hueco materializando una sola tabla DuckDB desde el DataFrame resultante de la
+query BigQuery. El nombre de tabla destino se sanea (`^[A-Za-z_][A-Za-z0-9_]*$`)
+antes de citarlo en el `CREATE OR REPLACE TABLE`.
+
+## Capability growth log
+
+- v1.1.0 (2026-07-01) — FULL pasa a ser el DEFAULT: se sustituye `max_rows=300000, sample=True` por `sample_frac=None` (None = todas las filas) + `max_rows=0` (tope duro opcional). El muestreo es opt-in explícito. Fetch acelerado via BigQuery Storage Read API (Arrow) con fallback REST. Preferencia estándar del usuario: los EDA se corren sobre el total salvo que se pida lo contrario.
@@ -0,0 +1,157 @@
+"""load_bq_table_to_duckdb — adaptador BigQuery -> DuckDB local para el grupo `eda`.
+
+Trae una tabla o vista de Google BigQuery a un archivo DuckDB local (por defecto
+COMPLETA — todas las filas — o una muestra si se pasa `sample_frac`), de modo que
+las funciones del grupo de capacidad `eda` (que perfilan DuckDB/PostgreSQL)
+puedan analizarla sin un adaptador BigQuery nativo. Materializa una sola tabla
+DuckDB desde un DataFrame de pandas.
+
+Modo por defecto = FULL: `sample_frac=None` trae la vista/tabla entera (preferencia
+estándar del usuario: los EDA se corren sobre el total salvo que se pida lo
+contrario). El muestreo es opt-in explícito: `sample_frac=0.05` trae ~5 %; `max_rows`
+es un tope duro opcional (0 = sin tope). El fetch usa el BigQuery Storage Read API
+(Arrow) cuando está disponible, con fallback al conversor REST.
+
+Seudonimización LOPDGDD/RGPD: las columnas listadas en `pseudonymize_cols` se
+transforman con un hash SHA-1 truncado ANTES de escribir a disco, preservando
+nulos, cardinalidad y patrón de faltantes pero sin volcar el valor real (DNI,
+nombre, email, teléfono, etc.). El EDA conserva su valor analítico sin incrustar
+datos personales reales.
+
+Autenticación: ADC (gcloud auth). Aplica creds.with_quota_project(None) para
+evitar el 403 USER_PROJECT_DENIED cuando el ADC lleva quota project ajeno.
+
+Estilo dict-no-throw del grupo `eda`: nunca lanza; devuelve {status:'error', ...}.
+"""
+
+import hashlib
+import re
+
+_FQN_RE = re.compile(r"^[A-Za-z0-9_.\-]+$")
+
+
+def _pseudonymize_series(values):
+    """Hash SHA-1 truncado (12 hex) de cada valor no nulo; conserva None/NaN."""
+    import pandas as pd
+    out = []
+    for v in values:
+        if v is None or (isinstance(v, float) and pd.isna(v)) or (
+            not isinstance(v, (list, dict)) and pd.isna(v) if _safe_isna(v) else False
+        ):
+            out.append(None)
+        else:
+            h = hashlib.sha1(str(v).encode("utf-8")).hexdigest()[:12]
+            out.append(h)
+    return out
+
+
+def _safe_isna(v):
+    import pandas as pd
+    try:
+        return bool(pd.isna(v))
+    except (TypeError, ValueError):
+        return False
+
+
+def load_bq_table_to_duckdb(
+    table_fqn: str,
+    duckdb_path: str,
+    dest_table: str = "",
+    sample_frac: float = None,
+    max_rows: int = 0,
+    project_id: str = "",
+    pseudonymize_cols: list = None,
+) -> dict:
+    try:
+        import duckdb
+        import google.auth
+        from google.cloud import bigquery
+
+        if not table_fqn or not _FQN_RE.match(table_fqn):
+            return {"status": "error", "error": f"table_fqn inválido: {table_fqn!r}"}
+
+        # dest_table: derivar del último segmento del FQN si no se pasa.
+        dest = dest_table or table_fqn.split(".")[-1]
+        if not re.match(r"^[A-Za-z_][A-Za-z0-9_]*$", dest):
+            dest = re.sub(r"[^A-Za-z0-9_]", "_", dest) or "bq_table"
+
+        # Auth ADC con fix de quota project (403 USER_PROJECT_DENIED).
+        creds, adc_project = google.auth.default(
+            scopes=["https://www.googleapis.com/auth/bigquery"]
+        )
+        if hasattr(creds, "with_quota_project"):
+            creds = creds.with_quota_project(None)
+        proj = project_id or table_fqn.split(".")[0] or adc_project
+        client = bigquery.Client(project=proj, credentials=creds)
+
+        # Conteo de filas de origen.
+        cnt = client.query(
+            f"SELECT COUNT(*) AS n FROM `{table_fqn}`"
+        ).result()
+        n_source = 0
+        for row in cnt:
+            n_source = int(row["n"])
+
+        # Modo por defecto = FULL (sample_frac=None -> todas las filas). El
+        # muestreo es opt-in: sample_frac in (0,1) muestrea esa fracción con
+        # `WHERE rand() < frac` (aplicable a tablas y vistas; TABLESAMPLE no va
+        # en vistas). max_rows>0 es un tope duro opcional (LIMIT); 0 = sin tope.
+        sampled = False
+        where = ""
+        if sample_frac is not None and 0 < float(sample_frac) < 1:
+            where = f" WHERE rand() < {float(sample_frac)}"
+            sampled = True
+        limit = f" LIMIT {int(max_rows)}" if max_rows and int(max_rows) > 0 else ""
+        sql = f"SELECT * FROM `{table_fqn}`{where}{limit}"
+
+        # Fetch: BigQuery Storage Read API (Arrow, rápido para millones de filas)
+        # con fallback al conversor REST si la lib no está o falla.
+        try:
+            df = client.query(sql).result().to_dataframe(create_bqstorage_client=True)
+        except Exception:  # noqa: BLE001
+            df = client.query(sql).result().to_dataframe(create_bqstorage_client=False)
+        n_fetched = len(df)
+
+        # Normalizar dtypes de db-dtypes: el conversor REST de BigQuery mapea las
+        # columnas DATE/TIME a las extension dtypes `dbdate`/`dbtime` de db-dtypes,
+        # que DuckDB NO reconoce al registrar el DataFrame ("Data type 'dbdate' not
+        # recognized"). Se convierten a tipos estándar que DuckDB sí ingiere: DATE
+        # -> datetime64[ns], TIME -> string. El resto de dtypes (datetime64 de
+        # TIMESTAMP, Int64/boolean nullable, object) los acepta DuckDB tal cual.
+        import pandas as pd
+        for col in df.columns:
+            dt = str(df[col].dtype)
+            if dt == "dbdate":
+                df[col] = pd.to_datetime(df[col], errors="coerce")
+            elif dt == "dbtime":
+                df[col] = df[col].astype("string").astype(object)
+
+        # Seudonimización de columnas PII antes de escribir a disco.
+        pseudo_applied = []
+        for col in (pseudonymize_cols or []):
+            if col in df.columns:
+                df[col] = _pseudonymize_series(df[col].tolist())
+                pseudo_applied.append(col)
+
+        # Materializar a DuckDB (una tabla desde el DataFrame).
+        con = duckdb.connect(duckdb_path)
+        try:
+            con.register("_src_df", df)
+            con.execute(f'CREATE OR REPLACE TABLE "{dest}" AS SELECT * FROM _src_df')
+            con.unregister("_src_df")
+        finally:
+            con.close()
+
+        return {
+            "status": "ok",
+            "duckdb_path": duckdb_path,
+            "table": dest,
+            "n_rows_source": n_source,
+            "n_rows_fetched": n_fetched,
+            "sampled": sampled,
+            "sample_frac": float(sample_frac) if sampled else None,
+            "columns": list(df.columns),
+            "pseudonymized": pseudo_applied,
+        }
+    except Exception as e:  # noqa: BLE001
+        return {"status": "error", "error": str(e)}
@@ -0,0 +1,121 @@
+---
+id: render_table_as_figure_py_datascience
+name: render_table_as_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def render_table_as_figure(header, rows, title=None, note=None, fontsize=9.0, max_cell_chars=40) -> \"matplotlib.figure.Figure\""
+description: "Dibuja un bloque tabular (cabecera + filas) como una matplotlib.figure.Figure nítida, lista para rasterizar a DPI alto. Pensada para tablas que NO caben como texto en una página/slide del informe EDA: se rasteriza a alta resolución (el caller usa dpi=220, bbox_inches='tight') y el usuario hace zoom en el móvil para leerla entera sin perder datos. Cabecera sombreada (#eef3f6) y en negrita, filas pares (1-based) con zebra suave (#f6f8fa), tinta oscura (#1b1b1b) sobre blanco, rejilla gris muy fina (#cccccc). Trunca cada celda a max_cell_chars con elipsis y str()-ea cada valor (None -> \"\"). figsize proporcional al contenido (ancho por nº y longitud de columnas, alto por nº de filas) para que sea legible con zoom. Backend Agg sin pyplot global. Defensiva: header/rows vacíos o None, filas irregulares o cualquier error interno devuelven una Figure placeholder con texto centrado \"(tabla no disponible)\". NUNCA lanza."
+tags: [eda, table, figure, matplotlib, visualization, rasterize, zoom, render, datascience, impure]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib]
+example: |
+  from datascience.render_table_as_figure import render_table_as_figure
+  header = ["columna", "n_nulos", "%_nulos", "distintos", "tipo", "ejemplo"]
+  rows = [
+      ["ingresos", 12, "1.2%", 980, "float64", "2345.67"],
+      ["edad", 0, "0.0%", 88, "int64", "37"],
+      ["ciudad", 5, "0.5%", 412, "object", "Madrid"],
+  ]
+  fig = render_table_as_figure(header, rows, title="Resumen de columnas",
+                               note="rasteriza a dpi=220 y haz zoom")
+  fig.savefig("/tmp/tabla.png", dpi=220, bbox_inches="tight")
+tested: true
+tests:
+  - "test_returns_figure_with_table"
+  - "test_rows_none_does_not_raise"
+  - "test_header_none_does_not_raise"
+  - "test_empty_lists_return_placeholder_figure"
+  - "test_both_none_return_placeholder_figure"
+  - "test_long_cell_is_truncated"
+  - "test_none_cells_become_empty_strings"
+  - "test_can_rasterize_to_png_high_dpi"
+  - "test_placeholder_can_rasterize"
+  - "test_ragged_rows_are_padded"
+test_file_path: "python/functions/datascience/render_table_as_figure_test.py"
+file_path: "python/functions/datascience/render_table_as_figure.py"
+params:
+  - name: header
+    desc: "Lista de nombres de columna (puede ser [] o None). Cada nombre se str()-ea, se trunca a max_cell_chars y se pinta en la fila cabecera sombreada en negrita. Si está vacío/None no se dibuja fila de cabecera (solo cuerpo)."
+  - name: rows
+    desc: "Lista de filas; cada fila es una lista de celdas con valores cualesquiera (se str()-ean; None -> \"\"). Admite None (se trata como []), filas escalares (se envuelven en una celda) y filas de distinta longitud (la rejilla se rectangulariza al ancho máximo, rellenando con celdas vacías). Saltos de línea/tabs en una celda se colapsan a espacios para que no desborde a otras filas."
+  - name: title
+    desc: "Título opcional dibujado encima de la tabla, en negrita tinta #1b1b1b, alineado a la izquierda. None o \"\" => sin título. Default None."
+  - name: note
+    desc: "Nota opcional al pie de la figura, en gris #8a8a8a e itálica. None o \"\" => sin nota. Default None."
+  - name: fontsize
+    desc: "Tamaño de fuente base (pt) de las celdas del cuerpo. La cabecera usa fontsize+3 y la nota max(7, fontsize-1). Un valor no numérico o <= 0 cae a 9.0. Default 9.0."
+  - name: max_cell_chars
+    desc: "Trunca el texto de cada celda a este nº de chars (con … final cuando se recorta) para que el ancho no explote. Un valor no entero cae a 40; <= 0 deja las celdas vacías. Default 40."
+output: "Un matplotlib.figure.Figure (figsize proporcional al contenido: ancho ≈ 0.9-1.6\" por columna según su texto, total acotado a 3-26\"; alto ≈ 0.32\" por fila + cabecera + espacio para título/nota, acotado) con un Axes sin ejes que contiene un ax.table(...) NO cerrado. Cabecera fondo #eef3f6 texto #1b1b1b bold; filas pares (1-based) zebra #f6f8fa, impares blanco; tinta #1b1b1b; bordes/rejilla #cccccc lw 0.4; texto alineado a la izquierda. Título encima (bold) y nota debajo (gris itálica) si se pasan. Si header/rows son vacíos o None, o ante cualquier error interno, devuelve una Figure placeholder pequeña con el texto centrado \"(tabla no disponible)\". NUNCA lanza. El caller la rasteriza (dpi=220, bbox_inches='tight') y la cierra; la función no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience.render_table_as_figure import render_table_as_figure
+
+# Tabla que no cabe como texto en la slide -> se rasteriza y se lee con zoom.
+header = ["columna", "n_nulos", "%_nulos", "distintos", "tipo", "ejemplo"]
+rows = [
+    ["ingresos", 12, "1.2%", 980, "float64", "2345.67"],
+    ["edad", 0, "0.0%", 88, "int64", "37"],
+    ["ciudad", 5, "0.5%", 412, "object", "Madrid"],
+    ["categoria_producto", 0, "0.0%", 1840, "object",
+     "un_valor_categorico_muy_largo_que_se_trunca"],
+]
+
+fig = render_table_as_figure(
+    header,
+    rows,
+    title="Resumen de columnas",
+    note="rasteriza a dpi=220 y haz zoom en el móvil",
+    fontsize=9.0,
+    max_cell_chars=40,
+)
+
+# El renderer del informe lo rasteriza a alta resolución; aquí lo persistimos.
+fig.savefig("/tmp/tabla.png", dpi=220, bbox_inches="tight")
+```
+
+## Cuando usarla
+
+Úsala en un informe EDA cuando una tabla **no cabe como texto** en una página o
+slide y prefieres una imagen nítida que el lector pueda ampliar en el móvil para
+leerla entera (perfiles de columnas, matrices de conteo, tablas de frecuencias
+con muchas filas o columnas anchas). Pásale la cabecera y las filas tal cual (los
+valores se `str()`-ean por ti) más un `title`/`note` opcionales; el llamante la
+rasteriza a `dpi=220` con `bbox_inches='tight'`. Es la pareja "tabla-como-imagen"
+de los gráficos `build_boxplots_figure` / `categorical_top_pie_figure`: misma
+paleta y mismo contrato (Agg, sin `pyplot`, el caller cierra la figura).
+
+## Gotchas
+
+- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
+  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
+  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
+  es thread-safe; esta función construye el `Figure` directamente, así que es
+  segura de llamar en bucle desde el renderer.
+- **El caller cierra la figura.** Devuelve el `Figure` pero no lo muestra ni lo
+  guarda. Quien la consume debe rasterizarla y luego liberarla
+  (`matplotlib.pyplot.close(fig)`) para no acumular memoria en lotes grandes.
+- **Pensada para rasterizar a DPI alto.** El `figsize` es proporcional al
+  contenido pero la legibilidad real viene del DPI: rasteriza con `dpi=220` y
+  `bbox_inches='tight'`. Una tabla con muchísimas filas crece en alto (capado a
+  ~60") — para miles de filas, parte la tabla o resume antes de pasarla.
+- **Truncación de celda visible.** Cada celda se recorta a `max_cell_chars`
+  (default 40) con `…` final y los saltos de línea/tabs se colapsan a espacios,
+  para que ninguna celda desborde a otras filas. Sube `max_cell_chars` si
+  necesitas ver el valor completo (a costa de ancho).
+- **Defensiva, nunca lanza.** `header`/`rows` vacíos o `None`, filas escalares,
+  filas de distinta longitud o cualquier error interno se manejan sin propagar:
+  en el peor caso devuelve una `Figure` placeholder con "(tabla no disponible)".
+  No envuelvas la llamada en try/except por miedo a un raise — no lo hay.
@@ -0,0 +1,241 @@
+"""Impure EDA helper: a crisp table rendered as a matplotlib Figure (`eda` group).
+
+Draws a tabular block (header + rows) as a sharp ``matplotlib.figure.Figure``
+ready to be rasterized at high DPI, so a table that does NOT fit as text on a
+page/slide can still be read in full by zooming into the rasterized image on a
+phone. The header is shaded and bold, even rows carry a soft zebra stripe, the
+ink is dark on white and the grid is very thin.
+
+Impure because it touches matplotlib's rendering machinery. It uses the headless
+Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
+global state and is safe to call repeatedly from a report renderer. It is fully
+defensive and NEVER raises: empty/invalid input or any internal error returns a
+small placeholder figure carrying a centered "(tabla no disponible)".
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+from matplotlib.figure import Figure  # noqa: E402
+
+# Palette shared with the EDA report renderer so the document stays coherent.
+_HEADER_BG = "#eef3f6"   # header cell background.
+_HEADER_TEXT = "#1b1b1b"  # header cell text (bold).
+_ZEBRA_BG = "#f6f8fa"    # even (1-based) row background stripe.
+_BODY_BG = "#ffffff"     # odd row background.
+_INK = "#1b1b1b"         # body text + title ink.
+_GRID = "#cccccc"        # cell borders / grid (thin).
+_NOTE_TEXT = "#8a8a8a"   # muted gray for the note (italic).
+
+
+def _placeholder_figure(message: str = "(tabla no disponible)") -> "Figure":
+    """Return a small fallback ``Figure`` carrying a single centered message."""
+    fig = Figure(figsize=(6.0, 1.6), dpi=150)
+    ax = fig.add_subplot(111)
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=11,
+        color=_NOTE_TEXT,
+        style="italic",
+        wrap=True,
+        transform=ax.transAxes,
+    )
+    fig.tight_layout()
+    return fig
+
+
+def _cell_text(value, max_cell_chars: int) -> str:
+    """``str()`` a cell value defensively, None -> "", truncate with an ellipsis."""
+    s = "" if value is None else str(value)
+    # Collapse newlines/tabs so a single cell never spills across table rows.
+    s = s.replace("\n", " ").replace("\r", " ").replace("\t", " ")
+    try:
+        limit = int(max_cell_chars)
+    except (TypeError, ValueError):
+        limit = 40
+    if limit <= 0:
+        return ""
+    if len(s) <= limit:
+        return s
+    if limit == 1:
+        return "…"
+    return s[: limit - 1] + "…"
+
+
+def render_table_as_figure(
+    header,
+    rows,
+    title=None,
+    note=None,
+    fontsize=9.0,
+    max_cell_chars=40,
+):
+    """Dibuja una tabla nítida como matplotlib.figure.Figure, lista para rasterizar a DPI alto.
+
+    Pensada para tablas que NO caben como texto en una página/slide: se rasteriza
+    a alta resolución y el usuario hace zoom en el móvil para leerla entera sin
+    perder datos. Cabecera sombreada + negrita, filas pares con zebra suave,
+    tinta oscura sobre blanco, rejilla muy fina.
+
+    Args:
+        header: lista de nombres de columna (puede ser []).
+        rows: lista de filas; cada fila es una lista de celdas (valores cualquiera, se str()-ean).
+        title: título opcional dibujado encima de la tabla (o None).
+        note: nota opcional en gris/itálica bajo la tabla (o None).
+        fontsize: tamaño de fuente base (pt) de las celdas.
+        max_cell_chars: trunca el texto de celda a este nº de chars (con … final) para que no explote el ancho.
+
+    Returns:
+        matplotlib.figure.Figure — NO cerrada (el llamante la rasteriza y la cierra).
+        Nunca lanza: ante cualquier error devuelve una Figure con el texto "(tabla no disponible)".
+    """
+    try:
+        # --- Defensive normalization of header/rows into a rectangular grid.
+        header_list = list(header) if isinstance(header, (list, tuple)) else []
+        raw_rows = list(rows) if isinstance(rows, (list, tuple)) else []
+
+        clean_rows = []
+        for row in raw_rows:
+            if isinstance(row, (list, tuple)):
+                clean_rows.append(list(row))
+            elif row is None:
+                clean_rows.append([])
+            else:
+                # A scalar row becomes a single-cell row instead of being dropped.
+                clean_rows.append([row])
+
+        # Nothing to draw at all -> placeholder.
+        if not header_list and not clean_rows:
+            return _placeholder_figure()
+
+        # Number of columns = widest of header / any row.
+        n_cols = len(header_list)
+        for row in clean_rows:
+            if len(row) > n_cols:
+                n_cols = len(row)
+        if n_cols <= 0:
+            return _placeholder_figure()
+
+        # Base font size, tolerate a bad value.
+        try:
+            base_fs = float(fontsize)
+        except (TypeError, ValueError):
+            base_fs = 9.0
+        if base_fs <= 0:
+            base_fs = 9.0
+
+        # --- Build the truncated, padded text matrix.
+        header_cells = [
+            _cell_text(header_list[c] if c < len(header_list) else "", max_cell_chars)
+            for c in range(n_cols)
+        ]
+        body_cells = []
+        for row in clean_rows:
+            body_cells.append(
+                [
+                    _cell_text(row[c] if c < len(row) else "", max_cell_chars)
+                    for c in range(n_cols)
+                ]
+            )
+
+        has_header = any(t for t in header_cells)
+        n_body = len(body_cells)
+        # Total drawn table rows (header counts as one when present).
+        n_table_rows = n_body + (1 if has_header else 0)
+        if n_table_rows <= 0:
+            return _placeholder_figure()
+
+        # --- figsize proportional to content so it reads under zoom.
+        # Width: per-column width scales with the longest text in that column,
+        # clamped to a sensible per-column range, total capped.
+        per_col_widths = []
+        for c in range(n_cols):
+            col_texts = [header_cells[c]] if has_header else []
+            col_texts += [body_cells[r][c] for r in range(n_body)]
+            longest = max((len(t) for t in col_texts), default=0)
+            # ~0.085" per char at the base font, clamped to [0.9, 1.6] inches.
+            w = 0.9 + 0.085 * max(longest - 6, 0)
+            w = max(0.9, min(1.6, w))
+            per_col_widths.append(w)
+        fig_w = sum(per_col_widths)
+        fig_w = max(3.0, min(26.0, fig_w))
+
+        # Height: ~0.32" per row + room for title / note.
+        fig_h = 0.32 * n_table_rows + 0.30
+        if title is not None and str(title) != "":
+            fig_h += 0.45
+        if note is not None and str(note) != "":
+            fig_h += 0.30
+        fig_h = max(1.0, min(60.0, fig_h))
+
+        fig = Figure(figsize=(fig_w, fig_h), dpi=150)
+        ax = fig.add_subplot(111)
+        ax.axis("off")
+
+        # Reserve vertical bands for the optional title (top) and note (bottom)
+        # so the table itself never overlaps them.
+        title_band = 0.10 if (title is not None and str(title) != "") else 0.0
+        note_band = 0.07 if (note is not None and str(note) != "") else 0.0
+        table_bbox = [0.0, note_band, 1.0, max(0.05, 1.0 - title_band - note_band)]
+
+        cell_text = ([header_cells] if has_header else []) + body_cells
+
+        col_widths = [w / fig_w for w in per_col_widths]
+
+        table = ax.table(
+            cellText=cell_text,
+            colWidths=col_widths,
+            cellLoc="left",
+            loc="center",
+            bbox=table_bbox,
+        )
+        table.auto_set_font_size(False)
+        table.set_fontsize(base_fs)
+
+        # --- Style every cell: zebra body, shaded bold header, thin gray grid.
+        for (r, _c), cell in table.get_celld().items():
+            cell.set_edgecolor(_GRID)
+            cell.set_linewidth(0.4)
+            # Small horizontal padding so text does not touch the border.
+            cell.PAD = 0.04
+            if has_header and r == 0:
+                cell.set_facecolor(_HEADER_BG)
+                cell.set_text_props(color=_HEADER_TEXT, fontweight="bold", ha="left")
+            else:
+                body_index = r - 1 if has_header else r  # 0-based body row.
+                # 1-based even rows get the zebra stripe.
+                is_even = ((body_index + 1) % 2) == 0
+                cell.set_facecolor(_ZEBRA_BG if is_even else _BODY_BG)
+                cell.set_text_props(color=_INK, ha="left")
+
+        if title is not None and str(title) != "":
+            ax.set_title(
+                str(title),
+                fontsize=base_fs + 3.0,
+                fontweight="bold",
+                color=_INK,
+                loc="left",
+                pad=8,
+            )
+
+        if note is not None and str(note) != "":
+            fig.text(
+                0.01,
+                0.01,
+                str(note),
+                ha="left",
+                va="bottom",
+                fontsize=max(7.0, base_fs - 1.0),
+                color=_NOTE_TEXT,
+                style="italic",
+            )
+
+        return fig
+    except Exception:  # noqa: BLE001 — never raise from a figure builder.
+        return _placeholder_figure()
@@ -0,0 +1,119 @@
+"""Tests para render_table_as_figure (tabla nítida como Figure, grupo eda).
+
+Usa el backend Agg sin display; no muestra ni guarda figuras a disco salvo a un
+BytesIO en memoria. Cada test cierra explícitamente la Figure construida
+(matplotlib.pyplot.close) para no acumular estado entre tests.
+"""
+
+from io import BytesIO
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from render_table_as_figure import render_table_as_figure
+
+
+def _grid(n_cols, n_rows):
+    """Cabecera de n_cols columnas + n_rows filas de celdas."""
+    header = [f"col_{c}" for c in range(n_cols)]
+    rows = [[f"r{r}c{c}" for c in range(n_cols)] for r in range(n_rows)]
+    return header, rows
+
+
+def test_returns_figure_with_table():
+    header, rows = _grid(6, 5)
+    fig = render_table_as_figure(header, rows, title="Tabla", note="nota al pie")
+    assert isinstance(fig, Figure)
+    # Hay al menos un Axes y ese Axes contiene una tabla con celdas.
+    assert len(fig.axes) >= 1
+    ax = fig.axes[0]
+    assert len(ax.tables) >= 1
+    # 6 columnas x (1 cabecera + 5 filas) = 36 celdas.
+    assert len(ax.tables[0].get_celld()) == 6 * (5 + 1)
+    plt.close(fig)
+
+
+def test_rows_none_does_not_raise():
+    fig = render_table_as_figure(["a", "b"], None)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_header_none_does_not_raise():
+    fig = render_table_as_figure(None, [["x", "y"], ["z", "w"]])
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_empty_lists_return_placeholder_figure():
+    fig = render_table_as_figure([], [])
+    assert isinstance(fig, Figure)
+    # Placeholder: un Axes con texto, sin tabla.
+    assert len(fig.axes) >= 1
+    assert len(fig.axes[0].tables) == 0
+    plt.close(fig)
+
+
+def test_both_none_return_placeholder_figure():
+    fig = render_table_as_figure(None, None)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes[0].tables) == 0
+    plt.close(fig)
+
+
+def test_long_cell_is_truncated():
+    long_value = "x" * 200
+    header, _ = _grid(2, 0)
+    fig = render_table_as_figure(header, [[long_value, "ok"]], max_cell_chars=20)
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    texts = [c.get_text().get_text() for c in ax.tables[0].get_celld().values()]
+    # La celda larga aparece truncada con elipsis y nunca en su forma completa.
+    assert any(t.endswith("…") and len(t) <= 20 for t in texts)
+    assert long_value not in texts
+    plt.close(fig)
+
+
+def test_none_cells_become_empty_strings():
+    fig = render_table_as_figure(["a", "b"], [[None, "v"], ["w", None]])
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    texts = [c.get_text().get_text() for c in ax.tables[0].get_celld().values()]
+    # Hay celdas vacías (los None) y celdas con valor.
+    assert "" in texts
+    assert "v" in texts
+    plt.close(fig)
+
+
+def test_can_rasterize_to_png_high_dpi():
+    header, rows = _grid(6, 8)
+    fig = render_table_as_figure(header, rows, title="Render", note="zoom me")
+    buf = BytesIO()
+    # No debe lanzar al rasterizar a DPI alto con bbox tight.
+    fig.savefig(buf, format="png", dpi=220, bbox_inches="tight")
+    assert buf.getbuffer().nbytes > 0
+    plt.close(fig)
+
+
+def test_placeholder_can_rasterize():
+    fig = render_table_as_figure([], [])
+    buf = BytesIO()
+    fig.savefig(buf, format="png", dpi=220, bbox_inches="tight")
+    assert buf.getbuffer().nbytes > 0
+    plt.close(fig)
+
+
+def test_ragged_rows_are_padded():
+    # Filas de distinta longitud: la rejilla se rectangulariza al ancho máximo.
+    fig = render_table_as_figure(["a", "b", "c"], [["1"], ["1", "2", "3", "4"]])
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    # 4 columnas (la fila más ancha) x (1 cabecera + 2 filas) = 12 celdas.
+    assert len(ax.tables[0].get_celld()) == 4 * (2 + 1)
+    plt.close(fig)
@@ -0,0 +1,79 @@
+---
+name: summarize_outlier_dims
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: pure
+signature: "def summarize_outlier_dims(raw_numeric: dict, outlier_rows: list, top_k: int = 3) -> list"
+description: "Explica QUE columnas hacen rara cada fila anomala detectada por isolation_forest_outliers. Para cada {row_index, score} reconstruye la fila valida (mismo filtro de columnas numericas y mismo descarte de filas con None que el detector, asi row_index coincide) y devuelve las top_k columnas de mayor |z-score| poblacional (ddof=0). Capa de explicabilidad del paso de outliers multivariante en EDA. Pura y determinista; ante entradas vacias/invalidas o sin filas validas devuelve [] sin petar."
+tags: [eda, models, outliers, anomaly-detection, explainability, z-score, multivariate]
+params:
+  - name: raw_numeric
+    desc: "dict {nombre_columna: [valores]} alineado por fila (como ctx['raw_numeric'] del motor AutomaticEDA). Solo se usan columnas con todos los valores numericos (None permitido por fila; bool/str/NaN/Inf descartan la columna entera) — filtro IDENTICO al de isolation_forest_outliers para que row_index coincida."
+  - name: outlier_rows
+    desc: "Lista de {row_index, score} tal cual la devuelve isolation_forest_outliers. row_index cuenta SOLO las filas validas (sin None) en orden de aparicion, base 0. Entradas fuera de rango o malformadas se ignoran defensivamente."
+  - name: top_k
+    desc: "Numero de columnas (las de mayor |z-score|) a reportar por outlier. Default 3. Valores invalidos (no-int, bool, <1) caen a 3."
+output: "Lista paralela a outlier_rows (mismo orden) de dicts {row_index: int, score: float, dims: [{col: str, value: float, z: float}, ...]}. dims trae hasta top_k columnas ordenadas por |z| descendente, con z (z-score poblacional, ddof=0) redondeado a 3 decimales; si una columna tiene std==0 su z es 0. Las entradas de outlier_rows fuera de rango/malformadas se omiten. Ante raw_numeric vacio/no-dict, outlier_rows no-lista, 0 columnas numericas o 0 filas validas devuelve []."
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: ""
+imports: []
+tested: true
+tests: ["test_row_index_skips_none_rows", "test_extreme_row_flagged_via_isolation", "test_out_of_range_row_index_is_ignored", "test_degrades_to_empty_on_invalid_inputs"]
+test_file_path: "python/functions/datascience/summarize_outlier_dims_test.py"
+file_path: "python/functions/datascience/summarize_outlier_dims.py"
+---
+
+## Ejemplo
+
+```python
+from datascience import isolation_forest_outliers, summarize_outlier_dims
+
+# Nube densa alrededor del origen + 1 fila con un valor extremo en "c".
+raw_numeric = {
+    "a": [0.1, 0.2, -0.1, 0.0, 0.3, -0.2, 0.15, -0.05, 0.25, 0.2, -0.3, 0.1],
+    "b": [1.0, 1.1, 0.9, 1.2, 0.8, 1.0, 1.1, 0.95, 1.05, 0.9, 1.15, 1.0],
+    "c": [5.0, 5.2, 4.8, 5.1, 4.9, 5.0, 4.95, 5.05, 4.9, 500.0, 5.1, 5.0],
+}
+
+result = isolation_forest_outliers(raw_numeric, contamination=0.1)
+summary = summarize_outlier_dims(raw_numeric, result["outlier_rows"], top_k=3)
+
+for item in summary:
+    top = item["dims"][0]
+    print(item["row_index"], top["col"], top["value"], top["z"])
+# La fila del valor 500 sale con dim top "c" y |z| alto: es lo que la hace rara.
+```
+
+## Cuando usarla
+
+Justo **despues** de `isolation_forest_outliers`, cuando ya sabes QUE filas son
+anomalas y quieres explicar POR QUE: en que columnas se desvian mas respecto al
+resto. Util para rellenar la seccion de outliers de un report/notebook EDA con
+"la fila 9 es rara sobre todo por `c` (z=+3.3)" en lugar de solo un row_index
+opaco. Pasa el mismo `raw_numeric` que diste al detector y su `outlier_rows`
+intacto; el `row_index` apunta a la misma fila porque ambas funciones aplican el
+mismo filtro de columnas y el mismo descarte de filas con None.
+
+## Gotchas
+
+- **Mismo `raw_numeric` que el detector**: el `row_index` solo coincide si pasas
+  el mismo dict de columnas (mismo orden, mismas listas) con el que llamaste a
+  `isolation_forest_outliers`. Si cambias las columnas o el orden, los indices
+  dejan de mapear.
+- **`row_index` es relativo a las filas validas**: las filas con `None` en
+  cualquier columna usada se descartan y los indices se recalculan sobre las que
+  quedan (base 0, orden de aparicion). No mapea 1:1 con las listas de entrada si
+  hay None.
+- **z-score poblacional (ddof=0)**: se usa la desviacion tipica poblacional,
+  consistente con el escalado del detector. Columnas con `std==0` (todos los
+  valores iguales) dan `z=0`, asi que nunca aparecen como "raras".
+- **Devuelve `[]` en vez de petar**: entrada no-dict/no-lista, 0 columnas
+  numericas, 0 filas validas, o todas las entradas fuera de rango -> lista vacia.
+  No lanza excepciones.
+- **No llama a `isolation_forest_outliers`**: solo consume su salida. Es una
+  funcion independiente (no la importa), por eso `uses_functions` esta vacio.
@@ -0,0 +1,144 @@
+"""Explica que dimensiones (columnas) hacen rara cada fila anomala.
+
+Toma la salida multivariante de `isolation_forest_outliers` (lista de
+`{row_index, score}`) y, para cada outlier, devuelve las columnas con mayor
+|z-score| respecto a la distribucion de las filas validas. Es la capa de
+"explicabilidad" del paso de outliers multivariante en la fase EDA: el
+Isolation Forest dice QUE filas son raras, esta funcion dice POR QUE (en que
+columnas se desvian mas).
+
+Pura y determinista: reconstruye EXACTAMENTE las mismas "filas validas" que usa
+`isolation_forest_outliers` (mismo filtro de columnas numericas y mismo descarte
+de filas con None), de modo que el `row_index` apunta a la misma fila en ambas
+funciones. No hace I/O ni depende de estado.
+"""
+
+import math
+
+import numpy as np
+
+
+def _is_finite_number(v) -> bool:
+    """True si v es int/float finito. bool NO cuenta; NaN/Inf tampoco."""
+    if isinstance(v, bool):
+        return False
+    if not isinstance(v, (int, float)):
+        return False
+    if isinstance(v, float) and (math.isnan(v) or math.isinf(v)):
+        return False
+    return True
+
+
+def summarize_outlier_dims(
+    raw_numeric: dict,
+    outlier_rows: list,
+    top_k: int = 3,
+) -> list:
+    """Resume las dimensiones que mas desvian a cada fila anomala.
+
+    Args:
+        raw_numeric: dict {nombre_columna: [valores]} alineado por fila (como
+            ctx['raw_numeric'] del motor AutomaticEDA). Solo se usan columnas
+            cuyos valores sean todos numericos (None permitido por fila; bool,
+            str, NaN e Inf descartan la columna entera) — filtro identico al de
+            isolation_forest_outliers.
+        outlier_rows: lista de {row_index, score} tal como la devuelve
+            isolation_forest_outliers. row_index cuenta SOLO las filas validas
+            (sin None) en orden de aparicion, empezando en 0.
+        top_k: numero de columnas (las de mayor |z-score|) a reportar por cada
+            outlier. Default 3. Valores invalidos caen a 3.
+
+    Returns:
+        Lista paralela a outlier_rows (mismo orden) de dicts
+        {row_index, score, dims}, donde dims es la lista de hasta top_k columnas
+        ordenadas por |z| descendente: [{col, value, z}, ...] con z redondeado a
+        3 decimales. Las entradas de outlier_rows fuera de rango o malformadas se
+        omiten (defensivo). Ante raw_numeric vacio/no-dict, outlier_rows
+        no-lista, 0 columnas numericas o 0 filas validas devuelve [].
+    """
+    # Validacion defensiva de los argumentos principales.
+    if not isinstance(raw_numeric, dict) or not isinstance(outlier_rows, list):
+        return []
+    if not isinstance(top_k, int) or isinstance(top_k, bool) or top_k < 1:
+        top_k = 3
+
+    # Seleccion de columnas numericas: identica a isolation_forest_outliers.
+    # Una columna entra solo si todos sus valores son numericos (None permitido
+    # por fila); cualquier bool/str/NaN/Inf descarta la columna completa.
+    numeric_cols: dict[str, list] = {}
+    for name, values in raw_numeric.items():
+        if not isinstance(values, (list, tuple)):
+            continue
+        ok = True
+        for v in values:
+            if v is None:
+                continue
+            if not _is_finite_number(v):
+                ok = False
+                break
+        if ok:
+            numeric_cols[name] = list(values)
+
+    if len(numeric_cols) < 1:
+        return []
+
+    col_names = list(numeric_cols.keys())
+    try:
+        n_rows_total = min(len(numeric_cols[c]) for c in col_names)
+    except ValueError:
+        return []
+
+    # Reconstruye las filas validas con el MISMO criterio que el detector: la
+    # fila i toma un valor por columna; si cualquier valor es None, la fila se
+    # descarta y NO incrementa el indice valido. Asi row_index de outlier_rows
+    # apunta a esta misma secuencia (base 0, orden de aparicion).
+    valid_rows: list[list[float]] = []
+    for i in range(n_rows_total):
+        row = [numeric_cols[c][i] for c in col_names]
+        if any(v is None for v in row):
+            continue
+        valid_rows.append([float(v) for v in row])
+
+    if not valid_rows:
+        return []
+
+    matrix = np.asarray(valid_rows, dtype=float)
+    n_valid = matrix.shape[0]
+    means = matrix.mean(axis=0)
+    stds = matrix.std(axis=0, ddof=0)  # poblacional (ddof=0)
+
+    out: list = []
+    for entry in outlier_rows:
+        if not isinstance(entry, dict):
+            continue
+        ri = entry.get("row_index")
+        # bool es subclase de int: lo excluimos explicitamente.
+        if not isinstance(ri, int) or isinstance(ri, bool):
+            continue
+        if ri < 0 or ri >= n_valid:
+            continue
+
+        try:
+            score = float(entry.get("score"))
+        except (TypeError, ValueError):
+            score = 0.0
+
+        row = matrix[ri]
+        dims = []
+        for j, name in enumerate(col_names):
+            std = stds[j]
+            if std == 0.0:
+                z = 0.0
+            else:
+                z = float((row[j] - means[j]) / std)
+            dims.append({"col": name, "value": float(row[j]), "z": z})
+
+        # Mayor |z| primero; sort estable, empates por orden de columna.
+        dims.sort(key=lambda d: abs(d["z"]), reverse=True)
+        dims = dims[:top_k]
+        for d in dims:
+            d["z"] = round(d["z"], 3)
+
+        out.append({"row_index": int(ri), "score": score, "dims": dims})
+
+    return out
@@ -0,0 +1,93 @@
+"""Tests para summarize_outlier_dims."""
+
+from isolation_forest_outliers import isolation_forest_outliers
+from summarize_outlier_dims import summarize_outlier_dims
+
+
+# Dataset compartido: 3 columnas, 13 filas. La fila ORIGINAL 6 tiene None en "a"
+# (se descarta), de modo que la fila ORIGINAL 10 -- con un valor extremo en "c"
+# -- queda en el indice VALIDO 9 (no 10). Esto verifica el salto de None.
+A = [0.1, 0.2, -0.1, 0.0, 0.3, -0.2, None, 0.15, -0.05, 0.25, 0.2, -0.3, 0.1]
+B = [1.0, 1.1, 0.9, 1.2, 0.8, 1.0, 1.3, 1.1, 0.95, 1.05, 0.9, 1.15, 1.0]
+C = [5.0, 5.2, 4.8, 5.1, 4.9, 5.0, 5.3, 4.95, 5.05, 4.9, 500.0, 5.1, 5.0]
+RAW = {"a": A, "b": B, "c": C}
+
+# Mapa original -> valido (saltando original 6):
+#   orig: 0 1 2 3 4 5 7 8 9 10 11 12
+#  valid: 0 1 2 3 4 5 6 7 8  9 10 11
+# => el extremo en "c" (original 10) esta en el indice valido 9.
+EXTREME_VALID_INDEX = 9
+
+
+def test_row_index_skips_none_rows():
+    # Mapeo directo (sin depender de la aleatoriedad de IsolationForest): el
+    # indice valido 9 debe corresponder a la fila con c == 500 -> el None de la
+    # fila original 6 se salto correctamente.
+    summary = summarize_outlier_dims(
+        RAW, [{"row_index": EXTREME_VALID_INDEX, "score": -0.5}], top_k=3
+    )
+    assert len(summary) == 1
+    entry = summary[0]
+    assert entry["row_index"] == EXTREME_VALID_INDEX
+    assert entry["score"] == -0.5
+    # La dimension dominante es "c", con su valor extremo y |z| alto.
+    top = entry["dims"][0]
+    assert top["col"] == "c"
+    assert top["value"] == 500.0
+    assert abs(top["z"]) > 2.0
+    # top_k respetado: como mucho 3 dims.
+    assert len(entry["dims"]) <= 3
+
+
+def test_extreme_row_flagged_via_isolation():
+    # Integracion real: detectar outliers y explicarlos.
+    result = isolation_forest_outliers(RAW, contamination=0.1)
+    assert "note" not in result
+    outlier_rows = result["outlier_rows"]
+    assert outlier_rows  # al menos un outlier
+
+    summary = summarize_outlier_dims(RAW, outlier_rows, top_k=3)
+    # Paralela a outlier_rows (todos los indices estan en rango).
+    assert len(summary) == len(outlier_rows)
+
+    by_index = {e["row_index"]: e for e in summary}
+    # El punto extremo debe estar entre los outliers detectados...
+    assert EXTREME_VALID_INDEX in by_index
+    # ...y su dimension top debe ser "c" (donde se desvia ~muchas sigmas).
+    extreme = by_index[EXTREME_VALID_INDEX]
+    assert extreme["dims"][0]["col"] == "c"
+    assert abs(extreme["dims"][0]["z"]) > 2.0
+
+
+def test_out_of_range_row_index_is_ignored():
+    # Indices fuera de rango se omiten en lugar de petar.
+    summary = summarize_outlier_dims(
+        RAW,
+        [
+            {"row_index": 999, "score": -1.0},
+            {"row_index": -1, "score": -1.0},
+            {"row_index": EXTREME_VALID_INDEX, "score": -0.5},
+        ],
+        top_k=2,
+    )
+    # Solo sobrevive el indice valido; los otros dos se descartan.
+    assert len(summary) == 1
+    assert summary[0]["row_index"] == EXTREME_VALID_INDEX
+    assert len(summary[0]["dims"]) <= 2
+
+
+def test_degrades_to_empty_on_invalid_inputs():
+    # raw_numeric vacio + outlier_rows vacio.
+    assert summarize_outlier_dims({}, [], 3) == []
+    # raw_numeric no es dict.
+    assert summarize_outlier_dims("not a dict", [{"row_index": 0}], 3) == []
+    # outlier_rows no es lista.
+    assert summarize_outlier_dims(RAW, "not a list", 3) == []
+    # Sin columnas numericas (todas con strings) -> [].
+    assert summarize_outlier_dims(
+        {"s": ["x", "y", "z"]}, [{"row_index": 0, "score": -1.0}], 3
+    ) == []
+    # Entradas malformadas dentro de outlier_rows se ignoran (no petan).
+    assert summarize_outlier_dims(
+        RAW, ["nope", 42, {"no_row_index": 1}], 3
+    ) == []
@@ -0,0 +1,466 @@
+"""Batería de tests de ACEPTACIÓN del AutomaticEDA — "que cada AEDA salga como queremos".
+
+Esta suite es la red de seguridad del subsistema EDA del grupo `eda`: garantiza
+que CADA capítulo de un informe AutomaticEDA sale poblado y con su contenido
+esencial, que la feature de capítulos sueltos (``only_chapters``) resuelve sus
+dependencias de cómputo, que los capítulos opcionales devuelven None cuando no
+aplican, que el informe de carpeta multi-tabla detecta la FK, y que el Markdown
+trae el apéndice completo (matriz de asociación entera + describe con
+skew/kurtosis). A diferencia de los tests unitarios de cada capítulo, aquí se
+ejercita el pipeline END-TO-END sobre un dataset sintético determinista que
+activa todos los capítulos a la vez.
+
+Determinismo: el dataset se genera con ``seed`` fijo y el pipeline corre sin LLM
+(``profile_level='standard'``), de modo que el manifest y el Markdown son
+reproducibles entre corridas. Un único render `standard` se reutiliza vía un
+fixture de scope module para no repetir el cómputo caro.
+
+dict-no-throw: los pipelines del grupo `eda` nunca lanzan; aquí se asserta sobre
+``status == 'ok'`` y luego sobre el contenido concreto del manifest / Markdown.
+
+Honestidad (DoD): los asserts comprueban CONTENIDO real (texto esencial de cada
+capítulo), no solo el heading. Si un capítulo dejara de emitir su contenido (un
+cambio rompiera la distribución numérica, el Isolation Forest, la matriz de
+correlación completa, …), el test correspondiente FALLA nombrando el capítulo y
+el fragmento ausente — no se ablanda para que pase.
+"""
+
+import json
+import os
+import subprocess
+import sys
+
+import pytest
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+from datascience.automatic_eda import CHAPTER_ORDER  # noqa: E402
+from datascience.generate_synthetic_eda_folder import (  # noqa: E402
+    generate_synthetic_eda_folder,
+)
+from datascience.generate_synthetic_eda_table import (  # noqa: E402
+    generate_synthetic_eda_table,
+)
+from pipelines.render_automatic_eda import render_automatic_eda  # noqa: E402
+from pipelines.render_automatic_eda_folder import (  # noqa: E402
+    render_automatic_eda_folder,
+)
+
+# --------------------------------------------------------------------------- #
+# Parámetros deterministas del fixture de oro.
+# --------------------------------------------------------------------------- #
+SEED = 42
+N_ROWS = 800
+TABLE = "synthetic"
+
+# El capítulo `analisis_llm` SOLO se computa con run_llm=True; en el preset
+# `standard` (sin LLM, lo que esta suite usa) no debe aparecer. Por eso los
+# capítulos esperados en un informe `standard` son todos los de CHAPTER_ORDER
+# MENOS analisis_llm. CHAPTER_ORDER es la fuente de verdad de los 16 capítulos
+# del motor (portada … glosario).
+LLM_ONLY_CHAPTERS = {"analisis_llm"}
+EXPECTED_STANDARD = [c for c in CHAPTER_ORDER if c not in LLM_ONLY_CHAPTERS]
+
+
+def _pdf_text(path):
+    """Texto del PDF vía pdftotext, o None si la herramienta no está disponible."""
+    try:
+        out = subprocess.run(
+            ["pdftotext", "-layout", path, "-"],
+            capture_output=True, text=True, timeout=60,
+        )
+        return out.stdout if out.returncode == 0 else None
+    except Exception:  # noqa: BLE001 — la verificación principal es sobre el MD.
+        return None
+
+
+def _manifest_chapters(result):
+    """Set de ids de capítulo presentes en el manifest del resultado."""
+    with open(result["manifest_path"], encoding="utf-8") as fh:
+        return set((json.load(fh).get("chapters") or {}).keys())
+
+
+# --------------------------------------------------------------------------- #
+# Fixtures de scope module: el dataset sintético se genera UNA vez y el render
+# `standard` se computa UNA vez; todos los tests de contenido lo reutilizan.
+# --------------------------------------------------------------------------- #
+@pytest.fixture(scope="module")
+def synth_db(tmp_path_factory):
+    """Tabla sintética determinista que activa los 16 capítulos del motor."""
+    d = tmp_path_factory.mktemp("aeda_accept_synth")
+    db = str(d / "synthetic.duckdb")
+    g = generate_synthetic_eda_table(db, TABLE, n_rows=N_ROWS, seed=SEED)
+    assert g["status"] == "ok", g.get("error")
+    return {"db": db, "table": TABLE, "gen": g}
+
+
+@pytest.fixture(scope="module")
+def standard_run(synth_db, tmp_path_factory):
+    """Render AutomaticEDA `standard` (sin LLM) sobre el dataset sintético.
+
+    Devuelve el dict del pipeline más el manifest cargado, el texto del Markdown
+    y el del PDF (si pdftotext está). Reutilizado por la mayoría de los tests.
+    """
+    out = str(tmp_path_factory.mktemp("aeda_accept_std"))
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        profile_level="standard", out_dir=out, basename="synth_std",
+    )
+    assert r["status"] == "ok", r.get("error")
+    with open(r["manifest_path"], encoding="utf-8") as fh:
+        manifest = json.load(fh)
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    return {
+        "r": r,
+        "manifest": manifest,
+        "chapters": manifest.get("chapters") or {},
+        "md": md,
+        "pdf_text": _pdf_text(r["pdf_path"]),
+    }
+
+
+@pytest.fixture(scope="module")
+def minimal_db(tmp_path_factory):
+    """Tabla mínima SIN texto libre, SIN fecha y SIN lat/lon.
+
+    Sirve para comprobar que text_distr / timeseries / geospatial devuelven None
+    (no aparecen en el manifest) y el EDA no peta. Solo numéricas continuas +
+    una categórica de baja cardinalidad.
+    """
+    import random
+
+    import duckdb
+
+    d = tmp_path_factory.mktemp("aeda_accept_min")
+    db = str(d / "minimal.duckdb")
+    con = duckdb.connect(db)
+    con.execute("CREATE TABLE minimal (a DOUBLE, b DOUBLE, c INTEGER, grp VARCHAR)")
+    random.seed(7)
+    rows = [
+        (round(random.gauss(10, 2), 3), round(random.gauss(50, 5), 3),
+         random.randint(1, 100), ["x", "y", "z"][i % 3])
+        for i in range(120)
+    ]
+    con.executemany("INSERT INTO minimal VALUES (?,?,?,?)", rows)
+    con.close()
+    return {"db": db, "table": "minimal"}
+
+
+# --------------------------------------------------------------------------- #
+# 1) COBERTURA DE CAPÍTULOS (golden) — el manifest standard trae los 15
+#    capítulos no-LLM esperados, ninguno falta, y analisis_llm NO sale sin LLM.
+# --------------------------------------------------------------------------- #
+def test_standard_cubre_todos_los_capitulos_esperados(standard_run):
+    chapters = set(standard_run["chapters"].keys())
+    expected = set(EXPECTED_STANDARD)
+    missing = expected - chapters
+    assert not missing, (
+        "capítulos esperados ausentes del manifest standard: "
+        f"{sorted(missing)} (presentes: {sorted(chapters)})"
+    )
+    # analisis_llm requiere run_llm=True: en standard NO debe aparecer.
+    assert "analisis_llm" not in chapters, (
+        "analisis_llm apareció sin LLM: el preset standard no debería computarlo"
+    )
+
+
+def test_manifest_top_level_es_valido(standard_run):
+    """El manifest declara el motor y un dict de capítulos con metadatos por id."""
+    man = standard_run["manifest"]
+    assert man.get("engine") == "AutomaticEDA"
+    assert man.get("engine_version")
+    chapters = standard_run["chapters"]
+    # Cada capítulo trae version + nº de páginas/slides (formato del manifest).
+    for cid, meta in chapters.items():
+        assert meta.get("version"), f"capítulo {cid} sin version en el manifest"
+        assert (meta.get("n_pages") or 0) > 0, f"capítulo {cid} con 0 páginas"
+
+
+# --------------------------------------------------------------------------- #
+# 2) CONTENIDO CLAVE POR CAPÍTULO (acceptance) — cada capítulo trae su contenido
+#    ESENCIAL en el Markdown, no solo el heading. Un fragmento ausente nombra el
+#    capítulo y el texto que falta.
+# --------------------------------------------------------------------------- #
+# Fragmentos de texto ESTABLE que cada capítulo emite en el Markdown del dataset
+# sintético. No son números frágiles: son etiquetas/estructura del capítulo más
+# nombres de columna del fixture. Si un capítulo deja de poblar su contenido, su
+# fragmento desaparece y el test falla nombrándolo.
+CHAPTER_NEEDLES = {
+    "portada":      ["800 filas", "19 columnas"],
+    "overview":     ["Primeras filas (df.head)", "Diccionario de columnas",
+                     "customer_id", "signup_date"],
+    "num_distr":    ["Distribuciones numéricas", "vallas Tukey", "income"],
+    "cat_distr":    ["Distribuciones categóricas", "Entropía", "Top categorías",
+                     "country"],
+    "text_distr":   ["Texto libre (NLP)", "TTR", "Términos más frecuentes",
+                     "Idioma dominante"],
+    "calidad":      ["Cómo se calcula la calidad", "Calidad global"],
+    "missingness":  ["Datos faltantes", "Celdas faltantes (global)",
+                     "Faltantes por columna"],
+    "outliers":     ["Valores atípicos por columna", "Filas atípicas (multivariante)",
+                     "Isolation Forest", "Filas analizadas"],
+    "correlacion":  ["Matriz de asociación", "Pares más correlacionados"],
+    "relaciones":   ["Candidatas a clave primaria", "customer_id"],
+    "modelos":      ["PCA — varianza explicada", "Segmentación (KMeans)"],
+    "timeseries":   ["Series temporales", "Columna de fecha", "signup_date"],
+    "geospatial":   ["Análisis geoespacial", "Extensión geográfica", "Centroide"],
+    "agregacion":   ["Agregación por grupos", "Agrupado por"],
+    "glosario":     ["Glosario de términos",
+                     "### Isolation Forest (anomalías multivariantes)",
+                     "### PCA (componentes principales)"],
+}
+
+
+def test_needles_cubren_exactamente_los_capitulos_standard():
+    """Guard de mantenimiento: las needles cubren los mismos 15 capítulos no-LLM.
+
+    Si alguien añade un capítulo nuevo a CHAPTER_ORDER, este test recuerda que
+    hay que documentar su contenido esencial aquí (o marcarlo como LLM-only)."""
+    assert set(CHAPTER_NEEDLES.keys()) == set(EXPECTED_STANDARD), (
+        "CHAPTER_NEEDLES desincronizado con los capítulos esperados de standard: "
+        f"falta needles para {set(EXPECTED_STANDARD) - set(CHAPTER_NEEDLES)}, "
+        f"sobra {set(CHAPTER_NEEDLES) - set(EXPECTED_STANDARD)}"
+    )
+
+
+@pytest.mark.parametrize("chapter_id", list(CHAPTER_NEEDLES.keys()))
+def test_capitulo_trae_su_contenido_esencial(standard_run, chapter_id):
+    md = standard_run["md"]
+    # Pre-condición: el capítulo está en el manifest (cobertura). Si no, es un
+    # fallo de cobertura, no de contenido — se reporta como tal.
+    assert chapter_id in standard_run["chapters"], (
+        f"capítulo {chapter_id} ausente del manifest (fallo de cobertura)"
+    )
+    for needle in CHAPTER_NEEDLES[chapter_id]:
+        assert needle in md, (
+            f"capítulo '{chapter_id}': falta su contenido esencial en el Markdown "
+            f"— fragmento ausente: {needle!r}"
+        )
+
+
+def test_outliers_isolation_forest_poblado_no_degradado(standard_run):
+    """El bloque multivariante (Isolation Forest) sale con datos, no degradado."""
+    md = standard_run["md"]
+    assert "Anomalías multivariantes" in md
+    assert "Filas analizadas" in md, "el Isolation Forest no trae su tabla poblada"
+    assert "No se pudo analizar la anomalía multivariante" not in md, (
+        "el bloque multivariante salió degradado en el informe completo"
+    )
+    # El perfil trae el bloque de modelos con los outliers multivariantes.
+    models = (standard_run["r"]["profile"] or {}).get("models") or {}
+    assert models.get("outliers") is not None, "profile['models']['outliers'] vacío"
+
+
+# --------------------------------------------------------------------------- #
+# 3) CAPÍTULOS SUELTOS CON DEPS RESUELTAS (acceptance de only_chapters) — pedir
+#    un capítulo suelto lo deja POBLADO porque la resolución de dependencias
+#    activa el cómputo que necesita, aunque el caller no lo pidiera.
+# --------------------------------------------------------------------------- #
+def test_only_outliers_isolation_forest_poblado(synth_db, tmp_path):
+    """only=['outliers'] sin run_models explícito → IsolationForest poblado."""
+    out = str(tmp_path / "only_out")
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        only_chapters=["outliers"], out_dir=out, basename="only_outliers",
+    )
+    assert r["status"] == "ok", r.get("error")
+    # Documento = portada + outliers + glosario, nada más.
+    assert _manifest_chapters(r) == {"portada", "outliers", "glosario"}
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Filas atípicas (multivariante)" in md
+    assert "Filas analizadas" in md, "Isolation Forest sin tabla poblada"
+    assert "No se pudo analizar la anomalía multivariante" not in md, (
+        "el multivariante salió degradado pese a resolver las deps"
+    )
+    # La resolución activó run_models → el perfil trae el bloque de modelos.
+    assert ((r["profile"] or {}).get("models") or {}).get("outliers") is not None
+
+
+def test_only_timeseries_rango_temporal_presente(synth_db, tmp_path):
+    """only=['timeseries'] → rango temporal poblado (run_series resuelto)."""
+    out = str(tmp_path / "only_ts")
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        only_chapters=["timeseries"], out_dir=out, basename="only_ts",
+    )
+    assert r["status"] == "ok", r.get("error")
+    assert "timeseries" in _manifest_chapters(r)
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Columna de fecha" in md
+    assert "signup_date" in md, "la serie no nombra su columna de fecha"
+    # run_series resuelto por deps → el perfil trae el análisis de serie.
+    assert (r["profile"] or {}).get("series") is not None, (
+        "only=['timeseries'] debe activar run_series por dependencias"
+    )
+
+
+def test_only_correlacion_scatters_presentes(synth_db, tmp_path):
+    """only=['correlacion'] → matriz + scatters de los pares fuertes."""
+    out = str(tmp_path / "only_corr")
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        only_chapters=["correlacion"], out_dir=out, basename="only_corr",
+    )
+    assert r["status"] == "ok", r.get("error")
+    assert _manifest_chapters(r) == {"portada", "correlacion", "glosario"}
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Matriz de asociación" in md
+    assert "Relaciones más fuertes (scatter)" in md, "faltan los scatters"
+    assert "Dispersión de" in md, "no se emitió ninguna figura de dispersión"
+
+
+# --------------------------------------------------------------------------- #
+# 4) NONE CUANDO NO APLICA — sobre una tabla sin texto largo, sin fecha y sin
+#    lat/lon, text_distr / timeseries / geospatial NO aparecen y el EDA no peta.
+# --------------------------------------------------------------------------- #
+def test_capitulos_opcionales_ausentes_cuando_no_aplican(minimal_db, tmp_path):
+    out = str(tmp_path / "minimal_out")
+    r = render_automatic_eda(
+        minimal_db["db"], minimal_db["table"],
+        profile_level="standard", out_dir=out, basename="minimal",
+    )
+    assert r["status"] == "ok", r.get("error")
+    chapters = _manifest_chapters(r)
+    for absent in ("text_distr", "timeseries", "geospatial"):
+        assert absent not in chapters, (
+            f"capítulo {absent} apareció en una tabla que no lo justifica "
+            f"(presentes: {sorted(chapters)})"
+        )
+    # El documento sigue siendo válido: portada + glosario + capítulos que sí
+    # aplican (overview/num_distr/correlacion al menos).
+    assert {"portada", "glosario", "overview", "num_distr"} <= chapters
+
+
+# --------------------------------------------------------------------------- #
+# 5) FOLDER MULTI-TABLA (acceptance) — el informe de carpeta perfila las N tablas
+#    y el capítulo de relaciones detecta la FK por containment.
+# --------------------------------------------------------------------------- #
+def test_folder_multitabla_con_fk_detectada(tmp_path):
+    fdir = str(tmp_path / "folder")
+    g = generate_synthetic_eda_folder(fdir, n_rows=300, seed=SEED)
+    assert g["status"] == "ok", g.get("error")
+
+    out = str(tmp_path / "fout")
+    rf = render_automatic_eda_folder(fdir, out_dir=out, basename="folder")
+    assert rf["status"] == "ok", rf.get("error")
+
+    # Las 3 tablas se perfilaron.
+    assert rf["n_tables"] == 3, f"esperadas 3 tablas, vistas {rf['n_tables']}"
+
+    # El manifest base trae el capítulo de relaciones inter-tabla.
+    with open(rf["manifest_path"], encoding="utf-8") as fh:
+        chapters = set((json.load(fh).get("chapters") or {}).keys())
+    assert "relaciones" in chapters, (
+        f"el documento de carpeta no incluye el capítulo de relaciones: {chapters}"
+    )
+
+    # El Markdown nombra las 3 tablas y declara la FK detectada por containment.
+    md = open(rf["md_path"], encoding="utf-8").read()
+    for tbl in ("customers", "orders", "reviews"):
+        assert tbl in md, f"la tabla {tbl} no aparece en el informe de carpeta"
+    assert "FK candidatas" in md, "no se declaran las FK candidatas"
+    assert "orders.customer_id" in md and "customers.customer_id" in md, (
+        "la FK orders→customers no se detectó por containment"
+    )
+    assert "reviews.customer_id" in md, "la FK reviews→customers no se detectó"
+
+
+# --------------------------------------------------------------------------- #
+# 6) MD COMPLETITUD (regresión) — el Markdown trae el apéndice con la matriz de
+#    asociación COMPLETA (todos los pares, no solo el top) y el describe con
+#    skew/kurtosis de todas las numéricas. Protege un fix ya mergeado.
+# --------------------------------------------------------------------------- #
+def test_md_apendice_matriz_correlacion_completa(standard_run):
+    md = standard_run["md"]
+    assert "Matriz de asociación — todos los pares" in md, (
+        "falta el apéndice con la matriz de asociación completa"
+    )
+    # Un par num-num de correlación BAJA que el top del capítulo NUNCA mostraría:
+    # su presencia prueba que el apéndice lista TODOS los pares, no solo el top.
+    assert "income ↔ longitude" in md, (
+        "el apéndice no contiene los pares de baja correlación: no es la matriz "
+        "completa, solo el top-k del capítulo"
+    )
+
+
+def test_md_apendice_describe_con_skew_kurtosis(standard_run):
+    md = standard_run["md"]
+    assert "Estadísticos numéricos completos (describe)" in md, (
+        "falta el apéndice describe completo"
+    )
+    # La cabecera del describe del apéndice lleva las columnas skew y kurtosis
+    # (subcadena única de ese header). Sin ellas el describe está incompleto.
+    assert "| skew | kurtosis |" in md, (
+        "el describe del apéndice no trae las columnas skew/kurtosis"
+    )
+
+
+# --------------------------------------------------------------------------- #
+# 7) LAS 3 SALIDAS NO-VACÍAS — PDF con páginas, PPTX con slides, MD con un mínimo
+#    de caracteres, y los tres archivos en disco. Manifest válido.
+# --------------------------------------------------------------------------- #
+def test_tres_salidas_no_vacias(standard_run):
+    r = standard_run["r"]
+    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
+    assert r["pptx_path"] and os.path.exists(r["pptx_path"])
+    assert r["aeda_md_path"] and os.path.exists(r["aeda_md_path"])
+    assert (r["n_pages"] or 0) > 0, "el PDF no tiene páginas"
+    assert (r["n_slides"] or 0) > 0, "el PPTX no tiene slides"
+    # El informe completo es grande: un mínimo holgado protege contra un MD vacío
+    # o truncado sin atarse a un tamaño exacto.
+    assert (r["md_chars"] or 0) > 10000, f"MD demasiado corto: {r['md_chars']} chars"
+    assert r["manifest_path"] and os.path.exists(r["manifest_path"])
+
+
+def test_pdf_texto_extraible_con_contenido(standard_run):
+    """Si pdftotext está disponible, el PDF debe traer texto real (no solo
+    imágenes): la portada nombra el dataset y su forma. Si no está la
+    herramienta, el test se omite (no es un fallo del EDA)."""
+    txt = standard_run["pdf_text"]
+    if txt is None:
+        pytest.skip("pdftotext no disponible")
+    assert len(txt) > 5000, "el PDF apenas tiene texto extraíble"
+    assert "Portada" in txt or "synthetic" in txt, (
+        "el texto del PDF no contiene la portada esperada"
+    )
+
+
+# --------------------------------------------------------------------------- #
+# DETERMINISMO — dos renders del MISMO dataset producen el MISMO manifest
+# (mismos capítulos y mismos n_pages/n_slides por capítulo). El generated_at
+# difiere por timestamp, por eso se compara el dict de capítulos, no el archivo.
+# --------------------------------------------------------------------------- #
+def test_render_es_determinista(synth_db, tmp_path):
+    out1 = str(tmp_path / "det1")
+    out2 = str(tmp_path / "det2")
+    r1 = render_automatic_eda(synth_db["db"], synth_db["table"],
+                              profile_level="standard", out_dir=out1, basename="d1")
+    r2 = render_automatic_eda(synth_db["db"], synth_db["table"],
+                              profile_level="standard", out_dir=out2, basename="d2")
+    assert r1["status"] == "ok" and r2["status"] == "ok"
+    c1 = json.load(open(r1["manifest_path"], encoding="utf-8")).get("chapters")
+    c2 = json.load(open(r2["manifest_path"], encoding="utf-8")).get("chapters")
+    assert c1 == c2, "el manifest no es determinista entre dos renders del mismo dataset"
+
+
+# --------------------------------------------------------------------------- #
+# SLOW (opcional, skippeable) — informe `full` con narrativa LLM. Requiere red /
+# credenciales y NO es determinista, por eso está apagado salvo opt-in explícito
+# vía la variable de entorno EDA_ACCEPT_LLM=1. Se omite con skipif (no con un
+# marker custom) para no depender de registro de marks en la config del repo.
+# --------------------------------------------------------------------------- #
+@pytest.mark.skipif(
+    os.environ.get("EDA_ACCEPT_LLM") != "1",
+    reason="full+LLM es lento/no determinista; exporta EDA_ACCEPT_LLM=1 para correrlo",
+)
+def test_full_incluye_capitulo_analisis_llm(synth_db, tmp_path):
+    out = str(tmp_path / "full")
+    r = render_automatic_eda(synth_db["db"], synth_db["table"],
+                             profile_level="full", out_dir=out, basename="full")
+    assert r["status"] == "ok", r.get("error")
+    assert "analisis_llm" in _manifest_chapters(r), (
+        "el preset full debe incluir el capítulo de análisis LLM"
+    )
@@ -0,0 +1,106 @@
+---
+name: profile_bq_table
+kind: pipeline
+lang: py
+domain: pipelines
+purity: impure
+version: "1.1.0"
+signature: "def profile_bq_table(table_fqn: str, sample_frac: float = None, max_rows: int = 0, pseudonymize_cols: list = None, run_models: bool = True, run_series: bool = False, run_llm: bool = False, project_id: str = \"\", report_dir: str = \"reports\", duckdb_path: str = \"\", keep_duckdb: bool = False) -> dict"
+description: "EDA one-shot de una tabla o vista de BigQuery: materializa el origen COMPLETO por defecto (todas las filas; muestreo opt-in con sample_frac; seudonimizacion PII opcional, LOPDGDD/RGPD) a un DuckDB local con load_bq_table_to_duckdb y lo perfila end-to-end con profile_table del grupo de capacidad eda, emitiendo el informe AutomaticEDA (PDF A5 movil + PPTX 16:9), Markdown y JSON sidecar. Es el adaptador BigQuery que faltaba en el grupo eda, resuelto por composicion (BigQuery -> DuckDB local -> profile_table) sin duplicar la logica de perfilado ni de render. Es el hazme un EDA de esta tabla BigQuery en una sola llamada, sobre el total de filas por defecto."
+tags: [eda, bigquery, launcher]
+uses_functions:
+  - load_bq_table_to_duckdb_py_datascience
+  - profile_table_py_pipelines
+uses_types: []
+returns: []
+returns_optional: false
+error_type: error_go_core
+imports: []
+tested: false
+tests: []
+test_file_path: ""
+file_path: "python/functions/pipelines/profile_bq_table.py"
+params:
+  - name: table_fqn
+    desc: "FQN de la tabla/vista BigQuery: `project.dataset.table`."
+  - name: sample_frac
+    desc: "None (DEFAULT) = FULL, perfila TODAS las filas del origen. Un float en (0,1) activa el muestreo opt-in (`WHERE rand() < frac`, ~frac del total)."
+  - name: max_rows
+    desc: "Tope duro opcional de filas (LIMIT). 0 (DEFAULT) = sin tope. Se combina con sample_frac si ambos se pasan."
+  - name: pseudonymize_cols
+    desc: "Columnas PII a seudonimizar (hash) antes de materializar (LOPDGDD/RGPD). Preserva nulos y cardinalidad."
+  - name: run_models
+    desc: "PCA/KMeans/IsolationForest/normalidad sobre numericas. Default True (informe AutomaticEDA completo)."
+  - name: run_series
+    desc: "Analisis de serie temporal por columna numerica. Default False."
+  - name: run_llm
+    desc: "1 llamada LLM sobre el perfil agregado (nunca filas crudas). Default False."
+  - name: project_id
+    desc: "Proyecto GCP de facturacion. Vacio = primer segmento del FQN."
+  - name: report_dir
+    desc: "Directorio de salida de los reports. Default 'reports' (artefacto local gitignored)."
+  - name: duckdb_path
+    desc: "Ruta DuckDB a usar. Vacio = temporal autogestionado."
+  - name: keep_duckdb
+    desc: "Si True conserva el DuckDB materializado (para el notebook Jupyter). Default False."
+output: "dict dict-no-throw. En exito {status:'ok', table_fqn, load:{n_rows_source,n_rows_fetched,sampled,sample_frac,pseudonymized,table}, duckdb_path, report_md_path, report_json_path, aeda_pdf_path, aeda_pptx_path, aeda_manifest_path, profile}. En error {status:'error', error, stage}."
+---
+
+## Ejemplo
+
+```python
+from pipelines.profile_bq_table import profile_bq_table
+
+# FULL por defecto: EDA sobre TODAS las filas de la vista (3,8M).
+r = profile_bq_table(
+    "autingo-159109.customer_marts.customer_profile",
+    pseudonymize_cols=["document_number", "full_name", "email", "phone", "postal_code", "salesforce_customer_id"],
+    run_models=True,
+)
+print(r["load"]["n_rows_fetched"], "filas perfiladas, sampled=", r["load"]["sampled"])
+print(r["aeda_pdf_path"]); print(r["aeda_pptx_path"]); print(r["report_md_path"])
+
+# Muestreo opt-in: EDA sobre ~5 % de las filas (tabla enorme / iteracion rapida).
+r = profile_bq_table(
+    "autingo-159109.customer_marts.customer_profile",
+    sample_frac=0.05,
+    pseudonymize_cols=["document_number", "full_name", "email", "phone", "postal_code", "salesforce_customer_id"],
+)
+```
+
+## Cuando usarla
+
+Cuando pidan un EDA de una tabla o vista de BigQuery ("hazme un EDA de esta
+tabla BigQuery"). Es el adaptador BigQuery del grupo de capacidad `eda` por
+composicion: trae el origen COMPLETO (todas las filas, por defecto) a un DuckDB
+local y delega todo el perfilado y render en `profile_table`, sin adaptador
+BigQuery nativo ni logica de EDA duplicada. Usala como primer paso al recibir un
+dataset BigQuery desconocido, antes de modelar o limpiar, o para auditar la
+calidad de una vista ya productiva. Para iteracion rapida o tablas que no quepan
+en RAM, pasa `sample_frac` (muestreo opt-in).
+
+## Gotchas
+
+- Impura: requiere ADC de BigQuery configurado (Application Default Credentials)
+  para que `load_bq_table_to_duckdb` autentique contra el proyecto.
+- FULL por defecto: `sample_frac=None` perfila TODAS las filas del origen. Una
+  vista de millones de filas se trae entera a RAM (varios GB posibles) antes de
+  materializar en DuckDB; el fetch usa el BigQuery Storage Read API (Arrow) cuando
+  esta disponible, mucho mas rapido que REST. Para acotar coste/memoria, pasa
+  `sample_frac` in (0,1) (muestreo opt-in) o `max_rows` (tope duro). Si por limite
+  de recursos no cabe el total, dilo explicito con el maximo que si se cargo.
+- Seudonimiza PII con `pseudonymize_cols` para cumplir LOPDGDD/RGPD ANTES de
+  escribir a disco: nombres, DNI/NIE, email, telefono, direccion, IDs de cliente,
+  etc. Se hashean preservando nulos y cardinalidad. Sin seudonimizar, la muestra
+  materializada (DuckDB + reports) contiene datos personales reales [POL-MMNSEG-001-1.0].
+- El DuckDB temporal se borra al terminar salvo `keep_duckdb=True` (pasalo para
+  seguir explorando la muestra desde un notebook Jupyter). Si pasas `duckdb_path`
+  explicito, la ruta se respeta y solo se conserva con `keep_duckdb=True`.
+- Escribe reports a `report_dir` (default 'reports', artefacto local gitignored):
+  Markdown + JSON sidecar + PDF A5 movil + PPTX 16:9 del informe AutomaticEDA.
+- `run_llm=True` gasta tokens (haiku) pero solo envia el perfil agregado, nunca
+  filas crudas ni datos personales.
+
+## Capability growth log
+
+- v1.1.0 (2026-07-01) — FULL pasa a ser el DEFAULT del pipeline: se sustituye `max_rows=300000, sample=True` por `sample_frac=None` (None = perfila todas las filas) + `max_rows=0` (tope duro opcional). El muestreo es opt-in explicito (`sample_frac`). Alinea con la preferencia estandar del usuario: los EDA se corren sobre el total salvo que se pida lo contrario. Hereda el fetch acelerado (Arrow/bqstorage) de `load_bq_table_to_duckdb` v1.1.0.
@@ -0,0 +1,138 @@
+"""profile_bq_table — EDA one-shot de una tabla/vista BigQuery con el grupo `eda`.
+
+Pipeline impuro: materializa una tabla o vista de BigQuery (por defecto COMPLETA —
+todas las filas — o una muestra si se pasa `sample_frac`, con seudonimizacion PII
+opcional, LOPDGDD/RGPD) a un DuckDB local con `load_bq_table_to_duckdb`, y la
+perfila end-to-end con `profile_table` del grupo de capacidad `eda`, emitiendo el
+informe AutomaticEDA (PDF A5 movil + PPTX 16:9), Markdown y JSON sidecar. Es el
+adaptador BigQuery que faltaba en el grupo `eda`, resuelto por composicion
+(BigQuery -> DuckDB local -> profile_table) sin duplicar la logica de perfilado ni
+de render.
+
+Modo por defecto = FULL: `sample_frac=None` perfila TODAS las filas del origen
+(preferencia estandar del usuario: los EDA se corren sobre el total salvo que se
+pida lo contrario). El muestreo es opt-in explicito: `sample_frac=0.05` perfila
+~5 % de las filas; `max_rows` es un tope duro opcional (0 = sin tope).
+
+Funciones del registry compuestas (NO se reimplementa su logica):
+  - load_bq_table_to_duckdb : trae la tabla/vista BigQuery a un DuckDB local
+                              (completa por defecto, o muestra si sample_frac).
+  - profile_table           : orquestador one-shot del grupo `eda` que perfila la
+                              DuckDB materializada y emite el informe AutomaticEDA.
+
+Estilo dict-no-throw del grupo `eda`: nunca lanza; devuelve {status:'error', ...}.
+"""
+
+import os
+import tempfile
+
+from datascience import load_bq_table_to_duckdb
+from pipelines.profile_table import profile_table
+
+
+def profile_bq_table(
+    table_fqn: str,
+    sample_frac: float = None,
+    max_rows: int = 0,
+    pseudonymize_cols: list = None,
+    run_models: bool = True,
+    run_series: bool = False,
+    run_llm: bool = False,
+    project_id: str = "",
+    report_dir: str = "reports",
+    duckdb_path: str = "",
+    keep_duckdb: bool = False,
+) -> dict:
+    """EDA one-shot de una tabla/vista BigQuery.
+
+    Por defecto perfila TODAS las filas del origen (`sample_frac=None`, modo FULL).
+    Materializa el origen (con seudonimizacion PII opcional) a un DuckDB local y lo
+    perfila con `profile_table` del grupo `eda`, emitiendo el informe AutomaticEDA
+    (PDF A5 movil + PPTX 16:9) + Markdown + JSON sidecar.
+
+    Args:
+        table_fqn: FQN de la tabla/vista BigQuery ("project.dataset.table").
+        sample_frac: None (default) = FULL, perfila todas las filas. Un float en
+            (0,1) activa el muestreo opt-in (`WHERE rand() < frac`, ~frac del total).
+        max_rows: Tope duro opcional de filas (LIMIT). 0 (default) = sin tope.
+        pseudonymize_cols: Columnas PII a seudonimizar (hash) antes de materializar.
+        run_models: Modelos baratos (PCA/KMeans/IsolationForest/normalidad).
+        run_series: Analisis de serie temporal por columna numerica.
+        run_llm: 1 llamada LLM sobre el perfil agregado (nunca filas crudas).
+        project_id: Proyecto GCP de facturacion. Vacio = primer segmento del FQN.
+        report_dir: Directorio de salida de los reports.
+        duckdb_path: Ruta DuckDB a usar. Vacio = temporal autogestionado.
+        keep_duckdb: Si True conserva el DuckDB materializado.
+
+    Returns:
+        dict dict-no-throw con el resultado del pipeline (ver output del .md).
+    """
+    tmp_created = False
+    try:
+        # DuckDB temporal si no se pasa ruta.
+        if not duckdb_path:
+            fd, duckdb_path = tempfile.mkstemp(prefix="eda_bq_", suffix=".duckdb")
+            os.close(fd)
+            os.remove(duckdb_path)  # que lo cree DuckDB limpio
+            tmp_created = True
+
+        load = load_bq_table_to_duckdb(
+            table_fqn,
+            duckdb_path,
+            sample_frac=sample_frac,
+            max_rows=max_rows,
+            project_id=project_id,
+            pseudonymize_cols=pseudonymize_cols,
+        )
+        if load.get("status") != "ok":
+            return {
+                "status": "error",
+                "error": load.get("error", "load fallo"),
+                "stage": "load",
+            }
+
+        prof = profile_table(
+            duckdb_path,
+            load["table"],
+            backend="duckdb",
+            run_models=run_models,
+            run_series=run_series,
+            run_llm=run_llm,
+            emit_automatic=True,   # PDF A5 movil + PPTX 16:9
+            emit_pdf=False,
+            write_report=True,     # Markdown + JSON sidecar
+            report_dir=report_dir,
+        )
+        if prof.get("status") != "ok":
+            return {
+                "status": "error",
+                "error": prof.get("error", "profile fallo"),
+                "stage": "profile",
+                "load": load,
+            }
+
+        return {
+            "status": "ok",
+            "table_fqn": table_fqn,
+            "load": {
+                k: load[k]
+                for k in ("n_rows_source", "n_rows_fetched", "sampled", "sample_frac", "pseudonymized", "table")
+                if k in load
+            },
+            "duckdb_path": duckdb_path if keep_duckdb else None,
+            "report_md_path": prof.get("report_md_path"),
+            "report_json_path": prof.get("report_json_path"),
+            "aeda_pdf_path": prof.get("aeda_pdf_path"),
+            "aeda_pptx_path": prof.get("aeda_pptx_path"),
+            "aeda_manifest_path": prof.get("aeda_manifest_path"),
+            "profile": prof.get("profile"),
+        }
+    except Exception as e:  # noqa: BLE001
+        return {"status": "error", "error": str(e)}
+    finally:
+        # Limpia el DuckDB temporal salvo que se pida conservarlo.
+        if tmp_created and not keep_duckdb and duckdb_path and os.path.exists(duckdb_path):
+            try:
+                os.remove(duckdb_path)
+            except OSError:
+                pass
@@ -4,8 +4,8 @@ kind: pipeline
 lang: py
 domain: pipelines
 purity: impure
-version: "1.1.0"
-signature: "def render_automatic_eda(db_path: str, table: str, backend: str = \"duckdb\", sample: int = None, run_models: bool = None, run_series: bool = None, run_llm: bool = None, profile_level: str = \"standard\", out_dir: str = \"reports\", basename: str = None, ctx_extra: dict = None) -> dict"
+version: "1.2.0"
+signature: "def render_automatic_eda(db_path: str, table: str, backend: str = \"duckdb\", sample: int = None, run_models: bool = None, run_series: bool = None, run_llm: bool = None, profile_level: str = \"standard\", out_dir: str = \"reports\", basename: str = None, ctx_extra: dict = None, emit_md: bool = True, only_chapters: list = None) -> dict"
 description: "Informe AutomaticEDA COMPLETO one-shot de una tabla DuckDB/PostgreSQL: perfila con profile_table, construye el ctx con los datos crudos (build_eda_render_ctx: raw_numeric para modelos/geo, timeseries_raw para series, geo_points para el mapa, db_path/table para la agregacion push-down) y emite PDF (A5 movil) Y PPTX (16:9) del mismo documento por capitulos, con los 11 capitulos POBLADOS de verdad (clusters pintados sobre el PCA, evolucion temporal, mapa geografico y tablas de agregacion), no degradados. El parametro profile_level es un preset de consumo CPU/LLM (lite/standard/full) que mapea a los flags run_models/run_series/run_llm/sample; un flag explicito siempre prima sobre el preset. lite=bajo consumo (sin LLM, sin serie, modelos solo PCA+normalidad sin KMeans/IsolationForest, sample reducido); standard=comportamiento historico; full=standard+narrativa LLM. Devuelve las rutas de PDF/PPTX y el manifiesto de versiones por capitulo."
 tags: [eda, duckdb, postgres, profiling, pipeline, dataops, report, pdf, pptx]
 uses_functions:
@@ -46,6 +46,10 @@ params:
    desc: "Nombre base de los archivos sin extension. Default 'aeda_<table>_<timestamp>'."
  - name: ctx_extra
    desc: "Dict opcional con claves de presentacion/contexto extra que se mezclan en el ctx (dataset_name, description, source_origin, ...); no pisan las claves de datos calculadas por build_eda_render_ctx."
+  - name: emit_md
+    desc: "Ademas del PDF y el PPTX, emite un Markdown autocontenido del mismo documento por capitulos (texto + tablas markdown, sin binarios) para pegar a un LLM. Default True. La ruta sale en aeda_md_path."
+  - name: only_chapters
+    desc: "Lista opcional de ids de capitulo a renderizar (subconjunto de CHAPTER_ORDER) para iterar/testear un capitulo suelto sin generar el documento entero. Default None => documento COMPLETO (retrocompatible). Cuando se pasa una lista: (1) se VALIDA contra CHAPTER_ORDER, un id desconocido o lista vacia devuelve error claro listando los validos; (2) se RESUELVEN las dependencias de computo de esos capitulos (automatic_eda.chapter_deps) activando los flags que necesiten (run_models/run_series/run_llm) aunque el caller no los pidiera y construyendo SOLO las piezas de ctx que leen, de modo que el capitulo suelto SIEMPRE llega poblado (p.ej. ['outliers'] activa run_models y conserva raw_numeric -> Isolation Forest completo) sin malgastar CPU/LLM en lo que ningun capitulo pedido usa; (3) el documento y su manifest contienen SOLO esos capitulos MAS portada (primera) y glosario (ultima, cuando hay terminos clicables). Un flag explicito del caller prima sobre la resolucion de dependencias."
 output: "dict {status:'ok', pdf_path:str, pptx_path:str, manifest_path:str|None, n_pages:int, n_slides:int, pdf_note:str, pptx_note:str, profile:<TableProfile>} o {status:'error', error:str} (dict-no-throw)."
 ---

@@ -69,6 +73,21 @@ r = render_automatic_eda("/tmp/ventas.duckdb", "ventas", profile_level="full")
 # Precedencia: el flag explicito SIEMPRE prima sobre el preset. lite pero con LLM:
 r = render_automatic_eda("/tmp/ventas.duckdb", "ventas",
                         profile_level="lite", run_llm=True)  # el LLM SI se ejecuta
+
+# Capitulo SUELTO: itera/testea un capitulo sin generar el documento entero. La
+# resolucion de dependencias activa el computo que el capitulo necesita aunque no
+# se pase explicito. Pedir solo 'outliers' activa run_models y conserva
+# raw_numeric -> el bloque Isolation Forest sale COMPLETO. Documento = portada +
+# outliers + glosario.
+r = render_automatic_eda("/tmp/ventas.duckdb", "ventas", only_chapters=["outliers"])
+
+# Varios capitulos sueltos a la vez (se unen sus dependencias):
+r = render_automatic_eda("/tmp/ventas.duckdb", "ventas",
+                         only_chapters=["correlacion", "missingness"])
+
+# id desconocido -> error claro listando los validos (dict-no-throw, no lanza):
+r = render_automatic_eda("/tmp/ventas.duckdb", "ventas", only_chapters=["nope"])
+# {'status': 'error', 'error': 'only_chapters con ids desconocidos: nope. Capitulos validos: portada, overview, ...'}
 ```

 ## Cuando usarla
@@ -86,6 +105,16 @@ Para un EDA **barato/rapido** (CI, vistazo previo, maquina sin GPU o sin red) us
 temporal y el LLM. Para el **maximo** con interpretacion narrativa por capitulo,
 `profile_level="full"`. El default `"standard"` mantiene el comportamiento previo.

+Cuando estes **iterando o testeando UN capitulo concreto** (afinar el render de
+outliers, comprobar el mapa geoespacial, depurar la agregacion) usa
+`only_chapters=[...]`: genera el documento con solo esos capitulos (+ portada y
+glosario), pero **resuelve sus dependencias de computo** para que el capitulo
+suelto nunca salga degradado — pedir `['outliers']` activa run_models y conserva
+`raw_numeric` aunque no los pases, y a la vez no malgasta CPU/LLM en lo que ningun
+capitulo pedido necesita (pedir `['geospatial']` no corre modelos). Es mucho mas
+rapido que renderizar el informe entero en cada iteracion. El mapa central de
+dependencias vive en `automatic_eda/chapter_deps.py` (fuente de verdad).
+
 ## Gotchas

 - Impura: ESCRIBE el PDF, el PPTX y `automatic_eda_manifest.json` en `out_dir`.
@@ -111,9 +140,29 @@ temporal y el LLM. Para el **maximo** con interpretacion narrativa por capitulo,
 - Los datos crudos del ctx se muestrean con `sample` (LIMIT), no se trae la tabla
  entera a RAM; con tablas enormes sube `sample` si quieres mas representatividad
  (coste: mas memoria).
+- **`only_chapters` y el glosario**: el glosario (ultimo capitulo) solo aparece si
+  algun capitulo del cuerpo registro terminos clicables. Un capitulo suelto que no
+  registra terminos (p.ej. `timeseries`, `geospatial`) sale como portada + ese
+  capitulo, sin glosario, porque no hay nada que enlazar — es correcto, no un fallo.
+- **`only_chapters` con `profile_level="lite"`**: en capitulos sueltos el preset
+  solo gobierna `sample`; los modelos NO usan el camino "lite" (que podaria
+  `ctx['raw_numeric']` y dejaria a outliers sin su multivariante en vivo). Quien
+  manda en capitulos sueltos es la resolucion de dependencias, no el preset de
+  coste de modelos.

 ## Capability growth log

+- v1.2.0 (2026-06-30) — anade el parametro `only_chapters`: renderiza un
+  SUBCONJUNTO de capitulos (para iterar/testear uno suelto) resolviendo sus
+  dependencias de computo via `automatic_eda/chapter_deps.py` (mapa central
+  CHAPTER_DEPS): activa los flags de coste que el capitulo necesita (run_models/
+  run_series/run_llm) aunque el caller no los pase y construye solo las piezas de
+  ctx que lee, de modo que el capitulo suelto SIEMPRE llega poblado (golden:
+  ['outliers'] -> Isolation Forest completo) sin malgastar en lo que no usa. La
+  seleccion viaja a build_document por la clave reservada `ctx['_only_chapters']`
+  (los renderers no cambian). Valida ids (error claro dict-no-throw). Cambio
+  aditivo y retro-compatible: `only_chapters=None` produce el documento completo
+  identico a v1.1.0.
 - v1.1.0 (2026-06-30) — anade el parametro `profile_level` (lite/standard/full),
  preset de consumo CPU/LLM que mapea a los flags run_models/run_series/run_llm/
  sample. lite limita los modelos a PCA+normalidad (cableado a run_eda_models con
@@ -99,6 +99,7 @@ def render_automatic_eda(
    basename: str = None,
    ctx_extra: dict = None,
    emit_md: bool = True,
+    only_chapters: list = None,
 ) -> dict:
    """Perfila una tabla y emite el informe AutomaticEDA completo (PDF + PPTX).

@@ -150,6 +151,29 @@ def render_automatic_eda(
            MISMO documento por capítulos (texto plano + tablas markdown, sin
            binarios), pensado para pegar a un LLM. Default True. La ruta sale en
            la clave de retorno ``aeda_md_path``. No altera las demás salidas.
+        only_chapters: lista opcional de ids de capítulo a renderizar (un
+            SUBCONJUNTO de CHAPTER_ORDER) para iterar/testear un capítulo concreto
+            sin generar el documento entero. Default None => documento COMPLETO,
+            idéntico al de hoy (retrocompatible). Cuando se pasa una lista:
+
+            - Se VALIDA contra CHAPTER_ORDER; un id desconocido devuelve un error
+              claro listando los válidos (dict-no-throw, no lanza). Lista vacía
+              ``[]`` también devuelve error (pasa al menos un capítulo o None).
+            - Se RESUELVEN las dependencias de cómputo de esos capítulos
+              (``automatic_eda.chapter_deps``): se activan los flags de coste que
+              necesiten (run_models / run_series / run_llm) AUNQUE el caller no
+              los pidiera, y se construyen SOLO las piezas de ``ctx`` que esos
+              capítulos leen. Así un capítulo suelto SIEMPRE llega poblado —
+              p.ej. ``only_chapters=['outliers']`` activa run_models y conserva
+              ``ctx['raw_numeric']`` para que el bloque IsolationForest salga
+              completo— y a la vez no se malgasta CPU/LLM en lo que ningún
+              capítulo pedido usa (pedir solo ``geospatial`` no corre modelos).
+            - El documento (PDF/PPTX/MD) y su manifest contienen SOLO esos
+              capítulos, MÁS la portada (primera) y el glosario (última), que se
+              incluyen siempre para que el documento sea válido y los términos
+              clicables tengan destino.
+            - Un flag explícito del caller (run_models/run_series/run_llm != None)
+              SIEMPRE prima sobre lo que resuelvan las dependencias.

    Returns:
        dict (nunca lanza). En éxito::
@@ -169,10 +193,55 @@ def render_automatic_eda(
        # "standard" (comportamiento histórico), sin lanzar.
        preset = _PROFILE_PRESETS.get(profile_level, _PROFILE_PRESETS["standard"])
        sample = preset["sample"] if sample is None else sample
+        model_opts = preset["model_opts"]
+
+        # 0.bis) Modo "capítulos sueltos": valida la selección y RESUELVE sus
+        # dependencias de cómputo. Es lo que garantiza que un capítulo pedido
+        # llegue completo (activa lo que necesita) sin malgastar en lo que no.
+        # Cuando only_chapters es None se conserva el camino histórico (preset).
+        if only_chapters is not None:
+            from datascience.automatic_eda import CHAPTER_ORDER
+            from datascience.automatic_eda.chapter_deps import (
+                needs_render_ctx,
+                resolve_ctx_data_keys,
+                resolve_requirements,
+                validate_chapter_ids,
+            )
+
+            if not isinstance(only_chapters, (list, tuple)):
+                return {"status": "error",
+                        "error": "only_chapters debe ser una lista de ids de "
+                                 "capítulo o None (documento completo)."}
+            only_chapters = [c for c in only_chapters]
+            if not only_chapters:
+                return {"status": "error",
+                        "error": "only_chapters=[] está vacío. Pasa al menos un "
+                                 "capítulo, o None para el documento completo. "
+                                 "Capítulos válidos: " + ", ".join(CHAPTER_ORDER)}
+            checked = validate_chapter_ids(only_chapters, CHAPTER_ORDER)
+            if checked["unknown"]:
+                return {"status": "error",
+                        "error": "only_chapters con ids desconocidos: "
+                                 + ", ".join(checked["unknown"])
+                                 + ". Capítulos válidos: "
+                                 + ", ".join(CHAPTER_ORDER)}
+            only_chapters = checked["valid"]
+
+            # Las dependencias fijan el DEFAULT de cada flag de coste (eficiencia:
+            # lo que ningún capítulo pedido necesita queda en False); un flag
+            # explícito del caller (!= None) sigue primando.
+            dep_flags = resolve_requirements(only_chapters)["profile_flags"]
+            run_models = ("run_models" in dep_flags) if run_models is None else run_models
+            run_series = ("run_series" in dep_flags) if run_series is None else run_series
+            run_llm = ("run_llm" in dep_flags) if run_llm is None else run_llm
+            # En capítulos sueltos no se usa el camino "modelos baratos" (lite),
+            # que poda ctx['raw_numeric']: un capítulo como outliers lo necesita
+            # para su multivariante en vivo. El preset solo gobierna `sample`.
+            model_opts = None
+        else:
            run_models = preset["run_models"] if run_models is None else run_models
            run_series = preset["run_series"] if run_series is None else run_series
            run_llm = preset["run_llm"] if run_llm is None else run_llm
-        model_opts = preset["model_opts"]

        # En el camino "modelos baratos" (lite) profile_table NO corre los
        # modelos: los ejecuta este pipeline con run_eda_models y la granularidad
@@ -217,10 +286,25 @@ def render_automatic_eda(
        if ctx_extra:
            base_ctx.update(ctx_extra)

+        # En modo capítulos sueltos, si NINGÚN capítulo pedido necesita datos
+        # crudos del ctx, se salta build_eda_render_ctx por completo (ahorro real
+        # de I/O): solo se conservan presentación + db_path/table. Si sí los
+        # necesita, se construye el ctx y luego se PODAN las piezas de datos que
+        # ningún capítulo pedido usa (db_path/table nunca se podan).
+        if only_chapters is not None and not needs_render_ctx(only_chapters):
+            ctx = dict(base_ctx)
+            ctx["db_path"] = db_path
+            ctx["table"] = table
+        else:
            ctx = build_eda_render_ctx(
                db_path, table, prof, backend=backend, sample=sample,
                base_ctx=base_ctx,
            )
+            if only_chapters is not None and isinstance(ctx, dict):
+                keep = resolve_ctx_data_keys(only_chapters)
+                for k in ("head_rows", "raw_numeric", "timeseries_raw", "geo_points"):
+                    if k not in keep:
+                        ctx.pop(k, None)

        # 2.5) Camino lite — modelos baratos (PCA + normalidad, sin KMeans ni
        # IsolationForest). profile_table no corrió los modelos; aquí se corren
@@ -245,6 +329,13 @@ def render_automatic_eda(
                ctx.pop("raw_numeric", None)

        # 3) Render a ambos formatos desde el MISMO documento por capítulos.
+        # En modo capítulos sueltos, la selección viaja a build_document por una
+        # clave reservada del ctx (los renderers llaman build_document sin pasar
+        # `only`): build_document filtra el cuerpo a esos capítulos y siempre
+        # añade portada (primera) + glosario (última). build_document la consume
+        # y la quita, así que no llega a los capítulos.
+        if only_chapters is not None and isinstance(ctx, dict):
+            ctx["_only_chapters"] = list(only_chapters)
        os.makedirs(out_dir, exist_ok=True)
        ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
        base = basename or f"aeda_{table}_{ts}"
@@ -283,6 +374,7 @@ def render_automatic_eda(
            "pdf_note": rpdf.get("note"),
            "pptx_note": rpptx.get("note"),
            "md_note": rmd.get("note"),
+            "only_chapters": only_chapters,
            "profile": prof,
        }
    except Exception as e:  # noqa: BLE001 — dict-no-throw: degradar, nunca lanzar.
@@ -0,0 +1,235 @@
+"""Tests del modo `only_chapters` del pipeline render_automatic_eda.
+
+Cubre la tarea de "capítulos sueltos con resolución de dependencias":
+
+  - Golden (DuckDB real): pedir SOLO un capítulo genera un documento con solo
+    portada + ese capítulo + glosario, y el capítulo llega COMPLETO porque la
+    resolución de dependencias activó el cómputo que necesita aunque el caller
+    no lo pidiera (outliers → run_models + raw_numeric → IsolationForest poblado;
+    timeseries → run_series; correlacion → raw_numeric).
+  - Eficiencia: pedir un capítulo que NO necesita flags caros (geospatial) no los
+    activa, y un capítulo puramente agregado (num_distr) ni siquiera construye el
+    ctx de datos crudos.
+  - Edge: id desconocido / lista vacía / no-lista devuelven error claro sin
+    lanzar; only_chapters=None mantiene el comportamiento histórico.
+"""
+
+import json
+import os
+import random
+import sys
+from datetime import date, timedelta
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+import duckdb  # noqa: E402
+
+from pipelines.render_automatic_eda import render_automatic_eda  # noqa: E402
+
+
+def _make_db_models(path):
+    """DB con fecha + 3 numéricas continuas en 3 clusters gaussianos.
+
+    Garantiza material para outliers/modelos (>=2 numéricas → IsolationForest),
+    timeseries (columna DATE) y correlacion (numéricas). Mismo shape que el
+    fixture del test del pipeline base.
+    """
+    con = duckdb.connect(path)
+    con.execute("CREATE TABLE pts (d DATE, grp VARCHAR, x1 DOUBLE, x2 DOUBLE, x3 DOUBLE)")
+    random.seed(42)
+    centers = [(0.0, 0.0, 0.0), (10.0, 10.0, 10.0), (20.0, 5.0, 15.0)]
+    d0 = date(2024, 1, 1)
+    rows = []
+    for i in range(150):
+        cx, cy, cz = centers[i % 3]
+        rows.append((
+            d0 + timedelta(days=i), f"g{i % 3}",
+            round(cx + random.gauss(0, 1.0), 4),
+            round(cy + random.gauss(0, 1.0), 4),
+            round(cz + random.gauss(0, 1.0), 4),
+        ))
+    con.executemany("INSERT INTO pts VALUES (?,?,?,?,?)", rows)
+    con.close()
+
+
+def _manifest_chapters(result):
+    with open(result["manifest_path"], encoding="utf-8") as fh:
+        return set((json.load(fh).get("chapters") or {}).keys())
+
+
+# --------------------------------------------------------------------------- #
+# GOLDEN — outliers suelto: IsolationForest poblado por resolución de deps.
+# --------------------------------------------------------------------------- #
+def test_only_outliers_isolation_forest_populated_without_explicit_run_models(tmp_path):
+    """El corazón de la tarea: pedir SOLO 'outliers' sin run_models explícito
+    activa run_models por dependencias y conserva ctx['raw_numeric'], de modo que
+    el bloque multivariante (Isolation Forest) sale con datos, no degradado."""
+    db = str(tmp_path / "pts.duckdb")
+    _make_db_models(db)
+    out = str(tmp_path / "out")
+
+    # NB: no se pasa run_models — la resolución de dependencias debe activarlo.
+    r = render_automatic_eda(db, "pts", only_chapters=["outliers"],
+                             out_dir=out, basename="only_outliers")
+    assert r["status"] == "ok", r.get("error")
+    assert r["only_chapters"] == ["outliers"]
+
+    # Documento = portada + outliers + glosario, nada más.
+    assert _manifest_chapters(r) == {"portada", "outliers", "glosario"}
+
+    # El multivariante salió POBLADO (no la nota de degradación). Se comprueba en
+    # el Markdown (mismo documento por capítulos, texto plano fiable).
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Filas atípicas (multivariante)" in md
+    assert "Filas analizadas" in md, "el Isolation Forest no trae su tabla poblada"
+    assert "No se pudo analizar la anomalía multivariante" not in md, \
+        "el bloque multivariante salió degradado pese a resolver las deps"
+
+    # La resolución activó run_models → el perfil trae el bloque de modelos.
+    assert ((r["profile"] or {}).get("models") or {}).get("outliers") is not None
+
+
+# --------------------------------------------------------------------------- #
+# GOLDEN — timeseries suelto activa run_series.
+# --------------------------------------------------------------------------- #
+def test_only_timeseries_activates_run_series(tmp_path):
+    db = str(tmp_path / "pts.duckdb")
+    _make_db_models(db)
+    out = str(tmp_path / "out")
+
+    r = render_automatic_eda(db, "pts", only_chapters=["timeseries"],
+                             out_dir=out, basename="only_ts")
+    assert r["status"] == "ok", r.get("error")
+    assert "timeseries" in _manifest_chapters(r)
+    assert "modelos" not in _manifest_chapters(r)
+    # run_series resuelto por deps → el perfil trae el análisis de serie.
+    assert (r["profile"] or {}).get("series") is not None, \
+        "only_chapters=['timeseries'] debe activar run_series"
+
+
+# --------------------------------------------------------------------------- #
+# GOLDEN — correlacion suelto construye raw_numeric (sin activar modelos).
+# --------------------------------------------------------------------------- #
+def test_only_correlacion_builds_raw_numeric_without_models(tmp_path):
+    db = str(tmp_path / "pts.duckdb")
+    _make_db_models(db)
+    out = str(tmp_path / "out")
+
+    r = render_automatic_eda(db, "pts", only_chapters=["correlacion"],
+                             out_dir=out, basename="only_corr")
+    assert r["status"] == "ok", r.get("error")
+    assert _manifest_chapters(r) == {"portada", "correlacion", "glosario"}
+    # Eficiencia: correlacion no necesita los modelos → no se corrieron.
+    assert ((r["profile"] or {}).get("models") or {}).get("outliers") is None
+    assert (r["profile"] or {}).get("series") is None
+
+
+# --------------------------------------------------------------------------- #
+# Eficiencia y precedencia — vía stub (sin DuckDB).
+# --------------------------------------------------------------------------- #
+def _patch(monkeypatch, cap):
+    import pipelines.render_automatic_eda as mod
+
+    def fake_pt(db, t, **kw):
+        cap["run_models"] = kw.get("run_models")
+        cap["run_series"] = kw.get("run_series")
+        cap["run_llm"] = kw.get("run_llm")
+        return {"status": "ok", "profile": {"columns": []}}
+
+    def fake_ctx(db, t, prof, **kw):
+        cap["ctx_called"] = True
+        return {"db_path": db, "table": t}
+
+    cap["ctx_called"] = False
+    monkeypatch.setattr(mod, "profile_table", fake_pt)
+    monkeypatch.setattr(mod, "build_eda_render_ctx", fake_ctx)
+    monkeypatch.setattr(mod, "render_automatic_eda_pdf",
+                        lambda *a, **k: {"path": "x.pdf", "n_pages": 1,
+                                         "manifest_path": "m.json"})
+    monkeypatch.setattr(mod, "render_automatic_eda_pptx",
+                        lambda *a, **k: {"path": "x.pptx", "n_slides": 1})
+    monkeypatch.setattr(mod, "render_automatic_eda_markdown",
+                        lambda *a, **k: {"path": "x.md", "n_chars": 1})
+
+
+def test_only_geospatial_does_not_activate_cost_flags(monkeypatch):
+    """Eficiencia: pedir solo geospatial NO corre modelos/serie/LLM."""
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["geospatial"])
+    assert cap["run_models"] is False
+    assert cap["run_series"] is False
+    assert cap["run_llm"] is False
+
+
+def test_only_outliers_activates_run_models_via_deps(monkeypatch):
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["outliers"])
+    assert cap["run_models"] is True
+    assert cap["run_series"] is False
+
+
+def test_explicit_flag_overrides_dependency_resolution(monkeypatch):
+    """run_models=False explícito gana, aunque outliers lo pediría por deps."""
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["outliers"], run_models=False)
+    assert cap["run_models"] is False
+
+
+def test_purely_aggregated_chapter_skips_render_ctx(monkeypatch):
+    """num_distr solo lee el profile → build_eda_render_ctx no se llama."""
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["num_distr"])
+    assert cap["ctx_called"] is False, \
+        "num_distr no necesita datos crudos: el ctx no debe construirse"
+
+
+def test_chapter_that_needs_ctx_builds_it(monkeypatch):
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["outliers"])
+    assert cap["ctx_called"] is True
+
+
+# --------------------------------------------------------------------------- #
+# EDGE — errores claros sin lanzar.
+# --------------------------------------------------------------------------- #
+def test_unknown_chapter_id_returns_clear_error(tmp_path):
+    r = render_automatic_eda(str(tmp_path / "x.duckdb"), "t",
+                             only_chapters=["no_existe"])
+    assert r["status"] == "error"
+    assert "no_existe" in r["error"]
+    assert "Capítulos válidos" in r["error"]
+    # Algún id válido conocido aparece en la lista.
+    assert "outliers" in r["error"]
+
+
+def test_empty_only_list_returns_error(tmp_path):
+    r = render_automatic_eda(str(tmp_path / "x.duckdb"), "t", only_chapters=[])
+    assert r["status"] == "error"
+    assert "vac" in r["error"].lower()
+
+
+def test_only_chapters_not_a_list_returns_error(tmp_path):
+    r = render_automatic_eda(str(tmp_path / "x.duckdb"), "t",
+                             only_chapters="outliers")
+    assert r["status"] == "error"
+
+
+def test_only_none_keeps_full_document(tmp_path):
+    """Retro-compat: only_chapters=None genera el documento completo."""
+    db = str(tmp_path / "pts.duckdb")
+    _make_db_models(db)
+    out = str(tmp_path / "out")
+    r = render_automatic_eda(db, "pts", out_dir=out, basename="full")
+    assert r["status"] == "ok", r.get("error")
+    chapters = _manifest_chapters(r)
+    # Documento completo: muchos más capítulos que portada/glosario.
+    assert {"portada", "glosario", "overview", "correlacion"} <= chapters
+    assert len(chapters) > 4
@@ -0,0 +1,7 @@
+import google.auth
+from google.cloud import bigquery
+_creds, _ = google.auth.default(scopes=['https://www.googleapis.com/auth/bigquery'])
+_creds = _creds.with_quota_project(None)
+client = bigquery.Client(project='autingo-159109', location='europe-west1', credentials=_creds)
+def q(sql):
+    return client.query(sql).result().to_dataframe()
@@ -0,0 +1 @@
+{"c1": 12363, "c2": 12364, "c3": 12365}
@@ -0,0 +1,61 @@
+ensena,year,mes,diego,bq_neto,match
+Aurgi,2023,feb,80.52,,
+Aurgi,2023,mar,89.94,,
+Aurgi,2023,abr,76.87,,
+Aurgi,2023,may,87.95,,
+Aurgi,2023,jun,97.84,,
+Aurgi,2023,jul,138.24,,
+Aurgi,2023,ago,89.7,,
+Aurgi,2023,sep,61.53,,
+Aurgi,2023,oct,56.48,,
+Aurgi,2023,nov,73.2,,
+Aurgi,2023,dic,78.81,,
+Aurgi,2024,ene,75.34,75.35,100.0
+Aurgi,2024,feb,60.21,60.21,100.0
+Aurgi,2024,mar,70.62,71.26,99.1
+Aurgi,2024,abr,70.46,70.46,100.0
+Aurgi,2024,may,84.76,84.76,100.0
+Aurgi,2024,jun,108.7,108.7,100.0
+Aurgi,2024,jul,141.2,141.2,100.0
+Aurgi,2024,ago,100.18,100.18,100.0
+Aurgi,2024,sep,67.91,67.91,100.0
+Aurgi,2024,oct,81.31,81.31,100.0
+Aurgi,2024,nov,71.57,71.57,100.0
+Aurgi,2024,dic,74.33,74.33,100.0
+Aurgi,2025,ene,86.28,86.28,100.0
+Aurgi,2025,feb,53.05,53.05,100.0
+Aurgi,2025,mar,86.75,86.75,100.0
+Aurgi,2025,abr,83.89,83.89,100.0
+Aurgi,2025,may,84.24,84.24,100.0
+Aurgi,2025,jun,134.46,134.46,100.0
+Aurgi,2025,jul,101.17,174.32,58.0
+MT,2023,feb,30.19,,
+MT,2023,mar,41.89,,
+MT,2023,abr,36.16,,
+MT,2023,may,42.01,,
+MT,2023,jun,44.24,,
+MT,2023,jul,63.61,,
+MT,2023,ago,40.7,,
+MT,2023,sep,28.6,,
+MT,2023,oct,28.79,,
+MT,2023,nov,30.3,,
+MT,2023,dic,35.21,,
+MT,2024,ene,38.13,38.13,100.0
+MT,2024,feb,32.44,32.44,100.0
+MT,2024,mar,35.17,35.18,100.0
+MT,2024,abr,35.38,35.38,100.0
+MT,2024,may,37.58,37.58,100.0
+MT,2024,jun,44.54,44.54,100.0
+MT,2024,jul,58.92,58.92,100.0
+MT,2024,ago,40.97,40.98,100.0
+MT,2024,sep,35.03,35.03,100.0
+MT,2024,oct,38.86,38.86,100.0
+MT,2024,nov,36.48,36.48,100.0
+MT,2024,dic,40.52,40.52,100.0
+MT,2025,ene,39.16,39.16,100.0
+MT,2025,feb,28.16,28.16,100.0
+MT,2025,mar,42.26,42.26,100.0
+MT,2025,abr,44.04,44.04,100.0
+MT,2025,may,52.71,52.71,100.0
+MT,2025,jun,63.54,63.54,100.0
+MT,2025,jul,49.47,84.94,58.2
@@ -0,0 +1 @@
+https://reports.autingo.es/dashboard/1142
@@ -0,0 +1,60 @@
+STRUCT(DATE(2023,2,1) AS mes, 80.515 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,3,1) AS mes, 89.936 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,4,1) AS mes, 76.866 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,5,1) AS mes, 87.952 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,6,1) AS mes, 97.84 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,7,1) AS mes, 138.24 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,8,1) AS mes, 89.7 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,9,1) AS mes, 61.53 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,10,1) AS mes, 56.48 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,11,1) AS mes, 73.2 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,12,1) AS mes, 78.81 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,1,1) AS mes, 75.345 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,2,1) AS mes, 60.211 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,3,1) AS mes, 70.62 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,4,1) AS mes, 70.456 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,5,1) AS mes, 84.759 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,6,1) AS mes, 108.702 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,7,1) AS mes, 141.204 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,8,1) AS mes, 100.181 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,9,1) AS mes, 67.91 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,10,1) AS mes, 81.307 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,11,1) AS mes, 71.569 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2024,12,1) AS mes, 74.329 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2025,1,1) AS mes, 86.277 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2025,2,1) AS mes, 53.054 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2025,3,1) AS mes, 86.749 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2025,4,1) AS mes, 83.888 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2025,5,1) AS mes, 84.24 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2025,6,1) AS mes, 134.464 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2025,7,1) AS mes, 101.168 AS diego_neto_k, 1 AS company_id),
+    STRUCT(DATE(2023,2,1) AS mes, 30.189 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,3,1) AS mes, 41.89 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,4,1) AS mes, 36.16 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,5,1) AS mes, 42.011 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,6,1) AS mes, 44.24 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,7,1) AS mes, 63.61 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,8,1) AS mes, 40.7 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,9,1) AS mes, 28.6 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,10,1) AS mes, 28.79 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,11,1) AS mes, 30.3 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2023,12,1) AS mes, 35.207 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,1,1) AS mes, 38.132 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,2,1) AS mes, 32.438 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,3,1) AS mes, 35.174 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,4,1) AS mes, 35.382 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,5,1) AS mes, 37.584 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,6,1) AS mes, 44.54 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,7,1) AS mes, 58.921 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,8,1) AS mes, 40.974 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,9,1) AS mes, 35.029 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,10,1) AS mes, 38.861 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,11,1) AS mes, 36.48 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2024,12,1) AS mes, 40.522 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2025,1,1) AS mes, 39.161 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2025,2,1) AS mes, 28.16 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2025,3,1) AS mes, 42.263 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2025,4,1) AS mes, 44.04 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2025,5,1) AS mes, 52.71 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2025,6,1) AS mes, 63.544 AS diego_neto_k, 2 AS company_id),
+    STRUCT(DATE(2025,7,1) AS mes, 49.469 AS diego_neto_k, 2 AS company_id)
@@ -0,0 +1,8 @@
+import sys, json
+from google.cloud import bigquery
+import google.auth
+creds=google.auth.default(scopes=['https://www.googleapis.com/auth/bigquery'])[0].with_quota_project(None)
+c=bigquery.Client(project='autingo-159109', location='europe-west1', credentials=creds)
+sql=sys.stdin.read()
+for r in c.query(sql).result():
+    print(json.dumps(dict(r), default=str))
@@ -0,0 +1,152 @@
+import json, os, urllib.request, sys
+
+MB = os.environ["MB"]; KEY = os.environ["KEY"]
+
+def api(method, path, body=None, timeout=180):
+    data = json.dumps(body).encode() if body is not None else None
+    req = urllib.request.Request(MB + path, data=data, method=method,
+        headers={"X-API-KEY": KEY, "Content-Type": "application/json"})
+    try:
+        return json.load(urllib.request.urlopen(req, timeout=timeout))
+    except urllib.error.HTTPError as e:
+        print(f"HTTP {e.code} on {method} {path}:", e.read().decode()[:1200]); raise
+
+# Bridge documento -> service_request (canal + charged), tal cual 1094 card 11751.
+BASE = r"""
+WITH vf AS (
+  SELECT document_id, LOGICAL_OR(is_pw) is_pw FROM (
+    SELECT CAST(document_id AS STRING) document_id, ANY_VALUE(is_precaweb) is_pw
+      FROM `autingo-159109.anjana_bi_datamart.VENTAS_aurgi` GROUP BY 1
+    UNION ALL
+    SELECT CAST(document_id AS STRING), ANY_VALUE(is_precaweb)
+      FROM `autingo-159109.anjana_bi_datamart.VENTAS_Motortown` GROUP BY 1
+  ) GROUP BY 1
+),
+lineas AS (
+  SELECT
+    CAST(s.numeroDocumento AS STRING) AS numdoc,
+    CAST(s.idCentro AS STRING)        AS idcentro,
+    DATE(s.Fecha)                     AS fecha,
+    s.Base_imponible_linea            AS bil
+  FROM {{#4494}} s
+  WHERE DATE(s.Fecha) >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY)
+    [[AND DATE(s.Fecha) >= {{fecha_desde}}]]
+    [[AND DATE(s.Fecha) <= {{fecha_hasta}}]]
+),
+web AS (
+  SELECT l.numdoc, l.fecha, l.bil, oc.name AS centro, oc.Companies__name AS ambito
+  FROM lineas l
+  LEFT JOIN vf ON l.numdoc = vf.document_id
+  LEFT JOIN `autingo-159109.rag_datasets.Objeto_Centros` oc
+    ON l.idcentro = CAST(oc.nav_id AS STRING)
+  WHERE (COALESCE(vf.is_pw, FALSE) OR oc.name IN ('Aurgi Web','MT Web'))
+    AND (oc.Companies__name IS NULL OR oc.Companies__name NOT IN ('Aurgi Glass','MotorTown Glass'))
+    [[AND oc.name IN ({{centro}})]]
+    [[AND oc.Companies__name IN ({{ensena}})]]
+),
+sr_link AS (
+  SELECT CAST(inv.nav_id AS STRING) numdoc, CAST(j.service_request_id AS STRING) sr_id
+  FROM `autingo-159109.psql_dcpublic.tpv_orders_invoice` inv
+  JOIN `autingo-159109.psql_dcpublic.tpv_precawebs_servicerequestjob` j ON j.order_id = inv.order_id
+  WHERE inv.nav_id IS NOT NULL
+  UNION DISTINCT
+  SELECT CAST(invoice_number AS STRING), CAST(service_request_id AS STRING)
+  FROM `autingo-159109.psql_dcpublic.logistic_orders`
+  WHERE invoice_number IS NOT NULL AND invoice_number != ''
+),
+sr_link1 AS (SELECT numdoc, MIN(sr_id) sr_id FROM sr_link GROUP BY 1),
+sr AS (
+  SELECT CAST(id AS STRING) sr_id, channel_id, charged
+  FROM `autingo-159109.psql_dcpublic.service_requests`
+),
+doc AS (
+  SELECT
+    w.numdoc,
+    ANY_VALUE(w.fecha)      AS fecha,
+    SUM(w.bil)              AS venta,
+    ANY_VALUE(sl.sr_id)     AS sr_id,
+    ANY_VALUE(sr.channel_id) AS channel_id,
+    ANY_VALUE(sr.charged)   AS charged
+  FROM web w
+  LEFT JOIN sr_link1 sl USING (numdoc)
+  LEFT JOIN sr ON sr.sr_id = sl.sr_id
+  GROUP BY w.numdoc
+),
+fin AS (
+  SELECT
+    numdoc, fecha, venta,
+    CASE WHEN sr_id IS NULL THEN 'Sin solicitud'
+         WHEN channel_id = 1 THEN 'aurgi.com'
+         WHEN channel_id = 2 THEN 'motortown.es'
+         WHEN channel_id = 3 THEN 'Autingo'
+         WHEN channel_id IN (11,13,14,15,6,8) THEN 'Marketplaces'
+         WHEN channel_id = 10 THEN 'Talleres Digitales'
+         ELSE 'Otros' END AS canal,
+    CASE WHEN sr_id IS NULL THEN 'Sin solicitud'
+         WHEN charged THEN 'Pago web'
+         ELSE 'Pago tienda' END AS forma_pago
+  FROM doc
+)
+"""
+
+CARDS = {
+  "total": {
+    "name": "Venta web total (facturacion NAV / modelo 4494)",
+    "sql": BASE + "SELECT ROUND(SUM(venta),0) AS venta_web_eur, COUNT(DISTINCT numdoc) AS documentos FROM fin",
+    "display": "scalar",
+  },
+  "canal": {
+    "name": "Venta web por canal",
+    "sql": BASE + "SELECT canal, ROUND(SUM(venta),0) AS venta_eur, COUNT(DISTINCT numdoc) AS documentos FROM fin GROUP BY canal ORDER BY venta_eur DESC",
+    "display": "bar",
+  },
+  "pago": {
+    "name": "Venta web por forma de pago",
+    "sql": BASE + "SELECT forma_pago, ROUND(SUM(venta),0) AS venta_eur, COUNT(DISTINCT numdoc) AS documentos FROM fin GROUP BY forma_pago ORDER BY venta_eur DESC",
+    "display": "row",
+  },
+  "matriz": {
+    "name": "Venta web: matriz canal x forma de pago",
+    "sql": BASE + "SELECT canal, forma_pago, ROUND(SUM(venta),0) AS venta_eur, COUNT(DISTINCT numdoc) AS documentos FROM fin GROUP BY canal, forma_pago ORDER BY venta_eur DESC",
+    "display": "table",
+  },
+  "evolutivo": {
+    "name": "Venta web mensual por canal",
+    "sql": BASE + "SELECT DATE_TRUNC(fecha, MONTH) AS mes, canal, ROUND(SUM(venta),0) AS venta_eur FROM fin GROUP BY mes, canal ORDER BY mes, venta_eur DESC",
+    "display": "bar",
+  },
+}
+
+TAGS = {
+  "#4494": {"type":"card","name":"#4494","id":"card__4494","display-name":"#4494","card-id":4494},
+  "fecha_desde": {"type":"date","name":"fecha_desde","id":"tag-fecha-desde","display-name":"Fecha desde"},
+  "fecha_hasta": {"type":"date","name":"fecha_hasta","id":"tag-fecha-hasta","display-name":"Fecha hasta"},
+  "centro": {"type":"text","name":"centro","id":"tag-centro","display-name":"Centro"},
+  "ensena": {"type":"text","name":"ensena","id":"tag-ensena","display-name":"Ensena"},
+}
+
+def dq(sql):
+    return {"type":"native","database":6,"native":{"query":sql,"template-tags":TAGS}}
+
+def test_query(sql, params=None):
+    body = dq(sql)
+    body["parameters"] = params or []
+    r = api("POST", "/api/dataset", body)
+    if r.get("error"):
+        print("QUERY ERROR:", r.get("error")); return None
+    cols = [c["name"] for c in r["data"]["cols"]]
+    rows = r["data"]["rows"]
+    return cols, rows
+
+if __name__ == "__main__":
+    which = sys.argv[1] if len(sys.argv) > 1 else "all"
+    # param YTD 2026 para verificar reconciliacion
+    p_ytd = [{"type":"date/single","value":"2026-01-01","target":["variable",["template-tag","fecha_desde"]]}]
+    for k, c in CARDS.items():
+        if which != "all" and which != k: continue
+        print(f"\n===== TEST {k}: {c['name']} =====")
+        res = test_query(c["sql"], p_ytd)
+        if res:
+            cols, rows = res
+            print("cols:", cols)
+            for row in rows[:15]: print(" ", row)
@@ -0,0 +1 @@
+{"total": 12367, "canal": 12368, "pago": 12369, "matriz": 12370, "evolutivo": 12371}
@@ -0,0 +1,42 @@
+import json, sys
+sys.path.insert(0, "scratchpad/exf")
+from build import api, BASE, CARDS, TAGS, dq
+
+COLLECTION = 583  # "Claude" (junto a 1094)
+
+CUR = {"number_style":"currency","currency":"EUR","currency_style":"symbol","decimals":0}
+
+def viz(kind):
+    if kind == "total":
+        return {"column_settings":{'["name","venta_web_eur"]':CUR},
+                "scalar.field":"venta_web_eur"}
+    if kind == "canal":
+        return {"graph.dimensions":["canal"],"graph.metrics":["venta_eur"],
+                "graph.x_axis.title_text":"Canal","graph.y_axis.title_text":"Venta web (EUR)",
+                "column_settings":{'["name","venta_eur"]':CUR},"graph.show_values":True}
+    if kind == "pago":
+        return {"graph.dimensions":["forma_pago"],"graph.metrics":["venta_eur"],
+                "column_settings":{'["name","venta_eur"]':CUR},"graph.show_values":True}
+    if kind == "matriz":
+        return {"column_settings":{'["name","venta_eur"]':CUR},
+                "table.columns":[
+                  {"name":"canal","enabled":True},{"name":"forma_pago","enabled":True},
+                  {"name":"venta_eur","enabled":True},{"name":"documentos","enabled":True}]}
+    if kind == "evolutivo":
+        return {"graph.dimensions":["mes","canal"],"graph.metrics":["venta_eur"],
+                "stackable.stack_type":"stacked","column_settings":{'["name","venta_eur"]':CUR},
+                "graph.x_axis.title_text":"Mes","graph.y_axis.title_text":"Venta web (EUR)"}
+    return {}
+
+created = {}
+for k, c in CARDS.items():
+    body = {"name": c["name"], "display": c["display"],
+            "dataset_query": dq(c["sql"]),
+            "visualization_settings": viz(k),
+            "collection_id": COLLECTION}
+    r = api("POST", "/api/card", body)
+    created[k] = r["id"]
+    print(f"card {k}: id {r['id']}  {c['name']}")
+
+json.dump(created, open("scratchpad/exf/cards.json","w"))
+print("CARDS:", created)
@@ -0,0 +1 @@
+{"dashboard_id": 1143}
@@ -0,0 +1,54 @@
+import json, sys
+sys.path.insert(0, "scratchpad/exf")
+from build import api
+
+C = json.load(open("scratchpad/exf/cards.json"))
+COLLECTION = 583
+
+# 1) crear dashboard vacio
+dash = api("POST", "/api/dashboard", {
+    "name": "Venta Web por Canal y Forma de Pago (facturacion NAV / modelo 4494)",
+    "collection_id": COLLECTION,
+    "description": "Solo venta web (origen precaweb) tomada del modelo 4494 (SUM Base_imponible_linea, facturacion NAV neta), desglosada por canal (channel_id) y forma de pago (pago web vs pago tienda), segun las convenciones del dashboard 1094. Glass excluido. Default: YTD 2026.",
+})
+DID = dash["id"]
+print("dashboard id:", DID)
+
+# 2) parametros del dashboard
+PARAMS = [
+  {"id":"p_desde","name":"Fecha desde","slug":"fecha_desde","type":"date/single","default":"2026-01-01"},
+  {"id":"p_hasta","name":"Fecha hasta","slug":"fecha_hasta","type":"date/single"},
+  {"id":"p_centro","name":"Centro","slug":"centro","type":"string/=","sectionId":"string"},
+  {"id":"p_ensena","name":"Ensena","slug":"ensena","type":"string/=","sectionId":"string"},
+]
+
+def mappings(cid):
+    return [
+      {"parameter_id":"p_desde","card_id":cid,"target":["variable",["template-tag","fecha_desde"]]},
+      {"parameter_id":"p_hasta","card_id":cid,"target":["variable",["template-tag","fecha_hasta"]]},
+      {"parameter_id":"p_centro","card_id":cid,"target":["variable",["template-tag","centro"]]},
+      {"parameter_id":"p_ensena","card_id":cid,"target":["variable",["template-tag","ensena"]]},
+    ]
+
+# 3) layout (grid 24 col)
+LAYOUT = {
+  "total":     (0, 0, 6, 4),
+  "pago":      (0, 6, 18, 4),
+  "canal":     (4, 0, 12, 7),
+  "matriz":    (4, 12, 12, 7),
+  "evolutivo": (11, 0, 24, 7),
+}
+dashcards = []
+neg = -1
+for k,(row,col,sx,sy) in LAYOUT.items():
+    cid = C[k]
+    dashcards.append({
+      "id": neg, "card_id": cid, "row": row, "col": col, "size_x": sx, "size_y": sy,
+      "series": [], "parameter_mappings": mappings(cid), "visualization_settings": {}
+    })
+    neg -= 1
+
+r = api("PUT", f"/api/dashboard/{DID}", {"dashcards": dashcards, "parameters": PARAMS})
+print("dashcards saved:", len(r.get("dashcards",[])))
+print("URL: https://reports.autingo.es/dashboard/%d" % DID)
+json.dump({"dashboard_id":DID}, open("scratchpad/exf/dash.json","w"))
@@ -0,0 +1,313 @@
+"""Genera la carpeta de documentacion de linaje en el Escritorio de Windows.
+
+A partir del grafo trazado (scratchpad/lineage_graph.json) escribe:
+  00_INDICE.txt                         resumen + mapa de capas + tabla de todos los objetos
+  01_marts/<vista>.txt                  una por vista de customer_marts: que es + arbol de linaje + SQL
+  02_intermedio_clientes_intel/*.txt    tablas base del pipeline de inteligencia de clientes
+  03_producto/*.txt                     cadena de catalogo de producto (vistas con SQL + bases)
+  04_fuentes/*.txt                      tablas fuente (replica Postgres, Navision, imagenes, tasas)
+
+Todos los .txt se escriben con CRLF para abrirse limpios en Bloc de notas de Windows.
+"""
+import json
+import os
+import textwrap
+
+DEST = "/mnt/c/Users/egutierrez/Desktop/linaje_customer_marts"
+DATA = json.load(open("scratchpad/lineage_graph.json"))
+G = DATA["graph"]
+PROJECT = DATA["project"]
+
+# ---------------------------------------------------------------------------
+# Descripciones ("que es") por objeto. La SQL/DDL incluida en cada archivo es la
+# fuente de verdad; estas lineas son un resumen para orientar al lector.
+# ---------------------------------------------------------------------------
+DESC = {
+    # ---- customer_marts (marts finales, grano = persona_id / cliente) ----
+    "customer_marts.customer_profile":
+        "Ficha maestra 360 del cliente: identidad + features agregadas + score CLV + segmento. Vista de perfil que consolida todo lo demas.",
+    "customer_marts.customer_monetary":
+        "Metricas monetarias del cliente (gasto total, ticket medio, recencia/frecuencia/valor). Componente M del RFM.",
+    "customer_marts.customer_channel":
+        "Canal del cliente: canal preferido transaccional, mix aurgi/motortown/web/servicio, canal de entrada (canal8) y fuentes de origen.",
+    "customer_marts.customer_contactability":
+        "Contactabilidad del cliente: disponibilidad de email/telefono y consentimientos, a partir de la dimension persona + features + segmento.",
+    "customer_marts.customer_category_spend":
+        "Gasto del cliente desglosado por categoria de producto, a partir de la tabla de hechos de transaccion.",
+    "customer_marts.customer_brand_affinity":
+        "Afinidad de marca del cliente: que marcas compra y con que peso, cruzando transacciones con el catalogo de producto (Objeto_productos).",
+    "customer_marts.customer_product":
+        "Productos comprados por el cliente (detalle de que ha adquirido) desde la tabla de hechos de transaccion.",
+    "customer_marts.customer_store_spend":
+        "Gasto del cliente por centro/tienda desde la tabla de hechos de transaccion.",
+    "customer_marts.customer_temporal":
+        "Patrones temporales de compra del cliente (estacionalidad, recencia, frecuencia) desde transacciones + features.",
+    "customer_marts.customer_vehicles":
+        "Vehiculos asociados al cliente: dimension vehiculo + features de vehiculo + mapping N:N persona-vehiculo.",
+    "customer_marts.customer_payment_method":
+        "Metodo de pago del cliente reconstruido desde los pedidos TPV (orders/invoice/payment/payment_types).",
+    "customer_marts.customer_promo_usage":
+        "Uso de promociones/descuentos por el cliente (pedidos con descuento) desde transacciones + pedidos TPV + segmento.",
+    "customer_marts.customer_promo_tolerance":
+        "Tolerancia del cliente a promociones: respuesta a campanas + sensibilidad a descuentos en pedidos.",
+    "customer_marts.customer_predictive":
+        "Senales predictivas del cliente: score CLV, proxima mejor accion (recomendaciones) y segmento.",
+
+    # ---- clientes_intel (capa intermedia; tablas base del pipeline de inteligencia de clientes) ----
+    "clientes_intel.dim_persona":
+        "Dimension PERSONA: identidad de cliente consolidada (una fila por persona_id). Nucleo de la doble identidad persona+vehiculo.",
+    "clientes_intel.dim_vehiculo":
+        "Dimension VEHICULO: una fila por vehiculo (matricula/bastidor) con sus atributos.",
+    "clientes_intel.fact_transaccion":
+        "Tabla de HECHOS de transaccion: linea/venta por cliente. Base de casi todos los marts monetarios y de producto.",
+    "clientes_intel.fact_campana_respuesta":
+        "Tabla de HECHOS de respuesta a campanas de marketing (envio/apertura/conversion) por cliente.",
+    "clientes_intel.feat_cliente_persona":
+        "Features agregadas a nivel PERSONA (RFM, mix de canal, indicadores derivados). Alimenta perfil, monetary, channel, temporal, contactability.",
+    "clientes_intel.feat_cliente_vehiculo":
+        "Features agregadas a nivel VEHICULO. Alimenta customer_vehicles.",
+    "clientes_intel.seg_cliente_360":
+        "Segmentacion 360 del cliente (segmentos de negocio / clusters). Alimenta perfil, channel, contactability, predictive, promo_usage.",
+    "clientes_intel.score_clv":
+        "Score de valor de vida del cliente (CLV). Alimenta perfil y predictive.",
+    "clientes_intel.reco_acciones":
+        "Recomendaciones / proxima mejor accion (NBA) por cliente. Alimenta customer_predictive.",
+    "clientes_intel.map_persona_canal8":
+        "Mapeo persona -> canal8 (canal de entrada). Puente para customer_channel.",
+    "clientes_intel.map_persona_fuente":
+        "Mapeo persona -> fuente(s) de origen (de que sistema/canal proviene el cliente). Puente para customer_channel.",
+    "clientes_intel.map_persona_vehiculo":
+        "Mapeo N:N persona <-> vehiculo. Puente para customer_vehicles.",
+
+    # ---- cadena de catalogo de producto ----
+    "anjana_bi_datamart.Objeto_productos":
+        "Vista maestra de PRODUCTO: catalogo Navision + categorias CGQ + imagenes + tasa/margen por material. Se usa para afinidad de marca.",
+    "anjana_bi_datamart.Cruce_16_07_cgq":
+        "Tabla de cruce de categorias CGQ (categoria/subcategoria/tipo) usada por Objeto_productos.",
+    "claude_bi.productos_tasa_mat":
+        "Tabla de tasa/margen por material de producto. La consume Objeto_productos.",
+    "external_datasets.product_object_images":
+        "Imagenes de producto (imagen principal/secundaria). Dataset externo. La consume Objeto_productos.",
+    "stg_anjana_bi.producto":
+        "Staging de producto: cruza item de Navision con equivalencias de matriculas (SAF). Capa de preparacion sobre las tablas de SQL Server.",
+
+    # ---- fuentes base ----
+    "psql_dcpublic.products":
+        "Catalogo de productos. Replica en BigQuery de la BBDD Postgres ANJANA (DCPublic).",
+    "psql_dcpublic.product_categories":
+        "Categorias de producto. Replica Postgres ANJANA (DCPublic).",
+    "psql_dcpublic.product_groups":
+        "Grupos de producto. Replica Postgres ANJANA (DCPublic).",
+    "psql_dcpublic.tpv_orders_order":
+        "Pedidos TPV (cabecera de pedido). Replica Postgres ANJANA (DCPublic).",
+    "psql_dcpublic.tpv_orders_orderitem":
+        "Lineas de pedido TPV. Replica Postgres ANJANA (DCPublic).",
+    "psql_dcpublic.tpv_orders_invoice":
+        "Facturas TPV. Replica Postgres ANJANA (DCPublic).",
+    "psql_dcpublic.tpv_orders_payment":
+        "Pagos de pedidos TPV. Replica Postgres ANJANA (DCPublic).",
+    "psql_dcpublic.tpv_payment_types":
+        "Tipos de pago TPV (catalogo). Replica Postgres ANJANA (DCPublic).",
+    "mssql2022_dbo.item":
+        "Catalogo de articulos de Navision (SQL Server 2022, esquema dbo).",
+    "mssql2022_dbo.equivalencias_matriculas_saf":
+        "Equivalencias de matriculas (SAF) en Navision (SQL Server 2022, esquema dbo).",
+}
+
+TYPE_ES = {
+    "VIEW": "VISTA (tiene SQL propio)",
+    "MATERIALIZED VIEW": "VISTA MATERIALIZADA (tiene SQL propio)",
+    "BASE TABLE": "TABLA BASE (datos materializados; sin SQL de definicion, solo esquema)",
+    "EXTERNAL": "TABLA EXTERNA",
+    "UNKNOWN": "DESCONOCIDO",
+}
+
+# Carpeta destino por objeto.
+def folder_of(key: str) -> str:
+    ds = key.split(".", 1)[0]
+    if ds == "customer_marts":
+        return "01_marts"
+    if ds == "clientes_intel":
+        return "02_intermedio_clientes_intel"
+    if ds in ("anjana_bi_datamart", "claude_bi", "external_datasets", "stg_anjana_bi"):
+        return "03_producto"
+    return "04_fuentes"
+
+def fname_of(key: str) -> str:
+    return key.replace(".", "__") + ".txt"
+
+def relpath_of(key: str) -> str:
+    return f"{folder_of(key)}/{fname_of(key)}"
+
+def desc_of(key: str) -> str:
+    return DESC.get(key, "(sin descripcion)")
+
+# ---------------------------------------------------------------------------
+# Arbol de linaje recursivo (para los marts).
+# ---------------------------------------------------------------------------
+def render_tree(key: str, prefix: str | None = None, is_last: bool = True, seen=None) -> list[str]:
+    if seen is None:
+        seen = set()
+    tag = {"VIEW": "[vista]", "MATERIALIZED VIEW": "[vista mat]",
+           "BASE TABLE": "[TABLA BASE/FUENTE]", "EXTERNAL": "[externa]",
+           "UNKNOWN": "[?]"}.get(G.get(key, {"type": "UNKNOWN"})["type"], "")
+
+    if prefix is None:  # raiz
+        lines = [f"{key}  {tag}"]
+        child_prefix = ""
+    else:
+        connector = "└── " if is_last else "├── "
+        lines = [f"{prefix}{connector}{key}  {tag}"]
+        child_prefix = prefix + ("    " if is_last else "│   ")
+
+    if key in seen:
+        lines[-1] += "  (ya expandido arriba)"
+        return lines
+    seen.add(key)
+    refs = G.get(key, {"refs": []}).get("refs", [])
+    for i, r in enumerate(refs):
+        lines += render_tree(r, child_prefix, i == len(refs) - 1, seen)
+    return lines
+
+# ---------------------------------------------------------------------------
+# Escritura.
+# ---------------------------------------------------------------------------
+def w(path: str, text: str):
+    full = os.path.join(DEST, path)
+    os.makedirs(os.path.dirname(full), exist_ok=True)
+    with open(full, "w", newline="\r\n", encoding="utf-8") as f:
+        f.write(text)
+
+SEP = "=" * 78 + "\n"
+
+def object_file(key: str, include_tree: bool) -> str:
+    node = G[key]
+    out = []
+    out.append(SEP)
+    out.append(f"OBJETO : {PROJECT}.{key}\n")
+    out.append(f"TIPO   : {TYPE_ES.get(node['type'], node['type'])}\n")
+    out.append(f"DATASET: {key.split('.',1)[0]}\n")
+    out.append(SEP)
+    out.append("\nQUE ES\n------\n")
+    out.append(textwrap.fill(desc_of(key), width=78) + "\n")
+
+    if node.get("refs"):
+        out.append("\nDEPENDE DIRECTAMENTE DE\n-----------------------\n")
+        for r in node["refs"]:
+            out.append(f"  - {PROJECT}.{r}   -> ver {relpath_of(r)}\n")
+
+    if include_tree:
+        out.append("\nLINAJE COMPLETO (hasta la fuente)\n---------------------------------\n")
+        out.append("\n".join(render_tree(key)) + "\n")
+
+    out.append("\nSQL / DDL\n---------\n")
+    if node["type"] in ("VIEW", "MATERIALIZED VIEW"):
+        out.append("(Definicion de la vista. Este es el SQL que puedes copiar.)\n\n")
+    else:
+        out.append("(Tabla base: no tiene SQL de transformacion. Se incluye el CREATE TABLE\n"
+                    " con el esquema de columnas para referencia.)\n\n")
+    out.append(node["ddl"].strip() + "\n")
+    return "".join(out)
+
+# Marts: incluir arbol de linaje.
+marts = sorted(k for k in G if k.startswith("customer_marts."))
+for k in marts:
+    w(f"01_marts/{fname_of(k)}", object_file(k, include_tree=True))
+
+# Resto de objetos: sin arbol (o arbol solo si es vista con dependencias).
+for k in sorted(G):
+    if k.startswith("customer_marts."):
+        continue
+    include_tree = G[k]["type"] in ("VIEW", "MATERIALIZED VIEW") and bool(G[k].get("refs"))
+    w(relpath_of(k), object_file(k, include_tree=include_tree))
+
+# ---------------------------------------------------------------------------
+# INDICE.
+# ---------------------------------------------------------------------------
+idx = []
+idx.append(SEP)
+idx.append("INDICE - LINAJE DEL DATASET customer_marts\n")
+idx.append(f"Proyecto BigQuery: {PROJECT}\n")
+idx.append(SEP)
+idx.append("""
+QUE ES ESTA CARPETA
+-------------------
+Documenta, para cada tabla/vista del dataset `customer_marts`, de donde salen sus
+datos: la cadena completa desde el mart final hasta las tablas fuente, con el SQL
+de cada vista listo para copiar y compartir.
+
+Cada objeto tiene su propio .txt con:
+  - QUE ES (resumen de una linea; la SQL es la fuente de verdad)
+  - DE QUE DEPENDE (dependencias directas, con la ruta a su archivo)
+  - LINAJE COMPLETO (arbol hasta la fuente) -- solo en los marts y vistas
+  - SQL / DDL (el codigo: definicion de la vista, o el esquema si es tabla base)
+
+MAPA DE CAPAS
+-------------
+  customer_marts (VISTAS finales, grano = cliente/persona_id)
+        |
+        v
+  clientes_intel (TABLAS BASE: capa intermedia construida por el pipeline de
+        |          inteligencia de clientes -- dim_*, feat_*, seg_*, score_*,
+        |          reco_*, fact_*, map_*)
+        v
+  Fuentes:
+     - psql_dcpublic.*   Replica en BigQuery de la BBDD Postgres ANJANA (TPV + catalogo)
+     - anjana_bi_datamart / claude_bi / external_datasets / stg_anjana_bi
+                          Cadena de catalogo de PRODUCTO (Objeto_productos y sus fuentes)
+     - mssql2022_dbo.*   Navision (SQL Server 2022, esquema dbo)
+
+NOTA: las tablas de `clientes_intel` son TABLAS BASE: no son vistas, sino tablas que
+un pipeline reconstruye cada dia con sentencias CREATE TABLE AS SELECT (CTAS). Su
+esquema esta en 02_intermedio_clientes_intel/. El SQL REAL que las construye (y que
+baja hasta TPV / customers / users / Navision / Salesforce) esta en la carpeta
+05_construccion_clientes_intel/ -- ver tambien 00b_FUENTES_DE_CLIENTE.txt.
+
+""")
+
+idx.append(SEP)
+idx.append("CARPETAS\n")
+idx.append(SEP)
+idx.append("""
+  01_marts/                      Las 14 vistas de customer_marts (con arbol de linaje)
+  02_intermedio_clientes_intel/  Las 12 tablas base intermedias (esquema)
+  03_producto/                   Cadena de catalogo de producto (vistas + bases)
+  04_fuentes/                    Tablas fuente (replica Postgres, Navision, imagenes, tasas)
+  05_construccion_clientes_intel/  El SQL (CTAS) que construye cada tabla de clientes_intel
+  00b_FUENTES_DE_CLIENTE.txt     Que consulta lee cada fuente de cliente (TPV/customers/
+                                 users/Navision/Salesforce)
+
+""")
+
+def index_block(title, keys):
+    lines = [SEP, title + "\n", SEP, "\n"]
+    for k in keys:
+        t = {"VIEW": "vista", "MATERIALIZED VIEW": "vista_mat", "BASE TABLE": "tabla",
+             "EXTERNAL": "externa", "UNKNOWN": "?"}.get(G[k]["type"], "")
+        lines.append(f"[{t:9s}] {k}\n")
+        lines.append(f"            {desc_of(k)}\n")
+        lines.append(f"            archivo: {relpath_of(k)}\n\n")
+    return "".join(lines)
+
+idx.append(index_block("1) MARTS FINALES (customer_marts)", marts))
+idx.append(index_block("2) CAPA INTERMEDIA (clientes_intel)",
+                       sorted(k for k in G if k.startswith("clientes_intel."))))
+idx.append(index_block("3) CADENA DE PRODUCTO",
+                       sorted(k for k in G if folder_of(k) == "03_producto")))
+idx.append(index_block("4) FUENTES BASE",
+                       sorted(k for k in G if folder_of(k) == "04_fuentes")))
+
+w("00_INDICE.txt", "".join(idx))
+
+# Conteo final
+n_files = sum(len(files) for _, _, files in os.walk(DEST))
+print(f"Escrito en: {DEST}")
+print(f"Archivos .txt generados: {n_files}")
+print("Estructura:")
+for root, dirs, files in sorted(os.walk(DEST)):
+    rel = os.path.relpath(root, DEST)
+    if rel == ".":
+        for f in sorted(files):
+            print(f"  {f}")
+    else:
+        print(f"  {rel}/  ({len(files)} archivos)")
@@ -0,0 +1,164 @@
+"""Genera 05_construccion_clientes_intel/ (SQL CTAS de cada tabla de clientes_intel)
+y 00b_FUENTES_DE_CLIENTE.txt (mapa fuente-de-cliente -> consulta que la lee).
+
+Fuente de datos: scratchpad/intel_build.json (SQL de construccion capturado de
+INFORMATION_SCHEMA.JOBS) y scratchpad/intel_lineage.json (tablas implicadas).
+"""
+import json
+import os
+import textwrap
+
+DEST = "/mnt/c/Users/egutierrez/Desktop/linaje_customer_marts"
+PROJECT = "autingo-159109"
+builds = json.load(open("scratchpad/intel_build.json"))
+lin = json.load(open("scratchpad/intel_lineage.json"))
+
+# Tablas para las que escribimos el SQL de construccion: las del linaje de customer_marts
+# + las que leen fuentes de cliente/Salesforce.
+EXTRA = ["seg_vega_persona", "fact_campana_respuesta__sfnew"]
+want = sorted(set(lin["intel_involved"]) | set(EXTRA))
+want = [t for t in want if t in builds]  # solo las que tienen SQL capturado
+
+DESC = {
+    "_persona_records":
+        "IDENTIDAD DEL CLIENTE (nucleo). UNION de 7 fuentes -> normaliza DNI/NIE/CIF, email y "
+        "telefono -> resuelve persona_id (FARM_FINGERPRINT de persona_key) con nivel de confianza. "
+        "AQUI es donde se juntan TPV customers, customers web, OTR, Navision, citaprevia, users y "
+        "Salesforce contacts_latest.",
+    "dim_persona":
+        "Dimension PERSONA final: una fila por persona_id, elegida desde _persona_records "
+        "(prioriza el mejor registro por fuente/confianza) + banderas de contacto.",
+    "dim_vehiculo":
+        "Dimension VEHICULO: una fila por vehiculo (matricula/bastidor) desde TPV vehicles, OTR, "
+        "citaprevia matriculas y calibrado de ano de matricula.",
+    "map_persona_fuente":
+        "Mapeo persona -> fuente(s) de origen (tpv/web/otr/navision/citaprevia/users/salesforce). "
+        "Registra de que sistemas proviene cada persona.",
+    "map_persona_vehiculo":
+        "Mapeo N:N persona <-> vehiculo (quien conduce/posee que coche) desde OTR, TPV vehicleowner "
+        "y citaprevia matriculas.",
+    "map_persona_canal8":
+        "Mapeo persona -> canal8 (canal de entrada del cliente).",
+    "fact_transaccion":
+        "Tabla de HECHOS de transaccion (linea/venta por persona). Base de los marts monetarios.",
+    "fact_visita":
+        "Tabla de HECHOS de visita (visitas del cliente al taller/tienda).",
+    "fact_campana_respuesta":
+        "HECHOS de respuesta a campanas: cruza envios/aperturas/clics/sms de Salesforce con personas.",
+    "fact_campana_respuesta__sfnew":
+        "Variante de fact_campana_respuesta con el esquema nuevo de Salesforce (email_sent/opened/clicked/sms).",
+    "feat_cliente_persona":
+        "Features agregadas por PERSONA (RFM, mix de canal, ticket medio, margen, recencia...).",
+    "feat_cliente_vehiculo":
+        "Features agregadas por VEHICULO.",
+    "seg_cliente_360":
+        "Segmentacion 360 del cliente (segmentos/clusters de negocio).",
+    "seg_vega_persona":
+        "Segmentacion VEGA por persona (contactabilidad/valor); lee fuentes de cliente para calcular "
+        "disponibilidad de contacto.",
+    "seg_cluster_persona":
+        "Clustering de personas (asignacion de cluster) que alimenta la segmentacion.",
+    "reco_acciones":
+        "Recomendaciones / proxima mejor accion (NBA) por cliente.",
+    "data_points_contacto":
+        "Puntos de dato de contacto (email/telefono) consolidados y calidad por persona.",
+    "_margen_rate_producto":
+        "Tasa de margen por producto (auxiliar para features monetarias).",
+    "_plate_year_calib":
+        "Calibrado del ano a partir de la matricula (auxiliar para dim_vehiculo).",
+    "dim_cp_provincia":
+        "Diccionario codigo postal -> provincia/CCAA.",
+    "tipologia_cliente":
+        "Tipologia de cliente (clasificacion de negocio).",
+}
+
+# Descripcion corta de cada fuente de cliente.
+SRC_DESC = {
+    "psql_dcpublic.tpv_customers": "Clientes del TPV (mostrador). Replica Postgres ANJANA (DCPublic).",
+    "psql_dcpublic.customers": "Clientes web (e-commerce). Replica Postgres ANJANA (DCPublic).",
+    "psql_dcpublic.otr_customers": "Clientes de OTR (ordenes de reparacion/taller). Replica Postgres ANJANA.",
+    "psql_dcpublic.users": "Usuarios (cuentas). Replica Postgres ANJANA (DCPublic).",
+    "mssql2022_dbo.anjana_customer": "Cliente de NAVISION (SQL Server 2022, esquema dbo). Campos no_/e_mail/movil/name/post_code.",
+    "salesforce_ew1.contacts_latest": "Contactos de SALESFORCE (ultima version). Dataset en europe-west1.",
+    "salesforce_ew1.email_sent": "Envios de email de Salesforce (Marketing Cloud).",
+    "salesforce_ew1.email_opened": "Aperturas de email de Salesforce.",
+    "salesforce_ew1.email_clicked": "Clics de email de Salesforce.",
+    "salesforce_ew1.sms": "SMS de Salesforce.",
+    "citaprevia_aurphcp.clientes": "Clientes de CITA PREVIA (aurphcp).",
+    "citaprevia_aurphcp.clientes_matriculas": "Matriculas por cliente en cita previa.",
+    "psql_dcpublic.tpv_vehicles_vehicle": "Vehiculos del TPV. Replica Postgres ANJANA.",
+    "psql_dcpublic.tpv_vehicles_vehicleowner": "Propietarios de vehiculo del TPV (N:N). Replica Postgres ANJANA.",
+}
+
+CUST_SOURCES = list(SRC_DESC.keys())
+
+SEP = "=" * 78 + "\n"
+
+def w(path, text):
+    full = os.path.join(DEST, path)
+    os.makedirs(os.path.dirname(full), exist_ok=True)
+    with open(full, "w", newline="\r\n", encoding="utf-8") as f:
+        f.write(text)
+
+def build_file(tbl):
+    b = builds[tbl]
+    out = [SEP, f"OBJETO : {PROJECT}.clientes_intel.{tbl}\n",
+           f"TIPO   : TABLA BASE construida por {b['stmt']} (se reconstruye periodicamente)\n",
+           f"ULTIMA EJECUCION CAPTURADA: {b['last_run']}\n", SEP,
+           "\nQUE ES\n------\n",
+           textwrap.fill(DESC.get(tbl, "(sin descripcion)"), width=78) + "\n"]
+    if b["refs"]:
+        out.append("\nLEE DE (tablas fuente / intermedias)\n------------------------------------\n")
+        for r in b["refs"]:
+            note = "   << FUENTE DE CLIENTE" if r in SRC_DESC else ""
+            out.append(f"  - {PROJECT}.{r}{note}\n")
+    out.append("\nSQL DE CONSTRUCCION (copiable)\n------------------------------\n\n")
+    out.append(b["query"].strip() + "\n")
+    return "".join(out)
+
+for t in want:
+    w(f"05_construccion_clientes_intel/{t}.txt", build_file(t))
+
+# 00b_FUENTES_DE_CLIENTE.txt
+f = [SEP, "FUENTES DE CLIENTE  ->  QUE CONSULTA DE clientes_intel LAS USA\n", SEP,
+     "\nResponde a: de donde salen los clientes (TPV, web, OTR, Navision, Salesforce, cita\n"
+     "previa) y en que consulta se juntan. El punto de union de identidades es\n"
+     "_persona_records (ver 05_construccion_clientes_intel/_persona_records.txt).\n\n"]
+
+f.append(SEP + "RESUMEN: LO QUE PEDISTE\n" + SEP + "\n")
+mapping = [
+    ("TPV customers", "psql_dcpublic.tpv_customers"),
+    ("customers (web)", "psql_dcpublic.customers"),
+    ("customers (OTR / taller)", "psql_dcpublic.otr_customers"),
+    ("users", "psql_dcpublic.users"),
+    ("customer de NAVISION", "mssql2022_dbo.anjana_customer"),
+    ("SALESFORCE (contactos)", "salesforce_ew1.contacts_latest"),
+]
+for label, src in mapping:
+    f.append(f"  {label:26s} -> {PROJECT}.{src}\n")
+f.append("\n  SI: tenemos Salesforce. El dataset es `salesforce_ew1` (europe-west1):\n"
+         "      contactos en contacts_latest; marketing en email_sent/opened/clicked y sms.\n\n")
+
+for src in CUST_SOURCES:
+    consumers = sorted(t for t, b in builds.items() if src in b["refs"])
+    f.append(SEP)
+    f.append(f"{PROJECT}.{src}\n")
+    f.append(SEP)
+    f.append(f"  {SRC_DESC[src]}\n")
+    f.append("  La leen estas tablas de clientes_intel (con su SQL en 05_construccion_...):\n")
+    if consumers:
+        for t in consumers:
+            star = "  [SQL disponible]" if t in want else ""
+            f.append(f"     - {t}  ({builds[t]['stmt']}){star}\n")
+    else:
+        f.append("     (ninguna la referencia directamente)\n")
+    f.append("\n")
+
+w("00b_FUENTES_DE_CLIENTE.txt", "".join(f))
+
+print("Generado:")
+print(f"  05_construccion_clientes_intel/  -> {len(want)} archivos con SQL de construccion")
+print(f"  00b_FUENTES_DE_CLIENTE.txt")
+print("\nTablas con SQL de construccion escrito:")
+for t in want:
+    print(f"  - {t}")
@@ -0,0 +1,126 @@
+"""Genera 00c_VERIFICACION.txt (chequeo de completitud del linaje) y
+06_otros_outputs_clientes_intel/ (SQL de las tablas de clientes_intel que NO acaban
+en customer_marts, para no dejar ninguna atras).
+"""
+import json
+import os
+import textwrap
+
+DEST = "/mnt/c/Users/egutierrez/Desktop/linaje_customer_marts"
+PROJECT = "autingo-159109"
+builds = json.load(open("scratchpad/intel_build.json"))
+lin = json.load(open("scratchpad/intel_lineage.json"))
+involved = set(lin["intel_involved"])
+
+# Catalogo completo de clientes_intel (40 objetos) reconstruido: involved + leftovers conocidos.
+LEFTOVER = [
+    "_presupuesto_persona", "_veh_cluster_feat", "_veh_tec_feat", "audit_persona_divergencias",
+    "calidad_email_snapshot", "f0_audit_keys", "fact_impacto_campana", "map_mutualista_particular",
+    "reco_promo_personalizada", "reco_promo_segmento", "rpt_campana", "rpt_campana_lift",
+    "rpt_campana_usuario", "rpt_impacto_persona", "seg_audiencia", "seg_vega_persona",
+    "sf_contact_map", "tipologia_cliente_resumen", "veh_cluster",
+]
+
+# Clasificacion por proposito (a donde va cada leftover).
+CATEGORY = {
+    "rpt_campana": "Informe de campanas (BI / dashboards de marketing)",
+    "rpt_campana_lift": "Informe de campanas: lift (BI / dashboards)",
+    "rpt_campana_usuario": "Informe de campanas por usuario (BI / dashboards)",
+    "rpt_impacto_persona": "Informe de impacto por persona (BI / dashboards)",
+    "fact_impacto_campana": "Hechos de impacto de campana (base de los informes)",
+    "reco_promo_personalizada": "Recomendacion de promo personalizada (activacion)",
+    "reco_promo_segmento": "Recomendacion de promo por segmento (activacion)",
+    "seg_audiencia": "Audiencias para activacion (probable push a Salesforce/Marketing)",
+    "sf_contact_map": "Mapa de contactos Salesforce (sincronizacion de IDs)",
+    "audit_persona_divergencias": "Auditoria de calidad: divergencias en resolucion de persona",
+    "calidad_email_snapshot": "Auditoria de calidad: snapshot de emails",
+    "f0_audit_keys": "Auditoria de claves (control interno del pipeline)",
+    "_presupuesto_persona": "Auxiliar: presupuestos por persona (interim)",
+    "_veh_cluster_feat": "Auxiliar: features para clustering de vehiculo (interim)",
+    "_veh_tec_feat": "Auxiliar: features tecnicas de vehiculo (interim)",
+    "veh_cluster": "Clustering de vehiculo (resultado; no lo usan los marts hoy)",
+    "tipologia_cliente_resumen": "Resumen de tipologia de cliente (BI)",
+    "map_mutualista_particular": "Vista auxiliar: mapa mutualista/particular",
+    "seg_vega_persona": "Segmentacion VEGA por persona (contactabilidad; lee fuentes de cliente)",
+}
+
+SEP = "=" * 78 + "\n"
+
+def w(path, text):
+    full = os.path.join(DEST, path)
+    os.makedirs(os.path.dirname(full), exist_ok=True)
+    with open(full, "w", newline="\r\n", encoding="utf-8") as f:
+        f.write(text)
+
+# --- 06: SQL de los leftovers que tengan build capturado ---
+written = []
+for t in LEFTOVER:
+    b = builds.get(t)
+    if not b:
+        continue
+    out = [SEP, f"OBJETO : {PROJECT}.clientes_intel.{t}\n",
+           f"TIPO   : {b['stmt']}   (NO alimenta customer_marts)\n",
+           f"ULTIMA EJECUCION CAPTURADA: {b['last_run']}\n", SEP,
+           "\nQUE ES / A DONDE VA\n-------------------\n",
+           textwrap.fill(CATEGORY.get(t, "(sin clasificar)"), width=78) + "\n"]
+    if b["refs"]:
+        out.append("\nLEE DE\n------\n")
+        for r in b["refs"]:
+            out.append(f"  - {PROJECT}.{r}\n")
+    out.append("\nSQL DE CONSTRUCCION (copiable)\n------------------------------\n\n")
+    out.append(b["query"].strip() + "\n")
+    w(f"06_otros_outputs_clientes_intel/{t}.txt", "".join(out))
+    written.append(t)
+
+# --- 00c: verificacion de completitud ---
+v = [SEP, "VERIFICACION DE COMPLETITUD DEL LINAJE\n", SEP, "\n"]
+v.append("PREGUNTA: todo esto acaba en customer_marts? Comprobado.\n\n")
+v.append("""RESPUESTA CORTA
+---------------
+La cadena customer_marts -> fuentes esta COMPLETA (todas las referencias resueltas,
+0 tablas sin identificar). PERO customer_marts NO es el unico destino: es UNO de los
+consumidores de la capa clientes_intel.
+
+  - clientes_intel tiene 40 objetos.
+  - 21 de ellos alimentan (directa o indirectamente) las 14 vistas de customer_marts.
+  - 19 NO van a customer_marts: son OTRAS salidas del mismo pipeline (informes de
+    campana, recomendaciones de promo, audiencias, auditorias, auxiliares).
+
+El unico dataset MODELADO que lee clientes_intel es customer_marts. El resto de lo que
+lee clientes_intel y customer_marts son consultas de BI / ad-hoc (tablas temporales
+_hexhash / anon...), es decir Metabase u otros lo consumen directamente. En ese sentido
+customer_marts SI es terminal en el modelo (aguas abajo solo hay BI).
+
+""")
+
+v.append(SEP + "1) LAS 21 TABLAS DE clientes_intel QUE SI ALIMENTAN customer_marts\n" + SEP + "\n")
+for t in sorted(involved):
+    b = builds.get(t, {})
+    v.append(f"  - {t}  ({b.get('stmt','(sin job)')})\n")
+
+v.append("\n" + SEP + "2) LAS 19 TABLAS DE clientes_intel QUE NO VAN A customer_marts\n" + SEP + "\n")
+v.append("   (SQL de cada una en 06_otros_outputs_clientes_intel/)\n\n")
+for t in LEFTOVER:
+    sql_note = "" if t in written else "   [sin SQL de job capturado]"
+    v.append(f"  - {t:28s} {CATEGORY.get(t,'')}{sql_note}\n")
+
+v.append("\n" + SEP + "3) FUENTES BASE ALCANZADAS (fin del linaje)\n" + SEP + "\n")
+v.append("   Fuera de clientes_intel, el pipeline lee de:\n\n")
+for s in sorted(lin["external_sources"]):
+    v.append(f"  - {PROJECT}.{s}\n")
+
+v.append("\n" + SEP + "4) NOTAS DE COBERTURA\n" + SEP + "\n")
+v.append("""  - score_clv y seg_cluster_vehiculo: usadas por customer_marts pero sin CTAS reciente
+    en el historial de jobs (son modelos ML / cargas antiguas). Su esquema esta en
+    02_intermedio_clientes_intel/; no hay un SQL de un solo job que las reconstruya.
+  - El SQL de construccion se tomo del ULTIMO job exitoso de cada tabla
+    (INFORMATION_SCHEMA.JOBS, region europe-west1, ventana 120 dias). Si una tabla se
+    reconstruye con otra logica fuera de esa ventana, no se captura aqui.
+  - customer_marts: 14 vistas = el dataset entero (no falta ninguna).
+""")
+
+w("00c_VERIFICACION.txt", "".join(v))
+
+print(f"06_otros_outputs_clientes_intel/ -> {len(written)} archivos")
+print("00c_VERIFICACION.txt -> escrito")
+print("\nLeftovers sin SQL capturado:", [t for t in LEFTOVER if t not in written] or "ninguno")
@@ -0,0 +1,53 @@
+{
+  "intel_involved": [
+    "_margen_rate_producto",
+    "_persona_records",
+    "_plate_year_calib",
+    "data_points_contacto",
+    "dim_cp_provincia",
+    "dim_persona",
+    "dim_vehiculo",
+    "fact_campana_respuesta",
+    "fact_transaccion",
+    "fact_visita",
+    "feat_cliente_persona",
+    "feat_cliente_vehiculo",
+    "map_persona_canal8",
+    "map_persona_fuente",
+    "map_persona_vehiculo",
+    "reco_acciones",
+    "score_clv",
+    "seg_cliente_360",
+    "seg_cluster_persona",
+    "seg_cluster_vehiculo",
+    "tipologia_cliente"
+  ],
+  "external_sources": [
+    "anjana_bi_amg.margenes_mat",
+    "citaprevia_aurphcp.clientes",
+    "citaprevia_aurphcp.clientes_matriculas",
+    "claude_bi.churn_scores_current",
+    "claude_bi.conversion_cqg_base_mat",
+    "claude_bi.todos_datos_lineas_mat",
+    "mssql2022_dbo.anjana_customer",
+    "ontologia.aurgiCitas_mat",
+    "psql_dcpublic.call_transactions",
+    "psql_dcpublic.car_makes",
+    "psql_dcpublic.car_model_families",
+    "psql_dcpublic.car_models",
+    "psql_dcpublic.car_versions",
+    "psql_dcpublic.customers",
+    "psql_dcpublic.otr_customers",
+    "psql_dcpublic.otr_vehicles",
+    "psql_dcpublic.tecrmi_license_plates",
+    "psql_dcpublic.tpv_customers",
+    "psql_dcpublic.tpv_vehicles_vehicle",
+    "psql_dcpublic.tpv_vehicles_vehicleowner",
+    "psql_dcpublic.users",
+    "salesforce_ew1.contacts_latest",
+    "salesforce_ew1.email_clicked",
+    "salesforce_ew1.email_opened",
+    "salesforce_ew1.email_sent",
+    "salesforce_ew1.sms"
+  ]
+}
@@ -0,0 +1,51 @@
+"""Helper: run SQL against Metabase BigQuery db=6 via REST API.
+
+Usage:
+    python3 mbq.py "SELECT 1"
+    python3 mbq.py < query.sql
+Reads API key from `pass metabase/aurgi-api-key`.
+Prints columns header + rows as TSV.
+"""
+import os
+import sys
+import json
+import subprocess
+
+sys.path.insert(0, "python/functions")
+from metabase import MetabaseClient, metabase_execute_query
+
+MB_URL = "https://reports.autingo.es"
+DB_ID = 6
+
+
+def get_key():
+    return subprocess.check_output(["pass", "show", "metabase/aurgi-api-key"]).decode().splitlines()[0].strip()
+
+
+def run(sql, max_results=2000):
+    import httpx
+    c = MetabaseClient(MB_URL, get_key())
+    try:
+        res = metabase_execute_query(c, DB_ID, sql, max_results=max_results)
+    except httpx.HTTPStatusError as e:
+        print("HTTP", e.response.status_code, e.response.text[:3000])
+        return
+    data = res.get("data", {})
+    cols = [col.get("display_name") or col.get("name") for col in data.get("cols", [])]
+    rows = data.get("rows", [])
+    # error?
+    if res.get("error") or (res.get("status") and res.get("status") != "completed"):
+        print("ERROR:", json.dumps(res.get("error") or res, ensure_ascii=False)[:2000])
+        return
+    print("\t".join(str(x) for x in cols))
+    for r in rows:
+        print("\t".join("" if v is None else str(v) for v in r))
+    print(f"-- {len(rows)} rows", file=sys.stderr)
+
+
+if __name__ == "__main__":
+    if len(sys.argv) > 1:
+        sql = sys.argv[1]
+    else:
+        sql = sys.stdin.read()
+    run(sql)
@@ -0,0 +1,106 @@
+"""Traza la construccion de clientes_intel: para cada tabla, recupera el SQL del ultimo
+job que la escribio (INFORMATION_SCHEMA.JOBS) + sus referenced_tables, y recorre hacia
+atras hasta las tablas fuente (TPV, customers, users, Navision, Salesforce).
+
+Vuelca todo a scratchpad/intel_build.json.
+"""
+import json
+import warnings
+
+warnings.filterwarnings("ignore")
+import google.auth
+from google.cloud import bigquery
+
+PROJECT = "autingo-159109"
+REGION = "region-europe-west1"
+
+creds, _ = google.auth.default(scopes=["https://www.googleapis.com/auth/bigquery"])
+creds = creds.with_quota_project(None)
+c = bigquery.Client(project=PROJECT, credentials=creds)
+
+# Ultimo job por tabla destino en clientes_intel: query + referenced_tables + stmt.
+sql = f"""
+WITH j AS (
+  SELECT
+    dest.table_id AS tbl,
+    query,
+    statement_type AS stmt,
+    creation_time,
+    ARRAY(
+      SELECT AS STRUCT rt.project_id, rt.dataset_id, rt.table_id
+      FROM UNNEST(referenced_tables) rt
+    ) AS refs,
+    ROW_NUMBER() OVER (PARTITION BY dest.table_id ORDER BY creation_time DESC) AS rn
+  FROM `{PROJECT}`.`{REGION}`.INFORMATION_SCHEMA.JOBS_BY_PROJECT,
+       UNNEST([destination_table]) dest
+  WHERE dest.dataset_id = 'clientes_intel'
+    AND state = 'DONE' AND error_result IS NULL
+    AND statement_type IS NOT NULL
+    AND creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 120 DAY)
+)
+SELECT tbl, query, stmt, creation_time, refs FROM j WHERE rn = 1
+ORDER BY tbl
+"""
+
+builds = {}
+for r in c.query(sql).result():
+    refs = []
+    for rt in r.refs:
+        refs.append(f"{rt['dataset_id']}.{rt['table_id']}")
+    builds[r.tbl] = {
+        "query": r.query or "",
+        "stmt": r.stmt,
+        "last_run": str(r.creation_time),
+        "refs": sorted(set(x for x in refs if not x.endswith(f".{r.tbl}"))),
+    }
+
+json.dump(builds, open("scratchpad/intel_build.json", "w"), indent=2, ensure_ascii=False)
+print(f"tablas clientes_intel con SQL de construccion capturado: {len(builds)}\n")
+
+# Recursion desde las 12 tablas usadas por customer_marts.
+SEED = [
+    "dim_persona", "dim_vehiculo", "fact_transaccion", "fact_campana_respuesta",
+    "feat_cliente_persona", "feat_cliente_vehiculo", "seg_cliente_360", "score_clv",
+    "reco_acciones", "map_persona_canal8", "map_persona_fuente", "map_persona_vehiculo",
+]
+intel_involved = set()
+external_sources = set()
+stack = list(SEED)
+while stack:
+    t = stack.pop()
+    if t in intel_involved:
+        continue
+    intel_involved.add(t)
+    b = builds.get(t)
+    if not b:
+        continue
+    for ref in b["refs"]:
+        ds, tbl = ref.split(".", 1)
+        if ds == "clientes_intel":
+            if tbl not in intel_involved:
+                stack.append(tbl)
+        else:
+            external_sources.add(ref)
+
+print("== tablas clientes_intel implicadas en el linaje de customer_marts ==")
+for t in sorted(intel_involved):
+    b = builds.get(t, {})
+    print(f"  {t:26s} {b.get('stmt','(sin job)')}")
+
+print("\n== FUENTES EXTERNAS (fuera de clientes_intel) usadas por el pipeline ==")
+for s in sorted(external_sources):
+    print(f"  {s}")
+
+# Marcar las fuentes de CLIENTE que pide el usuario.
+KEYS = ["customer", "customers", "cliente", "user", "usuario", "tpv", "salesforce",
+        "sf_", "contact", "mkt_cloud", "persona"]
+print("\n== fuentes que parecen de CLIENTE/usuario ==")
+for s in sorted(external_sources):
+    low = s.lower()
+    if any(k in low for k in KEYS):
+        print(f"  {s}")
+
+json.dump({
+    "intel_involved": sorted(intel_involved),
+    "external_sources": sorted(external_sources),
+}, open("scratchpad/intel_lineage.json", "w"), indent=2, ensure_ascii=False)
@@ -0,0 +1,158 @@
+"""Traza el linaje recursivo de las vistas de customer_marts hasta las tablas fuente.
+
+Para cada objeto: obtiene su tipo (VIEW/BASE TABLE/EXTERNAL/MATERIALIZED VIEW) y su DDL
+via INFORMATION_SCHEMA.TABLES, extrae las referencias a otras tablas del DDL y recurre
+sobre las que son vistas. Vuelca el grafo completo a un JSON en scratchpad.
+"""
+import json
+import re
+import sys
+import warnings
+
+warnings.filterwarnings("ignore")
+
+import google.auth
+from google.cloud import bigquery
+
+PROJECT = "autingo-159109"
+
+creds, _ = google.auth.default(scopes=["https://www.googleapis.com/auth/bigquery"])
+creds = creds.with_quota_project(None)
+client = bigquery.Client(project=PROJECT, credentials=creds)
+
+# Cache de metadata por dataset: {dataset: {table_name: {"type":..., "ddl":...}}}
+dataset_cache: dict[str, dict] = {}
+
+
+def load_dataset(dataset: str) -> dict:
+    """Carga todas las tablas/vistas de un dataset (una query por dataset)."""
+    if dataset in dataset_cache:
+        return dataset_cache[dataset]
+    result: dict[str, dict] = {}
+    try:
+        sql = f"""
+        SELECT table_name, table_type, ddl
+        FROM `{PROJECT}`.`{dataset}`.INFORMATION_SCHEMA.TABLES
+        """
+        for r in client.query(sql).result():
+            result[r.table_name] = {"type": r.table_type, "ddl": r.ddl or ""}
+    except Exception as e:  # noqa: BLE001
+        print(f"  [warn] no se pudo leer dataset {dataset}: {e}", file=sys.stderr)
+    dataset_cache[dataset] = result
+    return result
+
+
+# En el DDL que emite INFORMATION_SCHEMA, las referencias a otras tablas SIEMPRE van
+# entre backticks y totalmente cualificadas: `proyecto.dataset.tabla`. Los alias de
+# CTE/JOIN (dp, fcp, f...) nunca llevan backticks, asi que restringiendo a lo que hay
+# entre backticks eliminamos todo el ruido.
+BACKTICK_RE = re.compile(r"`([^`]+)`")
+# Variante con cada parte en su propio backtick: `proj`.`dataset`.`tabla`
+MULTIPART_RE = re.compile(
+    r"`([A-Za-z0-9_-]+)`\.`([A-Za-z0-9_-]+)`(?:\.`([A-Za-z0-9_-]+)`)?"
+)
+
+
+def _norm(proj: str, ds: str, tbl: str) -> tuple[str, str] | None:
+    if ds.upper() == "INFORMATION_SCHEMA" or tbl.upper() == "INFORMATION_SCHEMA":
+        return None
+    return (ds, tbl)
+
+
+def extract_refs(ddl: str) -> set[tuple[str, str]]:
+    """Devuelve el conjunto de (dataset, table) referenciados en el cuerpo del DDL.
+
+    Se queda con el SELECT (tras el primer 'AS') para no capturar el nombre del propio objeto.
+    """
+    body = ddl
+    m = re.search(r"\bAS\b", ddl, flags=re.IGNORECASE)
+    if m:
+        body = ddl[m.end():]
+
+    refs: set[tuple[str, str]] = set()
+
+    # Estilo `proyecto.dataset.tabla` (todo en un backtick).
+    for tok in BACKTICK_RE.findall(body):
+        parts = [p for p in tok.split(".") if p]
+        if len(parts) == 3:
+            r = _norm(parts[0], parts[1], parts[2])
+        elif len(parts) == 2:
+            r = _norm(PROJECT, parts[0], parts[1])
+        else:
+            r = None
+        if r:
+            refs.add(r)
+
+    # Estilo `proj`.`dataset`.`tabla` (parte por backtick, 3 partes cualificadas).
+    # OJO: `alias`.`columna` (2 partes con cada parte en su propio backtick) es una
+    # referencia a columna, NO a tabla — se descarta exigiendo las 3 partes.
+    for mt in MULTIPART_RE.finditer(body):
+        g1, g2, g3 = mt.group(1), mt.group(2), mt.group(3)
+        if g3:
+            r = _norm(g1, g2, g3)
+            if r:
+                refs.add(r)
+
+    return refs
+
+
+graph: dict[str, dict] = {}  # key "dataset.table" -> {type, ddl, refs:[...]}
+visited: set[str] = set()
+
+
+def visit(dataset: str, table: str, depth: int = 0):
+    key = f"{dataset}.{table}"
+    if key in visited:
+        return
+    visited.add(key)
+    meta = load_dataset(dataset).get(table)
+    if meta is None:
+        graph[key] = {"type": "UNKNOWN", "ddl": "", "refs": [], "depth": depth}
+        return
+    ddl = meta["ddl"]
+    ttype = meta["type"]
+    refs: list[str] = []
+    if ttype in ("VIEW", "MATERIALIZED VIEW"):
+        for ds, tbl in sorted(extract_refs(ddl)):
+            # Evitar auto-referencia
+            if ds == dataset and tbl == table:
+                continue
+            refs.append(f"{ds}.{tbl}")
+    graph[key] = {"type": ttype, "ddl": ddl, "refs": refs, "depth": depth}
+    for ref in refs:
+        rds, rtbl = ref.split(".", 1)
+        visit(rds, rtbl, depth + 1)
+
+
+# Semillas: las 14 vistas de customer_marts.
+SEEDS = [
+    "customer_brand_affinity", "customer_category_spend", "customer_channel",
+    "customer_contactability", "customer_monetary", "customer_payment_method",
+    "customer_predictive", "customer_product", "customer_profile",
+    "customer_promo_tolerance", "customer_promo_usage", "customer_store_spend",
+    "customer_temporal", "customer_vehicles",
+]
+for s in SEEDS:
+    visit("customer_marts", s, 0)
+
+out = {
+    "project": PROJECT,
+    "seeds": [f"customer_marts.{s}" for s in SEEDS],
+    "graph": graph,
+}
+with open("scratchpad/lineage_graph.json", "w") as f:
+    json.dump(out, f, indent=2, ensure_ascii=False)
+
+# Resumen
+n_view = sum(1 for v in graph.values() if v["type"] in ("VIEW", "MATERIALIZED VIEW"))
+n_base = sum(1 for v in graph.values() if v["type"] == "BASE TABLE")
+n_ext = sum(1 for v in graph.values() if v["type"] == "EXTERNAL")
+n_unk = sum(1 for v in graph.values() if v["type"] == "UNKNOWN")
+print(f"objetos totales: {len(graph)}  vistas: {n_view}  base: {n_base}  external: {n_ext}  desconocidos: {n_unk}")
+print("\n== objetos por dataset ==")
+by_ds: dict[str, int] = {}
+for k in graph:
+    ds = k.split(".", 1)[0]
+    by_ds[ds] = by_ds.get(ds, 0) + 1
+for ds, n in sorted(by_ds.items(), key=lambda x: -x[1]):
+    print(f"  {n:3d}  {ds}")
Author	SHA1	Message	Date
egutierrez	2ebc9efeb2	chore: auto-commit (8 archivos) - scratchpad/gen_docs.py - scratchpad/gen_intel.py - scratchpad/gen_verify.py - scratchpad/intel_build.json - scratchpad/intel_lineage.json - scratchpad/lineage_graph.json - scratchpad/trace_intel.py - scratchpad/trace_lineage.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-07-01 19:00:06 +02:00
egutierrez	fbdf80bd71	chore: auto-commit (10 archivos) - scratchpad/ap.parquet - scratchpad/bq.py - scratchpad/cards.json - scratchpad/citas_recon.csv - scratchpad/dash.txt - scratchpad/diego.parquet - scratchpad/diego_literals.sql - scratchpad/exf/ - scratchpad/va.parquet - scratchpad/vm.parquet Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-07-01 17:58:03 +02:00
egutierrez	8408863cfa	feat(eda): pipeline BQ-EDA sobre tablas BigQuery (grupo eda) Añade el conector y el pipeline para hacer EDA automático sobre tablas/vistas de BigQuery, reutilizando profile_table del grupo eda sin duplicar profiling: - load_bq_table_to_duckdb (datascience): trae una tabla BQ a DuckDB con seudonimización SHA-1 de columnas PII y normalización de dtypes. Por defecto carga el total de filas (sample_frac=None); el muestreo es opt-in explícito. - profile_bq_table (pipeline): orquesta load -> profile_table -> render report (JSON + Markdown + PDF/PPTX). Full por defecto. Ambas tageadas eda+bigquery, v1.1.0. El default full responde a la preferencia del operador: los EDA se corren sobre el total salvo indicación contraria. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 12:45:39 +02:00
egutierrez	7273823087	Merge remote-tracking branch 'origin/master' # Conflicts: # .claude/settings.local.json	2026-07-01 11:42:49 +02:00
egutierrez	76592e4dc0	chore: auto-commit (2 archivos) - .claude/settings.local.json - scratchpad/mbq.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-07-01 11:41:56 +02:00
egutierrez	26569c7015	chore: auto-commit (1 archivos) - logs/ardour_mcp_server.log Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-07-01 02:16:25 +02:00
egutierrez	44622339fa	merge(eda): cap4/cap5 distribuciones — parrafos al glosario, desc LLM+unidad por columna, donut->barras, PPT side_by_side	2026-07-01 02:11:53 +02:00
egutierrez	c0d44a6352	fix(eda): cat_distr — intro del cuerpo reducida a términos clicables mínimos Quita la frase descriptiva del cuerpo del capítulo ('Cada columna categórica ocupa su propia página — ...: cardinalidad, top de categorías y gráfico de barras. El dataset tiene N filas...'); ya vivía duplicada en la entrada de glosario 'pagina_categorica'. El intro deja solo los términos clicables mínimos ([[term:entropia]] · [[term:pagina_categorica]]) bajo el heading 'Entropía y cardinalidad'. El total de filas del dataset sigue disponible por columna en la tabla de cardinalidad ('Total filas (dataset)').	2026-07-01 02:10:39 +02:00
egutierrez	cab0fbf0a3	feat(eda): CAP4/CAP5 distribuciones — párrafos al glosario, desc LLM + unidad por columna, donut→barras, PPT figura a la derecha CAP4 num_distr: - Mueve el párrafo introductorio largo del histograma/boxplot al glosario (nuevo término clicable "histograma_boxplot"); el cuerpo del capítulo solo nombra el término con [[term:histograma_boxplot]] y la explicación completa (código de colores, 1,5·IQR, lectura de asimetría) vive en la entrada del glosario. La información se traslada, no se pierde. - Añade por columna numérica la descripción de negocio del LLM y la unidad, leídas de profile['llm']['dictionary'] (empareja por nombre de columna). Sin bloque LLM el bloque de descripción se omite limpiamente. CAP5 cat_distr: - Mueve el párrafo "Cada columna categórica ocupa su propia página..." al glosario (nuevo término clicable "pagina_categorica"); el intro solo nombra los términos entropía y pagina_categorica. - Añade descripción LLM + unidad por columna (misma fuente que CAP4). - Cambia el donut/pie por gráfico de barras horizontales (nueva función del registry categorical_top_bar_figure_py_datascience, contrato de entrada idéntico al donut para swap directo) más su fallback inline de barras. - Marca cada Group de columna con layout="side_by_side": en PPTX la tabla de cardinalidad queda a la izquierda y la barra a la derecha; en PDF se apila (A5 estrecho). No toca los renderers — el soporte de layout ya existía. Glosario: - Catálogo canónico _BASELINE_TERMS con las definiciones de los dos términos nuevos; build_glosario completa la definición de un término registrado sin ella desde el catálogo (los chapters solo registran clave+label). Tests actualizados (donut→barras, side_by_side, LLM desc/unidad, glosario) y nueva función con sus tests. Suite del subsistema + acceptance verde.	2026-07-01 02:01:07 +02:00
egutierrez	7f304adc9c	merge(eda): render quality global — DPI 220, tablas anchas como imagen, layout side_by_side, indice clicable	2026-07-01 01:36:10 +02:00
egutierrez	a74a5a047f	feat(eda): render quality global — DPI 220, tablas anchas como imagen, layout side_by_side, índice clicable Mejoras transversales del motor AutomaticEDA (PDF + PPTX) sobre el modelo de bloques: 1. DPI alto global: toda figura/imagen embebida se rasteriza a 220 dpi (antes 150, y en PDF la página se guardaba a ~100 dpi re-rasterizando los imshow). En PDF se aplica savefig.dpi=220 a la página; el texto sigue vectorial y seleccionable. Permite ampliar en el móvil sin pixelar. Imagen embebida medida: ~1081px (antes ~492px). 2. Tabla ancha → imagen de alta resolución: cuando un DataTable tiene demasiadas columnas para ser legible como texto (criterio _table_fits_as_text), se dibuja entera como una imagen nítida (nueva función render_table_as_figure_py_datascience: cabecera sombreada + zebra) escalada para caber completa, de modo que el lector hace zoom y la lee sin perder datos. Las tablas que sí caben siguen como texto seleccionable / tabla nativa. Aplica en PDF y PPTX. El df.head de 19 columnas del dataset sintético ya no se corta: sale como imagen. 3. Group.layout: nuevo hint retrocompatible (default "stack"). "side_by_side" coloca la tabla a la izquierda (~55%) y la figura a la derecha (~45%) en la misma slide PPTX (cae a apilado si no hay par tabla+figura o no caben); en PDF se trata como "stack" (el ancho A5 móvil no admite dos columnas). Pensado para que el capítulo cat_distr ponga el gráfico al lado de la tabla en PPT. 4. Portada con índice clicable: la lista de capítulos pasa de "Este informe incluye..." (markdown) a un Heading "Índice" + un TocEntry por capítulo. El renderer registra el inicio de cada capítulo y cablea cada entrada como salto real (PDF: link GOTO PyMuPDF; PPTX: salto a slide nativo), reutilizando el mecanismo del glosario clicable. Modelo: Group gana `layout`; nuevo bloque TocEntry; normalizers y __init__ actualizados. Contrato: documentado en docs/automatic_eda_contract.md §11.4 (incluye el contrato exacto del campo layout para el agente de cat_distr). Tests: nuevo render_quality_test.py (13 golden: DPI alto real, tabla ancha→imagen PDF/PPTX, narrow→texto, side_by_side PPTX dos columnas / PDF apilado, índice clicable PDF+PPTX, retrocompatibilidad layout por defecto). render_features_test actualizado al índice nuevo. Suite: 188 passed (módulo) + 38 passed/1 skipped (acceptance + pipeline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 01:34:21 +02:00
egutierrez	44be1d6b58	merge(eda): cap2 overview enriquece diccionario y describe con descripcion+unidad del LLM	2026-07-01 01:14:37 +02:00
egutierrez	64306f3b1c	feat(eda): overview enriquece diccionario y describe con descripcion+unidad del LLM La tabla DICCIONARIO de columnas del capitulo overview gana columnas "Descripcion" y "Unidad", y la tabla DESCRIBE gana "Unidad", consumiendo profile['llm']['dictionary'] (entradas column/description/business_meaning/unit producidas por eda_llm_insights) emparejadas por nombre de columna. Lectura defensiva: sin bloque LLM (run_llm no corrio) las celdas degradan a "—" y las tablas siguen renderizando. No recalcula nada ni llama al LLM. CHAPTER_VERSION 1.1.0 -> 1.2.0. Tests: golden (descripcion+unidad pobladas para income), edge (sin LLM -> "—"), fallback ctx['llm'], y render PDF con las columnas nuevas visibles. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 01:13:02 +02:00
egutierrez	f2eb782a5f	merge(eda): portada v2 (sin Criterios, descripcion LLM, resumen a la derecha) + zebra global PDF + nombre PPTX grande/subrayado	2026-06-30 22:53:46 +02:00
egutierrez	80d10010f5	feat(eda): portada cap01 + zebra global y emphasis de render Itera el capítulo PORTADA del AutomaticEDA y dos mejoras globales de los renderers PDF/PPTX: 1. Zebra global (PDF): _place_kv_table ahora sombrea las filas pares igual que las DataTable, así toda tabla del documento queda rayada (no solo las DataTable). Mismo patrón coherente al partir/repetir cabecera. 2. Portada usa la descripción LLM rica (profile['llm']['summary']) cuando el perfil la tiene; se elimina del fallback derivado el texto ruido "active la interpretación LLM (run_llm)…". No fuerza llamadas LLM en el capítulo, solo consume profile['llm'] si está. 3. Se quita el bloque "Criterios de calidad" de la portada (PDF y PPTX); el score "Calidad" se mantiene. 4. "Resumen del análisis" (PDF): los valores se alinean al margen derecho via el nuevo KVTable.value_align="right". 5. Nombre del dataset en la portada PPTX más grande (44pt) y subrayado via los nuevos hints Heading.underline / Heading.size_pt (el PDF los ignora). Bump CHAPTER_VERSION de portada 1.2.0 -> 1.3.0. Verificado: suite 213 passed / 1 skipped (incl. aceptación de los 16 capítulos); golden zebra = 185 filas zebra en 13 capítulos del PDF completo; portada con run_llm sin "Criterios de calidad", con descripción LLM rica y valores a la derecha; PPTX con nombre 44pt subrayado; edge sin LLM cae al fallback derivado sin ruido; fn index sin error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 22:44:33 +02:00
egutierrez	ecc22d6d57	merge(eda): suite de aceptacion de los 16 capitulos (29 passed, rescatado de ejecutor con auth caida)	2026-06-30 22:07:21 +02:00
agent	7bdb8bffb5	test(eda): suite de aceptacion de los 16 capitulos del AutomaticEDA Bateria que blinda el subsistema: cobertura de los 16 capitulos sobre el dataset sintetico Faker, contenido esencial por capitulo (needles parametrizados), capitulos sueltos con resolucion de dependencias (only_chapters=[outliers] puebla IsolationForest sin run_models; timeseries; correlacion), None cuando no aplica, folder multi-tabla con FK, completitud del MD (matriz de correlacion completa + skew/kurtosis), 3 salidas no vacias, determinismo. Test full+LLM skippeable. 29 passed, 1 skipped. Sin hallazgos: los 16 capitulos salen como deben.	2026-06-30 22:07:15 +02:00
egutierrez	4139394326	merge(eda): only_chapters con resolucion automatica de dependencias de computo por capitulo	2026-06-30 21:37:16 +02:00
egutierrez	54a9ab70c7	feat(eda): render AutomaticEDA por capítulos sueltos con resolución de dependencias Permite renderizar un SUBCONJUNTO de capítulos del informe AutomaticEDA (only_chapters=[...]) para iterar/testear un capítulo concreto sin generar el documento entero, garantizando que el capítulo pedido SIEMPRE llegue poblado. - Nuevo módulo automatic_eda/chapter_deps.py: mapa central CHAPTER_DEPS (fuente de verdad) que declara, por capítulo de CHAPTER_ORDER, qué flags de cómputo (run_models/run_series/run_llm) y qué piezas de ctx (raw_numeric, timeseries_raw, geo_points, head_rows, db_path/table) necesita para no salir degradado. Helpers puros: resolve_requirements, resolve_profile_flags, needs_render_ctx, resolve_ctx_data_keys, validate_chapter_ids. - build_document(profile, ctx, only=None): parámetro only opcional que restringe el cuerpo a esos capítulos (portada primera + glosario última siempre). Lee la clave reservada ctx['_only_chapters'] cuando only es None, para propagar la selección a través de los renderers sin modificarlos. Retrocompatible. - render_automatic_eda(..., only_chapters=None): valida los ids (error claro dict-no-throw), resuelve las dependencias activando el cómputo necesario aunque el caller no lo pidiera (un flag explícito siempre prima) y construyendo solo las piezas de ctx que los capítulos pedidos leen (salta build_eda_render_ctx entero si ninguno necesita datos crudos). only_chapters=None produce el documento completo idéntico al de hoy. - Tests: chapter_deps_test.py (resolución pura), build_document_only_test.py (filtro), render_automatic_eda_only_test.py (golden con DuckDB: outliers suelto con IsolationForest poblado por resolución; timeseries activa run_series; eficiencia geospatial sin modelos; edge cases). - .md del pipeline: documenta only_chapters + emit_md; version 1.1.0 -> 1.2.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 21:35:46 +02:00
egutierrez	4773781323	merge(eda): generadores sinteticos Faker (tabla todo-en-uno + carpeta multi-tabla) que activan todos los capitulos	2026-06-30 21:26:20 +02:00
egutierrez	50c05d126c	merge(eda): capitulo OUTLIERS — univariante (Tukey/z) + multivariante (IsolationForest)	2026-06-30 21:15:05 +02:00
egutierrez	6f88f184f1	feat(eda): capítulo OUTLIERS — valores atípicos univariantes + multivariantes Nuevo capítulo dedicado `outliers` para el motor AutomaticEDA que reúne y profundiza en un solo sitio el análisis de valores atípicos, hoy disperso entre `num_distr` (conteo por columna) y `modelos` (IsolationForest). Se registra en `chapters_registry.py` entre `missingness` y `correlacion` (bloque de calidad de datos: calidad → missingness → outliers). Contenido del capítulo: - Resumen univariante por columna: nº y % de atípicos por Tukey (1.5·IQR) y por z-score (\|z\| > 3), con vallas inferior/superior y valores extremos. Ordenado por contaminación y marcando las columnas más afectadas. Reusa las funciones del registry `build_boxplot_stats` (vallas desde los percentiles del profile) y `detect_outliers` (regla z-score sobre la muestra cruda de `ctx`). - Boxplots de Tukey de las columnas más contaminadas (caja, bigotes y puntos atípicos), delegados a la función nueva `build_boxplots_figure`. - Multivariante: filas anómalas considerando todas las columnas a la vez con `isolation_forest_outliers` — nº y % de filas, las más anómalas con su score y las dimensiones que las hacen raras (top columnas por \|z\|, vía la función nueva `summarize_outlier_dims`). El detector se corre en vivo sobre `raw_numeric` para que el indexado de filas coincida exactamente con el de las dimensiones; cae al bloque precomputado del perfil cuando no hay muestra cruda (preset lite). - Interpretación exploratoria: un atípico no es necesariamente un error (distingue error de dato vs dato real extremo) y recomendaciones (revisar, winsorizar o re-expresar, enlazando con la re-expresión de Tukey del perfil). Términos clicables registrados en el glosario compartido: `outlier`, `tukey_fence`, `zscore`, `isolation_forest`. Funciones nuevas del registry (dominio datascience, grupo eda): - `build_boxplots_figure_py_datascience` (figure helper, impura) - `summarize_outlier_dims_py_datascience` (pura) El capítulo se activa con ≥1 columna numérica y devuelve None en su ausencia; lee todo defensivo y nunca lanza. Tests: capítulo (golden + edges + error path + render PDF/PPTX) y ambas funciones nuevas. Suite de no-regresión de AutomaticEDA verde. Verificado end-to-end con el dataset Titanic (Fare/Parch/SibSp como las columnas más contaminadas). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 21:12:40 +02:00
				`@@ -0,0 +1 @@`
				`{"total": 12367, "canal": 12368, "pago": 12369, "matriz": 12370, "evolutivo": 12371}`