feat(eda): overview enriquece diccionario y describe con descripcion+unidad del LLM

La tabla DICCIONARIO de columnas del capitulo overview gana columnas "Descripcion" y "Unidad", y la tabla DESCRIBE gana "Unidad", consumiendo profile['llm']['dictionary'] (entradas column/description/business_meaning/unit producidas por eda_llm_insights) emparejadas por nombre de columna. Lectura defensiva: sin bloque LLM (run_llm no corrio) las celdas degradan a "—" y las tablas siguen renderizando. No recalcula nada ni llama al LLM. CHAPTER_VERSION 1.1.0 -> 1.2.0. Tests: golden (descripcion+unidad pobladas para income), edge (sin LLM -> "—"), fallback ctx['llm'], y render PDF con las columnas nuevas visibles. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
merge(eda): portada v2 (sin Criterios, descripcion LLM, resumen a la derecha) + zebra global PDF + nombre PPTX grande/subrayado
2026-07-01 01:13:02 +02:00 · 2026-06-30 22:53:46 +02:00 · 2026-06-30 22:44:33 +02:00 · 2026-06-30 22:07:21 +02:00 · 2026-06-30 22:07:15 +02:00 · 2026-06-30 21:37:16 +02:00
22 changed files with 3296 additions and 50 deletions
@@ -0,0 +1,109 @@
+"""Tests del filtro `only` de build_document (selección de capítulos).
+
+Verifican que:
+  - only=None mantiene el comportamiento histórico (todos los capítulos).
+  - only=[ids] restringe el CUERPO a esos ids, pero portada (primera) y glosario
+    (última) están SIEMPRE presentes.
+  - only=[] produce el documento mínimo (solo portada + glosario).
+  - la selección también viaja por la clave reservada ctx['_only_chapters']
+    (el canal que usan los renderers, que llaman build_document sin `only`), y
+    esa clave nunca se filtra a los capítulos.
+"""
+
+import os
+import sys
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+from datascience.automatic_eda import build_document  # noqa: E402
+
+
+def _profile_with_cat_and_num():
+    """Perfil mínimo que hace construir cat_distr y num_distr (cuerpo no vacío)."""
+    return {
+        "table": "ventas", "n_rows": 120, "n_cols": 2, "quality_score": 91,
+        "duplicate_pct": 1.5, "null_cell_pct": 0.8,
+        "columns": [
+            {"name": "region", "inferred_type": "categorical",
+             "categorical": {
+                 "top": [{"value": "norte", "count": 50, "pct": 0.42},
+                         {"value": "sur", "count": 40, "pct": 0.33},
+                         {"value": "este", "count": 30, "pct": 0.25}],
+                 "mode": "norte", "n_distinct": 3, "entropy": 1.55,
+                 "imbalance": 0.1}},
+            {"name": "importe", "inferred_type": "numeric",
+             "numeric": {"mean": 50.0, "median": 48.0, "std": 10.0,
+                         "min": 10, "max": 99, "iqr": 15,
+                         "histogram": [{"lo": 0, "hi": 50, "count": 40},
+                                       {"lo": 50, "hi": 100, "count": 80}]}},
+        ],
+    }
+
+
+def test_only_none_is_full_document():
+    """Retro-compat: sin `only`, salen todos los capítulos aplicables."""
+    chs = build_document(_profile_with_cat_and_num(), ctx={"dataset_name": "v"})
+    ids = [c.id for c in chs]
+    assert ids[0] == "portada"
+    assert ids[-1] == "glosario"
+    # El cuerpo trae las distribuciones (cat/num), no solo portada+glosario.
+    assert "num_distr" in ids
+    assert "cat_distr" in ids
+
+
+def test_only_restricts_body_but_keeps_cover_and_glossary():
+    # cat_distr registra el término "entropía" en el glosario, así que el
+    # glosario (destino del término clicable) aparece — demuestra el contrato
+    # "portada primera + capítulo + glosario última".
+    chs = build_document(_profile_with_cat_and_num(),
+                         ctx={"dataset_name": "v"}, only=["cat_distr"])
+    ids = [c.id for c in chs]
+    assert ids[0] == "portada", f"portada no es la primera: {ids}"
+    assert ids[-1] == "glosario", f"glosario no es la última: {ids}"
+    assert "cat_distr" in ids
+    # num_distr quedó fuera de la selección.
+    assert "num_distr" not in ids
+
+
+def test_only_empty_yields_minimal_document():
+    # only=[] -> cuerpo vacío. La portada está siempre; el glosario solo aparece
+    # si algún capítulo registró términos (patrón preexistente: glosario vacío se
+    # omite). Sin cuerpo no hay términos → documento mínimo = solo portada.
+    chs = build_document(_profile_with_cat_and_num(),
+                         ctx={"dataset_name": "v"}, only=[])
+    ids = [c.id for c in chs]
+    assert ids == ["portada"], \
+        f"only=[] debe dar el documento mínimo (solo portada), no {ids}"
+
+
+def test_selection_via_reserved_ctx_key():
+    """La selección viaja por ctx['_only_chapters'] cuando no se pasa `only`."""
+    chs = build_document(_profile_with_cat_and_num(),
+                         ctx={"dataset_name": "v",
+                              "_only_chapters": ["cat_distr"]})
+    ids = [c.id for c in chs]
+    assert "cat_distr" in ids
+    assert "num_distr" not in ids
+    assert ids[0] == "portada" and ids[-1] == "glosario"
+
+
+def test_explicit_only_arg_wins_over_ctx_key():
+    """Si se pasan ambos, el argumento `only` manda sobre la clave del ctx."""
+    chs = build_document(_profile_with_cat_and_num(),
+                         ctx={"dataset_name": "v",
+                              "_only_chapters": ["cat_distr"]},
+                         only=["num_distr"])
+    ids = [c.id for c in chs]
+    assert "num_distr" in ids
+    assert "cat_distr" not in ids
+
+
+def test_reserved_key_not_leaked_to_caller_ctx():
+    """build_document no muta el ctx del caller (copia interna)."""
+    ctx = {"dataset_name": "v", "_only_chapters": ["num_distr"]}
+    build_document(_profile_with_cat_and_num(), ctx=ctx)
+    # La clave reservada sigue en el dict del caller (no se mutó su copia).
+    assert ctx["_only_chapters"] == ["num_distr"]
@@ -0,0 +1,205 @@
+"""chapter_deps — mapa central de dependencias de cómputo por capítulo del EDA.
+
+Fuente de verdad ÚNICA de qué necesita cada capítulo de ``CHAPTER_ORDER`` para
+computarse COMPLETO (sin caer en su rama degradada "datos insuficientes"). Lo
+consume el pipeline ``render_automatic_eda`` cuando se le pide renderizar un
+SUBCONJUNTO de capítulos (kwarg ``only_chapters``): antes de perfilar, resuelve
+los requisitos de los capítulos pedidos y activa SOLO el cómputo que esos
+capítulos necesitan, de modo que un capítulo suelto siempre llegue poblado y a la
+vez no se malgaste CPU/LLM en piezas que ningún capítulo pedido usa.
+
+Diseño: el mapa es CENTRAL (este módulo), NO una constante por capítulo. Así se
+evita tocar los ``chapters/<id>.py`` (cada agente es dueño de su capítulo) y se
+elimina el riesgo de colisión entre ramas. Si un capítulo cambia lo que lee del
+``profile``/``ctx``, se actualiza ESTE mapa — es donde el motor mira.
+
+Dos clases de dependencia, derivadas inspeccionando qué lee cada capítulo:
+
+  - ``profile_flags``: flags de coste de ``profile_table`` que hay que ACTIVAR
+    para que el ``profile`` traiga el bloque que el capítulo lee. Son los caros:
+      * ``run_models``  -> ``profile['models']`` (KMeans/IsolationForest/PCA).
+        Lo leen ``outliers`` (fallback del multivariante) y ``modelos``.
+      * ``run_series``  -> ``profile['series']`` (análisis de serie temporal).
+        Lo lee ``timeseries``.
+      * ``run_llm``     -> ``profile['llm']`` (interpretación del modelo).
+        Lo lee ``analisis_llm``.
+
+  - ``ctx``: etiquetas de las piezas de DATOS CRUDOS que construye
+    ``build_eda_render_ctx`` y que el capítulo lee del ``ctx``. Si la lista está
+    vacía, el capítulo no necesita datos crudos y el pipeline puede saltarse
+    ``build_eda_render_ctx`` por completo cuando ningún capítulo pedido los pide.
+    Etiquetas y claves reales que mapean (ver ``CTX_LABEL_TO_KEYS``):
+      * ``head_rows``      -> ``ctx['head_rows']``      (overview: df.head real).
+      * ``raw_numeric``    -> ``ctx['raw_numeric']``    (outliers/modelos/
+        correlacion/missingness/geospatial: muestra numérica alineada por fila).
+      * ``timeseries_raw`` -> ``ctx['timeseries_raw']`` (timeseries: serie cruda).
+      * ``geo_points``     -> ``ctx['geo_points']`` (+ ``raw_numeric``)
+        (geospatial: lat/lon).
+      * ``db_path_table``  -> ``ctx['db_path']`` + ``ctx['table']`` (agregacion/
+        text_distr/missingness/relaciones: push-down de queries propias).
+
+``portada`` y ``glosario`` NO son opcionales: el pipeline los incluye SIEMPRE
+(la portada resume el documento y el glosario es el destino de los términos
+clicables), así que aquí se declaran sin requisitos de cómputo.
+
+Todas las funciones de este módulo son PURAS (no I/O, deterministas): se prestan
+a test unitario directo.
+"""
+
+from __future__ import annotations
+
+# Mapa central. Una entrada por id de CHAPTER_ORDER. ``profile_flags`` lista los
+# flags de coste a activar; ``ctx`` las etiquetas de datos crudos que lee. Las
+# claves vacías significan "no necesita ese tipo de dependencia".
+CHAPTER_DEPS = {
+    # Portada y glosario: SIEMPRE presentes, sin cómputo propio (la portada lee
+    # el document_summary que arma build_document; el glosario lee los términos
+    # que el resto registró). Se declaran para que el mapa cubra CHAPTER_ORDER
+    # entero y la validación los reconozca.
+    "portada":      {"profile_flags": [], "ctx": []},
+    "overview":     {"profile_flags": [], "ctx": ["head_rows"]},
+    "analisis_llm": {"profile_flags": ["run_llm"], "ctx": []},
+    "num_distr":    {"profile_flags": [], "ctx": []},
+    "cat_distr":    {"profile_flags": [], "ctx": []},
+    # text_distr empuja su propia query de texto (no usa raw_numeric); necesita
+    # db_path/table en el ctx para hacerlo.
+    "text_distr":   {"profile_flags": [], "ctx": ["db_path_table"]},
+    "calidad":      {"profile_flags": [], "ctx": []},
+    # missingness lee la muestra numérica cruda (co-ocurrencia de ausencias) y
+    # puede empujar una query de patrón de nulos con db_path/table.
+    "missingness":  {"profile_flags": [], "ctx": ["raw_numeric", "db_path_table"]},
+    # outliers corre IsolationForest EN VIVO sobre ctx['raw_numeric']; run_models
+    # asegura además el fallback profile['models']['outliers'] si el ctx faltara.
+    "outliers":     {"profile_flags": ["run_models"], "ctx": ["raw_numeric"]},
+    "correlacion":  {"profile_flags": [], "ctx": ["raw_numeric"]},
+    "relaciones":   {"profile_flags": [], "ctx": ["db_path_table"]},
+    "modelos":      {"profile_flags": ["run_models"], "ctx": ["raw_numeric"]},
+    "timeseries":   {"profile_flags": ["run_series"], "ctx": ["timeseries_raw"]},
+    "geospatial":   {"profile_flags": [], "ctx": ["geo_points", "raw_numeric"]},
+    "agregacion":   {"profile_flags": [], "ctx": ["db_path_table"]},
+    "glosario":     {"profile_flags": [], "ctx": []},
+}
+
+# Capítulos que el documento incluye SIEMPRE, independientemente de only_chapters.
+ALWAYS_PRESENT = ("portada", "glosario")
+
+# Flags de coste reconocidos (el orden no importa; se devuelven como set).
+KNOWN_PROFILE_FLAGS = ("run_models", "run_series", "run_llm")
+
+# Mapeo de cada etiqueta de ctx a las claves REALES que produce
+# build_eda_render_ctx. ``db_path_table`` es especial: db_path/table siempre se
+# ponen para un backend válido y son inofensivos, por eso no se podan nunca (no
+# aparecen en DATA_CTX_KEYS). El resto (head_rows/raw_numeric/timeseries_raw/
+# geo_points) son las piezas de datos podables.
+CTX_LABEL_TO_KEYS = {
+    "head_rows":      {"head_rows"},
+    "raw_numeric":    {"raw_numeric"},
+    "timeseries_raw": {"timeseries_raw"},
+    "geo_points":     {"geo_points", "raw_numeric"},
+    "db_path_table":  set(),  # db_path/table siempre presentes; nunca se podan.
+}
+
+# Claves de datos crudos del ctx que se pueden podar cuando ningún capítulo
+# pedido las necesita (las que cuestan muestreo). db_path/table NO entran aquí.
+DATA_CTX_KEYS = ("head_rows", "raw_numeric", "timeseries_raw", "geo_points")
+
+
+def _as_id_list(chapter_ids):
+    """Normaliza la entrada a una lista de ids string, defensiva. None -> []."""
+    if chapter_ids is None:
+        return []
+    if isinstance(chapter_ids, str):
+        return [chapter_ids]
+    return [c for c in chapter_ids if isinstance(c, str)]
+
+
+def validate_chapter_ids(chapter_ids, order):
+    """Separa los ids pedidos en válidos y desconocidos respecto a ``order``.
+
+    Args:
+        chapter_ids: lista (o str) de ids de capítulo pedidos.
+        order: lista canónica de ids válidos (CHAPTER_ORDER).
+
+    Returns:
+        dict ``{"valid": [...], "unknown": [...]}`` preservando el orden de
+        aparición de la entrada. Función pura.
+    """
+    valid_set = set(order or [])
+    valid, unknown = [], []
+    for cid in _as_id_list(chapter_ids):
+        (valid if cid in valid_set else unknown).append(cid)
+    return {"valid": valid, "unknown": unknown}
+
+
+def resolve_requirements(chapter_ids):
+    """Une los requisitos de cómputo de los capítulos pedidos.
+
+    Es el corazón de la resolución de dependencias: dado el subconjunto de
+    capítulos a renderizar, devuelve TODO lo que hay que activar/construir para
+    que esos capítulos lleguen COMPLETOS, y solo eso.
+
+    Los capítulos ``ALWAYS_PRESENT`` (portada/glosario) se añaden implícitamente
+    porque el pipeline siempre los incluye; como no tienen requisitos, no alteran
+    el resultado, pero se contemplan para que el conjunto sea coherente.
+
+    Args:
+        chapter_ids: lista (o str) de ids de capítulo. Ids desconocidos se
+            ignoran silenciosamente (la validación estricta es de quien llama).
+            None o lista vacía -> requisitos vacíos.
+
+    Returns:
+        dict ``{"profile_flags": set[str], "ctx_keys": set[str]}`` donde
+        ``ctx_keys`` son las ETIQUETAS de ctx (no las claves reales). Función
+        pura.
+    """
+    ids = set(_as_id_list(chapter_ids)) | set(ALWAYS_PRESENT)
+    profile_flags = set()
+    ctx_keys = set()
+    for cid in ids:
+        dep = CHAPTER_DEPS.get(cid)
+        if not isinstance(dep, dict):
+            continue
+        for f in dep.get("profile_flags", []) or []:
+            if f in KNOWN_PROFILE_FLAGS:
+                profile_flags.add(f)
+        for k in dep.get("ctx", []) or []:
+            ctx_keys.add(k)
+    return {"profile_flags": profile_flags, "ctx_keys": ctx_keys}
+
+
+def resolve_profile_flags(chapter_ids):
+    """Atajo: solo el set de profile_flags a activar para los capítulos pedidos.
+
+    Función pura. Devuelve un set ⊆ KNOWN_PROFILE_FLAGS.
+    """
+    return resolve_requirements(chapter_ids)["profile_flags"]
+
+
+def needs_render_ctx(chapter_ids):
+    """True si algún capítulo pedido necesita datos crudos del ctx.
+
+    Cuando es False, el pipeline puede saltarse ``build_eda_render_ctx`` entero
+    (ahorro real de CPU/I/O): los capítulos pedidos no leen ninguna pieza de
+    datos crudos. Función pura.
+    """
+    return bool(resolve_requirements(chapter_ids)["ctx_keys"])
+
+
+def resolve_ctx_data_keys(chapter_ids):
+    """Claves REALES de datos del ctx a CONSERVAR para los capítulos pedidos.
+
+    Traduce las etiquetas de ctx a las claves concretas que produce
+    ``build_eda_render_ctx`` (head_rows/raw_numeric/timeseries_raw/geo_points).
+    El pipeline poda del ctx las claves de datos que NO estén en este set, para
+    que un capítulo suelto no arrastre piezas de datos que no usa. db_path/table
+    nunca se podan (no aparecen aquí). Función pura.
+
+    Returns:
+        set[str] subconjunto de DATA_CTX_KEYS.
+    """
+    req = resolve_requirements(chapter_ids)
+    keep = set()
+    for label in req["ctx_keys"]:
+        keep |= CTX_LABEL_TO_KEYS.get(label, set())
+    # Solo claves de datos podables (db_path/table se gestionan aparte).
+    return {k for k in keep if k in DATA_CTX_KEYS}
@@ -0,0 +1,160 @@
+"""Tests del mapa central de dependencias por capítulo (chapter_deps).
+
+Todas las funciones bajo prueba son PURAS (sin I/O): se ejercitan directamente
+sin DuckDB ni renderizado. Cubren la resolución de requisitos (golden + edges),
+la validación de ids y los helpers de eficiencia (qué cómputo se salta).
+"""
+
+import os
+import sys
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+from datascience.automatic_eda.chapter_deps import (  # noqa: E402
+    ALWAYS_PRESENT,
+    CHAPTER_DEPS,
+    DATA_CTX_KEYS,
+    needs_render_ctx,
+    resolve_ctx_data_keys,
+    resolve_profile_flags,
+    resolve_requirements,
+    validate_chapter_ids,
+)
+from datascience.automatic_eda.chapters_registry import CHAPTER_ORDER  # noqa: E402
+
+
+# --------------------------------------------------------------------------- #
+# El mapa cubre CHAPTER_ORDER entero (sin huecos ni claves de más).
+# --------------------------------------------------------------------------- #
+def test_chapter_deps_covers_every_chapter_in_order():
+    assert set(CHAPTER_DEPS) == set(CHAPTER_ORDER), (
+        "CHAPTER_DEPS debe declarar exactamente los ids de CHAPTER_ORDER")
+    # Cada entrada tiene la forma esperada.
+    for cid, dep in CHAPTER_DEPS.items():
+        assert isinstance(dep.get("profile_flags"), list), cid
+        assert isinstance(dep.get("ctx"), list), cid
+
+
+# --------------------------------------------------------------------------- #
+# resolve_requirements — golden: outliers exige run_models + raw_numeric.
+# --------------------------------------------------------------------------- #
+def test_resolve_outliers_requires_run_models_and_raw_numeric():
+    req = resolve_requirements(["outliers"])
+    assert "run_models" in req["profile_flags"]
+    assert "raw_numeric" in req["ctx_keys"]
+    assert "run_series" not in req["profile_flags"]
+    assert "run_llm" not in req["profile_flags"]
+
+
+def test_resolve_timeseries_requires_run_series():
+    req = resolve_requirements(["timeseries"])
+    assert req["profile_flags"] == {"run_series"}
+    assert "timeseries_raw" in req["ctx_keys"]
+
+
+def test_resolve_analisis_llm_requires_run_llm():
+    assert resolve_requirements(["analisis_llm"])["profile_flags"] == {"run_llm"}
+
+
+def test_resolve_union_of_several_chapters():
+    req = resolve_requirements(["outliers", "timeseries", "analisis_llm"])
+    assert req["profile_flags"] == {"run_models", "run_series", "run_llm"}
+
+
+# --------------------------------------------------------------------------- #
+# Eficiencia: capítulos que NO necesitan flags caros no los activan.
+# --------------------------------------------------------------------------- #
+def test_resolve_geospatial_needs_no_cost_flags():
+    """geospatial sale de geo_points/raw_numeric del ctx, NO de los modelos."""
+    req = resolve_requirements(["geospatial"])
+    assert req["profile_flags"] == set(), \
+        "geospatial no debe activar run_models/run_series/run_llm"
+    assert "geo_points" in req["ctx_keys"]
+
+
+def test_resolve_correlacion_needs_raw_numeric_but_no_models():
+    req = resolve_requirements(["correlacion"])
+    assert req["profile_flags"] == set()
+    assert "raw_numeric" in req["ctx_keys"]
+
+
+def test_always_present_chapters_add_no_requirements():
+    """portada y glosario están siempre, pero no arrastran cómputo."""
+    for cid in ALWAYS_PRESENT:
+        req = resolve_requirements([cid])
+        assert req["profile_flags"] == set()
+        assert req["ctx_keys"] == set()
+
+
+def test_resolve_profile_flags_shortcut():
+    assert resolve_profile_flags(["modelos"]) == {"run_models"}
+    assert resolve_profile_flags(["num_distr"]) == set()
+
+
+# --------------------------------------------------------------------------- #
+# needs_render_ctx — cuándo se puede saltar build_eda_render_ctx por completo.
+# --------------------------------------------------------------------------- #
+def test_needs_render_ctx_true_when_chapter_reads_raw_data():
+    assert needs_render_ctx(["outliers"]) is True
+    assert needs_render_ctx(["agregacion"]) is True  # db_path/table push-down
+    assert needs_render_ctx(["timeseries"]) is True
+
+
+def test_needs_render_ctx_false_for_purely_aggregated_chapters():
+    """num_distr / cat_distr / calidad solo leen el profile agregado."""
+    assert needs_render_ctx(["num_distr"]) is False
+    assert needs_render_ctx(["cat_distr", "calidad"]) is False
+
+
+# --------------------------------------------------------------------------- #
+# resolve_ctx_data_keys — poda: qué claves de DATOS conservar (db_path/table no).
+# --------------------------------------------------------------------------- #
+def test_resolve_ctx_data_keys_outliers_keeps_only_raw_numeric():
+    assert resolve_ctx_data_keys(["outliers"]) == {"raw_numeric"}
+
+
+def test_resolve_ctx_data_keys_geospatial_keeps_geo_and_numeric():
+    assert resolve_ctx_data_keys(["geospatial"]) == {"geo_points", "raw_numeric"}
+
+
+def test_resolve_ctx_data_keys_aggregation_keeps_nothing_prunable():
+    """agregacion usa db_path/table (siempre presentes), 0 claves podables."""
+    assert resolve_ctx_data_keys(["agregacion"]) == set()
+
+
+def test_resolve_ctx_data_keys_subset_of_data_keys():
+    keep = resolve_ctx_data_keys(["overview", "timeseries", "geospatial"])
+    assert keep <= set(DATA_CTX_KEYS)
+    assert {"head_rows", "timeseries_raw", "geo_points", "raw_numeric"} == keep
+
+
+# --------------------------------------------------------------------------- #
+# validate_chapter_ids — separa válidos de desconocidos preservando orden.
+# --------------------------------------------------------------------------- #
+def test_validate_separates_known_and_unknown():
+    out = validate_chapter_ids(["outliers", "nope", "timeseries", "ghost"],
+                               CHAPTER_ORDER)
+    assert out["valid"] == ["outliers", "timeseries"]
+    assert out["unknown"] == ["nope", "ghost"]
+
+
+def test_validate_all_known():
+    out = validate_chapter_ids(["portada", "glosario"], CHAPTER_ORDER)
+    assert out["unknown"] == []
+
+
+# --------------------------------------------------------------------------- #
+# Robustez: entradas raras nunca lanzan.
+# --------------------------------------------------------------------------- #
+def test_resolve_handles_none_and_empty():
+    assert resolve_requirements(None)["profile_flags"] == set()
+    assert resolve_requirements([])["profile_flags"] == set()
+    # ids desconocidos se ignoran silenciosamente en la resolución.
+    assert resolve_requirements(["no_existe"])["ctx_keys"] == set()
+
+
+def test_resolve_accepts_single_string():
+    assert resolve_requirements("outliers")["profile_flags"] == {"run_models"}
@@ -0,0 +1,593 @@
+"""Outliers chapter (OUTLIERS) — univariate + multivariate atypical values.
+
+Today the analysis of atypical values is scattered across the document: the
+NUM DISTR chapter mentions the per-column outlier count inside each distribution
+figure, and the MODELOS chapter runs Isolation Forest as one of several cheap
+models. This chapter gathers and deepens the whole outlier story in a single
+place, with its interpretation: an [[term:outlier]]outlier[[/term]] is **not
+necessarily an error** — it can be a legitimate, extreme but real observation —
+so the reading is exploratory (what to look at), never confirmatory (what to
+delete).
+
+Sections, in order:
+
+1. **Resumen univariante por columna** — for every numeric column, the number
+   and percentage of atypical values by two complementary criteria: Tukey's
+   1.5·IQR rule ([[term:tukey_fence]]vallas de Tukey[[/term]]) and the
+   [[term:zscore]]z-score[[/term]] rule (|z| > 3). The most contaminated columns
+   are flagged. The fences come from the pure registry function
+   ``build_boxplot_stats`` (derived from the profile percentiles); the per-column
+   counts use the raw sample in ``ctx['raw_numeric']`` when available (the exact
+   count), degrading to the profile's own z-score counts otherwise.
+2. **Boxplots** — a single figure with the Tukey boxplots of the most
+   contaminated columns (box, whiskers and atypical points), delegated to the
+   reusable registry helper ``build_boxplots_figure``.
+3. **Multivariante (filas anómalas)** — rows that are atypical considering ALL
+   columns at once, via the registry function ``isolation_forest_outliers``: the
+   count and percentage of anomalous rows, the most anomalous rows with their
+   score, and the dimensions that make each one rare (top columns by |z|, via
+   ``summarize_outlier_dims``). Run live on ``ctx['raw_numeric']`` (the same
+   numeric columns ``summarize_outlier_dims`` uses, so the row indexing stays
+   coherent and the dimension breakdown is correct); falls back to the
+   precomputed ``profile['models']['outliers']`` only when no raw sample is
+   available (e.g. the lite preset), where no per-row breakdown is shown.
+4. **Interpretación** — outlier ≠ error: how to tell a data-entry error from a
+   genuine extreme value, and what to do (inspect, winsorize, or re-express —
+   linking to the Tukey re-expression the profile already computes).
+
+The chapter activates whenever the table has at least one numeric column; with
+no numeric column it returns ``None`` and disappears from the document.
+
+Reads everything defensively (``.get``) and never raises: every registry
+delegation is imported lazily and degraded to an honest note on any failure.
+
+Contract: build_<id>(profile, ctx) -> Chapter | None ; CHAPTER_VERSION = "x.y.z".
+"""
+
+from __future__ import annotations
+
+from .. import model
+
+CHAPTER_VERSION = "1.0.0"
+CHAPTER_ID = "outliers"
+CHAPTER_TITLE = "Valores atípicos"
+
+# z-score threshold for the univariate z rule: |z| > 3 flags a value ~3 standard
+# deviations from the mean (≈99.7% of a normal distribution lies within ±3σ).
+_Z_THRESH = 3.0
+# How many columns to draw in the boxplots figure (most contaminated first) and
+# how many anomalous rows to list in the multivariate table.
+_TOP_BOX = 12
+_TOP_ROWS = 12
+# Cap on the raw atypical values passed as boxplot fliers, so a heavy-tailed
+# column does not flood the figure with thousands of points.
+_MAX_FLIERS = 200
+# How many columns flagged as "most contaminated" in the summary note.
+_TOP_FLAGGED = 3
+
+# Glossary terms this chapter explains (contract §11.1). Registered in the shared
+# collector and marked clickable on first appearance. ``isolation_forest`` and
+# ``zscore`` may also be registered by the MODELOS chapter — ``add`` is
+# idempotent (first definition wins), so registering them here is harmless and
+# keeps this chapter self-contained when MODELOS does not render.
+_TERM_DEFS = {
+    "outlier": (
+        "Valor atípico (outlier)",
+        "Una observación que se aparta mucho del grueso de los datos. Un atípico "
+        "NO es necesariamente un error: puede ser un fallo de medida o de "
+        "registro, pero también un dato real extremo (un cliente que gasta diez "
+        "veces la media, un día de ventas excepcional). Por eso se señalan para "
+        "revisarlos, no para borrarlos automáticamente.",
+    ),
+    "tukey_fence": (
+        "Vallas de Tukey (1,5·IQR)",
+        "Regla clásica para marcar atípicos a partir de los cuartiles: se calcula "
+        "el rango intercuartílico IQR = P75 − P25 y se trazan dos vallas, una "
+        "inferior en P25 − 1,5·IQR y otra superior en P75 + 1,5·IQR. Los valores "
+        "que caen fuera de esas vallas se consideran atípicos. Es robusta porque "
+        "se apoya en la mediana y los cuartiles, no en la media.",
+    ),
+    "zscore": (
+        "z-score (puntuación típica)",
+        "Mide a cuántas desviaciones típicas está un valor de la media de su "
+        "columna: z = (valor − media) / desviación típica. Un |z| grande (aquí > "
+        "3) señala un valor alejado del centro. A diferencia de las vallas de "
+        "Tukey, el z-score usa media y desviación, así que es más sensible a la "
+        "presencia de los propios atípicos.",
+    ),
+    "isolation_forest": (
+        "Isolation Forest (anomalías multivariantes)",
+        "Algoritmo de detección de anomalías que considera TODAS las columnas a "
+        "la vez: construye árboles que parten el espacio con cortes aleatorios y "
+        "mide cuántos cortes hacen falta para aislar cada fila. Las filas raras "
+        "se aíslan con muy pocos cortes y se marcan como atípicas según un umbral "
+        "de contaminación. Detecta combinaciones de valores poco frecuentes que "
+        "ninguna columna por separado revelaría.",
+    ),
+}
+
+
+# --------------------------------------------------------------------------- #
+# Lazy registry delegations (each degrades to None / no-op on any failure).
+# --------------------------------------------------------------------------- #
+def _load_build_boxplot_stats():
+    try:
+        from datascience.build_boxplot_stats import build_boxplot_stats
+        return build_boxplot_stats
+    except Exception:  # noqa: BLE001
+        return None
+
+
+def _load_detect_outliers():
+    # detect_outliers lives in the monolithic ``datascience.datascience`` module
+    # (file_path datascience.py), not in its own submodule — try both shapes.
+    try:
+        from datascience.datascience import detect_outliers
+        return detect_outliers
+    except Exception:  # noqa: BLE001
+        try:
+            from datascience import detect_outliers
+            return detect_outliers
+        except Exception:  # noqa: BLE001
+            return None
+
+
+def _load_isolation_forest():
+    try:
+        from datascience.isolation_forest_outliers import isolation_forest_outliers
+        return isolation_forest_outliers
+    except Exception:  # noqa: BLE001
+        return None
+
+
+def _load_summarize_dims():
+    try:
+        from datascience.summarize_outlier_dims import summarize_outlier_dims
+        return summarize_outlier_dims
+    except Exception:  # noqa: BLE001
+        return None
+
+
+# --------------------------------------------------------------------------- #
+# Defensive formatters (own copy: the chapter never imports siblings).
+# --------------------------------------------------------------------------- #
+def _fmt_num(value, decimals: int = 3) -> str:
+    if value is None:
+        return "—"
+    if isinstance(value, bool):
+        return "sí" if value else "no"
+    if isinstance(value, int):
+        return f"{value:,}".replace(",", ".")
+    if isinstance(value, float):
+        if value != value:  # NaN
+            return "—"
+        if value in (float("inf"), float("-inf")):
+            return str(value)
+        text = f"{value:.{decimals}f}".rstrip("0").rstrip(".")
+        return text if text else "0"
+    return model._safe_str(value)
+
+
+def _fmt_int(value) -> str:
+    if value is None:
+        return "—"
+    try:
+        return f"{int(round(float(value))):,}".replace(",", ".")
+    except (TypeError, ValueError):
+        return model._safe_str(value)
+
+
+def _fmt_pct(value, decimals: int = 2) -> str:
+    """Format an already-0-100 value as a percentage. None -> placeholder."""
+    if value is None:
+        return "—"
+    try:
+        return f"{float(value):.{decimals}f}%"
+    except (TypeError, ValueError):
+        return model._safe_str(value)
+
+
+def _term(mark: bool, key: str, text: str) -> str:
+    return f"[[term:{key}]]{text}[[/term]]" if mark else text
+
+
+def _is_dict(v) -> bool:
+    return isinstance(v, dict)
+
+
+# --------------------------------------------------------------------------- #
+# Profile reads.
+# --------------------------------------------------------------------------- #
+def _numeric_columns(profile: dict) -> list:
+    """Return [(name, numeric_dict)] for numeric columns with usable stats."""
+    out = []
+    for col in profile.get("columns") or []:
+        if not isinstance(col, dict):
+            continue
+        if col.get("inferred_type") != "numeric":
+            continue
+        num = col.get("numeric")
+        if not isinstance(num, dict) or not num:
+            continue
+        if num.get("mean") is None and num.get("median") is None:
+            continue
+        out.append((col.get("name") or "(columna)", num))
+    return out
+
+
+def _clean_values(raw):
+    """Return the finite float values of a raw column list (drop None/NaN/inf)."""
+    if not isinstance(raw, (list, tuple)):
+        return None
+    vals = []
+    for v in raw:
+        if v is None or isinstance(v, bool):
+            continue
+        try:
+            f = float(v)
+        except (TypeError, ValueError):
+            continue
+        if f != f or f in (float("inf"), float("-inf")):
+            continue
+        vals.append(f)
+    return vals
+
+
+# --------------------------------------------------------------------------- #
+# Per-column univariate summary.
+# --------------------------------------------------------------------------- #
+def _univariate_row(name, numeric, raw_vals, box_fn, detect_fn):
+    """Compute one univariate summary row + boxplot inputs for a column.
+
+    Returns a dict with the table cells and, when raw values are available, the
+    exact Tukey/z counts and the list of atypical (flier) values; otherwise it
+    degrades to the profile's own z-score counts and the fence flags.
+    """
+    box = {}
+    if box_fn is not None:
+        try:
+            box = box_fn(numeric) or {}
+        except Exception:  # noqa: BLE001
+            box = {}
+    lf = box.get("lower_fence")
+    uf = box.get("upper_fence")
+
+    vals = _clean_values(raw_vals)
+    n_tukey = pct_tukey = None
+    n_z = pct_z = None
+    low_extreme = high_extreme = None
+    fliers = []
+    contamination = None  # metric used to rank columns (prefer Tukey %).
+
+    if vals:
+        n = len(vals)
+        tukey_out = []
+        for v in vals:
+            below = (lf is not None and v < lf)
+            above = (uf is not None and v > uf)
+            if below or above:
+                tukey_out.append(v)
+        n_tukey = len(tukey_out)
+        pct_tukey = 100.0 * n_tukey / n if n else None
+        if tukey_out:
+            low_extreme = min(tukey_out)
+            high_extreme = max(tukey_out)
+            fliers = tukey_out[:_MAX_FLIERS]
+        # z-score rule via the registry function (returns parallel bools).
+        if detect_fn is not None:
+            try:
+                flags = detect_fn(vals, _Z_THRESH) or []
+                n_z = int(sum(1 for b in flags if b))
+                pct_z = 100.0 * n_z / n if n else None
+            except Exception:  # noqa: BLE001
+                n_z = pct_z = None
+        contamination = pct_tukey
+    else:
+        # Degrade: no raw sample for this column. The profile's own outlier
+        # count/pct come from the z-score block (build_boxplot_stats note); the
+        # Tukey count is unknown, only the fence flags are.
+        n_z = numeric.get("n_outliers")
+        pct_z = numeric.get("outlier_pct")
+        if box.get("has_low_outliers") and box.get("min") is not None:
+            low_extreme = box.get("min")
+        if box.get("has_high_outliers") and box.get("max") is not None:
+            high_extreme = box.get("max")
+        contamination = pct_z if isinstance(pct_z, (int, float)) else None
+
+    # Compact "extremos atípicos" cell: down/up arrows for the low/high tail.
+    extremes = []
+    if low_extreme is not None:
+        extremes.append(f"↓ {_fmt_num(low_extreme)}")
+    if high_extreme is not None:
+        extremes.append(f"↑ {_fmt_num(high_extreme)}")
+    extremes_cell = "  ".join(extremes) if extremes else "—"
+
+    return {
+        "name": model._safe_str(name),
+        "n_tukey": n_tukey,
+        "pct_tukey": pct_tukey,
+        "n_z": n_z,
+        "pct_z": pct_z,
+        "lower_fence": lf,
+        "upper_fence": uf,
+        "extremes": extremes_cell,
+        "box": box,
+        "fliers": fliers,
+        "has_raw": bool(vals),
+        "contamination": contamination if isinstance(contamination, (int, float)) else -1.0,
+    }
+
+
+def _univariate_table(rows: list) -> model.DataTable:
+    header = ["Columna", "Atípicos Tukey", "% Tukey", "Atípicos z", "% z",
+              "Valla inf.", "Valla sup.", "Extremos atípicos"]
+    table_rows = []
+    for r in rows:
+        table_rows.append([
+            r["name"],
+            _fmt_int(r["n_tukey"]) if r["n_tukey"] is not None else "—",
+            _fmt_pct(r["pct_tukey"]) if r["pct_tukey"] is not None else "—",
+            _fmt_int(r["n_z"]) if r["n_z"] is not None else "—",
+            _fmt_pct(r["pct_z"]) if r["pct_z"] is not None else "—",
+            _fmt_num(r["lower_fence"]),
+            _fmt_num(r["upper_fence"]),
+            r["extremes"],
+        ])
+    return model.DataTable(
+        header=header, rows=table_rows,
+        title="Valores atípicos por columna",
+        note="Tukey = fuera de las vallas 1,5·IQR · z = |z-score| > 3 · "
+             "ordenado de más a menos contaminada")
+
+
+# --------------------------------------------------------------------------- #
+# Multivariate (Isolation Forest) section.
+# --------------------------------------------------------------------------- #
+def _resolve_multivariate(profile: dict, ctx: dict, raw_numeric):
+    """Return (outliers_dict_or_None, source).
+
+    Prefers a LIVE Isolation Forest over ``raw_numeric`` so the detector and
+    ``summarize_outlier_dims`` use EXACTLY the same numeric columns and the same
+    valid-row indexing — otherwise the precomputed ``profile['models']
+    ['outliers']`` (run by MODELOS over a possibly different column subset) would
+    yield ``row_index`` values that no longer point at the rows
+    ``summarize_outlier_dims`` reconstructs, mislabelling the "dimensions that
+    make each row rare". Falls back to the precomputed block when no raw sample
+    is available (e.g. the lite preset drops ``raw_numeric``)."""
+    if _is_dict(raw_numeric) and raw_numeric:
+        iso = _load_isolation_forest()
+        if iso is not None:
+            try:
+                out = iso(raw_numeric)
+                if _is_dict(out) and out.get("n_outliers") is not None and out.get("n_rows_used"):
+                    return out, "live"
+            except Exception:  # noqa: BLE001
+                pass
+    # Fallback: the model the MODELOS chapter already computed (no raw sample to
+    # recompute against, so no per-row dimension breakdown either).
+    models = profile.get("models") if _is_dict(profile.get("models")) else {}
+    pre = models.get("outliers") if _is_dict(models) else None
+    if _is_dict(pre) and pre.get("n_outliers") is not None and pre.get("n_rows_used"):
+        return pre, "precomputed"
+    return None, "none"
+
+
+def _multivariate_blocks(outliers: dict, raw_numeric, mark: bool) -> list:
+    isof = _term(mark, "isolation_forest", "**Isolation Forest**")
+    blocks = [
+        model.Heading(text="Filas atípicas (multivariante)", level=2),
+        model.Markdown(text=(
+            f"Hasta aquí cada columna se ha mirado por separado. {isof} busca "
+            "filas raras considerando **todas las columnas a la vez**: una fila "
+            "puede ser normal en cada variable y aun así ser atípica por la "
+            "**combinación** de sus valores (p. ej. una edad baja con una tarifa "
+            "muy alta). La tabla resume cuántas filas se marcaron y el umbral de "
+            "decisión.")),
+        model.KVTable(rows=[
+            ("Filas analizadas", _fmt_int(outliers.get("n_rows_used"))),
+            ("Columnas consideradas", _fmt_int(outliers.get("n_features"))),
+            ("Filas atípicas", _fmt_int(outliers.get("n_outliers"))),
+            ("% filas atípicas", _fmt_pct(outliers.get("outlier_pct"))),
+            ("Umbral de decisión", _fmt_num(outliers.get("threshold"), 4)),
+        ], title="Anomalías multivariantes"),
+    ]
+
+    rows_in = outliers.get("outlier_rows") or []
+    if not rows_in:
+        return blocks
+
+    # Enrich each anomalous row with the dimensions that make it rare, when the
+    # raw sample is available (summarize_outlier_dims reconstructs the same
+    # valid-row indexing as isolation_forest_outliers).
+    dims_by_row = {}
+    if _is_dict(raw_numeric) and raw_numeric:
+        summ = _load_summarize_dims()
+        if summ is not None:
+            try:
+                enriched = summ(raw_numeric, rows_in, top_k=3) or []
+                for e in enriched:
+                    if _is_dict(e) and e.get("row_index") is not None:
+                        dims_by_row[e.get("row_index")] = e.get("dims") or []
+            except Exception:  # noqa: BLE001
+                dims_by_row = {}
+
+    has_dims = bool(dims_by_row)
+    header = ["Fila (entre válidas)", "Score"]
+    if has_dims:
+        header.append("Dimensiones que la hacen rara (col = valor, z)")
+    table_rows = []
+    for r in rows_in[:_TOP_ROWS]:
+        if not _is_dict(r):
+            continue
+        ridx = r.get("row_index")
+        cells = [_fmt_int(ridx), _fmt_num(r.get("score"), 4)]
+        if has_dims:
+            dims = dims_by_row.get(ridx) or []
+            parts = []
+            for d in dims:
+                if not _is_dict(d):
+                    continue
+                parts.append(
+                    f"{model._safe_str(d.get('col'))} = {_fmt_num(d.get('value'))} "
+                    f"(z {_fmt_num(d.get('z'), 2)})")
+            cells.append("; ".join(parts) if parts else "—")
+        table_rows.append(cells)
+
+    if table_rows:
+        shown = len(table_rows)
+        total = outliers.get("n_outliers")
+        note = "las filas más anómalas primero (score más bajo = más rara)"
+        if isinstance(total, int) and total > shown:
+            note += f" — top {shown} de {total}"
+        if not has_dims:
+            note += (" · no se pudo recuperar la muestra cruda para explicar las "
+                     "dimensiones de cada fila")
+        blocks.append(model.DataTable(
+            header=header, rows=table_rows,
+            title="Filas más atípicas", note=note))
+    return blocks
+
+
+# --------------------------------------------------------------------------- #
+# Interpretation section.
+# --------------------------------------------------------------------------- #
+def _interpretation_block(mark: bool) -> model.Markdown:
+    outlier = _term(mark, "outlier", "atípico")
+    text = (
+        f"**Un {outlier} no es necesariamente un error.** Conviene distinguir "
+        "dos casos antes de actuar:\n\n"
+        "- **Error de dato** (medida, registro o unidad equivocada): una edad de "
+        "200 años, un importe negativo donde no puede haberlo, un decimal "
+        "desplazado. Estos sí se corrigen o se eliminan, idealmente en el origen.\n"
+        "- **Dato real extremo**: una observación legítima de la cola de la "
+        "distribución (un cliente que gasta mucho más, una tarifa de lujo, un día "
+        "de ventas excepcional). Borrarla sesga el análisis y oculta información "
+        "valiosa.\n\n"
+        "**Qué hacer.** Primero, **revisar** los valores señalados arriba contra "
+        "su origen para decidir cuál de los dos casos es. Si son errores, "
+        "corregirlos. Si son datos reales que distorsionan medias y modelos, hay "
+        "alternativas a borrarlos: **winsorizar** (recortar los extremos a un "
+        "percentil), o **re-expresar** la variable (por ejemplo una "
+        "transformación logarítmica o la escalera de re-expresión de Tukey que "
+        "este mismo perfil ya calcula para las columnas asimétricas), que suele "
+        "domar la cola sin perder ninguna fila. La elección depende del objetivo: "
+        "esta lectura es **exploratoria** —orienta dónde mirar—, no una regla "
+        "automática de limpieza.")
+    return model.Markdown(text=text)
+
+
+# --------------------------------------------------------------------------- #
+# Entry point.
+# --------------------------------------------------------------------------- #
+def build_outliers(profile: dict, ctx: dict):
+    """Build the OUTLIERS Chapter, or None if the dataset has no numeric column."""
+    profile = profile or {}
+    ctx = ctx or {}
+    if not isinstance(profile, dict):
+        return None
+
+    numerics = _numeric_columns(profile)
+    if not numerics:
+        return None  # chapter does not apply to a dataset with no numerics.
+
+    # Register glossary terms (if a collector is present) and mark them clickable.
+    glossary = ctx.get("glossary")
+    mark = False
+    if isinstance(glossary, model.GlossaryCollector):
+        for key, (label, definition) in _TERM_DEFS.items():
+            glossary.add(key, label, definition)
+        mark = True
+
+    raw_numeric = ctx.get("raw_numeric")
+    raw_numeric = raw_numeric if isinstance(raw_numeric, dict) else {}
+
+    box_fn = _load_build_boxplot_stats()
+    detect_fn = _load_detect_outliers()
+
+    # --- Univariate summary ------------------------------------------------- #
+    uni_rows = []
+    for name, numeric in numerics:
+        uni_rows.append(_univariate_row(
+            name, numeric, raw_numeric.get(name), box_fn, detect_fn))
+    # Rank columns by contamination (Tukey % when available, else z %).
+    uni_rows.sort(key=lambda r: r.get("contamination", -1.0), reverse=True)
+
+    intro = (
+        "Este capítulo reúne en un solo sitio el análisis de los **valores "
+        "atípicos** de la tabla, que en el resto del informe aparecen dispersos. "
+        f"Un {_term(mark, 'outlier', 'atípico')} es una observación que se aparta "
+        "mucho del grueso de los datos. Cada columna numérica se evalúa con dos "
+        f"criterios complementarios: las {_term(mark, 'tukey_fence', 'vallas de Tukey')} "
+        "(fuera de P25−1,5·IQR o P75+1,5·IQR, robusto a la propia cola) y el "
+        f"{_term(mark, 'zscore', 'z-score')} (|z| > 3, sensible a la media). La "
+        "tabla está ordenada de la columna más contaminada a la menos.")
+
+    blocks = [
+        model.Heading(text=CHAPTER_TITLE, level=1),
+        model.Markdown(text=intro),
+        _univariate_table(uni_rows),
+    ]
+
+    # Flag the most contaminated columns explicitly.
+    flagged = [r["name"] for r in uni_rows
+               if r.get("contamination", -1.0) > 0][:_TOP_FLAGGED]
+    if flagged:
+        names = ", ".join(f"**{n}**" for n in flagged)
+        blocks.append(model.Markdown(text=(
+            f"Las columnas con mayor proporción de atípicos son {names}: "
+            "concentran el grueso de los valores fuera de las vallas y son las "
+            "primeras a revisar.")))
+
+    # --- Boxplots figure ---------------------------------------------------- #
+    box_entries = [
+        {"name": r["name"], "box": r["box"], "fliers": r.get("fliers")}
+        for r in uni_rows
+        if r.get("box")
+    ][:_TOP_BOX]
+    if box_entries:
+        def _boxplots_make(entries=box_entries):
+            try:
+                from datascience.build_boxplots_figure import build_boxplots_figure
+                return build_boxplots_figure(
+                    entries, title="Boxplots de Tukey por columna",
+                    max_boxes=_TOP_BOX)
+            except Exception:  # noqa: BLE001 — minimal fallback figure.
+                import matplotlib
+                matplotlib.use("Agg")
+                from matplotlib.figure import Figure
+                fig = Figure(figsize=(5.0, 2.2))
+                ax = fig.add_subplot(111)
+                ax.text(0.5, 0.5, "(boxplots no disponibles)",
+                        ha="center", va="center")
+                ax.axis("off")
+                return fig
+
+        blocks.append(model.Group(blocks=[
+            model.Heading(text="Boxplots", level=2),
+            model.Markdown(text=(
+                "Cada caja abarca del primer al tercer cuartil (P25–P75), la línea "
+                "interior es la mediana y los bigotes llegan hasta 1,5·IQR; los "
+                "puntos son los valores que caen fuera de las vallas (atípicos por "
+                "Tukey).")),
+            model.Figure(
+                make=_boxplots_make,
+                caption="Boxplots de Tukey de las columnas más contaminadas."),
+        ]))
+
+    # --- Multivariate ------------------------------------------------------- #
+    outliers, _src = _resolve_multivariate(profile, ctx, raw_numeric)
+    if outliers is not None:
+        blocks.extend(_multivariate_blocks(outliers, raw_numeric, mark))
+    else:
+        blocks.append(model.Heading(text="Filas atípicas (multivariante)", level=2))
+        blocks.append(model.Note(
+            "No se pudo analizar la anomalía multivariante: hacen falta al menos "
+            "dos columnas numéricas y la muestra cruda (o los modelos del perfil) "
+            "para correr Isolation Forest."))
+
+    # --- Interpretation ----------------------------------------------------- #
+    blocks.append(model.Heading(text="Cómo interpretar los atípicos", level=2))
+    blocks.append(_interpretation_block(mark))
+
+    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
+                         version=CHAPTER_VERSION, blocks=blocks)
@@ -0,0 +1,304 @@
+"""Tests for the OUTLIERS chapter — DoD: golden + edges + error path.
+
+Self-contained: builds synthetic ``numeric`` blocks + a raw_numeric sample (no
+DuckDB) so the suite is fast and deterministic. Verifies that the chapter emits
+the univariate per-column table, a boxplots figure, the multivariate Isolation
+Forest section and the outlier≠error interpretation; that the most contaminated
+column is ranked first; that a profile with no numeric column yields None; that
+None/empty never raises; that the glossary terms are registered; and that the
+chapter renders into both PDF and PPTX without cutting its title.
+"""
+
+import math
+import os
+import re
+import tempfile
+
+from pypdf import PdfReader
+
+from datascience.automatic_eda.chapters.outliers import (
+    build_outliers, CHAPTER_VERSION, CHAPTER_TITLE, _TERM_DEFS,
+)
+from datascience.automatic_eda import model
+from datascience.render_automatic_eda_pdf import render_automatic_eda_pdf
+from datascience.render_automatic_eda_pptx import render_automatic_eda_pptx
+
+
+def _percentile(sorted_vals, q):
+    """Linear-interpolation percentile (q in 0..1) on an already-sorted list."""
+    if not sorted_vals:
+        return None
+    if len(sorted_vals) == 1:
+        return float(sorted_vals[0])
+    pos = q * (len(sorted_vals) - 1)
+    lo = int(math.floor(pos))
+    hi = int(math.ceil(pos))
+    if lo == hi:
+        return float(sorted_vals[lo])
+    frac = pos - lo
+    return float(sorted_vals[lo] * (1 - frac) + sorted_vals[hi] * frac)
+
+
+def _col_from_values(values, nbins=10):
+    """Build a ``numeric`` sub-block shaped like describe_numeric's output from a
+    concrete list of raw values, so the profile percentiles and the raw sample
+    are consistent (the boxplot fences match the crudo)."""
+    vals = [float(v) for v in values]
+    s = sorted(vals)
+    n = len(s)
+    mean = sum(vals) / n
+    var = sum((v - mean) ** 2 for v in vals) / n
+    std = math.sqrt(var)
+    median = _percentile(s, 0.5)
+    p25 = _percentile(s, 0.25)
+    p75 = _percentile(s, 0.75)
+    mn, mx = s[0], s[-1]
+    # z-score outlier count (population), what the profile's n_outliers carries.
+    n_out = sum(1 for v in vals if std > 0 and abs((v - mean) / std) > 3.0)
+    width = (mx - mn) / nbins if mx > mn else 1.0
+    hist = [{"lo": mn + i * width, "hi": mn + (i + 1) * width, "count": 1}
+            for i in range(nbins)]
+    return {
+        "min": mn, "max": mx, "mean": mean, "median": median, "std": std,
+        "p25": p25, "p50": median, "p75": p75, "iqr": (p75 - p25),
+        "n_outliers": n_out, "outlier_pct": 100.0 * n_out / n,
+        "distribution_type": "right-skewed", "histogram": hist,
+    }
+
+
+def _fare_values():
+    """A heavy-tailed column (most ~10-30, a few 200-512): clear Tukey/z outliers."""
+    base = [7.0 + (i % 25) for i in range(120)]      # bulk 7..31
+    tail = [180.0, 210.0, 263.0, 512.0]              # extreme upper tail
+    return base + tail
+
+
+def _age_values():
+    """A roughly symmetric column with one extreme low value."""
+    base = [22.0 + (i % 40) for i in range(120)]     # 22..61
+    return base + [80.0, 0.5, 74.0, 1.0]
+
+
+def _quiet_values():
+    """A clean column with no atypical values."""
+    return [50.0 + (i % 5) for i in range(124)]
+
+
+def _profile_and_ctx(with_models=True, with_raw=True):
+    fare = _fare_values()
+    age = _age_values()
+    quiet = _quiet_values()
+    cols = [
+        {"name": "Fare", "inferred_type": "numeric", "numeric": _col_from_values(fare)},
+        {"name": "Age", "inferred_type": "numeric", "numeric": _col_from_values(age)},
+        {"name": "Quiet", "inferred_type": "numeric", "numeric": _col_from_values(quiet)},
+        {"name": "Sexo", "inferred_type": "categorical",
+         "categorical": {"top": [{"value": "male", "count": 80}]}},
+    ]
+    profile = {"table": "titanic", "n_rows": len(fare), "n_cols": len(cols),
+               "columns": cols}
+    if with_models:
+        profile["models"] = {
+            "outliers": {
+                "n_outliers": 4, "outlier_pct": 3.2,
+                "outlier_rows": [
+                    {"row_index": 123, "score": -0.21},
+                    {"row_index": 121, "score": -0.15},
+                ],
+                "threshold": -0.02, "n_rows_used": 124, "n_features": 3,
+            }
+        }
+    ctx = {}
+    if with_raw:
+        ctx["raw_numeric"] = {"Fare": fare, "Age": age, "Quiet": quiet}
+    return profile, ctx
+
+
+def _pdf_text(path: str) -> str:
+    txt = "".join((pg.extract_text() or "") for pg in PdfReader(path).pages)
+    return re.sub(r"\s+", " ", txt)
+
+
+def _flatten(blocks):
+    out = []
+    for b in blocks:
+        if getattr(b, "kind", "") == "group":
+            out.extend(_flatten(getattr(b, "blocks", []) or []))
+        else:
+            out.append(b)
+    return out
+
+
+# --------------------------------------------------------------------------- #
+# Golden.
+# --------------------------------------------------------------------------- #
+def test_golden_estructura_y_secciones():
+    profile, ctx = _profile_and_ctx()
+    ctx["glossary"] = model.GlossaryCollector()
+    ch = build_outliers(profile, ctx)
+    assert ch is not None
+    assert ch.id == "outliers"
+    assert ch.version == CHAPTER_VERSION
+
+    flat = _flatten(ch.blocks)
+    kinds = [b.kind for b in flat]
+    # Title heading + univariate DataTable + boxplots Figure + multivariate
+    # KVTable + interpretation Markdown.
+    assert kinds[0] == "heading" and flat[0].text == CHAPTER_TITLE
+    tables = [b for b in flat if b.kind == "data_table"]
+    titles = [t.title for t in tables]
+    assert any(t and "atípicos por columna" in t for t in titles)
+    assert any(b.kind == "figure" for b in flat), "falta la figura de boxplots"
+    assert any(b.kind == "kv_table" for b in flat), "falta el resumen multivariante"
+
+    # The boxplots figure maker yields a real matplotlib figure (or its fallback).
+    fig = next(b for b in flat if b.kind == "figure").make()
+    assert fig is not None
+    import matplotlib.pyplot as plt
+    plt.close(fig)
+
+
+def test_golden_fare_es_la_mas_contaminada():
+    # The univariate table must rank Fare (heavy tail) first and report a
+    # non-zero Tukey percentage for it.
+    profile, ctx = _profile_and_ctx()
+    ch = build_outliers(profile, ctx)
+    table = next(b for b in _flatten(ch.blocks)
+                 if b.kind == "data_table" and b.title
+                 and "atípicos por columna" in b.title)
+    first_col = table.rows[0][0]
+    assert first_col == "Fare", f"esperaba Fare primera, fue {first_col}"
+    # % Tukey column (index 2) of the first row must be > 0.
+    pct_cell = table.rows[0][2]
+    assert pct_cell not in ("—", "0%", "0.00%"), f"% Tukey de Fare vacío: {pct_cell}"
+    # The z-score rule (detect_outliers) must actually run with raw_numeric: at
+    # least one column reports a non-empty z count/percentage (regression guard
+    # for the detect_outliers import path).
+    z_pcts = [r[4] for r in table.rows]
+    assert any(c not in ("—",) for c in z_pcts), f"columna z toda vacía: {z_pcts}"
+    z_counts = [r[3] for r in table.rows]
+    assert any(c not in ("—",) for c in z_counts), f"conteo z vacío: {z_counts}"
+
+
+def test_golden_interpretacion_outlier_no_es_error():
+    profile, ctx = _profile_and_ctx()
+    ch = build_outliers(profile, ctx)
+    md = " ".join(b.text for b in _flatten(ch.blocks) if b.kind == "markdown")
+    assert "no es necesariamente un error" in md.lower()
+    # Mentions the actionable options (winsorize / re-express).
+    assert "winsoriz" in md.lower()
+    assert "re-expres" in md.lower() or "logarítmic" in md.lower()
+
+
+def test_golden_terminos_glosario_registrados():
+    profile, ctx = _profile_and_ctx()
+    gloss = model.GlossaryCollector()
+    ctx["glossary"] = gloss
+    build_outliers(profile, ctx)
+    for key in _TERM_DEFS:
+        assert gloss.has(key), f"término '{key}' no registrado en el glosario"
+    # Terms are marked clickable in the body text.
+    md = " ".join(b.text for b in _flatten(build_outliers(profile, ctx).blocks)
+                  if b.kind == "markdown")
+    assert "[[term:outlier]]" in md and "[[term:tukey_fence]]" in md
+
+
+# --------------------------------------------------------------------------- #
+# Multivariate.
+# --------------------------------------------------------------------------- #
+def test_multivariante_live_con_raw_y_dims():
+    # With a raw sample the chapter runs Isolation Forest live (over the same
+    # columns summarize_outlier_dims uses) and lists the anomalous rows with the
+    # dimensions that make each one rare.
+    profile, ctx = _profile_and_ctx(with_models=False, with_raw=True)
+    ch = build_outliers(profile, ctx)
+    flat = _flatten(ch.blocks)
+    kv = next(b for b in flat if b.kind == "kv_table")
+    flat_kv = " ".join(f"{k} {v}" for (k, v) in kv.rows)
+    assert "Filas atípicas" in flat_kv
+    # A non-zero number of anomalous rows is reported.
+    n_cell = dict(kv.rows).get("Filas atípicas")
+    assert n_cell not in (None, "—", "0"), f"sin filas atípicas: {n_cell}"
+    # The anomalous-rows table carries the per-row dimension breakdown.
+    tbls = [b for b in flat if b.kind == "data_table" and b.title
+            and "más atípicas" in b.title]
+    assert tbls, "falta la tabla de filas más atípicas"
+    assert any("hacen rara" in h for h in tbls[0].header), \
+        f"falta la columna de dimensiones: {tbls[0].header}"
+
+
+def test_multivariante_precomputed_sin_raw():
+    # Without a raw sample the chapter falls back to profile['models']['outliers']
+    # (lite preset path); the precomputed n_outliers (4) surfaces in the KV table.
+    profile, ctx = _profile_and_ctx(with_models=True, with_raw=False)
+    ch = build_outliers(profile, ctx)
+    kv = next(b for b in _flatten(ch.blocks) if b.kind == "kv_table")
+    assert any("4" in str(v) for (k, v) in kv.rows)
+
+
+def test_multivariante_ausente_degrada_a_nota():
+    # No models and no raw sample → an honest note, never a crash.
+    profile, ctx = _profile_and_ctx(with_models=False, with_raw=False)
+    ch = build_outliers(profile, ctx)
+    assert ch is not None
+    notes = [b.text for b in _flatten(ch.blocks) if b.kind == "note"]
+    assert any("Isolation Forest" in n for n in notes)
+
+
+# --------------------------------------------------------------------------- #
+# Edges / error path.
+# --------------------------------------------------------------------------- #
+def test_edge_sin_columnas_numericas_devuelve_none():
+    prof = {"columns": [{"name": "c", "inferred_type": "categorical",
+                         "categorical": {"top": [{"value": "x", "count": 3}]}}]}
+    assert build_outliers(prof, {}) is None
+
+
+def test_edge_solo_texto_sintetico_devuelve_none():
+    # A text-only synthetic table (no numeric column) yields None (does not break).
+    prof = {"table": "notas", "n_rows": 3, "n_cols": 1,
+            "columns": [{"name": "comentario", "inferred_type": "text",
+                         "text": {"n_docs": 3}}]}
+    assert build_outliers(prof, {}) is None
+
+
+def test_edge_profile_none_y_vacio_no_revienta():
+    assert build_outliers(None, None) is None
+    assert build_outliers({}, {}) is None
+    assert build_outliers({"columns": []}, {}) is None
+
+
+def test_edge_sin_raw_numeric_degrada_a_perfil():
+    # Without raw_numeric the chapter still builds, using the profile z-score
+    # counts; the univariate table exists and Tukey counts degrade to '—'.
+    profile, ctx = _profile_and_ctx(with_models=True, with_raw=False)
+    ch = build_outliers(profile, ctx)
+    assert ch is not None
+    table = next(b for b in _flatten(ch.blocks)
+                 if b.kind == "data_table" and b.title
+                 and "atípicos por columna" in b.title)
+    # z column comes from the profile; Tukey count is unknown ('—').
+    assert all(len(r) == 8 for r in table.rows)
+
+
+# --------------------------------------------------------------------------- #
+# Anti-cut render.
+# --------------------------------------------------------------------------- #
+def test_render_pdf_y_pptx_incluyen_el_capitulo():
+    profile, ctx = _profile_and_ctx()
+    # The renderers build the whole document; the chapter is reached via the
+    # registry. Render the chapter standalone through a one-chapter document by
+    # passing the profile directly (the renderers run the full chapter registry).
+    with tempfile.TemporaryDirectory() as d:
+        pdf = os.path.join(d, "out.pdf")
+        res_pdf = render_automatic_eda_pdf(profile, pdf,
+                                           {"write_manifest": False, "ctx": ctx})
+        assert res_pdf["path"] == pdf
+        txt = _pdf_text(pdf)
+        assert CHAPTER_TITLE in txt, "el capítulo OUTLIERS no aparece en el PDF"
+        assert "Fare" in txt
+        pptx = os.path.join(d, "out.pptx")
+        res_pptx = render_automatic_eda_pptx(profile, pptx,
+                                             {"write_manifest": False, "ctx": ctx})
+        assert res_pptx["path"] == pptx
+        assert res_pptx["n_slides"] >= 1
@@ -7,11 +7,21 @@ as needed, the renderers paginate):
   NOT carry the raw head, so this is read from ``ctx['head_rows']`` /
   ``profile['head_rows']`` (a list of row dicts). When absent the chapter shows
   an honest placeholder documenting the missing key instead of inventing data.
-2. Column dictionary — name / type / nulls / non-null examples. Examples come
+2. Column dictionary — name / type / nulls / non-null examples plus, when the
+   LLM layer ran, the business **description** and **unit** of each column so the
+   reader knows at a glance what every column is and in which unit. Examples come
   from ``columns[i]['examples']`` when present; otherwise they are derived from
   real non-null profile values (categorical top values, numeric min/median/max)
   so the cell is never empty nor fabricated.
-3. ``df.describe`` — mean / median / min / max / std for every numeric column.
+3. ``df.describe`` — mean / median / min / max / std for every numeric column,
+   plus its **unit** (same LLM source) so the stats read in context.
+
+The description/unit come from the ``llm`` block that ``eda_llm_insights`` (group
+``eda``) already stored in the profile (``profile['llm']['dictionary']``, a list
+of ``{"column","description","business_meaning","unit"}`` entries) — this chapter
+only **consumes** it, matching by column name; it never calls the LLM nor
+recomputes anything. When the block is absent (``run_llm`` did not run) those
+cells degrade to ``"—"`` and the tables still render.

 Contract: build_<id>(profile, ctx) -> Chapter | None ; CHAPTER_VERSION = "x.y.z".
 """
@@ -20,13 +30,59 @@ from __future__ import annotations

 from .. import model

-CHAPTER_VERSION = "1.1.0"
+CHAPTER_VERSION = "1.2.0"
 CHAPTER_ID = "overview"
 CHAPTER_TITLE = "Overview"

 # Profile/ctx keys the calculation phase must add for a full head + examples.
 HEAD_KEY = "head_rows"          # list[dict] — df.head(n)
 EXAMPLES_KEY = "examples"       # per column: list of non-null sample values
+LLM_KEY = "llm"                 # interpretive block from eda_llm_insights
+
+
+def _llm_dict_index(profile: dict, ctx: dict) -> dict:
+    """Map column name -> its LLM dictionary entry (description/unit/...).
+
+    Reads the ``llm.dictionary`` list that ``eda_llm_insights`` stored in the
+    profile (``profile['llm']``; falls back to ``ctx['llm']``). Returns an empty
+    dict when no LLM block ran, so the caller degrades to "—" cells. Fully
+    defensive: never raises on malformed input.
+    """
+    llm = profile.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        llm = ctx.get(LLM_KEY)
+    if not isinstance(llm, dict):
+        return {}
+    entries = llm.get("dictionary")
+    if not isinstance(entries, (list, tuple)):
+        return {}
+    index: dict = {}
+    for e in entries:
+        if not isinstance(e, dict):
+            continue
+        col = e.get("column")
+        if col is None:
+            continue
+        index[model._safe_str(col)] = e
+    return index
+
+
+def _llm_desc(entry) -> str:
+    """Business description of a column from its LLM entry, or "—"."""
+    if not isinstance(entry, dict):
+        return "—"
+    raw = entry.get("description") or entry.get("business_meaning")
+    text = " ".join(model._safe_str(raw).split()) if raw is not None else ""
+    return text or "—"
+
+
+def _llm_unit(entry) -> str:
+    """Unit of a column from its LLM entry, or "—"."""
+    if not isinstance(entry, dict):
+        return "—"
+    raw = entry.get("unit")
+    text = " ".join(model._safe_str(raw).split()) if raw is not None else ""
+    return text or "—"


 def _fmt_num(value, decimals: int = 3) -> str:
@@ -104,9 +160,12 @@ def _head_block(profile: dict, ctx: dict):
        "pasarlo en ctx['head_rows'] para mostrar las primeras filas.")


-def _columns_block(profile: dict):
+def _columns_block(profile: dict, llm_index: dict):
    cols = profile.get("columns") or []
-    header = ["Columna", "Tipo", "Nulos", "Ejemplos (no nulos)"]
+    # Descripción / Unidad come from the LLM dictionary (matched by column name);
+    # they read "—" when run_llm did not run, so the table always renders.
+    header = ["Columna", "Tipo", "Nulos", "Ejemplos (no nulos)",
+              "Descripción", "Unidad"]
    rows = []
    for c in cols:
        if not isinstance(c, dict):
@@ -126,15 +185,18 @@ def _columns_block(profile: dict):
            nulls = str(null_count)
        else:
            nulls = "—"
-        rows.append([name, ctype, nulls, _examples_for(c)])
+        entry = llm_index.get(model._safe_str(name))
+        rows.append([name, ctype, nulls, _examples_for(c),
+                     _llm_desc(entry), _llm_unit(entry)])
    if not rows:
        return None
    return model.DataTable(header=header, rows=rows, title="Columnas")


-def _describe_block(profile: dict):
+def _describe_block(profile: dict, llm_index: dict):
    cols = profile.get("columns") or []
-    header = ["Columna", "mean", "median", "min", "max", "std"]
+    # "Unidad" (LLM source) lets the reader know in which unit each stat is.
+    header = ["Columna", "mean", "median", "min", "max", "std", "Unidad"]
    rows = []
    for c in cols:
        if not isinstance(c, dict) or c.get("inferred_type") != "numeric":
@@ -142,13 +204,16 @@ def _describe_block(profile: dict):
        num = c.get("numeric") or {}
        if not num:
            continue
+        name = c.get("name") or "(col)"
+        entry = llm_index.get(model._safe_str(name))
        rows.append([
-            c.get("name") or "(col)",
+            name,
            _fmt_num(num.get("mean")),
            _fmt_num(num.get("median")),
            _fmt_num(num.get("min")),
            _fmt_num(num.get("max")),
            _fmt_num(num.get("std")),
+            _llm_unit(entry),
        ])
    if not rows:
        return None
@@ -163,16 +228,18 @@ def build_overview(profile: dict, ctx: dict):
    if not cols and not (ctx.get(HEAD_KEY) or profile.get(HEAD_KEY)):
        return None

+    llm_index = _llm_dict_index(profile, ctx)
+
    blocks = [
        model.Heading(text="Primeras filas (df.head)", level=2),
        _head_block(profile, ctx),
    ]
-    cols_block = _columns_block(profile)
+    cols_block = _columns_block(profile, llm_index)
    if cols_block is not None:
        blocks.append(model.Heading(
            text="Diccionario de columnas", level=2))
        blocks.append(cols_block)
-    desc_block = _describe_block(profile)
+    desc_block = _describe_block(profile, llm_index)
    if desc_block is not None:
        blocks.append(model.Heading(
            text="Resumen estadístico numérico", level=2))
@@ -56,7 +56,21 @@ def _head_rows() -> list:
    ]


-def _profile(with_head: bool = True) -> dict:
+def _llm() -> dict:
+    """Interpretive block as eda_llm_insights stores it under profile['llm']."""
+    return {
+        "summary": "Pasajeros del Titanic.",
+        "dictionary": [
+            {"column": "PassengerId", "description": "Identificador del pasajero",
+             "business_meaning": "Clave única de cada pasajero", "unit": "id"},
+            {"column": "Pclass", "description": "Clase del billete",
+             "business_meaning": "Clase socioeconómica", "unit": "clase (1-3)"},
+            # No entry for Survived/Name/Sex on purpose -> they degrade to "—".
+        ],
+    }
+
+
+def _profile(with_head: bool = True, with_llm: bool = False) -> dict:
    prof = {
        "table": "titanic",
        "source": "/data/titanic.csv",
@@ -68,6 +82,8 @@ def _profile(with_head: bool = True) -> dict:
    }
    if with_head:
        prof["head_rows"] = _head_rows()
+    if with_llm:
+        prof["llm"] = _llm()
    return prof


@@ -185,3 +201,70 @@ def test_edge_none_y_vacio_no_rompen():
    assert ch is not None
    tables = [b for b in _flatten(ch.blocks) if isinstance(b, DataTable)]
    assert tables and len(tables[0].rows) == 3
+
+
+def _table_by_header(blocks, marker: str):
+    """Return the first DataTable whose header contains ``marker``."""
+    for b in _flatten(blocks):
+        if isinstance(b, DataTable) and marker in b.header:
+            return b
+    return None
+
+
+def test_golden_diccionario_lleva_descripcion_y_unidad_del_llm():
+    # With run_llm: the column dictionary gains "Descripción" and "Unidad"
+    # columns populated from profile['llm']['dictionary'], matched by name.
+    ch = build_overview(_profile(with_llm=True), {})
+    assert ch is not None
+    dic = _table_by_header(ch.blocks, "Descripción")
+    assert dic is not None
+    assert dic.header == ["Columna", "Tipo", "Nulos", "Ejemplos (no nulos)",
+                          "Descripción", "Unidad"]
+    by_name = {row[0]: row for row in dic.rows}
+    # PassengerId has an LLM entry -> description + unit populated.
+    assert by_name["PassengerId"][4] == "Identificador del pasajero"
+    assert by_name["PassengerId"][5] == "id"
+    assert by_name["Pclass"][5] == "clase (1-3)"
+    # Columns with no LLM entry degrade to "—" without breaking the row.
+    assert by_name["Survived"][4] == "—" and by_name["Survived"][5] == "—"
+
+
+def test_golden_describe_lleva_unidad_del_llm():
+    ch = build_overview(_profile(with_llm=True), {})
+    desc = _table_by_header(ch.blocks, "std")
+    assert desc is not None
+    assert desc.header[-1] == "Unidad"
+    by_name = {row[0]: row for row in desc.rows}
+    assert by_name["PassengerId"][-1] == "id"
+    assert by_name["Pclass"][-1] == "clase (1-3)"
+    # Numeric column with no LLM unit still renders, unit "—".
+    assert by_name["Survived"][-1] == "—"
+
+
+def test_edge_sin_llm_descripcion_unidad_son_guion():
+    # No profile['llm'] at all: the new cells degrade to "—" and nothing breaks.
+    ch = build_overview(_profile(), {})
+    assert ch is not None
+    dic = _table_by_header(ch.blocks, "Unidad")
+    assert dic is not None
+    for row in dic.rows:
+        assert row[4] == "—" and row[5] == "—"
+    desc = _table_by_header(ch.blocks, "std")
+    assert all(row[-1] == "—" for row in desc.rows)
+
+
+def test_golden_llm_via_ctx_tambien_funciona():
+    # LLM block arriving through ctx['llm'] (fallback path) is consumed too.
+    ch = build_overview(_profile(with_llm=False), {"llm": _llm()})
+    dic = _table_by_header(ch.blocks, "Descripción")
+    by_name = {row[0]: row for row in dic.rows}
+    assert by_name["PassengerId"][5] == "id"
+
+
+def test_golden_render_pdf_muestra_descripcion_y_unidad():
+    with tempfile.TemporaryDirectory() as d:
+        out = os.path.join(d, "eda.pdf")
+        render_automatic_eda_pdf(_profile(with_llm=True), out, {"title": "EDA"})
+        txt = _pdf_text(out)
+        assert "Descripción" in txt and "Unidad" in txt
+        assert "Identificador del pasajero" in txt
@@ -26,7 +26,7 @@ from datetime import datetime, timezone

 from .. import model

-CHAPTER_VERSION = "1.2.0"
+CHAPTER_VERSION = "1.3.0"
 CHAPTER_ID = "portada"
 CHAPTER_TITLE = "Portada"

@@ -35,12 +35,9 @@ CHAPTER_TITLE = "Portada"
 # row represents) from it when the LLM layer ran (``run_llm``).
 _LLM_KEY = "llm"

-# Default human description of what the table quality score measures. Chapters
-# can override it via ctx["quality_criteria"].
-_DEFAULT_QUALITY_CRITERIA = (
-    "media de los scores por columna (0–100): completitud (sin nulos/vacíos), "
-    "validez (tipo y rango coherentes) y consistencia (sin duplicados/constantes)."
-)
+# Font size (pt) for the dataset name on the PPTX cover slide — notably larger
+# than the default H1 so the dataset name stands out (shown underlined too).
+_PPTX_TITLE_PT = 44.0


 def _storage_from_source(source: str) -> str:
@@ -120,7 +117,8 @@ def _summary_blocks(summary) -> list:

    blocks = [model.Heading(text="Resumen del análisis", level=2)]
    if rows:
-        blocks.append(model.KVTable(rows=rows))
+        # Values pinned to the right margin (numbers flush right, label left).
+        blocks.append(model.KVTable(rows=rows, value_align="right"))
    if titles:
        bullets = "\n".join(f"- {model._safe_str(t)}" for t in titles)
        blocks.append(model.Markdown(
@@ -213,9 +211,7 @@ def _derive_description(profile: dict, ctx: dict) -> str:
    score = profile.get("quality_score")
    if score is not None:
        parts.append(f"Calidad media estimada: {score}/100.")
-    parts.append(
-        "Resumen derivado del perfil; active la interpretación LLM (`run_llm`) "
-        "para una descripción de negocio más rica.")
+    parts.append("Resumen derivado del perfil.")
    return " ".join(parts)


@@ -259,7 +255,6 @@ def build_portada(profile: dict, ctx: dict):
    shape = f"{_fmt_int(n_rows)} filas × {_fmt_int(n_cols)} columnas"

    score = profile.get("quality_score")
-    quality_criteria = ctx.get("quality_criteria") or _DEFAULT_QUALITY_CRITERIA
    quality_value = "—" if score is None else f"{score} / 100"

    llm = _llm_block(profile, ctx)
@@ -282,8 +277,11 @@ def build_portada(profile: dict, ctx: dict):

    # Title + dataset size shown together and BIG (Heading) at the top, kept on
    # the same page (Group). The size is no longer buried in the metadata table.
+    # The dataset name is shown big and underlined on the PPTX cover slide
+    # (size_pt/underline are honoured by the PPTX renderer; the PDF ignores them).
    cover = [
-        model.Heading(text=str(dataset_name), level=1),
+        model.Heading(text=str(dataset_name), level=1, underline=True,
+                      size_pt=_PPTX_TITLE_PT),
        model.Markdown(text="**Automatic-EDA** · informe exploratorio automático"),
        model.Heading(text=shape, level=2),
    ]
@@ -295,7 +293,6 @@ def build_portada(profile: dict, ctx: dict):
            ("Almacenamiento", storage),
            ("Generado", when),
            ("Calidad", quality_value),
-            ("Criterios de calidad", quality_criteria),
        ]),
        model.Heading(text="Descripción", level=2),
        model.Markdown(text=str(description)),
@@ -34,6 +34,7 @@ CHAPTER_ORDER = [
    "text_distr",    # free-text / NLP distributions (non-tabular content)
    "calidad",       # data quality
    "missingness",   # missing-data patterns (co-occurrence of absences; MCAR/MAR)
+    "outliers",      # atypical values: univariate (Tukey/z) + multivariate (IsolationForest)
    "correlacion",   # correlations / associations
    "relaciones",    # key relations: declared/candidate PK + FK (inter/intra-table)
    "modelos",       # cheap models (PCA/KMeans/outliers)
@@ -72,24 +73,51 @@ def build_chapter(chapter_id: str, profile: dict, ctx: dict):
    return model.as_chapter(result)


-def build_document(profile: dict, ctx: dict = None) -> list:
-    """Build the full ordered list of chapters for a TableProfile.
+def build_document(profile: dict, ctx: dict = None, only: list = None) -> list:
+    """Build the ordered list of chapters for a TableProfile.

    Args:
        profile: the ``eda`` group TableProfile dict (may be None/empty).
        ctx: optional context dict carrying presentation metadata not present in
            the profile (dataset_name, source_origin, storage, generated_at,
            description, granularity, quality_criteria, head_rows, ...).
+        only: optional list of chapter ids to render. ``None`` (default) keeps
+            the historical behaviour — every implemented & applicable chapter in
+            canonical order. A list restricts the BODY to just those ids (in
+            canonical order), but the cover (``portada``) and glossary
+            (``glosario``) are ALWAYS included so the document stays valid and
+            the clickable terms keep a destination — so passing ``only=["x"]``
+            yields portada + x + glosario. Unknown ids are simply skipped (the
+            caller is responsible for strict validation). ``only=[]`` yields the
+            minimal document (portada + glosario only). This argument is additive
+            and backward-compatible: the signature is unchanged for existing
+            callers (default ``None``).

    Returns:
        list[Chapter] in canonical order, containing only the chapters that are
-        implemented and applicable. Never raises.
+        implemented, applicable and selected. Never raises.
    """
    if not isinstance(profile, dict):
        profile = {}
    # Copy ctx so the shared collector / summary we add do not leak to the caller.
    ctx = dict(ctx) if isinstance(ctx, dict) else {}

+    # only=None -> all body chapters (historical). only=list -> restrict body to
+    # that selection (portada/glosario are added unconditionally below). The
+    # renderers call build_document(profile, meta['ctx']) without an `only`
+    # argument, so the pipeline forwards the selection through a reserved ctx key
+    # (``_only_chapters``); an explicit `only` argument always wins. The key is
+    # popped from the local ctx copy so it never reaches the chapters.
+    if only is None:
+        _carried = ctx.pop("_only_chapters", None)
+        if isinstance(_carried, (list, tuple, set)):
+            only = list(_carried)
+    else:
+        ctx.pop("_only_chapters", None)
+    # A set makes the membership test cheap; the iteration order stays
+    # CHAPTER_ORDER. only=[] is a valid (empty) selection -> minimal document.
+    only_set = set(only) if isinstance(only, (list, tuple, set)) else None
+
    # A single glossary collector is shared by every chapter via ctx['glossary'].
    # Chapters call ctx['glossary'].add(key, label, definition) and mark in-text
    # appearances with [[term:key]]…[[/term]]; the glosario chapter renders the
@@ -105,6 +133,10 @@ def build_document(profile: dict, ctx: dict = None) -> list:
    for cid in CHAPTER_ORDER:
        if cid in (_PORTADA, _GLOSARIO):
            continue
+        # When a selection is given, skip body chapters outside it. portada and
+        # glosario are never filtered (handled out of this loop).
+        if only_set is not None and cid not in only_set:
+            continue
        ch = build_chapter(cid, profile, ctx)
        if ch is not None and ch.blocks:
            body.append(ch)
@@ -38,10 +38,18 @@ ENGINE_NAME = "AutomaticEDA"
 # --------------------------------------------------------------------------- #
@dataclass
 class Heading:
-    """A section heading. ``level`` 1 (largest) .. 3 (smallest)."""
+    """A section heading. ``level`` 1 (largest) .. 3 (smallest).
+
+    ``underline`` and ``size_pt`` are optional emphasis hints honoured by the
+    PPTX renderer (the cover uses them to show the dataset name big and
+    underlined). ``size_pt`` overrides the per-level font size when set; the PDF
+    renderer ignores both so its layout is unchanged.
+    """

    text: str = ""
    level: int = 1
+    underline: bool = False
+    size_pt: Optional[float] = None
    kind: str = field(default="heading", init=False)


@@ -62,10 +70,17 @@ class Markdown:

@dataclass
 class KVTable:
-    """A two-column key/value table. ``rows`` is a list of ``(label, value)``."""
+    """A two-column key/value table. ``rows`` is a list of ``(label, value)``.
+
+    ``value_align`` controls the horizontal alignment of the value column in the
+    PDF renderer: ``"left"`` (default) keeps values next to the label column;
+    ``"right"`` pins them to the right margin (used by the cover's analysis
+    summary so the numbers line up flush right).
+    """

    rows: list = field(default_factory=list)
    title: Optional[str] = None
+    value_align: str = "left"
    kind: str = field(default="kv_table", init=False)


@@ -210,13 +225,20 @@ def as_block(obj: Any):
        # Build only with fields the dataclass accepts (ignore extras).
        try:
            if cls is Heading:
+                size_pt = obj.get("size_pt")
                return Heading(text=_safe_str(obj.get("text")),
-                               level=int(obj.get("level", 1) or 1))
+                               level=int(obj.get("level", 1) or 1),
+                               underline=bool(obj.get("underline", False)),
+                               size_pt=(float(size_pt)
+                                        if isinstance(size_pt, (int, float))
+                                        else None))
            if cls is Markdown:
                return Markdown(text=_safe_str(obj.get("text")))
            if cls is KVTable:
                return KVTable(rows=list(obj.get("rows") or []),
-                               title=obj.get("title"))
+                               title=obj.get("title"),
+                               value_align=_safe_str(
+                                   obj.get("value_align")) or "left")
            if cls is DataTable:
                return DataTable(header=list(obj.get("header") or []),
                                 rows=list(obj.get("rows") or []),
@@ -317,10 +317,18 @@ def _place_kv_table(st: _PdfState, block) -> None:
    if title:
        _place_heading(st, model.Heading(title, level=2))
    rows = getattr(block, "rows", []) or []
+    # ``value_align="right"`` pins the value column to the right margin (label
+    # left, number flush right) — used by the cover's analysis summary.
+    right = str(getattr(block, "value_align", "left")).lower() == "right"
    key_w = 1.9  # inches reserved for the label column.
+    # Right-aligned values wrap against the full usable width minus the label
+    # column; left-aligned values wrap against the value column only.
    val_chars = tl.chars_per_line(_USABLE_W - key_w - 0.1, _FS_BODY)
    lh = tl.line_height_in(_FS_BODY)
-    for row in rows:
+    # ``data_idx`` is the 0-based logical row index: even rows (1-based) are
+    # zebra-shaded → 0-based odd indices, matching the data-table convention so
+    # every table in the document carries the same striping.
+    for data_idx, row in enumerate(rows):
        try:
            label, value = row[0], row[1]
        except Exception:  # noqa: BLE001
@@ -329,11 +337,25 @@ def _place_kv_table(st: _PdfState, block) -> None:
        row_h = lh * len(v_lines) + _ROW_VPAD
        _ensure_space(st, row_h)
        y0 = st.y
+        # Faint zebra fill for even rows, drawn first (zorder 0) so striping
+        # never hides the text/value drawn on top.
+        if data_idx % 2 == 1:
+            st.fig.add_artist(Rectangle(
+                (_xf(_ML), _yf(y0 + row_h)), _xf(_ML + _USABLE_W) - _xf(_ML),
+                _yf(y0) - _yf(y0 + row_h), transform=st.fig.transFigure,
+                color=_ZEBRA, lw=0, zorder=0))
        st.fig.text(_xf(_ML), _yf(y0), tl.strip_inline_md(model._safe_str(label)),
-                    fontsize=_FS_BODY, color=_MUTED, ha="left", va="top")
+                    fontsize=_FS_BODY, color=_MUTED, ha="left", va="top",
+                    zorder=2)
        for k, vl in enumerate(v_lines):
-            st.fig.text(_xf(_ML + key_w), _yf(y0 + k * lh), vl,
-                        fontsize=_FS_BODY, color=_INK, ha="left", va="top")
+            if right:
+                st.fig.text(_xf(_ML + _USABLE_W), _yf(y0 + k * lh), vl,
+                            fontsize=_FS_BODY, color=_INK, ha="right",
+                            va="top", zorder=2)
+            else:
+                st.fig.text(_xf(_ML + key_w), _yf(y0 + k * lh), vl,
+                            fontsize=_FS_BODY, color=_INK, ha="left",
+                            va="top", zorder=2)
        st.y = y0 + row_h
    st.y += _GAP

@@ -135,7 +135,7 @@ def _ensure(st: _PptxState, height: float) -> None:


 def _add_text(st: _PptxState, lines: list, fs: float, color, bold=False,
-              italic=False, indent=0.0, bullet=False) -> None:
+              italic=False, indent=0.0, bullet=False, underline=False) -> None:
    lh = tl.line_height_in(fs)
    height = lh * len(lines) + 0.05
    _ensure(st, height)
@@ -153,6 +153,7 @@ def _add_text(st: _PptxState, lines: list, fs: float, color, bold=False,
        run.font.size = Pt(fs)
        run.font.bold = bold
        run.font.italic = italic
+        run.font.underline = underline
        run.font.color.rgb = _rgb(color)
    st.y += height

@@ -206,10 +207,16 @@ def _add_rich_text(st: _PptxState, rich_lines: list, fs: float, color,
 def _place_heading(st: _PptxState, block) -> None:
    level = max(1, min(3, int(getattr(block, "level", 1) or 1)))
    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
+    # Optional per-heading emphasis (cover dataset name): a larger font and an
+    # underline. ``size_pt`` overrides the per-level size when set.
+    size_override = getattr(block, "size_pt", None)
+    if isinstance(size_override, (int, float)) and size_override > 0:
+        fs = float(size_override)
+    underline = bool(getattr(block, "underline", False))
    text = tl.strip_inline_md(getattr(block, "text", ""))
    st.last_heading = text or st.last_heading
    lines = tl.wrap(text, tl.chars_per_line(_USABLE_W, fs))
-    _add_text(st, lines, fs, _INK, bold=True)
+    _add_text(st, lines, fs, _INK, bold=True, underline=underline)
    st.y += 0.04


@@ -552,9 +559,11 @@ def _place_note(st: _PptxState, block) -> None:
 # WITHOUT drawing it so a Group can move whole to the next slide before drawing.
 # Over-estimating only triggers an earlier slide break, never a content cut.
 # --------------------------------------------------------------------------- #
-def _measure_heading_text(text: str, level: int) -> float:
+def _measure_heading_text(text: str, level: int, size_pt=None) -> float:
    level = max(1, min(3, int(level or 1)))
    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
+    if isinstance(size_pt, (int, float)) and size_pt > 0:
+        fs = float(size_pt)
    lines = tl.wrap(tl.strip_inline_md(text), tl.chars_per_line(_USABLE_W, fs))
    return tl.line_height_in(fs) * len(lines) + 0.05 + 0.04

@@ -679,7 +688,8 @@ def _measure_block(st: _PptxState, block) -> float:
    try:
        if kind == "heading":
            return _measure_heading_text(getattr(block, "text", ""),
-                                         getattr(block, "level", 1))
+                                         getattr(block, "level", 1),
+                                         size_pt=getattr(block, "size_pt", None))
        if kind == "markdown":
            return _measure_markdown(block)
        if kind in ("figure", "image"):
@@ -0,0 +1,125 @@
+---
+id: build_boxplots_figure_py_datascience
+name: build_boxplots_figure
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def build_boxplots_figure(boxes: list, title: str = \"\", max_boxes: int = 12) -> \"matplotlib.figure.Figure\""
+description: "Construye una unica figura matplotlib con boxplots de Tukey HORIZONTALES (uno por columna) usando ax.bxp: caja Q1-Q3, bigotes hasta 1.5*IQR, linea de mediana y puntos atipicos. Consume la salida de build_boxplot_stats (un dict box por columna, leido con .get) mas una lista opcional de outliers crudos por columna; si vienen los dibuja como puntos (showfliers), si no marca solo box[min]/box[max] cuando hay outliers de cola (igual que num_distr). Dibuja como mucho max_boxes cajas (las primeras, ya ordenadas por contaminacion por el caller) y avisa de la truncacion con (mostrando N de M). Backend Agg sin pyplot global; alto adaptativo al nº de cajas. Defensiva: omite entradas invalidas y NUNCA lanza — sin cajas validas devuelve una figura placeholder (sin boxplots). Es la version small-multiples del capitulo num_distr para responder que columnas tienen mas outliers de un vistazo."
+tags: [eda, outliers, boxplot, tukey, iqr, bxp, matplotlib, figure, visualization, small-multiples, datascience, impure]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [matplotlib]
+example: |
+  from datascience.build_boxplot_stats import build_boxplot_stats
+  from datascience.build_boxplots_figure import build_boxplots_figure
+  boxes = [
+      {"name": "ingresos", "box": build_boxplot_stats({"min": 1.0, "max": 9e3,
+          "p25": 1e3, "median": 2e3, "p75": 3e3, "n_outliers": 7}), "fliers": None},
+      {"name": "edad", "box": build_boxplot_stats({"min": 0.0, "max": 99.0,
+          "p25": 25.0, "median": 38.0, "p75": 52.0}), "fliers": None},
+  ]
+  fig = build_boxplots_figure(boxes, title="Outliers por columna", max_boxes=12)
+tested: true
+tests:
+  - "test_returns_figure_with_axes"
+  - "test_empty_list_returns_placeholder_figure"
+  - "test_invalid_box_is_skipped_not_raised"
+  - "test_all_invalid_returns_placeholder"
+  - "test_raw_fliers_are_drawn"
+  - "test_max_boxes_truncates_and_does_not_raise"
+test_file_path: "python/functions/datascience/build_boxplots_figure_test.py"
+file_path: "python/functions/datascience/build_boxplots_figure.py"
+params:
+  - name: boxes
+    desc: "Lista de dicts, cada uno {\"name\": str, \"box\": dict, \"fliers\": list|None}. box es EXACTAMENTE la salida de build_boxplot_stats (claves leidas con .get: q1, median, q3, whisker_lo, whisker_hi, min, max, has_low_outliers, has_high_outliers, lower_fence, upper_fence, n_outliers). fliers es la lista opcional de outliers crudos: si viene se dibuja como puntos; si es None/ausente solo se marcan los extremos box[min]/box[max] cuando hay outliers de cola. Entradas que no son dict, sin box dict, o sin q1/median/q3 se omiten. El caller las pasa ya ordenadas por contaminacion (la mayor primera)."
+  - name: title
+    desc: "Titulo de la figura (fig.suptitle, alineado a la izquierda). Vacio => sin titulo. Si len(boxes) > max_boxes se le anade una nota \"(mostrando N de M)\" para que la truncacion no sea silenciosa. Default \"\"."
+  - name: max_boxes
+    desc: "Numero maximo de cajas a dibujar (las primeras de la lista). Default 12. Un valor no entero o <= 0 cae a 12. Si la lista trae mas entradas, las sobrantes se descartan pero se reporta en el titulo con (mostrando N de M)."
+output: "Un matplotlib.figure.Figure (figsize 7.0 x alto adaptativo = max(2.0, 0.5*n + 1.0), dpi 150) con un unico Axes que apila boxplots horizontales de Tukey (ax.bxp, orientation=horizontal con fallback vert=False), uno por columna valida, de arriba a abajo en el orden recibido. Cada caja: relleno #9ec6df, borde/bigotes/caps #5b8aa6, mediana #2e8b57, atipicos #c0392b. Etiquetas del eje Y = nombres de columna; eje X etiquetado \"valor\". Outliers dibujados desde fliers crudos (showfliers) o, si faltan, marcados en box[min]/box[max] segun has_low/high_outliers. Si no queda ninguna caja valida (lista vacia o todas invalidas) devuelve una Figure placeholder con texto centrado \"(sin boxplots)\"; cualquier error inesperado se captura y devuelve una Figure con el mensaje de error. NUNCA lanza. El caller rasteriza/cierra la figura; la funcion no la muestra ni la guarda."
+---
+
+## Ejemplo
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience.build_boxplot_stats import build_boxplot_stats
+from datascience.build_boxplots_figure import build_boxplots_figure
+
+# Un `box` por columna numérica, derivado del sub-bloque `numeric` del profile
+# (salida de describe_numeric). El caller los pasa ya ordenados por outlier_pct.
+boxes = [
+    {
+        "name": "ingresos",
+        "box": build_boxplot_stats({
+            "min": 1.0, "max": 9000.0,
+            "p25": 1000.0, "median": 2000.0, "p75": 3000.0,
+            "n_outliers": 7,
+        }),
+        "fliers": None,  # valores crudos desconocidos -> se marca solo el extremo.
+    },
+    {
+        "name": "edad",
+        "box": build_boxplot_stats({
+            "min": 0.0, "max": 99.0,
+            "p25": 25.0, "median": 38.0, "p75": 52.0,
+        }),
+        "fliers": [88.0, 95.0, 99.0],  # outliers crudos -> se dibujan como puntos.
+    },
+]
+
+fig = build_boxplots_figure(boxes, title="Outliers por columna", max_boxes=12)
+
+# El renderer del informe lo rasteriza; aquí solo persistimos para inspección.
+fig.savefig("/tmp/boxplots.png")
+```
+
+## Cuando usarla
+
+Úsala en el capítulo de outliers de un informe EDA cuando quieras comparar de un
+vistazo *qué columnas están más contaminadas por valores atípicos*: a diferencia
+de `num_distr` (que dibuja un histograma+boxplot por columna en figuras
+separadas), aquí apilas todos los boxplots horizontales en **una sola figura**
+(small multiples). Primero deriva el `box` de cada columna con
+`build_boxplot_stats`, ordénalas por `outlier_pct` descendente, envuélvelas como
+`{"name", "box", "fliers"}` y pásaselas. Si tienes los valores crudos fuera de
+las vallas, métele la lista `fliers` y se dibujarán como puntos; si no, la
+función marca solo los extremos `min`/`max` cuando hay cola.
+
+## Gotchas
+
+- **Impura por matplotlib.** Toca la maquinaria de render. Usa el backend `Agg`
+  y la API orientada a objetos `Figure`/`add_subplot` — NUNCA `pyplot.*` aquí,
+  para no tocar el estado global ni filtrar figuras entre llamadas. `pyplot` NO
+  es thread-safe; esta función construye el `Figure` directamente, así que es
+  segura de llamar en bucle desde el renderer.
+- **El caller cierra la figura.** Devuelve el `Figure` pero no lo muestra ni lo
+  guarda. Quien la consume debe rasterizarla y luego liberarla
+  (`matplotlib.pyplot.close(fig)`) para no acumular memoria en lotes grandes.
+- **`fliers` opcional, semántica distinta.** Si pasas la lista de outliers
+  crudos se dibujan todos como puntos (`showfliers=True`). Si es `None`/ausente
+  los valores son desconocidos y solo se marca un punto en `box["min"]` /
+  `box["max"]` cuando `has_low_outliers` / `has_high_outliers` — mismo criterio
+  que `num_distr`. No inventes fliers a partir del profile: el `box` no trae los
+  valores crudos, solo si los extremos superan las vallas.
+- **API de orientación de `ax.bxp`.** matplotlib reciente usa
+  `orientation="horizontal"`; las versiones antiguas usan `vert=False`. La
+  función prueba la primera y cae a la segunda en `except TypeError`, así que
+  funciona en ambas. Si `bxp` falla del todo, el Axes degrada a un texto
+  "(boxplot no disponible)" en vez de propagar.
+- **Truncación visible.** `max_boxes` (default 12) limita el nº de cajas para que
+  ninguna se solape; si la lista trae más, las sobrantes se descartan pero se
+  avisa en el título con "(mostrando N de M)". Pasa las columnas ya ordenadas por
+  contaminación para que las descartadas sean las menos relevantes.
+- **Defensiva, nunca lanza.** Lista vacía, entradas no-dict, sin `box`, o sin
+  `q1`/`median`/`q3` se omiten sin propagar; sin cajas válidas devuelve un
+  placeholder "(sin boxplots)" y cualquier error inesperado se captura en una
+  figura con el texto del error. No envuelvas la llamada en try/except por miedo
+  a un raise — no lo hay.
@@ -0,0 +1,250 @@
+"""Impure EDA helper: a single figure of horizontal Tukey boxplots (`eda` group).
+
+Draws, in one ``matplotlib.figure.Figure``, a stack of horizontal Tukey boxplots
+(one per column) using ``ax.bxp``: each carries its box (Q1–Q3), whiskers (up to
+1.5·IQR), the median line and its outlier points. It consumes the output of the
+pure registry function ``build_boxplot_stats`` (one ``box`` dict per column) plus
+an optional list of raw outlier values per column; it never recomputes anything.
+
+It is the "small-multiples" companion of ``num_distr`` (which draws one
+histogram+boxplot per column): here every column shares a single figure so the
+caller can show, at a glance, *which* columns are the most contaminated by
+outliers (the caller passes them already ordered by contamination).
+
+Impure because it touches matplotlib's rendering machinery. It uses the headless
+Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
+global state and is safe to call repeatedly from a report renderer. It is fully
+defensive and NEVER raises: invalid entries are skipped and, if nothing valid
+remains, it returns a placeholder figure carrying a centered "(sin boxplots)".
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+from matplotlib.figure import Figure  # noqa: E402
+
+# Blue palette shared with the ``num_distr`` chapter so the report stays coherent.
+_BOX_FACE = "#9ec6df"   # box fill.
+_BOX_EDGE = "#5b8aa6"   # box / whisker / cap border.
+_MEDIAN = "#2e8b57"     # median line (sea green).
+_OUTLIER = "#c0392b"    # outlier points (soft red).
+# Muted gray for the placeholder / fallback message text.
+_MUTED_TEXT = "#5f6b7a"
+# Soft red for the error fallback message.
+_ERROR_TEXT = "#b00020"
+
+
+def _num(value):
+    """Coerce ``value`` to float defensively; None for None/bool/non-numeric/NaN."""
+    # bool is a subclass of int; a stat value is never a real bool, so treat
+    # True/False as missing instead of silently coercing to 1.0/0.0.
+    if value is None or isinstance(value, bool):
+        return None
+    try:
+        f = float(value)
+    except (TypeError, ValueError):
+        return None
+    if f != f:  # NaN guard.
+        return None
+    return f
+
+
+def _placeholder_figure(message: str, color: str = _MUTED_TEXT) -> "Figure":
+    """Return a fallback ``Figure`` carrying a single centered message."""
+    fig = Figure(figsize=(7.0, 2.4), dpi=150)
+    ax = fig.add_subplot(111)
+    ax.axis("off")
+    ax.text(
+        0.5,
+        0.5,
+        message,
+        ha="center",
+        va="center",
+        fontsize=12,
+        color=color,
+        wrap=True,
+        transform=ax.transAxes,
+    )
+    fig.tight_layout()
+    return fig
+
+
+def build_boxplots_figure(
+    boxes: list,
+    title: str = "",
+    max_boxes: int = 12,
+) -> "matplotlib.figure.Figure":
+    """Build one figure of stacked horizontal Tukey boxplots (one per column).
+
+    For each entry the function builds a ``bxp`` stats record (``med, q1, q3,
+    whislo, whishi, fliers, label``) from its ``box`` sub-dict (the output of
+    ``build_boxplot_stats``) and draws all of them as horizontal boxplots sharing
+    the X axis, top-to-bottom in the order received (the caller is expected to
+    pass them already sorted by contamination).
+
+    Outliers are shown two ways:
+
+    - If an entry carries a ``fliers`` list (the raw out-of-fence values), they
+      are drawn as red points via ``ax.bxp(..., showfliers=True)``.
+    - If ``fliers`` is ``None``/absent, the raw values are unknown, so only the
+      extremes are marked: a red point at ``box["min"]`` when
+      ``box["has_low_outliers"]`` and at ``box["max"]`` when
+      ``box["has_high_outliers"]`` (same convention as ``num_distr``).
+
+    The function is fully defensive and NEVER raises. Entries that are not dicts,
+    lack a ``box`` dict, or miss any of ``q1``/``median``/``q3`` are skipped. If
+    after filtering no valid box remains it returns a placeholder ``Figure`` with
+    a centered "(sin boxplots)"; any unexpected error is caught and turned into a
+    fallback figure carrying the error text. It always returns a ``Figure``.
+
+    Args:
+        boxes: List of dicts ``{"name": str, "box": dict, "fliers": list|None}``.
+            ``box`` is exactly the output of ``build_boxplot_stats`` (read with
+            ``.get``: ``q1, median, q3, whisker_lo, whisker_hi, min, max,
+            has_low_outliers, has_high_outliers, ...``). ``fliers`` is the
+            optional list of raw outlier values; when present they are plotted,
+            otherwise only the extremes are marked.
+        title: Figure title (``fig.suptitle``). Empty => no title. When the list
+            is longer than ``max_boxes`` a "(mostrando N de M)" note is appended.
+        max_boxes: Draw at most the first ``max_boxes`` entries (default 12). The
+            rest are dropped but their omission is surfaced in the title note, so
+            the truncation is never silent.
+
+    Returns:
+        A ``matplotlib.figure.Figure`` with a single Axes holding the horizontal
+        boxplots (height adaptive to the box count so none overlap). The caller is
+        responsible for rasterizing/closing it; this function never shows nor
+        saves it.
+    """
+    try:
+        if not isinstance(boxes, (list, tuple)) or len(boxes) == 0:
+            return _placeholder_figure("(sin boxplots)")
+
+        total = len(boxes)
+
+        # Cap the number of boxes; tolerate a non-int / non-positive max_boxes.
+        try:
+            cap = int(max_boxes)
+        except (TypeError, ValueError):
+            cap = 12
+        if cap <= 0:
+            cap = 12
+        candidates = list(boxes)[:cap]
+
+        stats_list = []        # bxp stats records, in draw order.
+        labels = []            # Y tick labels (column names).
+        manual_markers = []    # (position, box) for entries without raw fliers.
+        any_fliers = False     # whether to enable showfliers in the bxp call.
+
+        for entry in candidates:
+            if not isinstance(entry, dict):
+                continue
+            box = entry.get("box")
+            if not isinstance(box, dict):
+                continue
+
+            q1 = _num(box.get("q1"))
+            med = _num(box.get("median"))
+            q3 = _num(box.get("q3"))
+            # Without the three quartiles a boxplot cannot be drawn — skip it.
+            if q1 is None or med is None or q3 is None:
+                continue
+
+            # Whisker extremes fall back to the quartiles when missing.
+            whislo = _num(box.get("whisker_lo"))
+            whishi = _num(box.get("whisker_hi"))
+            if whislo is None:
+                whislo = q1
+            if whishi is None:
+                whishi = q3
+
+            name = entry.get("name")
+            label = "" if name is None else str(name)
+
+            position = len(stats_list) + 1  # bxp positions are 1-indexed.
+            fliers_raw = entry.get("fliers")
+            if isinstance(fliers_raw, (list, tuple)):
+                fliers = [v for v in (_num(x) for x in fliers_raw) if v is not None]
+                if fliers:
+                    any_fliers = True
+            else:
+                # Raw values unknown: draw no bxp fliers, mark min/max by hand.
+                fliers = []
+                manual_markers.append((position, box))
+
+            stats_list.append({
+                "med": med,
+                "q1": q1,
+                "q3": q3,
+                "whislo": whislo,
+                "whishi": whishi,
+                "fliers": fliers,
+                "label": label,
+            })
+            labels.append(label)
+
+        if not stats_list:
+            return _placeholder_figure("(sin boxplots)")
+
+        n = len(stats_list)
+        positions = list(range(1, n + 1))
+
+        # Height grows with the box count so none of them overlap.
+        height = max(2.0, 0.5 * n + 1.0)
+        fig = Figure(figsize=(7.0, height), dpi=150)
+        ax = fig.add_subplot(111)
+
+        bxp_kw = dict(
+            showfliers=any_fliers, widths=0.5, patch_artist=True,
+            boxprops={"facecolor": _BOX_FACE, "edgecolor": _BOX_EDGE},
+            medianprops={"color": _MEDIAN, "linewidth": 1.6},
+            whiskerprops={"color": _BOX_EDGE},
+            capprops={"color": _BOX_EDGE},
+            flierprops={"marker": "o", "markersize": 3.5,
+                        "markerfacecolor": _OUTLIER, "markeredgecolor": _OUTLIER,
+                        "linestyle": "none"})
+        try:
+            # ``orientation`` is the current API; older matplotlib uses ``vert``.
+            try:
+                ax.bxp(stats_list, positions=positions,
+                       orientation="horizontal", **bxp_kw)
+            except TypeError:
+                ax.bxp(stats_list, positions=positions, vert=False, **bxp_kw)
+        except Exception:  # noqa: BLE001 — never let bxp kill the whole figure.
+            ax.text(0.5, 0.5, "(boxplot no disponible)", ha="center",
+                    va="center", fontsize=10, color=_MUTED_TEXT,
+                    transform=ax.transAxes)
+
+        # For entries without raw fliers, mark only the out-of-fence extremes.
+        for position, box in manual_markers:
+            mn = _num(box.get("min"))
+            mx = _num(box.get("max"))
+            if box.get("has_low_outliers") and mn is not None:
+                ax.plot([mn], [position], marker="o", markersize=3.5,
+                        color=_OUTLIER, zorder=5)
+            if box.get("has_high_outliers") and mx is not None:
+                ax.plot([mx], [position], marker="o", markersize=3.5,
+                        color=_OUTLIER, zorder=5)
+
+        # Pin the Y tick labels explicitly so they work across matplotlib
+        # versions regardless of whether ``bxp`` consumed the ``label`` key.
+        ax.set_yticks(positions)
+        ax.set_yticklabels(labels, fontsize=8)
+        ax.set_xlabel("valor", fontsize=9)
+        ax.tick_params(labelsize=7)
+        ax.margins(y=0.15)
+        for spine in ("top", "right"):
+            ax.spines[spine].set_visible(False)
+
+        # Surface truncation in the title instead of silently dropping boxes.
+        note = f"(mostrando {n} de {total})" if total > cap else ""
+        heading = "  ".join(p for p in (title, note) if p)
+        if heading:
+            fig.suptitle(heading, fontsize=12, x=0.02, ha="left")
+
+        fig.tight_layout()
+        return fig
+    except Exception as exc:  # noqa: BLE001 — never raise from a figure builder.
+        return _placeholder_figure(
+            f"error al dibujar boxplots: {exc}", color=_ERROR_TEXT)
@@ -0,0 +1,109 @@
+"""Tests para build_boxplots_figure (boxplots horizontales de Tukey, grupo eda).
+
+Usa el backend Agg sin display; no muestra ni guarda figuras. Cada test cierra
+explícitamente la Figure construida (matplotlib.pyplot.close) para no acumular
+estado entre tests.
+"""
+
+import matplotlib
+
+matplotlib.use("Agg")
+
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.figure import Figure  # noqa: E402
+
+from build_boxplots_figure import build_boxplots_figure
+
+
+def _box(name, q1, median, q3, mn, mx, low=False, high=False, fliers=None):
+    """Construye una entrada {name, box, fliers} con un box estilo build_boxplot_stats."""
+    iqr = q3 - q1
+    return {
+        "name": name,
+        "box": {
+            "q1": q1,
+            "median": median,
+            "q3": q3,
+            "iqr": iqr,
+            "lower_fence": q1 - 1.5 * iqr,
+            "upper_fence": q3 + 1.5 * iqr,
+            "whisker_lo": max(mn, q1 - 1.5 * iqr),
+            "whisker_hi": min(mx, q3 + 1.5 * iqr),
+            "min": mn,
+            "max": mx,
+            "has_low_outliers": low,
+            "has_high_outliers": high,
+            "n_outliers": 0,
+        },
+        "fliers": fliers,
+    }
+
+
+def test_returns_figure_with_axes():
+    boxes = [
+        _box("edad", 10.0, 25.0, 40.0, 1.0, 100.0, high=True),
+        _box("ingresos", 100.0, 200.0, 300.0, 50.0, 400.0),
+        _box("score", -1.0, 0.0, 1.0, -5.0, 5.0, low=True, high=True),
+    ]
+    fig = build_boxplots_figure(boxes, title="Boxplots", max_boxes=12)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    # Tres cajas -> tres etiquetas en el eje Y.
+    ax = fig.axes[0]
+    assert len(ax.get_yticks()) == 3
+    plt.close(fig)
+
+
+def test_empty_list_returns_placeholder_figure():
+    fig = build_boxplots_figure([], title="vacío")
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_invalid_box_is_skipped_not_raised():
+    boxes = [
+        {"name": "rota", "box": {"q1": None, "median": None, "q3": None}},
+        {"name": "sin_box"},                         # falta la clave box.
+        "no_es_dict",                                 # entrada no-dict.
+        _box("buena", 1.0, 2.0, 3.0, 0.0, 10.0, high=True),
+    ]
+    fig = build_boxplots_figure(boxes)
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    # Solo la caja válida sobrevive al filtrado.
+    assert len(ax.get_yticks()) == 1
+    plt.close(fig)
+
+
+def test_all_invalid_returns_placeholder():
+    boxes = [
+        {"name": "a", "box": {"q1": None, "median": 1.0, "q3": 2.0}},
+        {"name": "b"},
+    ]
+    fig = build_boxplots_figure(boxes)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_raw_fliers_are_drawn():
+    boxes = [
+        _box("con_fliers", 10.0, 20.0, 30.0, 5.0, 200.0,
+             high=True, fliers=[150.0, 180.0, 200.0]),
+    ]
+    fig = build_boxplots_figure(boxes)
+    assert isinstance(fig, Figure)
+    assert len(fig.axes) >= 1
+    plt.close(fig)
+
+
+def test_max_boxes_truncates_and_does_not_raise():
+    boxes = [_box(f"c{i}", float(i), float(i + 1), float(i + 2),
+                  float(i - 5), float(i + 10)) for i in range(20)]
+    fig = build_boxplots_figure(boxes, title="muchos", max_boxes=5)
+    assert isinstance(fig, Figure)
+    ax = fig.axes[0]
+    # Solo se dibujan las primeras 5 cajas.
+    assert len(ax.get_yticks()) == 5
+    plt.close(fig)
@@ -0,0 +1,79 @@
+---
+name: summarize_outlier_dims
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: pure
+signature: "def summarize_outlier_dims(raw_numeric: dict, outlier_rows: list, top_k: int = 3) -> list"
+description: "Explica QUE columnas hacen rara cada fila anomala detectada por isolation_forest_outliers. Para cada {row_index, score} reconstruye la fila valida (mismo filtro de columnas numericas y mismo descarte de filas con None que el detector, asi row_index coincide) y devuelve las top_k columnas de mayor |z-score| poblacional (ddof=0). Capa de explicabilidad del paso de outliers multivariante en EDA. Pura y determinista; ante entradas vacias/invalidas o sin filas validas devuelve [] sin petar."
+tags: [eda, models, outliers, anomaly-detection, explainability, z-score, multivariate]
+params:
+  - name: raw_numeric
+    desc: "dict {nombre_columna: [valores]} alineado por fila (como ctx['raw_numeric'] del motor AutomaticEDA). Solo se usan columnas con todos los valores numericos (None permitido por fila; bool/str/NaN/Inf descartan la columna entera) — filtro IDENTICO al de isolation_forest_outliers para que row_index coincida."
+  - name: outlier_rows
+    desc: "Lista de {row_index, score} tal cual la devuelve isolation_forest_outliers. row_index cuenta SOLO las filas validas (sin None) en orden de aparicion, base 0. Entradas fuera de rango o malformadas se ignoran defensivamente."
+  - name: top_k
+    desc: "Numero de columnas (las de mayor |z-score|) a reportar por outlier. Default 3. Valores invalidos (no-int, bool, <1) caen a 3."
+output: "Lista paralela a outlier_rows (mismo orden) de dicts {row_index: int, score: float, dims: [{col: str, value: float, z: float}, ...]}. dims trae hasta top_k columnas ordenadas por |z| descendente, con z (z-score poblacional, ddof=0) redondeado a 3 decimales; si una columna tiene std==0 su z es 0. Las entradas de outlier_rows fuera de rango/malformadas se omiten. Ante raw_numeric vacio/no-dict, outlier_rows no-lista, 0 columnas numericas o 0 filas validas devuelve []."
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: ""
+imports: []
+tested: true
+tests: ["test_row_index_skips_none_rows", "test_extreme_row_flagged_via_isolation", "test_out_of_range_row_index_is_ignored", "test_degrades_to_empty_on_invalid_inputs"]
+test_file_path: "python/functions/datascience/summarize_outlier_dims_test.py"
+file_path: "python/functions/datascience/summarize_outlier_dims.py"
+---
+
+## Ejemplo
+
+```python
+from datascience import isolation_forest_outliers, summarize_outlier_dims
+
+# Nube densa alrededor del origen + 1 fila con un valor extremo en "c".
+raw_numeric = {
+    "a": [0.1, 0.2, -0.1, 0.0, 0.3, -0.2, 0.15, -0.05, 0.25, 0.2, -0.3, 0.1],
+    "b": [1.0, 1.1, 0.9, 1.2, 0.8, 1.0, 1.1, 0.95, 1.05, 0.9, 1.15, 1.0],
+    "c": [5.0, 5.2, 4.8, 5.1, 4.9, 5.0, 4.95, 5.05, 4.9, 500.0, 5.1, 5.0],
+}
+
+result = isolation_forest_outliers(raw_numeric, contamination=0.1)
+summary = summarize_outlier_dims(raw_numeric, result["outlier_rows"], top_k=3)
+
+for item in summary:
+    top = item["dims"][0]
+    print(item["row_index"], top["col"], top["value"], top["z"])
+# La fila del valor 500 sale con dim top "c" y |z| alto: es lo que la hace rara.
+```
+
+## Cuando usarla
+
+Justo **despues** de `isolation_forest_outliers`, cuando ya sabes QUE filas son
+anomalas y quieres explicar POR QUE: en que columnas se desvian mas respecto al
+resto. Util para rellenar la seccion de outliers de un report/notebook EDA con
+"la fila 9 es rara sobre todo por `c` (z=+3.3)" en lugar de solo un row_index
+opaco. Pasa el mismo `raw_numeric` que diste al detector y su `outlier_rows`
+intacto; el `row_index` apunta a la misma fila porque ambas funciones aplican el
+mismo filtro de columnas y el mismo descarte de filas con None.
+
+## Gotchas
+
+- **Mismo `raw_numeric` que el detector**: el `row_index` solo coincide si pasas
+  el mismo dict de columnas (mismo orden, mismas listas) con el que llamaste a
+  `isolation_forest_outliers`. Si cambias las columnas o el orden, los indices
+  dejan de mapear.
+- **`row_index` es relativo a las filas validas**: las filas con `None` en
+  cualquier columna usada se descartan y los indices se recalculan sobre las que
+  quedan (base 0, orden de aparicion). No mapea 1:1 con las listas de entrada si
+  hay None.
+- **z-score poblacional (ddof=0)**: se usa la desviacion tipica poblacional,
+  consistente con el escalado del detector. Columnas con `std==0` (todos los
+  valores iguales) dan `z=0`, asi que nunca aparecen como "raras".
+- **Devuelve `[]` en vez de petar**: entrada no-dict/no-lista, 0 columnas
+  numericas, 0 filas validas, o todas las entradas fuera de rango -> lista vacia.
+  No lanza excepciones.
+- **No llama a `isolation_forest_outliers`**: solo consume su salida. Es una
+  funcion independiente (no la importa), por eso `uses_functions` esta vacio.
@@ -0,0 +1,144 @@
+"""Explica que dimensiones (columnas) hacen rara cada fila anomala.
+
+Toma la salida multivariante de `isolation_forest_outliers` (lista de
+`{row_index, score}`) y, para cada outlier, devuelve las columnas con mayor
+|z-score| respecto a la distribucion de las filas validas. Es la capa de
+"explicabilidad" del paso de outliers multivariante en la fase EDA: el
+Isolation Forest dice QUE filas son raras, esta funcion dice POR QUE (en que
+columnas se desvian mas).
+
+Pura y determinista: reconstruye EXACTAMENTE las mismas "filas validas" que usa
+`isolation_forest_outliers` (mismo filtro de columnas numericas y mismo descarte
+de filas con None), de modo que el `row_index` apunta a la misma fila en ambas
+funciones. No hace I/O ni depende de estado.
+"""
+
+import math
+
+import numpy as np
+
+
+def _is_finite_number(v) -> bool:
+    """True si v es int/float finito. bool NO cuenta; NaN/Inf tampoco."""
+    if isinstance(v, bool):
+        return False
+    if not isinstance(v, (int, float)):
+        return False
+    if isinstance(v, float) and (math.isnan(v) or math.isinf(v)):
+        return False
+    return True
+
+
+def summarize_outlier_dims(
+    raw_numeric: dict,
+    outlier_rows: list,
+    top_k: int = 3,
+) -> list:
+    """Resume las dimensiones que mas desvian a cada fila anomala.
+
+    Args:
+        raw_numeric: dict {nombre_columna: [valores]} alineado por fila (como
+            ctx['raw_numeric'] del motor AutomaticEDA). Solo se usan columnas
+            cuyos valores sean todos numericos (None permitido por fila; bool,
+            str, NaN e Inf descartan la columna entera) — filtro identico al de
+            isolation_forest_outliers.
+        outlier_rows: lista de {row_index, score} tal como la devuelve
+            isolation_forest_outliers. row_index cuenta SOLO las filas validas
+            (sin None) en orden de aparicion, empezando en 0.
+        top_k: numero de columnas (las de mayor |z-score|) a reportar por cada
+            outlier. Default 3. Valores invalidos caen a 3.
+
+    Returns:
+        Lista paralela a outlier_rows (mismo orden) de dicts
+        {row_index, score, dims}, donde dims es la lista de hasta top_k columnas
+        ordenadas por |z| descendente: [{col, value, z}, ...] con z redondeado a
+        3 decimales. Las entradas de outlier_rows fuera de rango o malformadas se
+        omiten (defensivo). Ante raw_numeric vacio/no-dict, outlier_rows
+        no-lista, 0 columnas numericas o 0 filas validas devuelve [].
+    """
+    # Validacion defensiva de los argumentos principales.
+    if not isinstance(raw_numeric, dict) or not isinstance(outlier_rows, list):
+        return []
+    if not isinstance(top_k, int) or isinstance(top_k, bool) or top_k < 1:
+        top_k = 3
+
+    # Seleccion de columnas numericas: identica a isolation_forest_outliers.
+    # Una columna entra solo si todos sus valores son numericos (None permitido
+    # por fila); cualquier bool/str/NaN/Inf descarta la columna completa.
+    numeric_cols: dict[str, list] = {}
+    for name, values in raw_numeric.items():
+        if not isinstance(values, (list, tuple)):
+            continue
+        ok = True
+        for v in values:
+            if v is None:
+                continue
+            if not _is_finite_number(v):
+                ok = False
+                break
+        if ok:
+            numeric_cols[name] = list(values)
+
+    if len(numeric_cols) < 1:
+        return []
+
+    col_names = list(numeric_cols.keys())
+    try:
+        n_rows_total = min(len(numeric_cols[c]) for c in col_names)
+    except ValueError:
+        return []
+
+    # Reconstruye las filas validas con el MISMO criterio que el detector: la
+    # fila i toma un valor por columna; si cualquier valor es None, la fila se
+    # descarta y NO incrementa el indice valido. Asi row_index de outlier_rows
+    # apunta a esta misma secuencia (base 0, orden de aparicion).
+    valid_rows: list[list[float]] = []
+    for i in range(n_rows_total):
+        row = [numeric_cols[c][i] for c in col_names]
+        if any(v is None for v in row):
+            continue
+        valid_rows.append([float(v) for v in row])
+
+    if not valid_rows:
+        return []
+
+    matrix = np.asarray(valid_rows, dtype=float)
+    n_valid = matrix.shape[0]
+    means = matrix.mean(axis=0)
+    stds = matrix.std(axis=0, ddof=0)  # poblacional (ddof=0)
+
+    out: list = []
+    for entry in outlier_rows:
+        if not isinstance(entry, dict):
+            continue
+        ri = entry.get("row_index")
+        # bool es subclase de int: lo excluimos explicitamente.
+        if not isinstance(ri, int) or isinstance(ri, bool):
+            continue
+        if ri < 0 or ri >= n_valid:
+            continue
+
+        try:
+            score = float(entry.get("score"))
+        except (TypeError, ValueError):
+            score = 0.0
+
+        row = matrix[ri]
+        dims = []
+        for j, name in enumerate(col_names):
+            std = stds[j]
+            if std == 0.0:
+                z = 0.0
+            else:
+                z = float((row[j] - means[j]) / std)
+            dims.append({"col": name, "value": float(row[j]), "z": z})
+
+        # Mayor |z| primero; sort estable, empates por orden de columna.
+        dims.sort(key=lambda d: abs(d["z"]), reverse=True)
+        dims = dims[:top_k]
+        for d in dims:
+            d["z"] = round(d["z"], 3)
+
+        out.append({"row_index": int(ri), "score": score, "dims": dims})
+
+    return out
@@ -0,0 +1,93 @@
+"""Tests para summarize_outlier_dims."""
+
+from isolation_forest_outliers import isolation_forest_outliers
+from summarize_outlier_dims import summarize_outlier_dims
+
+
+# Dataset compartido: 3 columnas, 13 filas. La fila ORIGINAL 6 tiene None en "a"
+# (se descarta), de modo que la fila ORIGINAL 10 -- con un valor extremo en "c"
+# -- queda en el indice VALIDO 9 (no 10). Esto verifica el salto de None.
+A = [0.1, 0.2, -0.1, 0.0, 0.3, -0.2, None, 0.15, -0.05, 0.25, 0.2, -0.3, 0.1]
+B = [1.0, 1.1, 0.9, 1.2, 0.8, 1.0, 1.3, 1.1, 0.95, 1.05, 0.9, 1.15, 1.0]
+C = [5.0, 5.2, 4.8, 5.1, 4.9, 5.0, 5.3, 4.95, 5.05, 4.9, 500.0, 5.1, 5.0]
+RAW = {"a": A, "b": B, "c": C}
+
+# Mapa original -> valido (saltando original 6):
+#   orig: 0 1 2 3 4 5 7 8 9 10 11 12
+#  valid: 0 1 2 3 4 5 6 7 8  9 10 11
+# => el extremo en "c" (original 10) esta en el indice valido 9.
+EXTREME_VALID_INDEX = 9
+
+
+def test_row_index_skips_none_rows():
+    # Mapeo directo (sin depender de la aleatoriedad de IsolationForest): el
+    # indice valido 9 debe corresponder a la fila con c == 500 -> el None de la
+    # fila original 6 se salto correctamente.
+    summary = summarize_outlier_dims(
+        RAW, [{"row_index": EXTREME_VALID_INDEX, "score": -0.5}], top_k=3
+    )
+    assert len(summary) == 1
+    entry = summary[0]
+    assert entry["row_index"] == EXTREME_VALID_INDEX
+    assert entry["score"] == -0.5
+    # La dimension dominante es "c", con su valor extremo y |z| alto.
+    top = entry["dims"][0]
+    assert top["col"] == "c"
+    assert top["value"] == 500.0
+    assert abs(top["z"]) > 2.0
+    # top_k respetado: como mucho 3 dims.
+    assert len(entry["dims"]) <= 3
+
+
+def test_extreme_row_flagged_via_isolation():
+    # Integracion real: detectar outliers y explicarlos.
+    result = isolation_forest_outliers(RAW, contamination=0.1)
+    assert "note" not in result
+    outlier_rows = result["outlier_rows"]
+    assert outlier_rows  # al menos un outlier
+
+    summary = summarize_outlier_dims(RAW, outlier_rows, top_k=3)
+    # Paralela a outlier_rows (todos los indices estan en rango).
+    assert len(summary) == len(outlier_rows)
+
+    by_index = {e["row_index"]: e for e in summary}
+    # El punto extremo debe estar entre los outliers detectados...
+    assert EXTREME_VALID_INDEX in by_index
+    # ...y su dimension top debe ser "c" (donde se desvia ~muchas sigmas).
+    extreme = by_index[EXTREME_VALID_INDEX]
+    assert extreme["dims"][0]["col"] == "c"
+    assert abs(extreme["dims"][0]["z"]) > 2.0
+
+
+def test_out_of_range_row_index_is_ignored():
+    # Indices fuera de rango se omiten en lugar de petar.
+    summary = summarize_outlier_dims(
+        RAW,
+        [
+            {"row_index": 999, "score": -1.0},
+            {"row_index": -1, "score": -1.0},
+            {"row_index": EXTREME_VALID_INDEX, "score": -0.5},
+        ],
+        top_k=2,
+    )
+    # Solo sobrevive el indice valido; los otros dos se descartan.
+    assert len(summary) == 1
+    assert summary[0]["row_index"] == EXTREME_VALID_INDEX
+    assert len(summary[0]["dims"]) <= 2
+
+
+def test_degrades_to_empty_on_invalid_inputs():
+    # raw_numeric vacio + outlier_rows vacio.
+    assert summarize_outlier_dims({}, [], 3) == []
+    # raw_numeric no es dict.
+    assert summarize_outlier_dims("not a dict", [{"row_index": 0}], 3) == []
+    # outlier_rows no es lista.
+    assert summarize_outlier_dims(RAW, "not a list", 3) == []
+    # Sin columnas numericas (todas con strings) -> [].
+    assert summarize_outlier_dims(
+        {"s": ["x", "y", "z"]}, [{"row_index": 0, "score": -1.0}], 3
+    ) == []
+    # Entradas malformadas dentro de outlier_rows se ignoran (no petan).
+    assert summarize_outlier_dims(
+        RAW, ["nope", 42, {"no_row_index": 1}], 3
+    ) == []
@@ -0,0 +1,466 @@
+"""Batería de tests de ACEPTACIÓN del AutomaticEDA — "que cada AEDA salga como queremos".
+
+Esta suite es la red de seguridad del subsistema EDA del grupo `eda`: garantiza
+que CADA capítulo de un informe AutomaticEDA sale poblado y con su contenido
+esencial, que la feature de capítulos sueltos (``only_chapters``) resuelve sus
+dependencias de cómputo, que los capítulos opcionales devuelven None cuando no
+aplican, que el informe de carpeta multi-tabla detecta la FK, y que el Markdown
+trae el apéndice completo (matriz de asociación entera + describe con
+skew/kurtosis). A diferencia de los tests unitarios de cada capítulo, aquí se
+ejercita el pipeline END-TO-END sobre un dataset sintético determinista que
+activa todos los capítulos a la vez.
+
+Determinismo: el dataset se genera con ``seed`` fijo y el pipeline corre sin LLM
+(``profile_level='standard'``), de modo que el manifest y el Markdown son
+reproducibles entre corridas. Un único render `standard` se reutiliza vía un
+fixture de scope module para no repetir el cómputo caro.
+
+dict-no-throw: los pipelines del grupo `eda` nunca lanzan; aquí se asserta sobre
+``status == 'ok'`` y luego sobre el contenido concreto del manifest / Markdown.
+
+Honestidad (DoD): los asserts comprueban CONTENIDO real (texto esencial de cada
+capítulo), no solo el heading. Si un capítulo dejara de emitir su contenido (un
+cambio rompiera la distribución numérica, el Isolation Forest, la matriz de
+correlación completa, …), el test correspondiente FALLA nombrando el capítulo y
+el fragmento ausente — no se ablanda para que pase.
+"""
+
+import json
+import os
+import subprocess
+import sys
+
+import pytest
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+from datascience.automatic_eda import CHAPTER_ORDER  # noqa: E402
+from datascience.generate_synthetic_eda_folder import (  # noqa: E402
+    generate_synthetic_eda_folder,
+)
+from datascience.generate_synthetic_eda_table import (  # noqa: E402
+    generate_synthetic_eda_table,
+)
+from pipelines.render_automatic_eda import render_automatic_eda  # noqa: E402
+from pipelines.render_automatic_eda_folder import (  # noqa: E402
+    render_automatic_eda_folder,
+)
+
+# --------------------------------------------------------------------------- #
+# Parámetros deterministas del fixture de oro.
+# --------------------------------------------------------------------------- #
+SEED = 42
+N_ROWS = 800
+TABLE = "synthetic"
+
+# El capítulo `analisis_llm` SOLO se computa con run_llm=True; en el preset
+# `standard` (sin LLM, lo que esta suite usa) no debe aparecer. Por eso los
+# capítulos esperados en un informe `standard` son todos los de CHAPTER_ORDER
+# MENOS analisis_llm. CHAPTER_ORDER es la fuente de verdad de los 16 capítulos
+# del motor (portada … glosario).
+LLM_ONLY_CHAPTERS = {"analisis_llm"}
+EXPECTED_STANDARD = [c for c in CHAPTER_ORDER if c not in LLM_ONLY_CHAPTERS]
+
+
+def _pdf_text(path):
+    """Texto del PDF vía pdftotext, o None si la herramienta no está disponible."""
+    try:
+        out = subprocess.run(
+            ["pdftotext", "-layout", path, "-"],
+            capture_output=True, text=True, timeout=60,
+        )
+        return out.stdout if out.returncode == 0 else None
+    except Exception:  # noqa: BLE001 — la verificación principal es sobre el MD.
+        return None
+
+
+def _manifest_chapters(result):
+    """Set de ids de capítulo presentes en el manifest del resultado."""
+    with open(result["manifest_path"], encoding="utf-8") as fh:
+        return set((json.load(fh).get("chapters") or {}).keys())
+
+
+# --------------------------------------------------------------------------- #
+# Fixtures de scope module: el dataset sintético se genera UNA vez y el render
+# `standard` se computa UNA vez; todos los tests de contenido lo reutilizan.
+# --------------------------------------------------------------------------- #
+@pytest.fixture(scope="module")
+def synth_db(tmp_path_factory):
+    """Tabla sintética determinista que activa los 16 capítulos del motor."""
+    d = tmp_path_factory.mktemp("aeda_accept_synth")
+    db = str(d / "synthetic.duckdb")
+    g = generate_synthetic_eda_table(db, TABLE, n_rows=N_ROWS, seed=SEED)
+    assert g["status"] == "ok", g.get("error")
+    return {"db": db, "table": TABLE, "gen": g}
+
+
+@pytest.fixture(scope="module")
+def standard_run(synth_db, tmp_path_factory):
+    """Render AutomaticEDA `standard` (sin LLM) sobre el dataset sintético.
+
+    Devuelve el dict del pipeline más el manifest cargado, el texto del Markdown
+    y el del PDF (si pdftotext está). Reutilizado por la mayoría de los tests.
+    """
+    out = str(tmp_path_factory.mktemp("aeda_accept_std"))
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        profile_level="standard", out_dir=out, basename="synth_std",
+    )
+    assert r["status"] == "ok", r.get("error")
+    with open(r["manifest_path"], encoding="utf-8") as fh:
+        manifest = json.load(fh)
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    return {
+        "r": r,
+        "manifest": manifest,
+        "chapters": manifest.get("chapters") or {},
+        "md": md,
+        "pdf_text": _pdf_text(r["pdf_path"]),
+    }
+
+
+@pytest.fixture(scope="module")
+def minimal_db(tmp_path_factory):
+    """Tabla mínima SIN texto libre, SIN fecha y SIN lat/lon.
+
+    Sirve para comprobar que text_distr / timeseries / geospatial devuelven None
+    (no aparecen en el manifest) y el EDA no peta. Solo numéricas continuas +
+    una categórica de baja cardinalidad.
+    """
+    import random
+
+    import duckdb
+
+    d = tmp_path_factory.mktemp("aeda_accept_min")
+    db = str(d / "minimal.duckdb")
+    con = duckdb.connect(db)
+    con.execute("CREATE TABLE minimal (a DOUBLE, b DOUBLE, c INTEGER, grp VARCHAR)")
+    random.seed(7)
+    rows = [
+        (round(random.gauss(10, 2), 3), round(random.gauss(50, 5), 3),
+         random.randint(1, 100), ["x", "y", "z"][i % 3])
+        for i in range(120)
+    ]
+    con.executemany("INSERT INTO minimal VALUES (?,?,?,?)", rows)
+    con.close()
+    return {"db": db, "table": "minimal"}
+
+
+# --------------------------------------------------------------------------- #
+# 1) COBERTURA DE CAPÍTULOS (golden) — el manifest standard trae los 15
+#    capítulos no-LLM esperados, ninguno falta, y analisis_llm NO sale sin LLM.
+# --------------------------------------------------------------------------- #
+def test_standard_cubre_todos_los_capitulos_esperados(standard_run):
+    chapters = set(standard_run["chapters"].keys())
+    expected = set(EXPECTED_STANDARD)
+    missing = expected - chapters
+    assert not missing, (
+        "capítulos esperados ausentes del manifest standard: "
+        f"{sorted(missing)} (presentes: {sorted(chapters)})"
+    )
+    # analisis_llm requiere run_llm=True: en standard NO debe aparecer.
+    assert "analisis_llm" not in chapters, (
+        "analisis_llm apareció sin LLM: el preset standard no debería computarlo"
+    )
+
+
+def test_manifest_top_level_es_valido(standard_run):
+    """El manifest declara el motor y un dict de capítulos con metadatos por id."""
+    man = standard_run["manifest"]
+    assert man.get("engine") == "AutomaticEDA"
+    assert man.get("engine_version")
+    chapters = standard_run["chapters"]
+    # Cada capítulo trae version + nº de páginas/slides (formato del manifest).
+    for cid, meta in chapters.items():
+        assert meta.get("version"), f"capítulo {cid} sin version en el manifest"
+        assert (meta.get("n_pages") or 0) > 0, f"capítulo {cid} con 0 páginas"
+
+
+# --------------------------------------------------------------------------- #
+# 2) CONTENIDO CLAVE POR CAPÍTULO (acceptance) — cada capítulo trae su contenido
+#    ESENCIAL en el Markdown, no solo el heading. Un fragmento ausente nombra el
+#    capítulo y el texto que falta.
+# --------------------------------------------------------------------------- #
+# Fragmentos de texto ESTABLE que cada capítulo emite en el Markdown del dataset
+# sintético. No son números frágiles: son etiquetas/estructura del capítulo más
+# nombres de columna del fixture. Si un capítulo deja de poblar su contenido, su
+# fragmento desaparece y el test falla nombrándolo.
+CHAPTER_NEEDLES = {
+    "portada":      ["800 filas", "19 columnas"],
+    "overview":     ["Primeras filas (df.head)", "Diccionario de columnas",
+                     "customer_id", "signup_date"],
+    "num_distr":    ["Distribuciones numéricas", "vallas Tukey", "income"],
+    "cat_distr":    ["Distribuciones categóricas", "Entropía", "Top categorías",
+                     "country"],
+    "text_distr":   ["Texto libre (NLP)", "TTR", "Términos más frecuentes",
+                     "Idioma dominante"],
+    "calidad":      ["Cómo se calcula la calidad", "Calidad global"],
+    "missingness":  ["Datos faltantes", "Celdas faltantes (global)",
+                     "Faltantes por columna"],
+    "outliers":     ["Valores atípicos por columna", "Filas atípicas (multivariante)",
+                     "Isolation Forest", "Filas analizadas"],
+    "correlacion":  ["Matriz de asociación", "Pares más correlacionados"],
+    "relaciones":   ["Candidatas a clave primaria", "customer_id"],
+    "modelos":      ["PCA — varianza explicada", "Segmentación (KMeans)"],
+    "timeseries":   ["Series temporales", "Columna de fecha", "signup_date"],
+    "geospatial":   ["Análisis geoespacial", "Extensión geográfica", "Centroide"],
+    "agregacion":   ["Agregación por grupos", "Agrupado por"],
+    "glosario":     ["Glosario de términos",
+                     "### Isolation Forest (anomalías multivariantes)",
+                     "### PCA (componentes principales)"],
+}
+
+
+def test_needles_cubren_exactamente_los_capitulos_standard():
+    """Guard de mantenimiento: las needles cubren los mismos 15 capítulos no-LLM.
+
+    Si alguien añade un capítulo nuevo a CHAPTER_ORDER, este test recuerda que
+    hay que documentar su contenido esencial aquí (o marcarlo como LLM-only)."""
+    assert set(CHAPTER_NEEDLES.keys()) == set(EXPECTED_STANDARD), (
+        "CHAPTER_NEEDLES desincronizado con los capítulos esperados de standard: "
+        f"falta needles para {set(EXPECTED_STANDARD) - set(CHAPTER_NEEDLES)}, "
+        f"sobra {set(CHAPTER_NEEDLES) - set(EXPECTED_STANDARD)}"
+    )
+
+
+@pytest.mark.parametrize("chapter_id", list(CHAPTER_NEEDLES.keys()))
+def test_capitulo_trae_su_contenido_esencial(standard_run, chapter_id):
+    md = standard_run["md"]
+    # Pre-condición: el capítulo está en el manifest (cobertura). Si no, es un
+    # fallo de cobertura, no de contenido — se reporta como tal.
+    assert chapter_id in standard_run["chapters"], (
+        f"capítulo {chapter_id} ausente del manifest (fallo de cobertura)"
+    )
+    for needle in CHAPTER_NEEDLES[chapter_id]:
+        assert needle in md, (
+            f"capítulo '{chapter_id}': falta su contenido esencial en el Markdown "
+            f"— fragmento ausente: {needle!r}"
+        )
+
+
+def test_outliers_isolation_forest_poblado_no_degradado(standard_run):
+    """El bloque multivariante (Isolation Forest) sale con datos, no degradado."""
+    md = standard_run["md"]
+    assert "Anomalías multivariantes" in md
+    assert "Filas analizadas" in md, "el Isolation Forest no trae su tabla poblada"
+    assert "No se pudo analizar la anomalía multivariante" not in md, (
+        "el bloque multivariante salió degradado en el informe completo"
+    )
+    # El perfil trae el bloque de modelos con los outliers multivariantes.
+    models = (standard_run["r"]["profile"] or {}).get("models") or {}
+    assert models.get("outliers") is not None, "profile['models']['outliers'] vacío"
+
+
+# --------------------------------------------------------------------------- #
+# 3) CAPÍTULOS SUELTOS CON DEPS RESUELTAS (acceptance de only_chapters) — pedir
+#    un capítulo suelto lo deja POBLADO porque la resolución de dependencias
+#    activa el cómputo que necesita, aunque el caller no lo pidiera.
+# --------------------------------------------------------------------------- #
+def test_only_outliers_isolation_forest_poblado(synth_db, tmp_path):
+    """only=['outliers'] sin run_models explícito → IsolationForest poblado."""
+    out = str(tmp_path / "only_out")
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        only_chapters=["outliers"], out_dir=out, basename="only_outliers",
+    )
+    assert r["status"] == "ok", r.get("error")
+    # Documento = portada + outliers + glosario, nada más.
+    assert _manifest_chapters(r) == {"portada", "outliers", "glosario"}
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Filas atípicas (multivariante)" in md
+    assert "Filas analizadas" in md, "Isolation Forest sin tabla poblada"
+    assert "No se pudo analizar la anomalía multivariante" not in md, (
+        "el multivariante salió degradado pese a resolver las deps"
+    )
+    # La resolución activó run_models → el perfil trae el bloque de modelos.
+    assert ((r["profile"] or {}).get("models") or {}).get("outliers") is not None
+
+
+def test_only_timeseries_rango_temporal_presente(synth_db, tmp_path):
+    """only=['timeseries'] → rango temporal poblado (run_series resuelto)."""
+    out = str(tmp_path / "only_ts")
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        only_chapters=["timeseries"], out_dir=out, basename="only_ts",
+    )
+    assert r["status"] == "ok", r.get("error")
+    assert "timeseries" in _manifest_chapters(r)
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Columna de fecha" in md
+    assert "signup_date" in md, "la serie no nombra su columna de fecha"
+    # run_series resuelto por deps → el perfil trae el análisis de serie.
+    assert (r["profile"] or {}).get("series") is not None, (
+        "only=['timeseries'] debe activar run_series por dependencias"
+    )
+
+
+def test_only_correlacion_scatters_presentes(synth_db, tmp_path):
+    """only=['correlacion'] → matriz + scatters de los pares fuertes."""
+    out = str(tmp_path / "only_corr")
+    r = render_automatic_eda(
+        synth_db["db"], synth_db["table"],
+        only_chapters=["correlacion"], out_dir=out, basename="only_corr",
+    )
+    assert r["status"] == "ok", r.get("error")
+    assert _manifest_chapters(r) == {"portada", "correlacion", "glosario"}
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Matriz de asociación" in md
+    assert "Relaciones más fuertes (scatter)" in md, "faltan los scatters"
+    assert "Dispersión de" in md, "no se emitió ninguna figura de dispersión"
+
+
+# --------------------------------------------------------------------------- #
+# 4) NONE CUANDO NO APLICA — sobre una tabla sin texto largo, sin fecha y sin
+#    lat/lon, text_distr / timeseries / geospatial NO aparecen y el EDA no peta.
+# --------------------------------------------------------------------------- #
+def test_capitulos_opcionales_ausentes_cuando_no_aplican(minimal_db, tmp_path):
+    out = str(tmp_path / "minimal_out")
+    r = render_automatic_eda(
+        minimal_db["db"], minimal_db["table"],
+        profile_level="standard", out_dir=out, basename="minimal",
+    )
+    assert r["status"] == "ok", r.get("error")
+    chapters = _manifest_chapters(r)
+    for absent in ("text_distr", "timeseries", "geospatial"):
+        assert absent not in chapters, (
+            f"capítulo {absent} apareció en una tabla que no lo justifica "
+            f"(presentes: {sorted(chapters)})"
+        )
+    # El documento sigue siendo válido: portada + glosario + capítulos que sí
+    # aplican (overview/num_distr/correlacion al menos).
+    assert {"portada", "glosario", "overview", "num_distr"} <= chapters
+
+
+# --------------------------------------------------------------------------- #
+# 5) FOLDER MULTI-TABLA (acceptance) — el informe de carpeta perfila las N tablas
+#    y el capítulo de relaciones detecta la FK por containment.
+# --------------------------------------------------------------------------- #
+def test_folder_multitabla_con_fk_detectada(tmp_path):
+    fdir = str(tmp_path / "folder")
+    g = generate_synthetic_eda_folder(fdir, n_rows=300, seed=SEED)
+    assert g["status"] == "ok", g.get("error")
+
+    out = str(tmp_path / "fout")
+    rf = render_automatic_eda_folder(fdir, out_dir=out, basename="folder")
+    assert rf["status"] == "ok", rf.get("error")
+
+    # Las 3 tablas se perfilaron.
+    assert rf["n_tables"] == 3, f"esperadas 3 tablas, vistas {rf['n_tables']}"
+
+    # El manifest base trae el capítulo de relaciones inter-tabla.
+    with open(rf["manifest_path"], encoding="utf-8") as fh:
+        chapters = set((json.load(fh).get("chapters") or {}).keys())
+    assert "relaciones" in chapters, (
+        f"el documento de carpeta no incluye el capítulo de relaciones: {chapters}"
+    )
+
+    # El Markdown nombra las 3 tablas y declara la FK detectada por containment.
+    md = open(rf["md_path"], encoding="utf-8").read()
+    for tbl in ("customers", "orders", "reviews"):
+        assert tbl in md, f"la tabla {tbl} no aparece en el informe de carpeta"
+    assert "FK candidatas" in md, "no se declaran las FK candidatas"
+    assert "orders.customer_id" in md and "customers.customer_id" in md, (
+        "la FK orders→customers no se detectó por containment"
+    )
+    assert "reviews.customer_id" in md, "la FK reviews→customers no se detectó"
+
+
+# --------------------------------------------------------------------------- #
+# 6) MD COMPLETITUD (regresión) — el Markdown trae el apéndice con la matriz de
+#    asociación COMPLETA (todos los pares, no solo el top) y el describe con
+#    skew/kurtosis de todas las numéricas. Protege un fix ya mergeado.
+# --------------------------------------------------------------------------- #
+def test_md_apendice_matriz_correlacion_completa(standard_run):
+    md = standard_run["md"]
+    assert "Matriz de asociación — todos los pares" in md, (
+        "falta el apéndice con la matriz de asociación completa"
+    )
+    # Un par num-num de correlación BAJA que el top del capítulo NUNCA mostraría:
+    # su presencia prueba que el apéndice lista TODOS los pares, no solo el top.
+    assert "income ↔ longitude" in md, (
+        "el apéndice no contiene los pares de baja correlación: no es la matriz "
+        "completa, solo el top-k del capítulo"
+    )
+
+
+def test_md_apendice_describe_con_skew_kurtosis(standard_run):
+    md = standard_run["md"]
+    assert "Estadísticos numéricos completos (describe)" in md, (
+        "falta el apéndice describe completo"
+    )
+    # La cabecera del describe del apéndice lleva las columnas skew y kurtosis
+    # (subcadena única de ese header). Sin ellas el describe está incompleto.
+    assert "| skew | kurtosis |" in md, (
+        "el describe del apéndice no trae las columnas skew/kurtosis"
+    )
+
+
+# --------------------------------------------------------------------------- #
+# 7) LAS 3 SALIDAS NO-VACÍAS — PDF con páginas, PPTX con slides, MD con un mínimo
+#    de caracteres, y los tres archivos en disco. Manifest válido.
+# --------------------------------------------------------------------------- #
+def test_tres_salidas_no_vacias(standard_run):
+    r = standard_run["r"]
+    assert r["pdf_path"] and os.path.exists(r["pdf_path"])
+    assert r["pptx_path"] and os.path.exists(r["pptx_path"])
+    assert r["aeda_md_path"] and os.path.exists(r["aeda_md_path"])
+    assert (r["n_pages"] or 0) > 0, "el PDF no tiene páginas"
+    assert (r["n_slides"] or 0) > 0, "el PPTX no tiene slides"
+    # El informe completo es grande: un mínimo holgado protege contra un MD vacío
+    # o truncado sin atarse a un tamaño exacto.
+    assert (r["md_chars"] or 0) > 10000, f"MD demasiado corto: {r['md_chars']} chars"
+    assert r["manifest_path"] and os.path.exists(r["manifest_path"])
+
+
+def test_pdf_texto_extraible_con_contenido(standard_run):
+    """Si pdftotext está disponible, el PDF debe traer texto real (no solo
+    imágenes): la portada nombra el dataset y su forma. Si no está la
+    herramienta, el test se omite (no es un fallo del EDA)."""
+    txt = standard_run["pdf_text"]
+    if txt is None:
+        pytest.skip("pdftotext no disponible")
+    assert len(txt) > 5000, "el PDF apenas tiene texto extraíble"
+    assert "Portada" in txt or "synthetic" in txt, (
+        "el texto del PDF no contiene la portada esperada"
+    )
+
+
+# --------------------------------------------------------------------------- #
+# DETERMINISMO — dos renders del MISMO dataset producen el MISMO manifest
+# (mismos capítulos y mismos n_pages/n_slides por capítulo). El generated_at
+# difiere por timestamp, por eso se compara el dict de capítulos, no el archivo.
+# --------------------------------------------------------------------------- #
+def test_render_es_determinista(synth_db, tmp_path):
+    out1 = str(tmp_path / "det1")
+    out2 = str(tmp_path / "det2")
+    r1 = render_automatic_eda(synth_db["db"], synth_db["table"],
+                              profile_level="standard", out_dir=out1, basename="d1")
+    r2 = render_automatic_eda(synth_db["db"], synth_db["table"],
+                              profile_level="standard", out_dir=out2, basename="d2")
+    assert r1["status"] == "ok" and r2["status"] == "ok"
+    c1 = json.load(open(r1["manifest_path"], encoding="utf-8")).get("chapters")
+    c2 = json.load(open(r2["manifest_path"], encoding="utf-8")).get("chapters")
+    assert c1 == c2, "el manifest no es determinista entre dos renders del mismo dataset"
+
+
+# --------------------------------------------------------------------------- #
+# SLOW (opcional, skippeable) — informe `full` con narrativa LLM. Requiere red /
+# credenciales y NO es determinista, por eso está apagado salvo opt-in explícito
+# vía la variable de entorno EDA_ACCEPT_LLM=1. Se omite con skipif (no con un
+# marker custom) para no depender de registro de marks en la config del repo.
+# --------------------------------------------------------------------------- #
+@pytest.mark.skipif(
+    os.environ.get("EDA_ACCEPT_LLM") != "1",
+    reason="full+LLM es lento/no determinista; exporta EDA_ACCEPT_LLM=1 para correrlo",
+)
+def test_full_incluye_capitulo_analisis_llm(synth_db, tmp_path):
+    out = str(tmp_path / "full")
+    r = render_automatic_eda(synth_db["db"], synth_db["table"],
+                             profile_level="full", out_dir=out, basename="full")
+    assert r["status"] == "ok", r.get("error")
+    assert "analisis_llm" in _manifest_chapters(r), (
+        "el preset full debe incluir el capítulo de análisis LLM"
+    )
@@ -4,8 +4,8 @@ kind: pipeline
 lang: py
 domain: pipelines
 purity: impure
-version: "1.1.0"
-signature: "def render_automatic_eda(db_path: str, table: str, backend: str = \"duckdb\", sample: int = None, run_models: bool = None, run_series: bool = None, run_llm: bool = None, profile_level: str = \"standard\", out_dir: str = \"reports\", basename: str = None, ctx_extra: dict = None) -> dict"
+version: "1.2.0"
+signature: "def render_automatic_eda(db_path: str, table: str, backend: str = \"duckdb\", sample: int = None, run_models: bool = None, run_series: bool = None, run_llm: bool = None, profile_level: str = \"standard\", out_dir: str = \"reports\", basename: str = None, ctx_extra: dict = None, emit_md: bool = True, only_chapters: list = None) -> dict"
 description: "Informe AutomaticEDA COMPLETO one-shot de una tabla DuckDB/PostgreSQL: perfila con profile_table, construye el ctx con los datos crudos (build_eda_render_ctx: raw_numeric para modelos/geo, timeseries_raw para series, geo_points para el mapa, db_path/table para la agregacion push-down) y emite PDF (A5 movil) Y PPTX (16:9) del mismo documento por capitulos, con los 11 capitulos POBLADOS de verdad (clusters pintados sobre el PCA, evolucion temporal, mapa geografico y tablas de agregacion), no degradados. El parametro profile_level es un preset de consumo CPU/LLM (lite/standard/full) que mapea a los flags run_models/run_series/run_llm/sample; un flag explicito siempre prima sobre el preset. lite=bajo consumo (sin LLM, sin serie, modelos solo PCA+normalidad sin KMeans/IsolationForest, sample reducido); standard=comportamiento historico; full=standard+narrativa LLM. Devuelve las rutas de PDF/PPTX y el manifiesto de versiones por capitulo."
 tags: [eda, duckdb, postgres, profiling, pipeline, dataops, report, pdf, pptx]
 uses_functions:
@@ -46,6 +46,10 @@ params:
    desc: "Nombre base de los archivos sin extension. Default 'aeda_<table>_<timestamp>'."
  - name: ctx_extra
    desc: "Dict opcional con claves de presentacion/contexto extra que se mezclan en el ctx (dataset_name, description, source_origin, ...); no pisan las claves de datos calculadas por build_eda_render_ctx."
+  - name: emit_md
+    desc: "Ademas del PDF y el PPTX, emite un Markdown autocontenido del mismo documento por capitulos (texto + tablas markdown, sin binarios) para pegar a un LLM. Default True. La ruta sale en aeda_md_path."
+  - name: only_chapters
+    desc: "Lista opcional de ids de capitulo a renderizar (subconjunto de CHAPTER_ORDER) para iterar/testear un capitulo suelto sin generar el documento entero. Default None => documento COMPLETO (retrocompatible). Cuando se pasa una lista: (1) se VALIDA contra CHAPTER_ORDER, un id desconocido o lista vacia devuelve error claro listando los validos; (2) se RESUELVEN las dependencias de computo de esos capitulos (automatic_eda.chapter_deps) activando los flags que necesiten (run_models/run_series/run_llm) aunque el caller no los pidiera y construyendo SOLO las piezas de ctx que leen, de modo que el capitulo suelto SIEMPRE llega poblado (p.ej. ['outliers'] activa run_models y conserva raw_numeric -> Isolation Forest completo) sin malgastar CPU/LLM en lo que ningun capitulo pedido usa; (3) el documento y su manifest contienen SOLO esos capitulos MAS portada (primera) y glosario (ultima, cuando hay terminos clicables). Un flag explicito del caller prima sobre la resolucion de dependencias."
 output: "dict {status:'ok', pdf_path:str, pptx_path:str, manifest_path:str|None, n_pages:int, n_slides:int, pdf_note:str, pptx_note:str, profile:<TableProfile>} o {status:'error', error:str} (dict-no-throw)."
 ---

@@ -69,6 +73,21 @@ r = render_automatic_eda("/tmp/ventas.duckdb", "ventas", profile_level="full")
 # Precedencia: el flag explicito SIEMPRE prima sobre el preset. lite pero con LLM:
 r = render_automatic_eda("/tmp/ventas.duckdb", "ventas",
                         profile_level="lite", run_llm=True)  # el LLM SI se ejecuta
+
+# Capitulo SUELTO: itera/testea un capitulo sin generar el documento entero. La
+# resolucion de dependencias activa el computo que el capitulo necesita aunque no
+# se pase explicito. Pedir solo 'outliers' activa run_models y conserva
+# raw_numeric -> el bloque Isolation Forest sale COMPLETO. Documento = portada +
+# outliers + glosario.
+r = render_automatic_eda("/tmp/ventas.duckdb", "ventas", only_chapters=["outliers"])
+
+# Varios capitulos sueltos a la vez (se unen sus dependencias):
+r = render_automatic_eda("/tmp/ventas.duckdb", "ventas",
+                         only_chapters=["correlacion", "missingness"])
+
+# id desconocido -> error claro listando los validos (dict-no-throw, no lanza):
+r = render_automatic_eda("/tmp/ventas.duckdb", "ventas", only_chapters=["nope"])
+# {'status': 'error', 'error': 'only_chapters con ids desconocidos: nope. Capitulos validos: portada, overview, ...'}
 ```

 ## Cuando usarla
@@ -86,6 +105,16 @@ Para un EDA **barato/rapido** (CI, vistazo previo, maquina sin GPU o sin red) us
 temporal y el LLM. Para el **maximo** con interpretacion narrativa por capitulo,
 `profile_level="full"`. El default `"standard"` mantiene el comportamiento previo.

+Cuando estes **iterando o testeando UN capitulo concreto** (afinar el render de
+outliers, comprobar el mapa geoespacial, depurar la agregacion) usa
+`only_chapters=[...]`: genera el documento con solo esos capitulos (+ portada y
+glosario), pero **resuelve sus dependencias de computo** para que el capitulo
+suelto nunca salga degradado — pedir `['outliers']` activa run_models y conserva
+`raw_numeric` aunque no los pases, y a la vez no malgasta CPU/LLM en lo que ningun
+capitulo pedido necesita (pedir `['geospatial']` no corre modelos). Es mucho mas
+rapido que renderizar el informe entero en cada iteracion. El mapa central de
+dependencias vive en `automatic_eda/chapter_deps.py` (fuente de verdad).
+
 ## Gotchas

 - Impura: ESCRIBE el PDF, el PPTX y `automatic_eda_manifest.json` en `out_dir`.
@@ -111,9 +140,29 @@ temporal y el LLM. Para el **maximo** con interpretacion narrativa por capitulo,
 - Los datos crudos del ctx se muestrean con `sample` (LIMIT), no se trae la tabla
  entera a RAM; con tablas enormes sube `sample` si quieres mas representatividad
  (coste: mas memoria).
+- **`only_chapters` y el glosario**: el glosario (ultimo capitulo) solo aparece si
+  algun capitulo del cuerpo registro terminos clicables. Un capitulo suelto que no
+  registra terminos (p.ej. `timeseries`, `geospatial`) sale como portada + ese
+  capitulo, sin glosario, porque no hay nada que enlazar — es correcto, no un fallo.
+- **`only_chapters` con `profile_level="lite"`**: en capitulos sueltos el preset
+  solo gobierna `sample`; los modelos NO usan el camino "lite" (que podaria
+  `ctx['raw_numeric']` y dejaria a outliers sin su multivariante en vivo). Quien
+  manda en capitulos sueltos es la resolucion de dependencias, no el preset de
+  coste de modelos.

 ## Capability growth log

+- v1.2.0 (2026-06-30) — anade el parametro `only_chapters`: renderiza un
+  SUBCONJUNTO de capitulos (para iterar/testear uno suelto) resolviendo sus
+  dependencias de computo via `automatic_eda/chapter_deps.py` (mapa central
+  CHAPTER_DEPS): activa los flags de coste que el capitulo necesita (run_models/
+  run_series/run_llm) aunque el caller no los pase y construye solo las piezas de
+  ctx que lee, de modo que el capitulo suelto SIEMPRE llega poblado (golden:
+  ['outliers'] -> Isolation Forest completo) sin malgastar en lo que no usa. La
+  seleccion viaja a build_document por la clave reservada `ctx['_only_chapters']`
+  (los renderers no cambian). Valida ids (error claro dict-no-throw). Cambio
+  aditivo y retro-compatible: `only_chapters=None` produce el documento completo
+  identico a v1.1.0.
 - v1.1.0 (2026-06-30) — anade el parametro `profile_level` (lite/standard/full),
  preset de consumo CPU/LLM que mapea a los flags run_models/run_series/run_llm/
  sample. lite limita los modelos a PCA+normalidad (cableado a run_eda_models con
@@ -99,6 +99,7 @@ def render_automatic_eda(
    basename: str = None,
    ctx_extra: dict = None,
    emit_md: bool = True,
+    only_chapters: list = None,
 ) -> dict:
    """Perfila una tabla y emite el informe AutomaticEDA completo (PDF + PPTX).

@@ -150,6 +151,29 @@ def render_automatic_eda(
            MISMO documento por capítulos (texto plano + tablas markdown, sin
            binarios), pensado para pegar a un LLM. Default True. La ruta sale en
            la clave de retorno ``aeda_md_path``. No altera las demás salidas.
+        only_chapters: lista opcional de ids de capítulo a renderizar (un
+            SUBCONJUNTO de CHAPTER_ORDER) para iterar/testear un capítulo concreto
+            sin generar el documento entero. Default None => documento COMPLETO,
+            idéntico al de hoy (retrocompatible). Cuando se pasa una lista:
+
+            - Se VALIDA contra CHAPTER_ORDER; un id desconocido devuelve un error
+              claro listando los válidos (dict-no-throw, no lanza). Lista vacía
+              ``[]`` también devuelve error (pasa al menos un capítulo o None).
+            - Se RESUELVEN las dependencias de cómputo de esos capítulos
+              (``automatic_eda.chapter_deps``): se activan los flags de coste que
+              necesiten (run_models / run_series / run_llm) AUNQUE el caller no
+              los pidiera, y se construyen SOLO las piezas de ``ctx`` que esos
+              capítulos leen. Así un capítulo suelto SIEMPRE llega poblado —
+              p.ej. ``only_chapters=['outliers']`` activa run_models y conserva
+              ``ctx['raw_numeric']`` para que el bloque IsolationForest salga
+              completo— y a la vez no se malgasta CPU/LLM en lo que ningún
+              capítulo pedido usa (pedir solo ``geospatial`` no corre modelos).
+            - El documento (PDF/PPTX/MD) y su manifest contienen SOLO esos
+              capítulos, MÁS la portada (primera) y el glosario (última), que se
+              incluyen siempre para que el documento sea válido y los términos
+              clicables tengan destino.
+            - Un flag explícito del caller (run_models/run_series/run_llm != None)
+              SIEMPRE prima sobre lo que resuelvan las dependencias.

    Returns:
        dict (nunca lanza). En éxito::
@@ -169,11 +193,56 @@ def render_automatic_eda(
        # "standard" (comportamiento histórico), sin lanzar.
        preset = _PROFILE_PRESETS.get(profile_level, _PROFILE_PRESETS["standard"])
        sample = preset["sample"] if sample is None else sample
-        run_models = preset["run_models"] if run_models is None else run_models
-        run_series = preset["run_series"] if run_series is None else run_series
-        run_llm = preset["run_llm"] if run_llm is None else run_llm
        model_opts = preset["model_opts"]

+        # 0.bis) Modo "capítulos sueltos": valida la selección y RESUELVE sus
+        # dependencias de cómputo. Es lo que garantiza que un capítulo pedido
+        # llegue completo (activa lo que necesita) sin malgastar en lo que no.
+        # Cuando only_chapters es None se conserva el camino histórico (preset).
+        if only_chapters is not None:
+            from datascience.automatic_eda import CHAPTER_ORDER
+            from datascience.automatic_eda.chapter_deps import (
+                needs_render_ctx,
+                resolve_ctx_data_keys,
+                resolve_requirements,
+                validate_chapter_ids,
+            )
+
+            if not isinstance(only_chapters, (list, tuple)):
+                return {"status": "error",
+                        "error": "only_chapters debe ser una lista de ids de "
+                                 "capítulo o None (documento completo)."}
+            only_chapters = [c for c in only_chapters]
+            if not only_chapters:
+                return {"status": "error",
+                        "error": "only_chapters=[] está vacío. Pasa al menos un "
+                                 "capítulo, o None para el documento completo. "
+                                 "Capítulos válidos: " + ", ".join(CHAPTER_ORDER)}
+            checked = validate_chapter_ids(only_chapters, CHAPTER_ORDER)
+            if checked["unknown"]:
+                return {"status": "error",
+                        "error": "only_chapters con ids desconocidos: "
+                                 + ", ".join(checked["unknown"])
+                                 + ". Capítulos válidos: "
+                                 + ", ".join(CHAPTER_ORDER)}
+            only_chapters = checked["valid"]
+
+            # Las dependencias fijan el DEFAULT de cada flag de coste (eficiencia:
+            # lo que ningún capítulo pedido necesita queda en False); un flag
+            # explícito del caller (!= None) sigue primando.
+            dep_flags = resolve_requirements(only_chapters)["profile_flags"]
+            run_models = ("run_models" in dep_flags) if run_models is None else run_models
+            run_series = ("run_series" in dep_flags) if run_series is None else run_series
+            run_llm = ("run_llm" in dep_flags) if run_llm is None else run_llm
+            # En capítulos sueltos no se usa el camino "modelos baratos" (lite),
+            # que poda ctx['raw_numeric']: un capítulo como outliers lo necesita
+            # para su multivariante en vivo. El preset solo gobierna `sample`.
+            model_opts = None
+        else:
+            run_models = preset["run_models"] if run_models is None else run_models
+            run_series = preset["run_series"] if run_series is None else run_series
+            run_llm = preset["run_llm"] if run_llm is None else run_llm
+
        # En el camino "modelos baratos" (lite) profile_table NO corre los
        # modelos: los ejecuta este pipeline con run_eda_models y la granularidad
        # del preset, evitando pagar el coste CPU de KMeans + IsolationForest.
@@ -217,10 +286,25 @@ def render_automatic_eda(
        if ctx_extra:
            base_ctx.update(ctx_extra)

-        ctx = build_eda_render_ctx(
-            db_path, table, prof, backend=backend, sample=sample,
-            base_ctx=base_ctx,
-        )
+        # En modo capítulos sueltos, si NINGÚN capítulo pedido necesita datos
+        # crudos del ctx, se salta build_eda_render_ctx por completo (ahorro real
+        # de I/O): solo se conservan presentación + db_path/table. Si sí los
+        # necesita, se construye el ctx y luego se PODAN las piezas de datos que
+        # ningún capítulo pedido usa (db_path/table nunca se podan).
+        if only_chapters is not None and not needs_render_ctx(only_chapters):
+            ctx = dict(base_ctx)
+            ctx["db_path"] = db_path
+            ctx["table"] = table
+        else:
+            ctx = build_eda_render_ctx(
+                db_path, table, prof, backend=backend, sample=sample,
+                base_ctx=base_ctx,
+            )
+            if only_chapters is not None and isinstance(ctx, dict):
+                keep = resolve_ctx_data_keys(only_chapters)
+                for k in ("head_rows", "raw_numeric", "timeseries_raw", "geo_points"):
+                    if k not in keep:
+                        ctx.pop(k, None)

        # 2.5) Camino lite — modelos baratos (PCA + normalidad, sin KMeans ni
        # IsolationForest). profile_table no corrió los modelos; aquí se corren
@@ -245,6 +329,13 @@ def render_automatic_eda(
                ctx.pop("raw_numeric", None)

        # 3) Render a ambos formatos desde el MISMO documento por capítulos.
+        # En modo capítulos sueltos, la selección viaja a build_document por una
+        # clave reservada del ctx (los renderers llaman build_document sin pasar
+        # `only`): build_document filtra el cuerpo a esos capítulos y siempre
+        # añade portada (primera) + glosario (última). build_document la consume
+        # y la quita, así que no llega a los capítulos.
+        if only_chapters is not None and isinstance(ctx, dict):
+            ctx["_only_chapters"] = list(only_chapters)
        os.makedirs(out_dir, exist_ok=True)
        ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
        base = basename or f"aeda_{table}_{ts}"
@@ -283,6 +374,7 @@ def render_automatic_eda(
            "pdf_note": rpdf.get("note"),
            "pptx_note": rpptx.get("note"),
            "md_note": rmd.get("note"),
+            "only_chapters": only_chapters,
            "profile": prof,
        }
    except Exception as e:  # noqa: BLE001 — dict-no-throw: degradar, nunca lanzar.
@@ -0,0 +1,235 @@
+"""Tests del modo `only_chapters` del pipeline render_automatic_eda.
+
+Cubre la tarea de "capítulos sueltos con resolución de dependencias":
+
+  - Golden (DuckDB real): pedir SOLO un capítulo genera un documento con solo
+    portada + ese capítulo + glosario, y el capítulo llega COMPLETO porque la
+    resolución de dependencias activó el cómputo que necesita aunque el caller
+    no lo pidiera (outliers → run_models + raw_numeric → IsolationForest poblado;
+    timeseries → run_series; correlacion → raw_numeric).
+  - Eficiencia: pedir un capítulo que NO necesita flags caros (geospatial) no los
+    activa, y un capítulo puramente agregado (num_distr) ni siquiera construye el
+    ctx de datos crudos.
+  - Edge: id desconocido / lista vacía / no-lista devuelven error claro sin
+    lanzar; only_chapters=None mantiene el comportamiento histórico.
+"""
+
+import json
+import os
+import random
+import sys
+from datetime import date, timedelta
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+import duckdb  # noqa: E402
+
+from pipelines.render_automatic_eda import render_automatic_eda  # noqa: E402
+
+
+def _make_db_models(path):
+    """DB con fecha + 3 numéricas continuas en 3 clusters gaussianos.
+
+    Garantiza material para outliers/modelos (>=2 numéricas → IsolationForest),
+    timeseries (columna DATE) y correlacion (numéricas). Mismo shape que el
+    fixture del test del pipeline base.
+    """
+    con = duckdb.connect(path)
+    con.execute("CREATE TABLE pts (d DATE, grp VARCHAR, x1 DOUBLE, x2 DOUBLE, x3 DOUBLE)")
+    random.seed(42)
+    centers = [(0.0, 0.0, 0.0), (10.0, 10.0, 10.0), (20.0, 5.0, 15.0)]
+    d0 = date(2024, 1, 1)
+    rows = []
+    for i in range(150):
+        cx, cy, cz = centers[i % 3]
+        rows.append((
+            d0 + timedelta(days=i), f"g{i % 3}",
+            round(cx + random.gauss(0, 1.0), 4),
+            round(cy + random.gauss(0, 1.0), 4),
+            round(cz + random.gauss(0, 1.0), 4),
+        ))
+    con.executemany("INSERT INTO pts VALUES (?,?,?,?,?)", rows)
+    con.close()
+
+
+def _manifest_chapters(result):
+    with open(result["manifest_path"], encoding="utf-8") as fh:
+        return set((json.load(fh).get("chapters") or {}).keys())
+
+
+# --------------------------------------------------------------------------- #
+# GOLDEN — outliers suelto: IsolationForest poblado por resolución de deps.
+# --------------------------------------------------------------------------- #
+def test_only_outliers_isolation_forest_populated_without_explicit_run_models(tmp_path):
+    """El corazón de la tarea: pedir SOLO 'outliers' sin run_models explícito
+    activa run_models por dependencias y conserva ctx['raw_numeric'], de modo que
+    el bloque multivariante (Isolation Forest) sale con datos, no degradado."""
+    db = str(tmp_path / "pts.duckdb")
+    _make_db_models(db)
+    out = str(tmp_path / "out")
+
+    # NB: no se pasa run_models — la resolución de dependencias debe activarlo.
+    r = render_automatic_eda(db, "pts", only_chapters=["outliers"],
+                             out_dir=out, basename="only_outliers")
+    assert r["status"] == "ok", r.get("error")
+    assert r["only_chapters"] == ["outliers"]
+
+    # Documento = portada + outliers + glosario, nada más.
+    assert _manifest_chapters(r) == {"portada", "outliers", "glosario"}
+
+    # El multivariante salió POBLADO (no la nota de degradación). Se comprueba en
+    # el Markdown (mismo documento por capítulos, texto plano fiable).
+    md = open(r["aeda_md_path"], encoding="utf-8").read()
+    assert "Filas atípicas (multivariante)" in md
+    assert "Filas analizadas" in md, "el Isolation Forest no trae su tabla poblada"
+    assert "No se pudo analizar la anomalía multivariante" not in md, \
+        "el bloque multivariante salió degradado pese a resolver las deps"
+
+    # La resolución activó run_models → el perfil trae el bloque de modelos.
+    assert ((r["profile"] or {}).get("models") or {}).get("outliers") is not None
+
+
+# --------------------------------------------------------------------------- #
+# GOLDEN — timeseries suelto activa run_series.
+# --------------------------------------------------------------------------- #
+def test_only_timeseries_activates_run_series(tmp_path):
+    db = str(tmp_path / "pts.duckdb")
+    _make_db_models(db)
+    out = str(tmp_path / "out")
+
+    r = render_automatic_eda(db, "pts", only_chapters=["timeseries"],
+                             out_dir=out, basename="only_ts")
+    assert r["status"] == "ok", r.get("error")
+    assert "timeseries" in _manifest_chapters(r)
+    assert "modelos" not in _manifest_chapters(r)
+    # run_series resuelto por deps → el perfil trae el análisis de serie.
+    assert (r["profile"] or {}).get("series") is not None, \
+        "only_chapters=['timeseries'] debe activar run_series"
+
+
+# --------------------------------------------------------------------------- #
+# GOLDEN — correlacion suelto construye raw_numeric (sin activar modelos).
+# --------------------------------------------------------------------------- #
+def test_only_correlacion_builds_raw_numeric_without_models(tmp_path):
+    db = str(tmp_path / "pts.duckdb")
+    _make_db_models(db)
+    out = str(tmp_path / "out")
+
+    r = render_automatic_eda(db, "pts", only_chapters=["correlacion"],
+                             out_dir=out, basename="only_corr")
+    assert r["status"] == "ok", r.get("error")
+    assert _manifest_chapters(r) == {"portada", "correlacion", "glosario"}
+    # Eficiencia: correlacion no necesita los modelos → no se corrieron.
+    assert ((r["profile"] or {}).get("models") or {}).get("outliers") is None
+    assert (r["profile"] or {}).get("series") is None
+
+
+# --------------------------------------------------------------------------- #
+# Eficiencia y precedencia — vía stub (sin DuckDB).
+# --------------------------------------------------------------------------- #
+def _patch(monkeypatch, cap):
+    import pipelines.render_automatic_eda as mod
+
+    def fake_pt(db, t, **kw):
+        cap["run_models"] = kw.get("run_models")
+        cap["run_series"] = kw.get("run_series")
+        cap["run_llm"] = kw.get("run_llm")
+        return {"status": "ok", "profile": {"columns": []}}
+
+    def fake_ctx(db, t, prof, **kw):
+        cap["ctx_called"] = True
+        return {"db_path": db, "table": t}
+
+    cap["ctx_called"] = False
+    monkeypatch.setattr(mod, "profile_table", fake_pt)
+    monkeypatch.setattr(mod, "build_eda_render_ctx", fake_ctx)
+    monkeypatch.setattr(mod, "render_automatic_eda_pdf",
+                        lambda *a, **k: {"path": "x.pdf", "n_pages": 1,
+                                         "manifest_path": "m.json"})
+    monkeypatch.setattr(mod, "render_automatic_eda_pptx",
+                        lambda *a, **k: {"path": "x.pptx", "n_slides": 1})
+    monkeypatch.setattr(mod, "render_automatic_eda_markdown",
+                        lambda *a, **k: {"path": "x.md", "n_chars": 1})
+
+
+def test_only_geospatial_does_not_activate_cost_flags(monkeypatch):
+    """Eficiencia: pedir solo geospatial NO corre modelos/serie/LLM."""
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["geospatial"])
+    assert cap["run_models"] is False
+    assert cap["run_series"] is False
+    assert cap["run_llm"] is False
+
+
+def test_only_outliers_activates_run_models_via_deps(monkeypatch):
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["outliers"])
+    assert cap["run_models"] is True
+    assert cap["run_series"] is False
+
+
+def test_explicit_flag_overrides_dependency_resolution(monkeypatch):
+    """run_models=False explícito gana, aunque outliers lo pediría por deps."""
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["outliers"], run_models=False)
+    assert cap["run_models"] is False
+
+
+def test_purely_aggregated_chapter_skips_render_ctx(monkeypatch):
+    """num_distr solo lee el profile → build_eda_render_ctx no se llama."""
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["num_distr"])
+    assert cap["ctx_called"] is False, \
+        "num_distr no necesita datos crudos: el ctx no debe construirse"
+
+
+def test_chapter_that_needs_ctx_builds_it(monkeypatch):
+    cap = {}
+    _patch(monkeypatch, cap)
+    render_automatic_eda("db", "t", only_chapters=["outliers"])
+    assert cap["ctx_called"] is True
+
+
+# --------------------------------------------------------------------------- #
+# EDGE — errores claros sin lanzar.
+# --------------------------------------------------------------------------- #
+def test_unknown_chapter_id_returns_clear_error(tmp_path):
+    r = render_automatic_eda(str(tmp_path / "x.duckdb"), "t",
+                             only_chapters=["no_existe"])
+    assert r["status"] == "error"
+    assert "no_existe" in r["error"]
+    assert "Capítulos válidos" in r["error"]
+    # Algún id válido conocido aparece en la lista.
+    assert "outliers" in r["error"]
+
+
+def test_empty_only_list_returns_error(tmp_path):
+    r = render_automatic_eda(str(tmp_path / "x.duckdb"), "t", only_chapters=[])
+    assert r["status"] == "error"
+    assert "vac" in r["error"].lower()
+
+
+def test_only_chapters_not_a_list_returns_error(tmp_path):
+    r = render_automatic_eda(str(tmp_path / "x.duckdb"), "t",
+                             only_chapters="outliers")
+    assert r["status"] == "error"
+
+
+def test_only_none_keeps_full_document(tmp_path):
+    """Retro-compat: only_chapters=None genera el documento completo."""
+    db = str(tmp_path / "pts.duckdb")
+    _make_db_models(db)
+    out = str(tmp_path / "out")
+    r = render_automatic_eda(db, "pts", out_dir=out, basename="full")
+    assert r["status"] == "ok", r.get("error")
+    chapters = _manifest_chapters(r)
+    # Documento completo: muchos más capítulos que portada/glosario.
+    assert {"portada", "glosario", "overview", "correlacion"} <= chapters
+    assert len(chapters) > 4
Author	SHA1	Message	Date
egutierrez	64306f3b1c	feat(eda): overview enriquece diccionario y describe con descripcion+unidad del LLM La tabla DICCIONARIO de columnas del capitulo overview gana columnas "Descripcion" y "Unidad", y la tabla DESCRIBE gana "Unidad", consumiendo profile['llm']['dictionary'] (entradas column/description/business_meaning/unit producidas por eda_llm_insights) emparejadas por nombre de columna. Lectura defensiva: sin bloque LLM (run_llm no corrio) las celdas degradan a "—" y las tablas siguen renderizando. No recalcula nada ni llama al LLM. CHAPTER_VERSION 1.1.0 -> 1.2.0. Tests: golden (descripcion+unidad pobladas para income), edge (sin LLM -> "—"), fallback ctx['llm'], y render PDF con las columnas nuevas visibles. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 01:13:02 +02:00
egutierrez	f2eb782a5f	merge(eda): portada v2 (sin Criterios, descripcion LLM, resumen a la derecha) + zebra global PDF + nombre PPTX grande/subrayado	2026-06-30 22:53:46 +02:00
egutierrez	80d10010f5	feat(eda): portada cap01 + zebra global y emphasis de render Itera el capítulo PORTADA del AutomaticEDA y dos mejoras globales de los renderers PDF/PPTX: 1. Zebra global (PDF): _place_kv_table ahora sombrea las filas pares igual que las DataTable, así toda tabla del documento queda rayada (no solo las DataTable). Mismo patrón coherente al partir/repetir cabecera. 2. Portada usa la descripción LLM rica (profile['llm']['summary']) cuando el perfil la tiene; se elimina del fallback derivado el texto ruido "active la interpretación LLM (run_llm)…". No fuerza llamadas LLM en el capítulo, solo consume profile['llm'] si está. 3. Se quita el bloque "Criterios de calidad" de la portada (PDF y PPTX); el score "Calidad" se mantiene. 4. "Resumen del análisis" (PDF): los valores se alinean al margen derecho via el nuevo KVTable.value_align="right". 5. Nombre del dataset en la portada PPTX más grande (44pt) y subrayado via los nuevos hints Heading.underline / Heading.size_pt (el PDF los ignora). Bump CHAPTER_VERSION de portada 1.2.0 -> 1.3.0. Verificado: suite 213 passed / 1 skipped (incl. aceptación de los 16 capítulos); golden zebra = 185 filas zebra en 13 capítulos del PDF completo; portada con run_llm sin "Criterios de calidad", con descripción LLM rica y valores a la derecha; PPTX con nombre 44pt subrayado; edge sin LLM cae al fallback derivado sin ruido; fn index sin error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 22:44:33 +02:00
egutierrez	ecc22d6d57	merge(eda): suite de aceptacion de los 16 capitulos (29 passed, rescatado de ejecutor con auth caida)	2026-06-30 22:07:21 +02:00
agent	7bdb8bffb5	test(eda): suite de aceptacion de los 16 capitulos del AutomaticEDA Bateria que blinda el subsistema: cobertura de los 16 capitulos sobre el dataset sintetico Faker, contenido esencial por capitulo (needles parametrizados), capitulos sueltos con resolucion de dependencias (only_chapters=[outliers] puebla IsolationForest sin run_models; timeseries; correlacion), None cuando no aplica, folder multi-tabla con FK, completitud del MD (matriz de correlacion completa + skew/kurtosis), 3 salidas no vacias, determinismo. Test full+LLM skippeable. 29 passed, 1 skipped. Sin hallazgos: los 16 capitulos salen como deben.	2026-06-30 22:07:15 +02:00
egutierrez	4139394326	merge(eda): only_chapters con resolucion automatica de dependencias de computo por capitulo	2026-06-30 21:37:16 +02:00
egutierrez	54a9ab70c7	feat(eda): render AutomaticEDA por capítulos sueltos con resolución de dependencias Permite renderizar un SUBCONJUNTO de capítulos del informe AutomaticEDA (only_chapters=[...]) para iterar/testear un capítulo concreto sin generar el documento entero, garantizando que el capítulo pedido SIEMPRE llegue poblado. - Nuevo módulo automatic_eda/chapter_deps.py: mapa central CHAPTER_DEPS (fuente de verdad) que declara, por capítulo de CHAPTER_ORDER, qué flags de cómputo (run_models/run_series/run_llm) y qué piezas de ctx (raw_numeric, timeseries_raw, geo_points, head_rows, db_path/table) necesita para no salir degradado. Helpers puros: resolve_requirements, resolve_profile_flags, needs_render_ctx, resolve_ctx_data_keys, validate_chapter_ids. - build_document(profile, ctx, only=None): parámetro only opcional que restringe el cuerpo a esos capítulos (portada primera + glosario última siempre). Lee la clave reservada ctx['_only_chapters'] cuando only es None, para propagar la selección a través de los renderers sin modificarlos. Retrocompatible. - render_automatic_eda(..., only_chapters=None): valida los ids (error claro dict-no-throw), resuelve las dependencias activando el cómputo necesario aunque el caller no lo pidiera (un flag explícito siempre prima) y construyendo solo las piezas de ctx que los capítulos pedidos leen (salta build_eda_render_ctx entero si ninguno necesita datos crudos). only_chapters=None produce el documento completo idéntico al de hoy. - Tests: chapter_deps_test.py (resolución pura), build_document_only_test.py (filtro), render_automatic_eda_only_test.py (golden con DuckDB: outliers suelto con IsolationForest poblado por resolución; timeseries activa run_series; eficiencia geospatial sin modelos; edge cases). - .md del pipeline: documenta only_chapters + emit_md; version 1.1.0 -> 1.2.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 21:35:46 +02:00
egutierrez	4773781323	merge(eda): generadores sinteticos Faker (tabla todo-en-uno + carpeta multi-tabla) que activan todos los capitulos	2026-06-30 21:26:20 +02:00
egutierrez	50c05d126c	merge(eda): capitulo OUTLIERS — univariante (Tukey/z) + multivariante (IsolationForest)	2026-06-30 21:15:05 +02:00
egutierrez	6f88f184f1	feat(eda): capítulo OUTLIERS — valores atípicos univariantes + multivariantes Nuevo capítulo dedicado `outliers` para el motor AutomaticEDA que reúne y profundiza en un solo sitio el análisis de valores atípicos, hoy disperso entre `num_distr` (conteo por columna) y `modelos` (IsolationForest). Se registra en `chapters_registry.py` entre `missingness` y `correlacion` (bloque de calidad de datos: calidad → missingness → outliers). Contenido del capítulo: - Resumen univariante por columna: nº y % de atípicos por Tukey (1.5·IQR) y por z-score (\|z\| > 3), con vallas inferior/superior y valores extremos. Ordenado por contaminación y marcando las columnas más afectadas. Reusa las funciones del registry `build_boxplot_stats` (vallas desde los percentiles del profile) y `detect_outliers` (regla z-score sobre la muestra cruda de `ctx`). - Boxplots de Tukey de las columnas más contaminadas (caja, bigotes y puntos atípicos), delegados a la función nueva `build_boxplots_figure`. - Multivariante: filas anómalas considerando todas las columnas a la vez con `isolation_forest_outliers` — nº y % de filas, las más anómalas con su score y las dimensiones que las hacen raras (top columnas por \|z\|, vía la función nueva `summarize_outlier_dims`). El detector se corre en vivo sobre `raw_numeric` para que el indexado de filas coincida exactamente con el de las dimensiones; cae al bloque precomputado del perfil cuando no hay muestra cruda (preset lite). - Interpretación exploratoria: un atípico no es necesariamente un error (distingue error de dato vs dato real extremo) y recomendaciones (revisar, winsorizar o re-expresar, enlazando con la re-expresión de Tukey del perfil). Términos clicables registrados en el glosario compartido: `outlier`, `tukey_fence`, `zscore`, `isolation_forest`. Funciones nuevas del registry (dominio datascience, grupo eda): - `build_boxplots_figure_py_datascience` (figure helper, impura) - `summarize_outlier_dims_py_datascience` (pura) El capítulo se activa con ≥1 columna numérica y devuelve None en su ausencia; lee todo defensivo y nunca lanza. Tests: capítulo (golden + edges + error path + render PDF/PPTX) y ambas funciones nuevas. Suite de no-regresión de AutomaticEDA verde. Verificado end-to-end con el dataset Titanic (Fare/Parch/SibSp como las columnas más contaminadas). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 21:12:40 +02:00