feat(eda): render AutomaticEDA por capítulos sueltos con resolución de dependencias

Permite renderizar un SUBCONJUNTO de capítulos del informe AutomaticEDA (only_chapters=[...]) para iterar/testear un capítulo concreto sin generar el documento entero, garantizando que el capítulo pedido SIEMPRE llegue poblado. - Nuevo módulo automatic_eda/chapter_deps.py: mapa central CHAPTER_DEPS (fuente de verdad) que declara, por capítulo de CHAPTER_ORDER, qué flags de cómputo (run_models/run_series/run_llm) y qué piezas de ctx (raw_numeric, timeseries_raw, geo_points, head_rows, db_path/table) necesita para no salir degradado. Helpers puros: resolve_requirements, resolve_profile_flags, needs_render_ctx, resolve_ctx_data_keys, validate_chapter_ids. - build_document(profile, ctx, only=None): parámetro only opcional que restringe el cuerpo a esos capítulos (portada primera + glosario última siempre). Lee la clave reservada ctx['_only_chapters'] cuando only es None, para propagar la selección a través de los renderers sin modificarlos. Retrocompatible. - render_automatic_eda(..., only_chapters=None): valida los ids (error claro dict-no-throw), resuelve las dependencias activando el cómputo necesario aunque el caller no lo pidiera (un flag explícito siempre prima) y construyendo solo las piezas de ctx que los capítulos pedidos leen (salta build_eda_render_ctx entero si ninguno necesita datos crudos). only_chapters=None produce el documento completo idéntico al de hoy. - Tests: chapter_deps_test.py (resolución pura), build_document_only_test.py (filtro), render_automatic_eda_only_test.py (golden con DuckDB: outliers suelto con IsolationForest poblado por resolución; timeseries activa run_series; eficiencia geospatial sin modelos; edge cases). - .md del pipeline: documenta only_chapters + emit_md; version 1.1.0 -> 1.2.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 21:35:46 +02:00
parent 50c05d126c
commit 54a9ab70c7
7 changed files with 893 additions and 12 deletions
@@ -73,24 +73,51 @@ def build_chapter(chapter_id: str, profile: dict, ctx: dict):
    return model.as_chapter(result)


-def build_document(profile: dict, ctx: dict = None) -> list:
-    """Build the full ordered list of chapters for a TableProfile.
+def build_document(profile: dict, ctx: dict = None, only: list = None) -> list:
+    """Build the ordered list of chapters for a TableProfile.

    Args:
        profile: the ``eda`` group TableProfile dict (may be None/empty).
        ctx: optional context dict carrying presentation metadata not present in
            the profile (dataset_name, source_origin, storage, generated_at,
            description, granularity, quality_criteria, head_rows, ...).
+        only: optional list of chapter ids to render. ``None`` (default) keeps
+            the historical behaviour — every implemented & applicable chapter in
+            canonical order. A list restricts the BODY to just those ids (in
+            canonical order), but the cover (``portada``) and glossary
+            (``glosario``) are ALWAYS included so the document stays valid and
+            the clickable terms keep a destination — so passing ``only=["x"]``
+            yields portada + x + glosario. Unknown ids are simply skipped (the
+            caller is responsible for strict validation). ``only=[]`` yields the
+            minimal document (portada + glosario only). This argument is additive
+            and backward-compatible: the signature is unchanged for existing
+            callers (default ``None``).

    Returns:
        list[Chapter] in canonical order, containing only the chapters that are
-        implemented and applicable. Never raises.
+        implemented, applicable and selected. Never raises.
    """
    if not isinstance(profile, dict):
        profile = {}
    # Copy ctx so the shared collector / summary we add do not leak to the caller.
    ctx = dict(ctx) if isinstance(ctx, dict) else {}

+    # only=None -> all body chapters (historical). only=list -> restrict body to
+    # that selection (portada/glosario are added unconditionally below). The
+    # renderers call build_document(profile, meta['ctx']) without an `only`
+    # argument, so the pipeline forwards the selection through a reserved ctx key
+    # (``_only_chapters``); an explicit `only` argument always wins. The key is
+    # popped from the local ctx copy so it never reaches the chapters.
+    if only is None:
+        _carried = ctx.pop("_only_chapters", None)
+        if isinstance(_carried, (list, tuple, set)):
+            only = list(_carried)
+    else:
+        ctx.pop("_only_chapters", None)
+    # A set makes the membership test cheap; the iteration order stays
+    # CHAPTER_ORDER. only=[] is a valid (empty) selection -> minimal document.
+    only_set = set(only) if isinstance(only, (list, tuple, set)) else None
+
    # A single glossary collector is shared by every chapter via ctx['glossary'].
    # Chapters call ctx['glossary'].add(key, label, definition) and mark in-text
    # appearances with [[term:key]]…[[/term]]; the glosario chapter renders the
@@ -106,6 +133,10 @@ def build_document(profile: dict, ctx: dict = None) -> list:
    for cid in CHAPTER_ORDER:
        if cid in (_PORTADA, _GLOSARIO):
            continue
+        # When a selection is given, skip body chapters outside it. portada and
+        # glosario are never filtered (handled out of this loop).
+        if only_set is not None and cid not in only_set:
+            continue
        ch = build_chapter(cid, profile, ctx)
        if ch is not None and ch.blocks:
            body.append(ch)