feat(eda): render AutomaticEDA por capítulos sueltos con resolución de dependencias

Permite renderizar un SUBCONJUNTO de capítulos del informe AutomaticEDA
(only_chapters=[...]) para iterar/testear un capítulo concreto sin generar el
documento entero, garantizando que el capítulo pedido SIEMPRE llegue poblado.

- Nuevo módulo automatic_eda/chapter_deps.py: mapa central CHAPTER_DEPS (fuente
  de verdad) que declara, por capítulo de CHAPTER_ORDER, qué flags de cómputo
  (run_models/run_series/run_llm) y qué piezas de ctx (raw_numeric, timeseries_raw,
  geo_points, head_rows, db_path/table) necesita para no salir degradado. Helpers
  puros: resolve_requirements, resolve_profile_flags, needs_render_ctx,
  resolve_ctx_data_keys, validate_chapter_ids.
- build_document(profile, ctx, only=None): parámetro only opcional que restringe
  el cuerpo a esos capítulos (portada primera + glosario última siempre). Lee la
  clave reservada ctx['_only_chapters'] cuando only es None, para propagar la
  selección a través de los renderers sin modificarlos. Retrocompatible.
- render_automatic_eda(..., only_chapters=None): valida los ids (error claro
  dict-no-throw), resuelve las dependencias activando el cómputo necesario aunque
  el caller no lo pidiera (un flag explícito siempre prima) y construyendo solo
  las piezas de ctx que los capítulos pedidos leen (salta build_eda_render_ctx
  entero si ninguno necesita datos crudos). only_chapters=None produce el
  documento completo idéntico al de hoy.
- Tests: chapter_deps_test.py (resolución pura), build_document_only_test.py
  (filtro), render_automatic_eda_only_test.py (golden con DuckDB: outliers suelto
  con IsolationForest poblado por resolución; timeseries activa run_series;
  eficiencia geospatial sin modelos; edge cases).
- .md del pipeline: documenta only_chapters + emit_md; version 1.1.0 -> 1.2.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-30 21:35:46 +02:00
parent 50c05d126c
commit 54a9ab70c7
7 changed files with 893 additions and 12 deletions
@@ -73,24 +73,51 @@ def build_chapter(chapter_id: str, profile: dict, ctx: dict):
return model.as_chapter(result)
def build_document(profile: dict, ctx: dict = None) -> list:
"""Build the full ordered list of chapters for a TableProfile.
def build_document(profile: dict, ctx: dict = None, only: list = None) -> list:
"""Build the ordered list of chapters for a TableProfile.
Args:
profile: the ``eda`` group TableProfile dict (may be None/empty).
ctx: optional context dict carrying presentation metadata not present in
the profile (dataset_name, source_origin, storage, generated_at,
description, granularity, quality_criteria, head_rows, ...).
only: optional list of chapter ids to render. ``None`` (default) keeps
the historical behaviour — every implemented & applicable chapter in
canonical order. A list restricts the BODY to just those ids (in
canonical order), but the cover (``portada``) and glossary
(``glosario``) are ALWAYS included so the document stays valid and
the clickable terms keep a destination — so passing ``only=["x"]``
yields portada + x + glosario. Unknown ids are simply skipped (the
caller is responsible for strict validation). ``only=[]`` yields the
minimal document (portada + glosario only). This argument is additive
and backward-compatible: the signature is unchanged for existing
callers (default ``None``).
Returns:
list[Chapter] in canonical order, containing only the chapters that are
implemented and applicable. Never raises.
implemented, applicable and selected. Never raises.
"""
if not isinstance(profile, dict):
profile = {}
# Copy ctx so the shared collector / summary we add do not leak to the caller.
ctx = dict(ctx) if isinstance(ctx, dict) else {}
# only=None -> all body chapters (historical). only=list -> restrict body to
# that selection (portada/glosario are added unconditionally below). The
# renderers call build_document(profile, meta['ctx']) without an `only`
# argument, so the pipeline forwards the selection through a reserved ctx key
# (``_only_chapters``); an explicit `only` argument always wins. The key is
# popped from the local ctx copy so it never reaches the chapters.
if only is None:
_carried = ctx.pop("_only_chapters", None)
if isinstance(_carried, (list, tuple, set)):
only = list(_carried)
else:
ctx.pop("_only_chapters", None)
# A set makes the membership test cheap; the iteration order stays
# CHAPTER_ORDER. only=[] is a valid (empty) selection -> minimal document.
only_set = set(only) if isinstance(only, (list, tuple, set)) else None
# A single glossary collector is shared by every chapter via ctx['glossary'].
# Chapters call ctx['glossary'].add(key, label, definition) and mark in-text
# appearances with [[term:key]]…[[/term]]; the glosario chapter renders the
@@ -106,6 +133,10 @@ def build_document(profile: dict, ctx: dict = None) -> list:
for cid in CHAPTER_ORDER:
if cid in (_PORTADA, _GLOSARIO):
continue
# When a selection is given, skip body chapters outside it. portada and
# glosario are never filtered (handled out of this loop).
if only_set is not None and cid not in only_set:
continue
ch = build_chapter(cid, profile, ctx)
if ch is not None and ch.blocks:
body.append(ch)