feat(eda): render AutomaticEDA por capítulos sueltos con resolución de dependencias
Permite renderizar un SUBCONJUNTO de capítulos del informe AutomaticEDA (only_chapters=[...]) para iterar/testear un capítulo concreto sin generar el documento entero, garantizando que el capítulo pedido SIEMPRE llegue poblado. - Nuevo módulo automatic_eda/chapter_deps.py: mapa central CHAPTER_DEPS (fuente de verdad) que declara, por capítulo de CHAPTER_ORDER, qué flags de cómputo (run_models/run_series/run_llm) y qué piezas de ctx (raw_numeric, timeseries_raw, geo_points, head_rows, db_path/table) necesita para no salir degradado. Helpers puros: resolve_requirements, resolve_profile_flags, needs_render_ctx, resolve_ctx_data_keys, validate_chapter_ids. - build_document(profile, ctx, only=None): parámetro only opcional que restringe el cuerpo a esos capítulos (portada primera + glosario última siempre). Lee la clave reservada ctx['_only_chapters'] cuando only es None, para propagar la selección a través de los renderers sin modificarlos. Retrocompatible. - render_automatic_eda(..., only_chapters=None): valida los ids (error claro dict-no-throw), resuelve las dependencias activando el cómputo necesario aunque el caller no lo pidiera (un flag explícito siempre prima) y construyendo solo las piezas de ctx que los capítulos pedidos leen (salta build_eda_render_ctx entero si ninguno necesita datos crudos). only_chapters=None produce el documento completo idéntico al de hoy. - Tests: chapter_deps_test.py (resolución pura), build_document_only_test.py (filtro), render_automatic_eda_only_test.py (golden con DuckDB: outliers suelto con IsolationForest poblado por resolución; timeseries activa run_series; eficiencia geospatial sin modelos; edge cases). - .md del pipeline: documenta only_chapters + emit_md; version 1.1.0 -> 1.2.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -73,24 +73,51 @@ def build_chapter(chapter_id: str, profile: dict, ctx: dict):
|
||||
return model.as_chapter(result)
|
||||
|
||||
|
||||
def build_document(profile: dict, ctx: dict = None) -> list:
|
||||
"""Build the full ordered list of chapters for a TableProfile.
|
||||
def build_document(profile: dict, ctx: dict = None, only: list = None) -> list:
|
||||
"""Build the ordered list of chapters for a TableProfile.
|
||||
|
||||
Args:
|
||||
profile: the ``eda`` group TableProfile dict (may be None/empty).
|
||||
ctx: optional context dict carrying presentation metadata not present in
|
||||
the profile (dataset_name, source_origin, storage, generated_at,
|
||||
description, granularity, quality_criteria, head_rows, ...).
|
||||
only: optional list of chapter ids to render. ``None`` (default) keeps
|
||||
the historical behaviour — every implemented & applicable chapter in
|
||||
canonical order. A list restricts the BODY to just those ids (in
|
||||
canonical order), but the cover (``portada``) and glossary
|
||||
(``glosario``) are ALWAYS included so the document stays valid and
|
||||
the clickable terms keep a destination — so passing ``only=["x"]``
|
||||
yields portada + x + glosario. Unknown ids are simply skipped (the
|
||||
caller is responsible for strict validation). ``only=[]`` yields the
|
||||
minimal document (portada + glosario only). This argument is additive
|
||||
and backward-compatible: the signature is unchanged for existing
|
||||
callers (default ``None``).
|
||||
|
||||
Returns:
|
||||
list[Chapter] in canonical order, containing only the chapters that are
|
||||
implemented and applicable. Never raises.
|
||||
implemented, applicable and selected. Never raises.
|
||||
"""
|
||||
if not isinstance(profile, dict):
|
||||
profile = {}
|
||||
# Copy ctx so the shared collector / summary we add do not leak to the caller.
|
||||
ctx = dict(ctx) if isinstance(ctx, dict) else {}
|
||||
|
||||
# only=None -> all body chapters (historical). only=list -> restrict body to
|
||||
# that selection (portada/glosario are added unconditionally below). The
|
||||
# renderers call build_document(profile, meta['ctx']) without an `only`
|
||||
# argument, so the pipeline forwards the selection through a reserved ctx key
|
||||
# (``_only_chapters``); an explicit `only` argument always wins. The key is
|
||||
# popped from the local ctx copy so it never reaches the chapters.
|
||||
if only is None:
|
||||
_carried = ctx.pop("_only_chapters", None)
|
||||
if isinstance(_carried, (list, tuple, set)):
|
||||
only = list(_carried)
|
||||
else:
|
||||
ctx.pop("_only_chapters", None)
|
||||
# A set makes the membership test cheap; the iteration order stays
|
||||
# CHAPTER_ORDER. only=[] is a valid (empty) selection -> minimal document.
|
||||
only_set = set(only) if isinstance(only, (list, tuple, set)) else None
|
||||
|
||||
# A single glossary collector is shared by every chapter via ctx['glossary'].
|
||||
# Chapters call ctx['glossary'].add(key, label, definition) and mark in-text
|
||||
# appearances with [[term:key]]…[[/term]]; the glosario chapter renders the
|
||||
@@ -106,6 +133,10 @@ def build_document(profile: dict, ctx: dict = None) -> list:
|
||||
for cid in CHAPTER_ORDER:
|
||||
if cid in (_PORTADA, _GLOSARIO):
|
||||
continue
|
||||
# When a selection is given, skip body chapters outside it. portada and
|
||||
# glosario are never filtered (handled out of this loop).
|
||||
if only_set is not None and cid not in only_set:
|
||||
continue
|
||||
ch = build_chapter(cid, profile, ctx)
|
||||
if ch is not None and ch.blocks:
|
||||
body.append(ch)
|
||||
|
||||
Reference in New Issue
Block a user