feat(eda): motor AutomaticEDA fase 4a — render fixes + keep-together + glosario clicable
Mejoras transversales del motor de render (no del contenido de capítulos): 1. Fix negrita pisa texto (PDF): _place_rich_lines mide el ancho REAL de cada span con las métricas de fuente del renderer (peso correcto) en vez del grid de ancho medio; negrita y normal en la misma línea ya no se solapan. 2. Zebra striping: filas pares sombreadas (#f6f8fa) en DataTable (PDF + PPTX), coherente al partir tablas largas (índice de fila lógico, no por página). 3. Keep-together: bloque Group nuevo; el renderer mide el grupo entero y lo mueve completo a la página/slide siguiente si no cabe, y encoge la figura (height_in) para dejar sitio a su título y texto. num_distr lo usa. 4. Caption siempre visible en toda figura PPTX (fallback al heading); la figura reserva el alto de su caption para que ambos quepan en el mismo slide. 5. Portada construida al final (con resumen agregado del análisis vía ctx['document_summary']) pero colocada primera por build_document. 6. Glosario: capítulo nuevo (último) + GlossaryCollector en ctx; los capítulos registran términos y marcan apariciones con [[term:key]]...[[/term]]. Links clicables reales: PDF (PyMuPDF, link GOTO) y PPTX (slide-jump nativo). Enganchado "entropía" en cat_distr como ejemplo end-to-end. Funciones reutilizables delegadas a fn-constructor (tag eda): - add_pdf_internal_links_py_datascience (PyMuPDF) - pptx_link_run_to_slide_py_datascience (slide-jump) Contrato docs/automatic_eda_contract.md actualizado (§1/§3/§5 + §11 nueva) con la API de glosario, keep-together y zebra para la siguiente fase. PyMuPDF declarado en pyproject. Suite verde (90 tests); golden titanic verificado. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -128,6 +128,39 @@ class Note:
|
||||
kind: str = field(default="note", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Group:
|
||||
"""A keep-together unit: its blocks render on the SAME page/slide.
|
||||
|
||||
Renderers measure the whole group first; if it does not fit in the remaining
|
||||
space they move it *whole* to the next page (PDF) or slide (PPTX) before
|
||||
drawing anything — so a heading never gets stranded apart from the figure and
|
||||
text it introduces. If the group is taller than a full page even on its own,
|
||||
it starts on a fresh page and flows (honest degradation, never cut). Use it to
|
||||
bind ``Heading`` + ``Markdown`` + ``Figure`` of one idea together (see the
|
||||
DISTR NUM / AGREGACION chapters).
|
||||
"""
|
||||
|
||||
blocks: list = field(default_factory=list)
|
||||
title: Optional[str] = None
|
||||
kind: str = field(default="group", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class GlossaryEntry:
|
||||
"""One glossary term: a clickable destination at the end of the document.
|
||||
|
||||
Rendered as the term ``label`` (heading) plus its ``definition`` (markdown).
|
||||
The renderers register its page/slide position as the link target so every
|
||||
in-text appearance of the same ``key`` becomes a real clickable jump (PDF link
|
||||
annotation via PyMuPDF; PPTX internal slide jump)."""
|
||||
|
||||
key: str = ""
|
||||
label: str = ""
|
||||
definition: str = ""
|
||||
kind: str = field(default="glossary_entry", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Chapter:
|
||||
"""An ordered set of blocks with an id, a title and a generation version."""
|
||||
@@ -150,13 +183,17 @@ _BLOCK_BY_KIND = {
|
||||
"image": Image,
|
||||
"caption": Caption,
|
||||
"note": Note,
|
||||
"group": Group,
|
||||
"glossary_entry": GlossaryEntry,
|
||||
}
|
||||
|
||||
|
||||
def as_block(obj: Any):
|
||||
"""Coerce a value into a block dataclass. Unknown values become a Note."""
|
||||
if isinstance(obj, (Heading, Markdown, KVTable, DataTable, Figure, Image,
|
||||
Caption, Note)):
|
||||
Caption, Note, Group, GlossaryEntry)):
|
||||
if isinstance(obj, Group):
|
||||
obj.blocks = as_blocks(obj.blocks)
|
||||
return obj
|
||||
if isinstance(obj, dict):
|
||||
kind = obj.get("kind")
|
||||
@@ -189,6 +226,13 @@ def as_block(obj: Any):
|
||||
return Caption(text=_safe_str(obj.get("text")))
|
||||
if cls is Note:
|
||||
return Note(text=_safe_str(obj.get("text")))
|
||||
if cls is Group:
|
||||
return Group(blocks=as_blocks(obj.get("blocks")),
|
||||
title=obj.get("title"))
|
||||
if cls is GlossaryEntry:
|
||||
return GlossaryEntry(key=_safe_str(obj.get("key")),
|
||||
label=_safe_str(obj.get("label")),
|
||||
definition=_safe_str(obj.get("definition")))
|
||||
except Exception: # noqa: BLE001 — never raise on a malformed block.
|
||||
return Note(text=_safe_str(obj))
|
||||
return Note(text=_safe_str(obj))
|
||||
@@ -246,6 +290,67 @@ def _safe_str(v: Any) -> str:
|
||||
return ""
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Glossary collector — chapters register the terms they use; the glosario
|
||||
# chapter renders them at the end and the renderers wire the clickable links.
|
||||
# --------------------------------------------------------------------------- #
|
||||
class GlossaryCollector:
|
||||
"""Accumulates glossary terms registered by chapters during document build.
|
||||
|
||||
A single instance is created by :func:`build_document` and passed to every
|
||||
chapter via ``ctx['glossary']``. A chapter calls ``add(key, label,
|
||||
definition)`` to declare a term it explains (e.g. ``"entropia"`` →
|
||||
"Entropía"), and marks each in-text appearance with the inline span
|
||||
``[[term:key]]texto visible[[/term]]`` (see ``text_layout.parse_inline_rich``).
|
||||
The ``glosario`` chapter reads ``terms()`` to emit one :class:`GlossaryEntry`
|
||||
per term; the renderers turn every marked appearance into a real click that
|
||||
jumps to that entry. First registration of a key wins (idempotent); never
|
||||
raises."""
|
||||
|
||||
def __init__(self):
|
||||
self._terms: dict = {}
|
||||
self._order: list = []
|
||||
|
||||
def add(self, key: Any, label: Any = None, definition: Any = "") -> str:
|
||||
"""Register a term and return its normalized key (''. if invalid)."""
|
||||
try:
|
||||
k = _safe_str(key).strip()
|
||||
if not k:
|
||||
return ""
|
||||
if k not in self._terms:
|
||||
self._terms[k] = {
|
||||
"key": k,
|
||||
"label": _safe_str(label).strip() or k,
|
||||
"definition": _safe_str(definition),
|
||||
}
|
||||
self._order.append(k)
|
||||
return k
|
||||
except Exception: # noqa: BLE001 — collecting a term never breaks a build.
|
||||
return ""
|
||||
|
||||
def has(self, key: Any) -> bool:
|
||||
return _safe_str(key).strip() in self._terms
|
||||
|
||||
def get(self, key: Any) -> Optional[dict]:
|
||||
return self._terms.get(_safe_str(key).strip())
|
||||
|
||||
def terms(self, by: str = "label") -> list:
|
||||
"""Return the registered terms as dicts.
|
||||
|
||||
``by='label'`` (default) sorts alphabetically by visible label;
|
||||
``by='order'`` keeps first-appearance order."""
|
||||
if by == "order":
|
||||
return [self._terms[k] for k in self._order]
|
||||
return sorted(self._terms.values(),
|
||||
key=lambda t: _safe_str(t.get("label")).lower())
|
||||
|
||||
def __len__(self) -> int:
|
||||
return len(self._terms)
|
||||
|
||||
def __bool__(self) -> bool:
|
||||
return bool(self._terms)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Manifest — per-chapter versions and page/slide counts for tracking.
|
||||
# --------------------------------------------------------------------------- #
|
||||
|
||||
Reference in New Issue
Block a user