Files
fn_registry/python/functions/datascience/render_automatic_eda_pptx.py
T
egutierrez 9cdde4a341 feat(eda): núcleo AutomaticEDA — documento por capítulos + renderers PDF/PPTX anti-corte
Introduce la capa intermedia entre el contenido de un EDA y su formato de
salida. Un documento es una lista de capítulos versionados; cada capítulo es
un conjunto ordenado de bloques (heading, markdown, kv_table, data_table,
figure, image, caption, note) independientes del formato.

Núcleo (paquete de soporte python/functions/datascience/automatic_eda/):
- model.py: dataclasses de bloques + Chapter, normalizadores defensivos
  (aceptan dataclass o dict, nunca lanzan), ENGINE_VERSION y el manifiesto
  por capítulo (automatic_eda_manifest.json).
- text_layout.py: medición/wrapping por rejilla de caracteres compartida.
- chapters_registry.py: CHAPTER_ORDER pre-declarado + build_document con
  auto-discovery de capítulos por convención (permite añadir capítulos en
  paralelo sin editar el registro).
- render_pdf_impl.py: paginador A5 retrato móvil que MIDE cada bloque y nunca
  corta: texto a líneas completas, tablas largas partidas por filas repitiendo
  cabecera, figuras/imágenes escaladas para caber enteras. Pie versionado por
  capítulo.
- render_pptx_impl.py: mismo principio sobre slides 16:9 (continúa en slide
  "(cont.)"; tablas repiten cabecera; figuras exportadas a PNG escaladas).
- chapters/portada.py y chapters/overview.py: capítulos de referencia. Portada
  con nombre, rótulo Automatic-EDA, fuente, almacenamiento (inferido de
  source), fecha europea, filas×cols, descripción, granularidad y calidad con
  criterios. Overview con df.head (placeholder honesto si falta head_rows),
  diccionario de columnas (tipo/nulos/ejemplos) y describe numérico.

Funciones públicas del registry (grupo eda, dict-no-throw):
- render_automatic_eda_pdf / render_automatic_eda_pptx: aceptan capítulos o un
  TableProfile (construyen los capítulos con build_document) y escriben el
  manifiesto. Aditivas — no reemplazan render_eda_pdf.

Tests self-contained (sin DuckDB) para ambos renderers: golden (portada +
overview), partición de tablas largas repitiendo cabecera, no-corte de celdas
y markdown largos, profile None/{} válido de 1 página/slide, y error path en
directorio no escribible. 23 tests verdes (incluye los previos de
render_eda_pdf, intactos).

Dependencia nueva python-pptx>=1.0.2 declarada en python/pyproject.toml.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 14:30:31 +02:00

77 lines
3.2 KiB
Python

"""render_automatic_eda_pptx — chapter-based EDA report as a 16:9 PPTX deck.
Public ``eda``-group entry point that renders an AutomaticEDA document (a list
of chapters, or an ``eda`` TableProfile from which the canonical chapters are
built) into a PowerPoint deck for sharing. Same anti-cut principle as the PDF
renderer: every block is measured and, when it does not fit, continues on a new
slide titled ``<Chapter> (cont.)``; data tables split by rows repeating the
header; matplotlib figures are exported to PNG and inserted scaled to fit
entirely. Each slide is stamped ``<Chapter> · v<version>`` and a per-chapter
manifest (``automatic_eda_manifest.json``) is written next to the output.
dict-no-throw: never raises. Returns ``{path, n_slides, chapters,
manifest_path, note}``; on a fatal error ``path`` is None and ``note`` explains
why (e.g. python-pptx not installed).
Engine: ``python-pptx`` (added dependency; declared in python/pyproject.toml).
"""
from __future__ import annotations
import os
from datascience.automatic_eda import build_document, merge_manifest, render_pptx
from datascience.automatic_eda.model import as_chapter, as_chapters
def _coerce_chapters(chapters_or_profile, meta: dict) -> list:
"""Accept chapters OR an eda profile and return a list of Chapter."""
arg = chapters_or_profile
if isinstance(arg, (list, tuple)):
return as_chapters(list(arg))
if isinstance(arg, dict):
if "blocks" in arg and "columns" not in arg:
ch = as_chapter(arg)
return [ch] if ch is not None else []
return build_document(arg, (meta or {}).get("ctx"))
return []
def render_automatic_eda_pptx(chapters_or_profile, out_path: str,
meta: dict = None) -> dict:
"""Render an AutomaticEDA document into a shareable PPTX deck.
Args:
chapters_or_profile: a list of chapters (``Chapter`` dataclasses or
dicts) or an ``eda`` TableProfile dict (chapters built via
``build_document(profile, meta['ctx'])``).
out_path: filesystem path for the PPTX (parent dirs are created).
meta: optional dict. Recognised keys: ``title``, ``ctx``,
``manifest_path`` (defaults to ``automatic_eda_manifest.json`` beside
``out_path``), ``write_manifest`` (False to skip), ``generated_at``.
Returns:
dict (never raises): ``{path, n_slides, chapters, manifest_path, note}``.
"""
meta = dict(meta or {})
chapters = _coerce_chapters(chapters_or_profile, meta)
result = render_pptx(chapters, out_path, meta)
manifest_path = None
if meta.get("write_manifest", True) and result.get("path"):
manifest_path = meta.get("manifest_path")
if not manifest_path:
manifest_path = os.path.join(
os.path.dirname(os.path.abspath(out_path)),
"automatic_eda_manifest.json")
generated_at = meta.get("generated_at") or _now_iso()
merge_manifest(manifest_path, "pptx", result.get("chapters") or [],
generated_at)
result["manifest_path"] = manifest_path
return result
def _now_iso() -> str:
from datetime import datetime, timezone
return datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")