feat(eda): núcleo AutomaticEDA — documento por capítulos + renderers PDF/PPTX anti-corte
Introduce la capa intermedia entre el contenido de un EDA y su formato de
salida. Un documento es una lista de capítulos versionados; cada capítulo es
un conjunto ordenado de bloques (heading, markdown, kv_table, data_table,
figure, image, caption, note) independientes del formato.
Núcleo (paquete de soporte python/functions/datascience/automatic_eda/):
- model.py: dataclasses de bloques + Chapter, normalizadores defensivos
(aceptan dataclass o dict, nunca lanzan), ENGINE_VERSION y el manifiesto
por capítulo (automatic_eda_manifest.json).
- text_layout.py: medición/wrapping por rejilla de caracteres compartida.
- chapters_registry.py: CHAPTER_ORDER pre-declarado + build_document con
auto-discovery de capítulos por convención (permite añadir capítulos en
paralelo sin editar el registro).
- render_pdf_impl.py: paginador A5 retrato móvil que MIDE cada bloque y nunca
corta: texto a líneas completas, tablas largas partidas por filas repitiendo
cabecera, figuras/imágenes escaladas para caber enteras. Pie versionado por
capítulo.
- render_pptx_impl.py: mismo principio sobre slides 16:9 (continúa en slide
"(cont.)"; tablas repiten cabecera; figuras exportadas a PNG escaladas).
- chapters/portada.py y chapters/overview.py: capítulos de referencia. Portada
con nombre, rótulo Automatic-EDA, fuente, almacenamiento (inferido de
source), fecha europea, filas×cols, descripción, granularidad y calidad con
criterios. Overview con df.head (placeholder honesto si falta head_rows),
diccionario de columnas (tipo/nulos/ejemplos) y describe numérico.
Funciones públicas del registry (grupo eda, dict-no-throw):
- render_automatic_eda_pdf / render_automatic_eda_pptx: aceptan capítulos o un
TableProfile (construyen los capítulos con build_document) y escriben el
manifiesto. Aditivas — no reemplazan render_eda_pdf.
Tests self-contained (sin DuckDB) para ambos renderers: golden (portada +
overview), partición de tablas largas repitiendo cabecera, no-corte de celdas
y markdown largos, profile None/{} válido de 1 página/slide, y error path en
directorio no escribible. 23 tests verdes (incluye los previos de
render_eda_pdf, intactos).
Dependencia nueva python-pptx>=1.0.2 declarada en python/pyproject.toml.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,57 @@
|
||||
"""AutomaticEDA — chapter-based, versioned EDA document with PDF + PPTX output.
|
||||
|
||||
Public surface (support package for the registry functions
|
||||
``render_automatic_eda_pdf`` and ``render_automatic_eda_pptx``):
|
||||
|
||||
- Document model: ``Heading``, ``Markdown``, ``KVTable``, ``DataTable``,
|
||||
``Figure``, ``Image``, ``Caption``, ``Note``, ``Chapter``; normalizers
|
||||
``as_blocks`` / ``as_chapters``; ``ENGINE_VERSION`` / ``ENGINE_NAME``.
|
||||
- ``build_document(profile, ctx)`` — assemble the ordered chapters of a profile.
|
||||
- ``render_pdf(chapters, out_path, meta)`` / ``render_pptx(...)`` — the two
|
||||
renderers (used by the public registry functions).
|
||||
- ``merge_manifest(...)`` — write/update the per-chapter version manifest.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from .model import ( # noqa: F401
|
||||
ENGINE_NAME,
|
||||
ENGINE_VERSION,
|
||||
Caption,
|
||||
Chapter,
|
||||
DataTable,
|
||||
Figure,
|
||||
Heading,
|
||||
Image,
|
||||
KVTable,
|
||||
Markdown,
|
||||
Note,
|
||||
as_blocks,
|
||||
as_chapters,
|
||||
merge_manifest,
|
||||
)
|
||||
from .chapters_registry import CHAPTER_ORDER, build_chapter, build_document # noqa: F401
|
||||
from .render_pdf_impl import render_pdf # noqa: F401
|
||||
from .render_pptx_impl import render_pptx # noqa: F401
|
||||
|
||||
__all__ = [
|
||||
"ENGINE_NAME",
|
||||
"ENGINE_VERSION",
|
||||
"Heading",
|
||||
"Markdown",
|
||||
"KVTable",
|
||||
"DataTable",
|
||||
"Figure",
|
||||
"Image",
|
||||
"Caption",
|
||||
"Note",
|
||||
"Chapter",
|
||||
"as_blocks",
|
||||
"as_chapters",
|
||||
"merge_manifest",
|
||||
"CHAPTER_ORDER",
|
||||
"build_chapter",
|
||||
"build_document",
|
||||
"render_pdf",
|
||||
"render_pptx",
|
||||
]
|
||||
@@ -0,0 +1,7 @@
|
||||
"""AutomaticEDA chapters.
|
||||
|
||||
Each chapter is a module ``<id>.py`` exposing ``build_<id>(profile, ctx) ->
|
||||
Chapter | None`` and a ``CHAPTER_VERSION`` constant. The canonical document
|
||||
order lives in :mod:`automatic_eda.chapters_registry`. Implemented today:
|
||||
``portada`` and ``overview`` (the reference chapters other agents copy).
|
||||
"""
|
||||
@@ -0,0 +1,176 @@
|
||||
"""Overview chapter — df.head, column dictionary and describe (reference).
|
||||
|
||||
Second reference chapter for AutomaticEDA. Renders (across as many pages/slides
|
||||
as needed, the renderers paginate):
|
||||
|
||||
1. ``df.head`` — the first rows of the table. The current ``TableProfile`` does
|
||||
NOT carry the raw head, so this is read from ``ctx['head_rows']`` /
|
||||
``profile['head_rows']`` (a list of row dicts). When absent the chapter shows
|
||||
an honest placeholder documenting the missing key instead of inventing data.
|
||||
2. Column dictionary — name / type / nulls / non-null examples. Examples come
|
||||
from ``columns[i]['examples']`` when present; otherwise they are derived from
|
||||
real non-null profile values (categorical top values, numeric min/median/max)
|
||||
so the cell is never empty nor fabricated.
|
||||
3. ``df.describe`` — mean / median / min / max / std for every numeric column.
|
||||
|
||||
Contract: build_<id>(profile, ctx) -> Chapter | None ; CHAPTER_VERSION = "x.y.z".
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from .. import model
|
||||
|
||||
CHAPTER_VERSION = "1.0.0"
|
||||
CHAPTER_ID = "overview"
|
||||
CHAPTER_TITLE = "Overview"
|
||||
|
||||
# Profile/ctx keys the calculation phase must add for a full head + examples.
|
||||
HEAD_KEY = "head_rows" # list[dict] — df.head(n)
|
||||
EXAMPLES_KEY = "examples" # per column: list of non-null sample values
|
||||
|
||||
|
||||
def _fmt_num(value, decimals: int = 3) -> str:
|
||||
if value is None:
|
||||
return "—"
|
||||
if isinstance(value, bool):
|
||||
return str(value)
|
||||
if isinstance(value, int):
|
||||
return f"{value:,}".replace(",", ".")
|
||||
if isinstance(value, float):
|
||||
if value != value: # NaN
|
||||
return "NaN"
|
||||
if value in (float("inf"), float("-inf")):
|
||||
return str(value)
|
||||
text = f"{value:.{decimals}f}".rstrip("0").rstrip(".")
|
||||
return text if text else "0"
|
||||
return str(value)
|
||||
|
||||
|
||||
def _fmt_pct(value, decimals: int = 1) -> str:
|
||||
if value is None:
|
||||
return "—"
|
||||
try:
|
||||
return f"{float(value) * 100:.{decimals}f}%"
|
||||
except (TypeError, ValueError):
|
||||
return str(value)
|
||||
|
||||
|
||||
def _examples_for(col: dict) -> str:
|
||||
"""Build a short string of real non-null example values for a column."""
|
||||
explicit = col.get(EXAMPLES_KEY)
|
||||
if isinstance(explicit, (list, tuple)) and explicit:
|
||||
return ", ".join(model._safe_str(v) for v in explicit[:4])
|
||||
cat = col.get("categorical") or {}
|
||||
top = cat.get("top") or []
|
||||
if top:
|
||||
vals = [model._safe_str((t or {}).get("value")) for t in top[:4]
|
||||
if isinstance(t, dict)]
|
||||
vals = [v for v in vals if v]
|
||||
if vals:
|
||||
return ", ".join(vals)
|
||||
num = col.get("numeric") or {}
|
||||
if num:
|
||||
bits = []
|
||||
for key in ("min", "median", "max"):
|
||||
v = num.get(key)
|
||||
if v is not None:
|
||||
bits.append(_fmt_num(v))
|
||||
if bits:
|
||||
return ", ".join(bits)
|
||||
return "—"
|
||||
|
||||
|
||||
def _head_block(profile: dict, ctx: dict):
|
||||
"""Return a DataTable for df.head, or a Note documenting the missing key."""
|
||||
head = ctx.get(HEAD_KEY) or profile.get(HEAD_KEY)
|
||||
if isinstance(head, list) and head and isinstance(head[0], dict):
|
||||
# Column order from the profile, then any extra keys present in rows.
|
||||
cols = [c.get("name") for c in (profile.get("columns") or [])
|
||||
if c.get("name")]
|
||||
if not cols:
|
||||
cols = list(head[0].keys())
|
||||
rows = [[model._safe_str(r.get(c)) for c in cols] for r in head[:10]]
|
||||
return model.DataTable(header=cols, rows=rows,
|
||||
note=f"primeras {len(rows)} filas")
|
||||
return model.Note(
|
||||
"df.head no disponible: el TableProfile no incluye 'head_rows'. La fase "
|
||||
"de cálculo debe añadir profile['head_rows'] (lista de dicts fila) o "
|
||||
"pasarlo en ctx['head_rows'] para mostrar las primeras filas.")
|
||||
|
||||
|
||||
def _columns_block(profile: dict):
|
||||
cols = profile.get("columns") or []
|
||||
header = ["Columna", "Tipo", "Nulos", "Ejemplos (no nulos)"]
|
||||
rows = []
|
||||
for c in cols:
|
||||
if not isinstance(c, dict):
|
||||
continue
|
||||
name = c.get("name") or "(col)"
|
||||
ctype = c.get("inferred_type") or c.get("physical_type") or "—"
|
||||
sem = c.get("semantic_type")
|
||||
if sem:
|
||||
ctype = f"{ctype} ({sem})"
|
||||
null_pct = c.get("null_pct")
|
||||
null_count = c.get("null_count")
|
||||
if null_pct is not None:
|
||||
nulls = _fmt_pct(null_pct)
|
||||
if null_count is not None:
|
||||
nulls += f" ({null_count})"
|
||||
elif null_count is not None:
|
||||
nulls = str(null_count)
|
||||
else:
|
||||
nulls = "—"
|
||||
rows.append([name, ctype, nulls, _examples_for(c)])
|
||||
if not rows:
|
||||
return None
|
||||
return model.DataTable(header=header, rows=rows, title="Columnas")
|
||||
|
||||
|
||||
def _describe_block(profile: dict):
|
||||
cols = profile.get("columns") or []
|
||||
header = ["Columna", "mean", "median", "min", "max", "std"]
|
||||
rows = []
|
||||
for c in cols:
|
||||
if not isinstance(c, dict) or c.get("inferred_type") != "numeric":
|
||||
continue
|
||||
num = c.get("numeric") or {}
|
||||
if not num:
|
||||
continue
|
||||
rows.append([
|
||||
c.get("name") or "(col)",
|
||||
_fmt_num(num.get("mean")),
|
||||
_fmt_num(num.get("median")),
|
||||
_fmt_num(num.get("min")),
|
||||
_fmt_num(num.get("max")),
|
||||
_fmt_num(num.get("std")),
|
||||
])
|
||||
if not rows:
|
||||
return None
|
||||
return model.DataTable(header=header, rows=rows, title="Estadística (describe)")
|
||||
|
||||
|
||||
def build_overview(profile: dict, ctx: dict):
|
||||
"""Build the Overview Chapter, or None if the profile has no columns."""
|
||||
profile = profile or {}
|
||||
ctx = ctx or {}
|
||||
cols = profile.get("columns") or []
|
||||
if not cols and not (ctx.get(HEAD_KEY) or profile.get(HEAD_KEY)):
|
||||
return None
|
||||
|
||||
blocks = [
|
||||
model.Heading(text="Primeras filas (df.head)", level=2),
|
||||
_head_block(profile, ctx),
|
||||
]
|
||||
cols_block = _columns_block(profile)
|
||||
if cols_block is not None:
|
||||
blocks.append(model.Heading(
|
||||
text="Diccionario de columnas", level=2))
|
||||
blocks.append(cols_block)
|
||||
desc_block = _describe_block(profile)
|
||||
if desc_block is not None:
|
||||
blocks.append(model.Heading(
|
||||
text="Resumen estadístico numérico", level=2))
|
||||
blocks.append(desc_block)
|
||||
|
||||
return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
|
||||
version=CHAPTER_VERSION, blocks=blocks)
|
||||
@@ -0,0 +1,156 @@
|
||||
"""Cover chapter (PORTADA) — the reference chapter for AutomaticEDA.
|
||||
|
||||
Builds the document cover from a TableProfile plus an optional ``ctx`` of
|
||||
presentation metadata. Reads everything defensively (``.get``) and degrades
|
||||
honestly: a field that is neither in the profile nor in ``ctx`` is shown as a
|
||||
placeholder rather than invented, leaving a hook for the LLM layer to fill it.
|
||||
|
||||
Contract for chapter authors (see ``docs/capabilities/automatic_eda.md``):
|
||||
build_<id>(profile: dict, ctx: dict) -> Chapter | None
|
||||
CHAPTER_VERSION = "x.y.z"
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from .. import model
|
||||
|
||||
CHAPTER_VERSION = "1.0.0"
|
||||
CHAPTER_ID = "portada"
|
||||
CHAPTER_TITLE = "Portada"
|
||||
|
||||
# Default human description of what the table quality score measures. Chapters
|
||||
# can override it via ctx["quality_criteria"].
|
||||
_DEFAULT_QUALITY_CRITERIA = (
|
||||
"media de los scores por columna (0–100): completitud (sin nulos/vacíos), "
|
||||
"validez (tipo y rango coherentes) y consistencia (sin duplicados/constantes)."
|
||||
)
|
||||
|
||||
|
||||
def _storage_from_source(source: str) -> str:
|
||||
"""Infer the storage technology the dataset currently lives in.
|
||||
|
||||
Heuristic on the profile ``source`` string (a path, DSN or backend name).
|
||||
Returns a human label; falls back to the raw source when unknown.
|
||||
"""
|
||||
s = (source or "").strip().lower()
|
||||
if not s:
|
||||
return "—"
|
||||
if s.endswith(".csv") or s.endswith(".tsv"):
|
||||
return "CSV"
|
||||
if s.endswith(".parquet") or s.endswith(".pq"):
|
||||
return "Parquet"
|
||||
if s.endswith(".json") or s.endswith(".ndjson"):
|
||||
return "JSON"
|
||||
if s.endswith(".xlsx") or s.endswith(".xls"):
|
||||
return "Excel"
|
||||
if s.endswith((".duckdb", ".ddb")) or s == "duckdb" or s.endswith(".db"):
|
||||
return "DuckDB"
|
||||
if s.startswith(("postgres://", "postgresql://")) or "postgres" in s:
|
||||
return "PostgreSQL"
|
||||
if s.startswith("bigquery") or "bigquery" in s or s.count(".") == 2 and " " not in s:
|
||||
return "BigQuery"
|
||||
if "sqlite" in s:
|
||||
return "SQLite"
|
||||
# Unknown: show the raw source so nothing is hidden.
|
||||
return source
|
||||
|
||||
|
||||
def _fmt_int(v) -> str:
|
||||
if v is None:
|
||||
return "—"
|
||||
try:
|
||||
return f"{int(v):,}".replace(",", ".")
|
||||
except (TypeError, ValueError):
|
||||
return str(v)
|
||||
|
||||
|
||||
def _fmt_date_eu(value) -> str:
|
||||
"""Format a date/ISO string as European DD/MM/AAAA HH:mm (UI convention).
|
||||
|
||||
Accepts a datetime, an ISO-8601 string (with or without microseconds/tz) or
|
||||
any other string. Non-parseable strings are returned verbatim so nothing is
|
||||
lost; None yields a placeholder.
|
||||
"""
|
||||
if value is None:
|
||||
return "—"
|
||||
if isinstance(value, datetime):
|
||||
return value.strftime("%d/%m/%Y %H:%M")
|
||||
s = str(value).strip()
|
||||
if not s:
|
||||
return "—"
|
||||
try:
|
||||
dt = datetime.fromisoformat(s.replace("Z", "+00:00"))
|
||||
return dt.strftime("%d/%m/%Y %H:%M")
|
||||
except (TypeError, ValueError):
|
||||
# Try a couple of common forms before giving up.
|
||||
for fmt in ("%Y-%m-%d %H:%M:%S UTC", "%Y-%m-%d %H:%M UTC",
|
||||
"%Y-%m-%d %H:%M:%S", "%Y-%m-%d"):
|
||||
try:
|
||||
return datetime.strptime(s, fmt).strftime("%d/%m/%Y %H:%M")
|
||||
except ValueError:
|
||||
continue
|
||||
return s
|
||||
|
||||
|
||||
def build_portada(profile: dict, ctx: dict):
|
||||
"""Build the cover Chapter, or None if there is truly nothing to show."""
|
||||
profile = profile or {}
|
||||
ctx = ctx or {}
|
||||
|
||||
dataset_name = (ctx.get("dataset_name") or profile.get("table")
|
||||
or "(dataset sin nombre)")
|
||||
source = profile.get("source") or ""
|
||||
# Where the dataset comes from (origin), distinct from where it is stored.
|
||||
source_origin = ctx.get("source_origin") or source or "—"
|
||||
storage = ctx.get("storage") or _storage_from_source(source)
|
||||
|
||||
when = _fmt_date_eu(
|
||||
ctx.get("generated_at") or profile.get("profiled_at")
|
||||
or datetime.now(timezone.utc))
|
||||
|
||||
n_rows = profile.get("n_rows")
|
||||
n_cols = profile.get("n_cols")
|
||||
shape = f"{_fmt_int(n_rows)} filas × {_fmt_int(n_cols)} columnas"
|
||||
|
||||
score = profile.get("quality_score")
|
||||
quality_criteria = ctx.get("quality_criteria") or _DEFAULT_QUALITY_CRITERIA
|
||||
quality_value = "—" if score is None else f"{score} / 100"
|
||||
|
||||
# Granularity: ctx wins; else derive from key candidates; else be honest.
|
||||
granularity = ctx.get("granularity")
|
||||
if not granularity:
|
||||
keys = profile.get("key_candidates") or []
|
||||
if keys:
|
||||
granularity = ("Cada fila parece identificada por "
|
||||
+ ", ".join(str(k) for k in keys[:3]) + ".")
|
||||
else:
|
||||
granularity = ("Cada fila es… (granularidad no determinada — "
|
||||
"pendiente de la capa de cálculo/LLM).")
|
||||
|
||||
description = ctx.get("description")
|
||||
if not description:
|
||||
description = ("Descripción no provista — pendiente de la capa LLM "
|
||||
"(`run_llm`) o de `ctx['description']`.")
|
||||
|
||||
blocks = [
|
||||
model.Heading(text=str(dataset_name), level=1),
|
||||
model.Markdown(text="**Automatic-EDA** · informe exploratorio automático"),
|
||||
model.KVTable(rows=[
|
||||
("Fuente", source_origin),
|
||||
("Almacenamiento", storage),
|
||||
("Generado", when),
|
||||
("Tamaño", shape),
|
||||
("Calidad", quality_value),
|
||||
("Criterios de calidad", quality_criteria),
|
||||
]),
|
||||
model.Heading(text="Descripción", level=2),
|
||||
model.Markdown(text=str(description)),
|
||||
model.Heading(text="Granularidad", level=2),
|
||||
model.Markdown(text=str(granularity)),
|
||||
]
|
||||
|
||||
return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
|
||||
version=CHAPTER_VERSION, blocks=blocks)
|
||||
@@ -0,0 +1,89 @@
|
||||
"""Chapter registry — the canonical order of an AutomaticEDA document.
|
||||
|
||||
``CHAPTER_ORDER`` declares every chapter the engine will *ever* place, in the
|
||||
order they appear in the document. Each id maps by convention to a module
|
||||
``automatic_eda/chapters/<id>.py`` exposing ``build_<id>(profile, ctx) ->
|
||||
Chapter | None`` and a ``CHAPTER_VERSION`` constant.
|
||||
|
||||
This pre-declared order is what lets many agents add chapters in parallel
|
||||
without contention: an agent only creates its own ``chapters/<id>.py`` module —
|
||||
it never edits this file. ``build_document`` imports each chapter lazily; a
|
||||
chapter whose module does not exist yet (not implemented) is simply skipped, so
|
||||
the document is always renderable with whatever chapters are present today.
|
||||
|
||||
``build_document`` never raises: a chapter that errors out is dropped with a
|
||||
note, and a chapter that returns ``None`` (does not apply to this dataset, e.g.
|
||||
time series on a dataset with no date column) is omitted.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
|
||||
from . import model
|
||||
|
||||
# Canonical document order. Implemented today: portada, overview. The rest are
|
||||
# placeholders other agents will fill by creating chapters/<id>.py — they will
|
||||
# appear in this exact position automatically once their module exists.
|
||||
CHAPTER_ORDER = [
|
||||
"portada", # cover
|
||||
"overview", # df.head + columns/types/nulls/examples + describe
|
||||
"num_distr", # numeric distributions
|
||||
"cat_distr", # categorical distributions
|
||||
"calidad", # data quality
|
||||
"correlacion", # correlations / associations
|
||||
"modelos", # cheap models (PCA/KMeans/outliers)
|
||||
"analisis_llm", # LLM interpretation
|
||||
"timeseries", # time-series analysis
|
||||
"geospatial", # geospatial
|
||||
"agregacion", # aggregations / pivots
|
||||
]
|
||||
|
||||
|
||||
def build_chapter(chapter_id: str, profile: dict, ctx: dict):
|
||||
"""Build a single chapter by id, or None if absent/not-applicable/error.
|
||||
|
||||
Looks up ``automatic_eda.chapters.<chapter_id>`` and calls its
|
||||
``build_<chapter_id>(profile, ctx)``. Returns a normalized Chapter, or None
|
||||
when the module is missing, the builder returns None, or anything raises.
|
||||
"""
|
||||
mod_name = f"{__package__}.chapters.{chapter_id}"
|
||||
try:
|
||||
mod = importlib.import_module(mod_name)
|
||||
except Exception: # noqa: BLE001 — chapter not implemented yet → skip.
|
||||
return None
|
||||
builder = getattr(mod, f"build_{chapter_id}", None)
|
||||
if builder is None:
|
||||
return None
|
||||
try:
|
||||
result = builder(profile or {}, ctx or {})
|
||||
except Exception: # noqa: BLE001 — a broken chapter never aborts the doc.
|
||||
return None
|
||||
return model.as_chapter(result)
|
||||
|
||||
|
||||
def build_document(profile: dict, ctx: dict = None) -> list:
|
||||
"""Build the full ordered list of chapters for a TableProfile.
|
||||
|
||||
Args:
|
||||
profile: the ``eda`` group TableProfile dict (may be None/empty).
|
||||
ctx: optional context dict carrying presentation metadata not present in
|
||||
the profile (dataset_name, source_origin, storage, generated_at,
|
||||
description, granularity, quality_criteria, head_rows, ...).
|
||||
|
||||
Returns:
|
||||
list[Chapter] in canonical order, containing only the chapters that are
|
||||
implemented and applicable. Never raises.
|
||||
"""
|
||||
if profile is None:
|
||||
profile = {}
|
||||
if not isinstance(profile, dict):
|
||||
profile = {}
|
||||
if ctx is None:
|
||||
ctx = {}
|
||||
chapters = []
|
||||
for cid in CHAPTER_ORDER:
|
||||
ch = build_chapter(cid, profile, ctx)
|
||||
if ch is not None and ch.blocks:
|
||||
chapters.append(ch)
|
||||
return chapters
|
||||
@@ -0,0 +1,310 @@
|
||||
"""AutomaticEDA document model — format-independent blocks and chapters.
|
||||
|
||||
This is the intermediate layer between *content* (what an EDA chapter wants to
|
||||
say) and *output format* (PDF for mobile reading, PPTX for sharing). A document
|
||||
is an ordered list of :class:`Chapter`. A chapter is ``{id, title, version,
|
||||
blocks}``. A block is one of a small, closed set of presentation primitives
|
||||
(heading, markdown, key/value table, data table, figure, image, caption, note).
|
||||
|
||||
Neither renderer knows anything about the EDA profile: they only know how to lay
|
||||
out blocks so that **nothing is ever cut** — long text wraps to whole lines,
|
||||
long tables split by rows repeating the header, figures and images are scaled to
|
||||
fit entirely. Each chapter declares its own ``version`` so every page/slide can
|
||||
be stamped ``<Chapter> · v<version>`` and tracked in a manifest for continuous,
|
||||
per-chapter improvement.
|
||||
|
||||
Reading is defensive throughout (the ``eda`` group "dict-no-throw" style): the
|
||||
normalizers accept dataclass blocks *or* plain dicts, coerce anything unknown
|
||||
into a readable :class:`Note` instead of raising, and the renderers degrade a
|
||||
malformed block to text rather than crashing the whole document.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Callable, Optional
|
||||
|
||||
# Global engine version. Bump when the document model or a renderer changes in a
|
||||
# way that affects output. Individual chapters carry their own CHAPTER_VERSION.
|
||||
ENGINE_VERSION = "1.0.0"
|
||||
ENGINE_NAME = "AutomaticEDA"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Block primitives. Each carries a stable ``kind`` string so renderers can
|
||||
# dispatch by kind (works for dataclass instances and for plain dicts alike).
|
||||
# --------------------------------------------------------------------------- #
|
||||
@dataclass
|
||||
class Heading:
|
||||
"""A section heading. ``level`` 1 (largest) .. 3 (smallest)."""
|
||||
|
||||
text: str = ""
|
||||
level: int = 1
|
||||
kind: str = field(default="heading", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Markdown:
|
||||
"""A block of light markdown text.
|
||||
|
||||
Supported subset (everything else is rendered verbatim, never dropped):
|
||||
``#``/``##``/``###`` headings, ``-``/``*`` bullet lists, ``| a | b |``
|
||||
tables (consecutive pipe lines become a data table), blank lines as
|
||||
paragraph breaks, and ``**bold**`` inline markers (markers are stripped, the
|
||||
text is kept). Text is wrapped to whole lines so it is never cut mid-line.
|
||||
"""
|
||||
|
||||
text: str = ""
|
||||
kind: str = field(default="markdown", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class KVTable:
|
||||
"""A two-column key/value table. ``rows`` is a list of ``(label, value)``."""
|
||||
|
||||
rows: list = field(default_factory=list)
|
||||
title: Optional[str] = None
|
||||
kind: str = field(default="kv_table", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class DataTable:
|
||||
"""A tabular block with a header row.
|
||||
|
||||
If it does not fit in the remaining page/slide space it is split by rows,
|
||||
**repeating the header** on each continuation. Long cell text wraps inside
|
||||
its column (the row grows taller) so no cell content is ever lost.
|
||||
"""
|
||||
|
||||
header: list = field(default_factory=list)
|
||||
rows: list = field(default_factory=list) # list[list[Any]]
|
||||
title: Optional[str] = None
|
||||
note: Optional[str] = None
|
||||
kind: str = field(default="data_table", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Figure:
|
||||
"""A matplotlib figure, scaled to fit entirely (never cropped).
|
||||
|
||||
Provide either an already-built ``fig`` (a ``matplotlib.figure.Figure``) or
|
||||
a zero-arg ``make`` callable that returns one (lazy: only built when the
|
||||
renderer needs it). ``height_in`` is an optional hint for the target height
|
||||
on the page; renderers clamp it to the available space preserving aspect.
|
||||
"""
|
||||
|
||||
fig: Any = None
|
||||
make: Optional[Callable[[], Any]] = None
|
||||
caption: Optional[str] = None
|
||||
height_in: Optional[float] = None
|
||||
kind: str = field(default="figure", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Image:
|
||||
"""A raster image (PNG/JPG) by path, scaled to fit entirely."""
|
||||
|
||||
path: str = ""
|
||||
caption: Optional[str] = None
|
||||
height_in: Optional[float] = None
|
||||
kind: str = field(default="image", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Caption:
|
||||
"""Small auxiliary text rendered under a figure/table."""
|
||||
|
||||
text: str = ""
|
||||
kind: str = field(default="caption", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Note:
|
||||
"""Small auxiliary note (italic). Also the fallback for unknown content."""
|
||||
|
||||
text: str = ""
|
||||
kind: str = field(default="note", init=False)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Chapter:
|
||||
"""An ordered set of blocks with an id, a title and a generation version."""
|
||||
|
||||
id: str = ""
|
||||
title: str = ""
|
||||
version: str = "1.0.0"
|
||||
blocks: list = field(default_factory=list)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Defensive normalizers — accept dataclasses OR plain dicts, never raise.
|
||||
# --------------------------------------------------------------------------- #
|
||||
_BLOCK_BY_KIND = {
|
||||
"heading": Heading,
|
||||
"markdown": Markdown,
|
||||
"kv_table": KVTable,
|
||||
"data_table": DataTable,
|
||||
"figure": Figure,
|
||||
"image": Image,
|
||||
"caption": Caption,
|
||||
"note": Note,
|
||||
}
|
||||
|
||||
|
||||
def as_block(obj: Any):
|
||||
"""Coerce a value into a block dataclass. Unknown values become a Note."""
|
||||
if isinstance(obj, (Heading, Markdown, KVTable, DataTable, Figure, Image,
|
||||
Caption, Note)):
|
||||
return obj
|
||||
if isinstance(obj, dict):
|
||||
kind = obj.get("kind")
|
||||
cls = _BLOCK_BY_KIND.get(kind)
|
||||
if cls is None:
|
||||
return Note(text=_safe_str(obj))
|
||||
# Build only with fields the dataclass accepts (ignore extras).
|
||||
try:
|
||||
if cls is Heading:
|
||||
return Heading(text=_safe_str(obj.get("text")),
|
||||
level=int(obj.get("level", 1) or 1))
|
||||
if cls is Markdown:
|
||||
return Markdown(text=_safe_str(obj.get("text")))
|
||||
if cls is KVTable:
|
||||
return KVTable(rows=list(obj.get("rows") or []),
|
||||
title=obj.get("title"))
|
||||
if cls is DataTable:
|
||||
return DataTable(header=list(obj.get("header") or []),
|
||||
rows=list(obj.get("rows") or []),
|
||||
title=obj.get("title"), note=obj.get("note"))
|
||||
if cls is Figure:
|
||||
return Figure(fig=obj.get("fig"), make=obj.get("make"),
|
||||
caption=obj.get("caption"),
|
||||
height_in=obj.get("height_in"))
|
||||
if cls is Image:
|
||||
return Image(path=_safe_str(obj.get("path")),
|
||||
caption=obj.get("caption"),
|
||||
height_in=obj.get("height_in"))
|
||||
if cls is Caption:
|
||||
return Caption(text=_safe_str(obj.get("text")))
|
||||
if cls is Note:
|
||||
return Note(text=_safe_str(obj.get("text")))
|
||||
except Exception: # noqa: BLE001 — never raise on a malformed block.
|
||||
return Note(text=_safe_str(obj))
|
||||
return Note(text=_safe_str(obj))
|
||||
|
||||
|
||||
def as_blocks(seq: Any) -> list:
|
||||
"""Normalize an arbitrary sequence into a list of block dataclasses."""
|
||||
if seq is None:
|
||||
return []
|
||||
if not isinstance(seq, (list, tuple)):
|
||||
return [as_block(seq)]
|
||||
return [as_block(b) for b in seq]
|
||||
|
||||
|
||||
def as_chapter(obj: Any) -> Optional[Chapter]:
|
||||
"""Coerce a value into a Chapter (or None). Accepts a dict or a Chapter."""
|
||||
if obj is None:
|
||||
return None
|
||||
if isinstance(obj, Chapter):
|
||||
obj.blocks = as_blocks(obj.blocks)
|
||||
return obj
|
||||
if isinstance(obj, dict):
|
||||
return Chapter(
|
||||
id=_safe_str(obj.get("id")),
|
||||
title=_safe_str(obj.get("title")) or _safe_str(obj.get("id")),
|
||||
version=_safe_str(obj.get("version")) or "1.0.0",
|
||||
blocks=as_blocks(obj.get("blocks")),
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def as_chapters(seq: Any) -> list:
|
||||
"""Normalize a sequence of chapters, dropping anything that can't coerce."""
|
||||
if seq is None:
|
||||
return []
|
||||
if isinstance(seq, Chapter):
|
||||
return [as_chapter(seq)]
|
||||
if not isinstance(seq, (list, tuple)):
|
||||
return []
|
||||
out = []
|
||||
for c in seq:
|
||||
ch = as_chapter(c)
|
||||
if ch is not None:
|
||||
out.append(ch)
|
||||
return out
|
||||
|
||||
|
||||
def _safe_str(v: Any) -> str:
|
||||
"""str() that never raises and maps None to ''."""
|
||||
if v is None:
|
||||
return ""
|
||||
try:
|
||||
return str(v)
|
||||
except Exception: # noqa: BLE001
|
||||
return ""
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Manifest — per-chapter versions and page/slide counts for tracking.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def merge_manifest(manifest_path: str, renderer: str, chapters_meta: list,
|
||||
generated_at: str,
|
||||
engine_version: str = ENGINE_VERSION) -> dict:
|
||||
"""Read-modify-write the AutomaticEDA manifest, merging one renderer's run.
|
||||
|
||||
The manifest lives next to the outputs as ``automatic_eda_manifest.json``
|
||||
and records, per chapter, its version plus the page count (PDF) and slide
|
||||
count (PPTX). Calling either renderer creates or updates it. Never raises:
|
||||
on any error returns the in-memory manifest without writing.
|
||||
|
||||
Args:
|
||||
manifest_path: path to the JSON manifest to create or update.
|
||||
renderer: "pdf" or "pptx" — selects which count key is written.
|
||||
chapters_meta: list of ``{"id", "version", "n_pages"|"n_slides"}``.
|
||||
generated_at: ISO-ish timestamp string for this run.
|
||||
engine_version: AutomaticEDA engine version.
|
||||
|
||||
Returns:
|
||||
The merged manifest dict (also written to disk on success).
|
||||
"""
|
||||
data: dict = {}
|
||||
try:
|
||||
if manifest_path and os.path.exists(manifest_path):
|
||||
with open(manifest_path, "r", encoding="utf-8") as fh:
|
||||
loaded = json.load(fh)
|
||||
if isinstance(loaded, dict):
|
||||
data = loaded
|
||||
except Exception: # noqa: BLE001 — a corrupt manifest is overwritten.
|
||||
data = {}
|
||||
|
||||
data["engine"] = ENGINE_NAME
|
||||
data["engine_version"] = engine_version
|
||||
data["generated_at"] = generated_at
|
||||
chapters = data.get("chapters")
|
||||
if not isinstance(chapters, dict):
|
||||
chapters = {}
|
||||
count_key = "n_slides" if renderer == "pptx" else "n_pages"
|
||||
for cm in chapters_meta or []:
|
||||
if not isinstance(cm, dict):
|
||||
continue
|
||||
cid = cm.get("id")
|
||||
if not cid:
|
||||
continue
|
||||
entry = chapters.get(cid)
|
||||
if not isinstance(entry, dict):
|
||||
entry = {}
|
||||
entry["version"] = cm.get("version") or entry.get("version") or "1.0.0"
|
||||
entry[count_key] = cm.get(count_key, cm.get("n_pages", cm.get("n_slides")))
|
||||
chapters[cid] = entry
|
||||
data["chapters"] = chapters
|
||||
|
||||
try:
|
||||
parent = os.path.dirname(os.path.abspath(manifest_path))
|
||||
os.makedirs(parent, exist_ok=True)
|
||||
with open(manifest_path, "w", encoding="utf-8") as fh:
|
||||
json.dump(data, fh, ensure_ascii=False, indent=2, default=str)
|
||||
except Exception: # noqa: BLE001 — never raise from the manifest writer.
|
||||
pass
|
||||
return data
|
||||
@@ -0,0 +1,532 @@
|
||||
"""AutomaticEDA PDF renderer — A5 portrait, mobile-first, never cuts content.
|
||||
|
||||
A flow paginator: it measures each block (using the deterministic character grid
|
||||
from :mod:`text_layout`) and places it top-to-bottom on the current page. When a
|
||||
unit does not fit in the remaining space it moves whole to the next page —
|
||||
text by whole lines (never mid-line, never mid-word), data tables by rows
|
||||
**repeating the header**, figures/images scaled to fit entirely (never cropped).
|
||||
|
||||
Each chapter starts on a fresh page and every page is stamped in the footer with
|
||||
``<Chapter> · v<version>`` plus the engine version and a running page number, so
|
||||
output is versioned per chapter for continuous improvement.
|
||||
|
||||
dict-no-throw: a failure inside one block is caught and noted; the PDF is always
|
||||
produced and at least one page is guaranteed. Engine: matplotlib ``PdfPages``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import io
|
||||
import os
|
||||
|
||||
import matplotlib
|
||||
|
||||
matplotlib.use("Agg")
|
||||
|
||||
import matplotlib.image as mpimg # noqa: E402
|
||||
import matplotlib.pyplot as plt # noqa: E402
|
||||
from matplotlib.backends.backend_pdf import PdfPages # noqa: E402
|
||||
from matplotlib.patches import Rectangle # noqa: E402
|
||||
|
||||
from . import model # noqa: E402
|
||||
from . import text_layout as tl # noqa: E402
|
||||
|
||||
# A5 portrait, inches.
|
||||
_W, _H = 5.83, 8.27
|
||||
_ML, _MR, _MT, _MB = 0.5, 0.42, 0.55, 0.5
|
||||
_FOOTER_H = 0.34
|
||||
_USABLE_W = _W - _ML - _MR
|
||||
_CONTENT_TOP = _MT
|
||||
_CONTENT_BOTTOM = _H - _MB - _FOOTER_H
|
||||
|
||||
# Palette / type (inherits the Tufte-ish mobile look of render_eda_pdf).
|
||||
_INK = "#1b1b1b"
|
||||
_ACCENT = "#2a6f97"
|
||||
_MUTED = "#8a8a8a"
|
||||
_RULE = "#cccccc"
|
||||
_HEAD_BG = "#eef3f6"
|
||||
|
||||
_RC = {
|
||||
"font.size": 10,
|
||||
"font.family": "sans-serif",
|
||||
"figure.facecolor": "white",
|
||||
"savefig.facecolor": "white",
|
||||
"pdf.fonttype": 42, # embed TrueType — text stays selectable on mobile.
|
||||
}
|
||||
|
||||
# Font sizes (pt) and derived line heights (in).
|
||||
_FS_H1, _FS_H2, _FS_H3 = 17, 13, 11
|
||||
_FS_BODY, _FS_CELL, _FS_NOTE = 10.5, 9.0, 9.0
|
||||
_GAP = 0.12 # vertical gap after a block, inches.
|
||||
_CELL_PAD = 0.06 # horizontal padding inside a table cell, inches.
|
||||
_ROW_VPAD = 0.05 # vertical padding inside a table row, inches.
|
||||
|
||||
|
||||
class _PdfState:
|
||||
"""Mutable layout cursor for the running PDF document."""
|
||||
|
||||
def __init__(self, pdf, title: str):
|
||||
self.pdf = pdf
|
||||
self.title = title
|
||||
self.fig = None
|
||||
self.y = _CONTENT_TOP # inches from the top of the page.
|
||||
self.page = 0 # global page counter.
|
||||
self.chapter = None # current Chapter (for the footer).
|
||||
self.chapter_pages = 0 # pages produced for the current chapter.
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Coordinate helpers (inches-from-top → matplotlib figure fraction).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _yf(y_in: float) -> float:
|
||||
return 1.0 - (y_in / _H)
|
||||
|
||||
|
||||
def _xf(x_in: float) -> float:
|
||||
return x_in / _W
|
||||
|
||||
|
||||
def _new_page(st: _PdfState) -> None:
|
||||
"""Close the current page (if any) and open a fresh one with a footer."""
|
||||
_flush_page(st)
|
||||
st.fig = plt.figure(figsize=(_W, _H))
|
||||
st.y = _CONTENT_TOP
|
||||
st.page += 1
|
||||
st.chapter_pages += 1
|
||||
_draw_footer(st)
|
||||
|
||||
|
||||
def _flush_page(st: _PdfState) -> None:
|
||||
if st.fig is not None:
|
||||
st.pdf.savefig(st.fig)
|
||||
plt.close(st.fig)
|
||||
st.fig = None
|
||||
|
||||
|
||||
def _draw_footer(st: _PdfState) -> None:
|
||||
ch = st.chapter
|
||||
left = ""
|
||||
if ch is not None:
|
||||
left = f"{ch.title} · v{ch.version}"
|
||||
right = f"{model.ENGINE_NAME} v{model.ENGINE_VERSION} · p.{st.page}"
|
||||
yb = (_MB * 0.45) / _H
|
||||
st.fig.text(_xf(_ML), yb, left, fontsize=7.5, color=_MUTED,
|
||||
ha="left", va="center")
|
||||
st.fig.text(_xf(_W - _MR), yb, right, fontsize=7.5, color=_MUTED,
|
||||
ha="right", va="center")
|
||||
# A thin rule above the footer.
|
||||
st.fig.add_artist(Rectangle(
|
||||
(_xf(_ML), (_MB + _FOOTER_H * 0.5) / _H),
|
||||
_xf(_W - _MR) - _xf(_ML), 0.0008,
|
||||
transform=st.fig.transFigure, color=_RULE, lw=0.6))
|
||||
|
||||
|
||||
def _remaining(st: _PdfState) -> float:
|
||||
return _CONTENT_BOTTOM - st.y
|
||||
|
||||
|
||||
def _ensure_space(st: _PdfState, height: float) -> None:
|
||||
"""Open a new page if ``height`` does not fit in the remaining space."""
|
||||
if _remaining(st) < height:
|
||||
_new_page(st)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Block placers. Each advances st.y and paginates as needed.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _place_heading(st: _PdfState, block) -> None:
|
||||
level = max(1, min(3, int(getattr(block, "level", 1) or 1)))
|
||||
fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
|
||||
text = tl.strip_inline_md(getattr(block, "text", ""))
|
||||
max_chars = tl.chars_per_line(_USABLE_W, fs)
|
||||
lines = tl.wrap(text, max_chars)
|
||||
lh = tl.line_height_in(fs, leading=1.2)
|
||||
block_h = lh * len(lines) + 0.06
|
||||
# Keep at least the heading + a couple of body lines together when possible.
|
||||
_ensure_space(st, min(block_h + tl.line_height_in(_FS_BODY) * 2,
|
||||
_CONTENT_BOTTOM - _CONTENT_TOP))
|
||||
for ln in lines:
|
||||
_ensure_space(st, lh)
|
||||
st.fig.text(_xf(_ML), _yf(st.y), ln, fontsize=fs, fontweight="bold",
|
||||
color=_INK, ha="left", va="top")
|
||||
st.y += lh
|
||||
if level == 1:
|
||||
# Accent underline under a top-level heading.
|
||||
st.fig.add_artist(Rectangle(
|
||||
(_xf(_ML), _yf(st.y + 0.02)), _xf(_ML + 1.4) - _xf(_ML), 0.0016,
|
||||
transform=st.fig.transFigure, color=_ACCENT, lw=0))
|
||||
st.y += 0.10
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
def _place_text_lines(st: _PdfState, lines: list, fs: float, color: str,
|
||||
style: str = "normal", indent: float = 0.0) -> None:
|
||||
lh = tl.line_height_in(fs)
|
||||
for ln in lines:
|
||||
_ensure_space(st, lh)
|
||||
st.fig.text(_xf(_ML + indent), _yf(st.y), ln, fontsize=fs, color=color,
|
||||
ha="left", va="top", style=style)
|
||||
st.y += lh
|
||||
|
||||
|
||||
def _place_markdown(st: _PdfState, block) -> None:
|
||||
raw = getattr(block, "text", "") or ""
|
||||
md_lines = str(raw).split("\n")
|
||||
i = 0
|
||||
n = len(md_lines)
|
||||
while i < n:
|
||||
line = md_lines[i]
|
||||
stripped = line.strip()
|
||||
# Consecutive pipe-table lines → a DataTable.
|
||||
if stripped.startswith("|") and stripped.endswith("|"):
|
||||
j = i
|
||||
tbl_lines = []
|
||||
while j < n and md_lines[j].strip().startswith("|") \
|
||||
and md_lines[j].strip().endswith("|"):
|
||||
tbl_lines.append(md_lines[j])
|
||||
j += 1
|
||||
parsed = tl.parse_md_table(tbl_lines)
|
||||
if parsed:
|
||||
header, rows = parsed
|
||||
_place_data_table(st, model.DataTable(header=header, rows=rows))
|
||||
i = j
|
||||
continue
|
||||
if stripped == "":
|
||||
st.y += tl.line_height_in(_FS_BODY) * 0.5
|
||||
i += 1
|
||||
continue
|
||||
if stripped.startswith("### "):
|
||||
_place_heading(st, model.Heading(stripped[4:], level=3))
|
||||
i += 1
|
||||
continue
|
||||
if stripped.startswith("## "):
|
||||
_place_heading(st, model.Heading(stripped[3:], level=2))
|
||||
i += 1
|
||||
continue
|
||||
if stripped.startswith("# "):
|
||||
_place_heading(st, model.Heading(stripped[2:], level=1))
|
||||
i += 1
|
||||
continue
|
||||
if stripped.startswith("- ") or stripped.startswith("* "):
|
||||
content = tl.strip_inline_md(stripped[2:])
|
||||
bullet_chars = tl.chars_per_line(_USABLE_W - 0.22, _FS_BODY)
|
||||
wrapped = tl.wrap(content, bullet_chars)
|
||||
first = True
|
||||
for w in wrapped:
|
||||
prefix = "• " if first else " "
|
||||
_place_text_lines(st, [prefix + w], _FS_BODY, _INK,
|
||||
indent=0.0)
|
||||
first = False
|
||||
i += 1
|
||||
continue
|
||||
# Plain paragraph (gather following plain lines into one paragraph).
|
||||
para = [tl.strip_inline_md(stripped)]
|
||||
j = i + 1
|
||||
while j < n:
|
||||
nxt = md_lines[j].strip()
|
||||
if nxt == "" or nxt.startswith(("|", "#", "- ", "* ")):
|
||||
break
|
||||
para.append(tl.strip_inline_md(nxt))
|
||||
j += 1
|
||||
text = " ".join(para)
|
||||
max_chars = tl.chars_per_line(_USABLE_W, _FS_BODY)
|
||||
_place_text_lines(st, tl.wrap(text, max_chars), _FS_BODY, _INK)
|
||||
i = j
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
def _place_kv_table(st: _PdfState, block) -> None:
|
||||
title = getattr(block, "title", None)
|
||||
if title:
|
||||
_place_heading(st, model.Heading(title, level=2))
|
||||
rows = getattr(block, "rows", []) or []
|
||||
key_w = 1.9 # inches reserved for the label column.
|
||||
val_chars = tl.chars_per_line(_USABLE_W - key_w - 0.1, _FS_BODY)
|
||||
lh = tl.line_height_in(_FS_BODY)
|
||||
for row in rows:
|
||||
try:
|
||||
label, value = row[0], row[1]
|
||||
except Exception: # noqa: BLE001
|
||||
label, value = str(row), ""
|
||||
v_lines = tl.wrap(model._safe_str(value), val_chars)
|
||||
row_h = lh * len(v_lines) + _ROW_VPAD
|
||||
_ensure_space(st, row_h)
|
||||
y0 = st.y
|
||||
st.fig.text(_xf(_ML), _yf(y0), tl.strip_inline_md(model._safe_str(label)),
|
||||
fontsize=_FS_BODY, color=_MUTED, ha="left", va="top")
|
||||
for k, vl in enumerate(v_lines):
|
||||
st.fig.text(_xf(_ML + key_w), _yf(y0 + k * lh), vl,
|
||||
fontsize=_FS_BODY, color=_INK, ha="left", va="top")
|
||||
st.y = y0 + row_h
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
def _col_widths(header: list, rows: list, fs: float) -> list:
|
||||
"""Distribute usable width across columns proportional to content length."""
|
||||
ncol = len(header) if header else (len(rows[0]) if rows else 1)
|
||||
ncol = max(1, ncol)
|
||||
natural = [3] * ncol
|
||||
for c in range(ncol):
|
||||
if header and c < len(header):
|
||||
natural[c] = max(natural[c], len(model._safe_str(header[c])))
|
||||
for r in rows:
|
||||
if c < len(r):
|
||||
natural[c] = max(natural[c], len(model._safe_str(r[c])))
|
||||
# Clamp so one very long column does not starve the others.
|
||||
clamped = [min(max(w, 4), 40) for w in natural]
|
||||
total = float(sum(clamped)) or 1.0
|
||||
widths = [_USABLE_W * w / total for w in clamped]
|
||||
# Enforce a minimum readable column width.
|
||||
min_w = 0.45
|
||||
widths = [max(w, min_w) for w in widths]
|
||||
# Renormalize if the minimums pushed us over the usable width.
|
||||
s = sum(widths)
|
||||
if s > _USABLE_W:
|
||||
widths = [w * _USABLE_W / s for w in widths]
|
||||
return widths
|
||||
|
||||
|
||||
def _wrap_row(cells: list, widths: list, fs: float) -> list:
|
||||
"""Wrap each cell to its column width → list of line-lists per cell."""
|
||||
out = []
|
||||
for c, w in enumerate(widths):
|
||||
text = model._safe_str(cells[c]) if c < len(cells) else ""
|
||||
max_chars = tl.chars_per_line(w - _CELL_PAD * 2, fs)
|
||||
out.append(tl.wrap(text, max_chars))
|
||||
return out
|
||||
|
||||
|
||||
def _draw_table_row(st: _PdfState, cells_lines: list, widths: list, fs: float,
|
||||
y0: float, header: bool) -> float:
|
||||
lh = tl.line_height_in(fs)
|
||||
nlines = max((len(c) for c in cells_lines), default=1)
|
||||
row_h = lh * nlines + _ROW_VPAD * 2
|
||||
if header:
|
||||
st.fig.add_artist(Rectangle(
|
||||
(_xf(_ML), _yf(y0 + row_h)), _xf(_ML + _USABLE_W) - _xf(_ML),
|
||||
_yf(y0) - _yf(y0 + row_h), transform=st.fig.transFigure,
|
||||
color=_HEAD_BG, lw=0, zorder=0))
|
||||
x = _ML
|
||||
for c, lines in enumerate(cells_lines):
|
||||
for k, ln in enumerate(lines):
|
||||
st.fig.text(_xf(x + _CELL_PAD), _yf(y0 + _ROW_VPAD + k * lh), ln,
|
||||
fontsize=fs, color=_INK,
|
||||
fontweight="bold" if header else "normal",
|
||||
ha="left", va="top", zorder=2)
|
||||
x += widths[c]
|
||||
# Bottom rule of the row.
|
||||
st.fig.add_artist(Rectangle(
|
||||
(_xf(_ML), _yf(y0 + row_h)), _xf(_ML + _USABLE_W) - _xf(_ML), 0.0006,
|
||||
transform=st.fig.transFigure, color=_RULE, lw=0, zorder=1))
|
||||
return row_h
|
||||
|
||||
|
||||
def _place_data_table(st: _PdfState, block) -> None:
|
||||
title = getattr(block, "title", None)
|
||||
if title:
|
||||
_place_heading(st, model.Heading(title, level=2))
|
||||
header = list(getattr(block, "header", []) or [])
|
||||
rows = list(getattr(block, "rows", []) or [])
|
||||
fs = _FS_CELL
|
||||
widths = _col_widths(header, rows, fs)
|
||||
header_lines = _wrap_row(header, widths, fs) if header else None
|
||||
lh = tl.line_height_in(fs)
|
||||
|
||||
def header_h() -> float:
|
||||
if not header_lines:
|
||||
return 0.0
|
||||
return lh * max((len(c) for c in header_lines), default=1) + _ROW_VPAD * 2
|
||||
|
||||
def draw_header() -> None:
|
||||
if header_lines:
|
||||
st.y += _draw_table_row(st, header_lines, widths, fs, st.y,
|
||||
header=True)
|
||||
|
||||
# Ensure header + first row fit, else start on a new page.
|
||||
first_row_h = 0.0
|
||||
if rows:
|
||||
first_lines = _wrap_row(rows[0], widths, fs)
|
||||
first_row_h = lh * max((len(c) for c in first_lines), default=1) \
|
||||
+ _ROW_VPAD * 2
|
||||
_ensure_space(st, header_h() + max(first_row_h, lh))
|
||||
draw_header()
|
||||
for r in rows:
|
||||
cells_lines = _wrap_row(r, widths, fs)
|
||||
row_h = lh * max((len(c) for c in cells_lines), default=1) \
|
||||
+ _ROW_VPAD * 2
|
||||
if _remaining(st) < row_h:
|
||||
_new_page(st)
|
||||
draw_header() # repeat header on the continuation page.
|
||||
st.y += _draw_table_row(st, cells_lines, widths, fs, st.y, header=False)
|
||||
note = getattr(block, "note", None)
|
||||
if note:
|
||||
_place_text_lines(st, tl.wrap(model._safe_str(note),
|
||||
tl.chars_per_line(_USABLE_W, _FS_NOTE)),
|
||||
_FS_NOTE, _MUTED, style="italic")
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
def _resolve_figure(block):
|
||||
fig = getattr(block, "fig", None)
|
||||
if fig is not None:
|
||||
return fig, False
|
||||
make = getattr(block, "make", None)
|
||||
if callable(make):
|
||||
try:
|
||||
return make(), True
|
||||
except Exception: # noqa: BLE001
|
||||
return None, False
|
||||
return None, False
|
||||
|
||||
|
||||
def _png_from_figure(fig) -> bytes:
|
||||
buf = io.BytesIO()
|
||||
fig.savefig(buf, format="png", dpi=150, bbox_inches="tight")
|
||||
buf.seek(0)
|
||||
return buf.read()
|
||||
|
||||
|
||||
def _place_image_array(st: _PdfState, arr, caption) -> None:
|
||||
h_px, w_px = arr.shape[0], arr.shape[1]
|
||||
aspect = (h_px / w_px) if w_px else 1.0
|
||||
max_h = _CONTENT_BOTTOM - _CONTENT_TOP
|
||||
target_w = _USABLE_W
|
||||
target_h = target_w * aspect
|
||||
if target_h > max_h:
|
||||
target_h = max_h
|
||||
target_w = target_h / aspect if aspect else _USABLE_W
|
||||
cap_h = tl.line_height_in(_FS_NOTE) + 0.04 if caption else 0.0
|
||||
# Move whole image to next page if it does not fit in remaining space.
|
||||
if _remaining(st) < target_h + cap_h:
|
||||
if (max_h) >= target_h + cap_h:
|
||||
_new_page(st)
|
||||
else:
|
||||
# Taller than a full page even at min — already clamped to max_h.
|
||||
_new_page(st)
|
||||
left_frac = _xf(_ML + (_USABLE_W - target_w) / 2.0)
|
||||
bottom_frac = _yf(st.y + target_h)
|
||||
ax = st.fig.add_axes([left_frac, bottom_frac, target_w / _W, target_h / _H])
|
||||
ax.imshow(arr)
|
||||
ax.axis("off")
|
||||
st.y += target_h + 0.04
|
||||
if caption:
|
||||
_place_text_lines(st, tl.wrap(model._safe_str(caption),
|
||||
tl.chars_per_line(_USABLE_W, _FS_NOTE)),
|
||||
_FS_NOTE, _MUTED, style="italic")
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
def _place_figure(st: _PdfState, block) -> None:
|
||||
fig, owned = _resolve_figure(block)
|
||||
if fig is None:
|
||||
_place_text_lines(st, ["(figura no disponible)"], _FS_NOTE, _MUTED,
|
||||
style="italic")
|
||||
st.y += _GAP
|
||||
return
|
||||
try:
|
||||
png = _png_from_figure(fig)
|
||||
finally:
|
||||
if owned:
|
||||
try:
|
||||
plt.close(fig)
|
||||
except Exception: # noqa: BLE001
|
||||
pass
|
||||
arr = mpimg.imread(io.BytesIO(png))
|
||||
_place_image_array(st, arr, getattr(block, "caption", None))
|
||||
|
||||
|
||||
def _place_image(st: _PdfState, block) -> None:
|
||||
path = getattr(block, "path", "")
|
||||
if not path or not os.path.exists(path):
|
||||
_place_text_lines(st, [f"(imagen no encontrada: {path})"], _FS_NOTE,
|
||||
_MUTED, style="italic")
|
||||
st.y += _GAP
|
||||
return
|
||||
arr = mpimg.imread(path)
|
||||
_place_image_array(st, arr, getattr(block, "caption", None))
|
||||
|
||||
|
||||
def _place_caption(st: _PdfState, block) -> None:
|
||||
_place_text_lines(st, tl.wrap(getattr(block, "text", ""),
|
||||
tl.chars_per_line(_USABLE_W, _FS_NOTE)),
|
||||
_FS_NOTE, _MUTED, style="italic")
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
def _place_note(st: _PdfState, block) -> None:
|
||||
_place_text_lines(st, tl.wrap(getattr(block, "text", ""),
|
||||
tl.chars_per_line(_USABLE_W, _FS_NOTE)),
|
||||
_FS_NOTE, _MUTED, style="italic")
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
_PLACERS = {
|
||||
"heading": _place_heading,
|
||||
"markdown": _place_markdown,
|
||||
"kv_table": _place_kv_table,
|
||||
"data_table": _place_data_table,
|
||||
"figure": _place_figure,
|
||||
"image": _place_image,
|
||||
"caption": _place_caption,
|
||||
"note": _place_note,
|
||||
}
|
||||
|
||||
|
||||
def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:
|
||||
"""Render a list of Chapters into an A5-portrait, mobile-readable PDF.
|
||||
|
||||
Never raises. Returns ``{path, n_pages, chapters, note}`` where ``chapters``
|
||||
is a list of ``{id, version, n_pages}`` for the manifest. On a fatal write
|
||||
error ``path`` is None and ``note`` explains why.
|
||||
"""
|
||||
meta = meta or {}
|
||||
chapters = model.as_chapters(chapters)
|
||||
notes = []
|
||||
|
||||
try:
|
||||
parent = os.path.dirname(os.path.abspath(out_path))
|
||||
os.makedirs(parent, exist_ok=True)
|
||||
except OSError as e:
|
||||
return {"path": None, "n_pages": 0, "chapters": [],
|
||||
"note": f"no se pudo crear el directorio destino: {e}"}
|
||||
|
||||
title = meta.get("title") or model.ENGINE_NAME
|
||||
chapters_meta = []
|
||||
try:
|
||||
with plt.rc_context(_RC):
|
||||
with PdfPages(out_path) as pdf:
|
||||
st = _PdfState(pdf, title)
|
||||
for ch in chapters:
|
||||
st.chapter = ch
|
||||
st.chapter_pages = 0
|
||||
_new_page(st) # each chapter starts on a fresh page.
|
||||
for block in ch.blocks:
|
||||
placer = _PLACERS.get(getattr(block, "kind", ""),
|
||||
_place_note)
|
||||
try:
|
||||
placer(st, block)
|
||||
except Exception as e: # noqa: BLE001
|
||||
notes.append(
|
||||
f"bloque '{getattr(block, 'kind', '?')}' del "
|
||||
f"capítulo '{ch.id}' omitido: {e}")
|
||||
chapters_meta.append({"id": ch.id, "version": ch.version,
|
||||
"n_pages": st.chapter_pages})
|
||||
_flush_page(st)
|
||||
if st.page == 0:
|
||||
# No chapters at all → guarantee one valid page.
|
||||
st.chapter = model.Chapter(id="vacio", title=title,
|
||||
version=model.ENGINE_VERSION)
|
||||
_new_page(st)
|
||||
_place_note(st, model.Note(
|
||||
"(documento vacío — sin capítulos aplicables)"))
|
||||
_flush_page(st)
|
||||
n_pages = st.page
|
||||
except Exception as e: # noqa: BLE001
|
||||
return {"path": None, "n_pages": 0, "chapters": [],
|
||||
"note": f"fallo al escribir el PDF: {e}"}
|
||||
|
||||
note = f"{n_pages} páginas"
|
||||
if notes:
|
||||
note += " · " + "; ".join(notes)
|
||||
return {"path": out_path, "n_pages": n_pages, "chapters": chapters_meta,
|
||||
"note": note}
|
||||
@@ -0,0 +1,518 @@
|
||||
"""AutomaticEDA PPTX renderer — 16:9 slides, never cuts content.
|
||||
|
||||
Same flow principle as the PDF renderer but onto PowerPoint slides: measure each
|
||||
block and place it top-to-bottom; when it does not fit in the remaining slide
|
||||
space, continue on a new slide titled ``<Chapter> (cont.)``. Data tables split by
|
||||
rows **repeating the header**; figures/images are scaled to fit entirely. Every
|
||||
slide carries a footer ``<Chapter> · v<version>`` plus the engine version.
|
||||
|
||||
dict-no-throw: a failure inside one block is caught and noted; the deck is always
|
||||
produced with at least one slide. Engine: ``python-pptx`` (added dependency).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import io
|
||||
import os
|
||||
|
||||
from . import model
|
||||
from . import text_layout as tl
|
||||
|
||||
try:
|
||||
from pptx import Presentation
|
||||
from pptx.util import Inches, Pt, Emu
|
||||
from pptx.dml.color import RGBColor
|
||||
from pptx.enum.text import PP_ALIGN
|
||||
_PPTX_OK = True
|
||||
_PPTX_ERR = ""
|
||||
except Exception as _e: # noqa: BLE001 — surfaced as a dict-no-throw note.
|
||||
_PPTX_OK = False
|
||||
_PPTX_ERR = str(_e)
|
||||
|
||||
# 16:9 widescreen, inches.
|
||||
_W, _H = 13.333, 7.5
|
||||
_ML, _MR = 0.7, 0.7
|
||||
_TITLE_TOP, _TITLE_H = 0.28, 0.7
|
||||
_CONTENT_TOP = 1.12
|
||||
_FOOTER_H = 0.4
|
||||
_CONTENT_BOTTOM = _H - _FOOTER_H - 0.15
|
||||
_USABLE_W = _W - _ML - _MR
|
||||
|
||||
_INK = (0x1B, 0x1B, 0x1B)
|
||||
_ACCENT = (0x2A, 0x6F, 0x97)
|
||||
_MUTED = (0x8A, 0x8A, 0x8A)
|
||||
_HEAD_BG = (0xEE, 0xF3, 0xF6)
|
||||
_WHITE = (0xFF, 0xFF, 0xFF)
|
||||
|
||||
_FS_TITLE = 26
|
||||
_FS_H1, _FS_H2, _FS_H3 = 20, 16, 13
|
||||
_FS_BODY, _FS_CELL, _FS_NOTE = 14, 11, 11
|
||||
_GAP = 0.12
|
||||
|
||||
|
||||
class _PptxState:
|
||||
def __init__(self, prs, title: str):
|
||||
self.prs = prs
|
||||
self.title = title
|
||||
self.slide = None
|
||||
self.y = _CONTENT_TOP
|
||||
self.chapter = None
|
||||
self.slide_no = 0
|
||||
self.chapter_slides = 0
|
||||
|
||||
|
||||
def _rgb(c):
|
||||
return RGBColor(*c)
|
||||
|
||||
|
||||
def _new_slide(st: _PptxState, cont: bool = False) -> None:
|
||||
blank = st.prs.slide_layouts[6]
|
||||
st.slide = st.prs.slides.add_slide(blank)
|
||||
st.y = _CONTENT_TOP
|
||||
st.slide_no += 1
|
||||
st.chapter_slides += 1
|
||||
_draw_title(st, cont)
|
||||
_draw_footer(st)
|
||||
|
||||
|
||||
def _draw_title(st: _PptxState, cont: bool) -> None:
|
||||
ch = st.chapter
|
||||
title = ch.title if ch is not None else st.title
|
||||
if cont:
|
||||
title = f"{title} (cont.)"
|
||||
box = st.slide.shapes.add_textbox(
|
||||
Inches(_ML), Inches(_TITLE_TOP), Inches(_USABLE_W), Inches(_TITLE_H))
|
||||
tf = box.text_frame
|
||||
tf.word_wrap = True
|
||||
p = tf.paragraphs[0]
|
||||
run = p.add_run()
|
||||
run.text = title
|
||||
run.font.size = Pt(_FS_TITLE)
|
||||
run.font.bold = True
|
||||
run.font.color.rgb = _rgb(_INK)
|
||||
|
||||
|
||||
def _draw_footer(st: _PptxState) -> None:
|
||||
ch = st.chapter
|
||||
left = f"{ch.title} · v{ch.version}" if ch is not None else ""
|
||||
right = f"{model.ENGINE_NAME} v{model.ENGINE_VERSION} · {st.slide_no}"
|
||||
box = st.slide.shapes.add_textbox(
|
||||
Inches(_ML), Inches(_H - _FOOTER_H), Inches(_USABLE_W),
|
||||
Inches(_FOOTER_H * 0.7))
|
||||
tf = box.text_frame
|
||||
tf.word_wrap = False
|
||||
p = tf.paragraphs[0]
|
||||
r = p.add_run()
|
||||
r.text = left
|
||||
r.font.size = Pt(9)
|
||||
r.font.color.rgb = _rgb(_MUTED)
|
||||
# Right-aligned engine stamp on a second textbox.
|
||||
box2 = st.slide.shapes.add_textbox(
|
||||
Inches(_ML), Inches(_H - _FOOTER_H), Inches(_USABLE_W),
|
||||
Inches(_FOOTER_H * 0.7))
|
||||
tf2 = box2.text_frame
|
||||
p2 = tf2.paragraphs[0]
|
||||
p2.alignment = PP_ALIGN.RIGHT
|
||||
r2 = p2.add_run()
|
||||
r2.text = right
|
||||
r2.font.size = Pt(9)
|
||||
r2.font.color.rgb = _rgb(_MUTED)
|
||||
|
||||
|
||||
def _remaining(st: _PptxState) -> float:
|
||||
return _CONTENT_BOTTOM - st.y
|
||||
|
||||
|
||||
def _ensure(st: _PptxState, height: float) -> None:
|
||||
if _remaining(st) < height:
|
||||
_new_slide(st, cont=True)
|
||||
|
||||
|
||||
def _add_text(st: _PptxState, lines: list, fs: float, color, bold=False,
|
||||
italic=False, indent=0.0, bullet=False) -> None:
|
||||
lh = tl.line_height_in(fs)
|
||||
height = lh * len(lines) + 0.05
|
||||
_ensure(st, height)
|
||||
box = st.slide.shapes.add_textbox(
|
||||
Inches(_ML + indent), Inches(st.y), Inches(_USABLE_W - indent),
|
||||
Inches(height))
|
||||
tf = box.text_frame
|
||||
tf.word_wrap = True
|
||||
first = True
|
||||
for ln in lines:
|
||||
p = tf.paragraphs[0] if first else tf.add_paragraph()
|
||||
first = False
|
||||
run = p.add_run()
|
||||
run.text = ("• " + ln) if bullet else ln
|
||||
run.font.size = Pt(fs)
|
||||
run.font.bold = bold
|
||||
run.font.italic = italic
|
||||
run.font.color.rgb = _rgb(color)
|
||||
st.y += height
|
||||
|
||||
|
||||
def _place_heading(st: _PptxState, block) -> None:
|
||||
level = max(1, min(3, int(getattr(block, "level", 1) or 1)))
|
||||
fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
|
||||
text = tl.strip_inline_md(getattr(block, "text", ""))
|
||||
lines = tl.wrap(text, tl.chars_per_line(_USABLE_W, fs))
|
||||
_add_text(st, lines, fs, _INK, bold=True)
|
||||
st.y += 0.04
|
||||
|
||||
|
||||
def _place_markdown(st: _PptxState, block) -> None:
|
||||
raw = str(getattr(block, "text", "") or "")
|
||||
md_lines = raw.split("\n")
|
||||
i, n = 0, len(md_lines)
|
||||
while i < n:
|
||||
stripped = md_lines[i].strip()
|
||||
if stripped.startswith("|") and stripped.endswith("|"):
|
||||
j = i
|
||||
tbl = []
|
||||
while j < n and md_lines[j].strip().startswith("|") \
|
||||
and md_lines[j].strip().endswith("|"):
|
||||
tbl.append(md_lines[j])
|
||||
j += 1
|
||||
parsed = tl.parse_md_table(tbl)
|
||||
if parsed:
|
||||
header, rows = parsed
|
||||
_place_data_table(st, model.DataTable(header=header, rows=rows))
|
||||
i = j
|
||||
continue
|
||||
if stripped == "":
|
||||
st.y += tl.line_height_in(_FS_BODY) * 0.4
|
||||
i += 1
|
||||
continue
|
||||
if stripped.startswith("### "):
|
||||
_place_heading(st, model.Heading(stripped[4:], level=3))
|
||||
i += 1
|
||||
continue
|
||||
if stripped.startswith("## "):
|
||||
_place_heading(st, model.Heading(stripped[3:], level=2))
|
||||
i += 1
|
||||
continue
|
||||
if stripped.startswith("# "):
|
||||
_place_heading(st, model.Heading(stripped[2:], level=1))
|
||||
i += 1
|
||||
continue
|
||||
if stripped.startswith("- ") or stripped.startswith("* "):
|
||||
content = tl.strip_inline_md(stripped[2:])
|
||||
lines = tl.wrap(content, tl.chars_per_line(_USABLE_W - 0.3, _FS_BODY))
|
||||
_add_text(st, lines, _FS_BODY, _INK, bullet=True)
|
||||
i += 1
|
||||
continue
|
||||
para = [tl.strip_inline_md(stripped)]
|
||||
j = i + 1
|
||||
while j < n:
|
||||
nxt = md_lines[j].strip()
|
||||
if nxt == "" or nxt.startswith(("|", "#", "- ", "* ")):
|
||||
break
|
||||
para.append(tl.strip_inline_md(nxt))
|
||||
j += 1
|
||||
text = " ".join(para)
|
||||
_add_text(st, tl.wrap(text, tl.chars_per_line(_USABLE_W, _FS_BODY)),
|
||||
_FS_BODY, _INK)
|
||||
i = j
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
def _place_kv_table(st: _PptxState, block) -> None:
|
||||
title = getattr(block, "title", None)
|
||||
if title:
|
||||
_place_heading(st, model.Heading(title, level=2))
|
||||
rows = getattr(block, "rows", []) or []
|
||||
data_rows = []
|
||||
for row in rows:
|
||||
try:
|
||||
label, value = row[0], row[1]
|
||||
except Exception: # noqa: BLE001
|
||||
label, value = str(row), ""
|
||||
data_rows.append([model._safe_str(label), model._safe_str(value)])
|
||||
_place_data_table(st, model.DataTable(header=["Campo", "Valor"],
|
||||
rows=data_rows), shaded_header=True,
|
||||
key_value=True)
|
||||
|
||||
|
||||
def _col_widths(header, rows):
|
||||
ncol = len(header) if header else (len(rows[0]) if rows else 1)
|
||||
ncol = max(1, ncol)
|
||||
natural = [3] * ncol
|
||||
for c in range(ncol):
|
||||
if header and c < len(header):
|
||||
natural[c] = max(natural[c], len(model._safe_str(header[c])))
|
||||
for r in rows:
|
||||
if c < len(r):
|
||||
natural[c] = max(natural[c], len(model._safe_str(r[c])))
|
||||
clamped = [min(max(w, 4), 44) for w in natural]
|
||||
total = float(sum(clamped)) or 1.0
|
||||
return [_USABLE_W * w / total for w in clamped]
|
||||
|
||||
|
||||
def _row_height_in(cells, widths, fs) -> float:
|
||||
lh = tl.line_height_in(fs)
|
||||
maxlines = 1
|
||||
for c, w in enumerate(widths):
|
||||
text = model._safe_str(cells[c]) if c < len(cells) else ""
|
||||
lines = tl.wrap(text, tl.chars_per_line(w - 0.12, fs))
|
||||
maxlines = max(maxlines, len(lines))
|
||||
return lh * maxlines + 0.10
|
||||
|
||||
|
||||
def _emit_table(st: _PptxState, header, chunk, widths, fs) -> None:
|
||||
nrows = len(chunk) + (1 if header else 0)
|
||||
ncol = len(widths)
|
||||
# Pre-measure total height to size the shape (pptx still auto-grows rows).
|
||||
heights = []
|
||||
if header:
|
||||
heights.append(_row_height_in(header, widths, fs))
|
||||
for r in chunk:
|
||||
heights.append(_row_height_in(r, widths, fs))
|
||||
total_h = sum(heights)
|
||||
gtable = st.slide.shapes.add_table(
|
||||
nrows, ncol, Inches(_ML), Inches(st.y), Inches(_USABLE_W),
|
||||
Inches(total_h)).table
|
||||
gtable.first_row = bool(header)
|
||||
gtable.horz_banding = False
|
||||
for c in range(ncol):
|
||||
gtable.columns[c].width = Emu(int(Inches(widths[c])))
|
||||
ridx = 0
|
||||
if header:
|
||||
for c in range(ncol):
|
||||
cell = gtable.cell(0, c)
|
||||
cell.text = model._safe_str(header[c]) if c < len(header) else ""
|
||||
_style_cell(cell, fs, _INK, bold=True, fill=_HEAD_BG)
|
||||
ridx = 1
|
||||
for r in chunk:
|
||||
for c in range(ncol):
|
||||
cell = gtable.cell(ridx, c)
|
||||
cell.text = model._safe_str(r[c]) if c < len(r) else ""
|
||||
_style_cell(cell, fs, _INK, bold=False, fill=_WHITE)
|
||||
ridx += 1
|
||||
st.y += total_h + _GAP
|
||||
|
||||
|
||||
def _style_cell(cell, fs, color, bold, fill) -> None:
|
||||
cell.fill.solid()
|
||||
cell.fill.fore_color.rgb = _rgb(fill)
|
||||
cell.margin_left = Inches(0.05)
|
||||
cell.margin_right = Inches(0.05)
|
||||
cell.margin_top = Inches(0.02)
|
||||
cell.margin_bottom = Inches(0.02)
|
||||
for p in cell.text_frame.paragraphs:
|
||||
for run in p.runs:
|
||||
run.font.size = Pt(fs)
|
||||
run.font.bold = bold
|
||||
run.font.color.rgb = _rgb(color)
|
||||
|
||||
|
||||
def _place_data_table(st: _PptxState, block, shaded_header=True,
|
||||
key_value=False) -> None:
|
||||
title = getattr(block, "title", None)
|
||||
if title:
|
||||
_place_heading(st, model.Heading(title, level=2))
|
||||
header = list(getattr(block, "header", []) or [])
|
||||
rows = list(getattr(block, "rows", []) or [])
|
||||
fs = _FS_CELL
|
||||
widths = _col_widths(header, rows)
|
||||
header_h = _row_height_in(header, widths, fs) if header else 0.0
|
||||
|
||||
idx = 0
|
||||
n = len(rows)
|
||||
if n == 0:
|
||||
# Header-only table still rendered (one slide).
|
||||
_ensure(st, header_h + 0.2)
|
||||
_emit_table(st, header, [], widths, fs)
|
||||
return
|
||||
while idx < n:
|
||||
# Greedily fill the current slide with as many rows as fit.
|
||||
if _remaining(st) < header_h + _row_height_in(rows[idx], widths, fs):
|
||||
_new_slide(st, cont=True)
|
||||
avail = _remaining(st) - header_h
|
||||
chunk = []
|
||||
used = 0.0
|
||||
while idx < n:
|
||||
rh = _row_height_in(rows[idx], widths, fs)
|
||||
if used + rh > avail and chunk:
|
||||
break
|
||||
chunk.append(rows[idx])
|
||||
used += rh
|
||||
idx += 1
|
||||
_emit_table(st, header, chunk, widths, fs)
|
||||
note = getattr(block, "note", None)
|
||||
if note:
|
||||
_add_text(st, tl.wrap(model._safe_str(note),
|
||||
tl.chars_per_line(_USABLE_W, _FS_NOTE)), _FS_NOTE, _MUTED,
|
||||
italic=True)
|
||||
|
||||
|
||||
def _img_size_px(data: bytes):
|
||||
try:
|
||||
from PIL import Image
|
||||
with Image.open(io.BytesIO(data)) as im:
|
||||
return im.size # (w, h)
|
||||
except Exception: # noqa: BLE001
|
||||
return (1200, 800)
|
||||
|
||||
|
||||
def _resolve_png(block):
|
||||
fig = getattr(block, "fig", None)
|
||||
make = getattr(block, "make", None)
|
||||
f = fig
|
||||
owned = False
|
||||
if f is None and callable(make):
|
||||
try:
|
||||
f = make()
|
||||
owned = True
|
||||
except Exception: # noqa: BLE001
|
||||
f = None
|
||||
if f is None:
|
||||
return None
|
||||
try:
|
||||
import matplotlib.pyplot as plt
|
||||
buf = io.BytesIO()
|
||||
f.savefig(buf, format="png", dpi=150, bbox_inches="tight")
|
||||
buf.seek(0)
|
||||
return buf.read()
|
||||
except Exception: # noqa: BLE001
|
||||
return None
|
||||
finally:
|
||||
if owned:
|
||||
try:
|
||||
import matplotlib.pyplot as plt
|
||||
plt.close(f)
|
||||
except Exception: # noqa: BLE001
|
||||
pass
|
||||
|
||||
|
||||
def _place_picture_bytes(st: _PptxState, data: bytes, caption) -> None:
|
||||
w_px, h_px = _img_size_px(data)
|
||||
aspect = (h_px / w_px) if w_px else 0.66
|
||||
max_h = _CONTENT_BOTTOM - _CONTENT_TOP
|
||||
target_w = _USABLE_W
|
||||
target_h = target_w * aspect
|
||||
if target_h > max_h:
|
||||
target_h = max_h
|
||||
target_w = target_h / aspect if aspect else _USABLE_W
|
||||
cap_h = tl.line_height_in(_FS_NOTE) + 0.05 if caption else 0.0
|
||||
if _remaining(st) < target_h + cap_h:
|
||||
_new_slide(st, cont=True)
|
||||
left = _ML + (_USABLE_W - target_w) / 2.0
|
||||
st.slide.shapes.add_picture(io.BytesIO(data), Inches(left), Inches(st.y),
|
||||
width=Inches(target_w), height=Inches(target_h))
|
||||
st.y += target_h + 0.05
|
||||
if caption:
|
||||
_add_text(st, tl.wrap(model._safe_str(caption),
|
||||
tl.chars_per_line(_USABLE_W, _FS_NOTE)), _FS_NOTE, _MUTED,
|
||||
italic=True)
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
def _place_figure(st: _PptxState, block) -> None:
|
||||
png = _resolve_png(block)
|
||||
if png is None:
|
||||
_add_text(st, ["(figura no disponible)"], _FS_NOTE, _MUTED, italic=True)
|
||||
st.y += _GAP
|
||||
return
|
||||
_place_picture_bytes(st, png, getattr(block, "caption", None))
|
||||
|
||||
|
||||
def _place_image(st: _PptxState, block) -> None:
|
||||
path = getattr(block, "path", "")
|
||||
if not path or not os.path.exists(path):
|
||||
_add_text(st, [f"(imagen no encontrada: {path})"], _FS_NOTE, _MUTED,
|
||||
italic=True)
|
||||
st.y += _GAP
|
||||
return
|
||||
try:
|
||||
with open(path, "rb") as fh:
|
||||
data = fh.read()
|
||||
except Exception as e: # noqa: BLE001
|
||||
_add_text(st, [f"(no se pudo leer la imagen: {e})"], _FS_NOTE, _MUTED,
|
||||
italic=True)
|
||||
st.y += _GAP
|
||||
return
|
||||
_place_picture_bytes(st, data, getattr(block, "caption", None))
|
||||
|
||||
|
||||
def _place_caption(st: _PptxState, block) -> None:
|
||||
_add_text(st, tl.wrap(getattr(block, "text", ""),
|
||||
tl.chars_per_line(_USABLE_W, _FS_NOTE)), _FS_NOTE, _MUTED,
|
||||
italic=True)
|
||||
st.y += _GAP
|
||||
|
||||
|
||||
def _place_note(st: _PptxState, block) -> None:
|
||||
_place_caption(st, block)
|
||||
|
||||
|
||||
_PLACERS = {
|
||||
"heading": _place_heading,
|
||||
"markdown": _place_markdown,
|
||||
"kv_table": _place_kv_table,
|
||||
"data_table": _place_data_table,
|
||||
"figure": _place_figure,
|
||||
"image": _place_image,
|
||||
"caption": _place_caption,
|
||||
"note": _place_note,
|
||||
}
|
||||
|
||||
|
||||
def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:
|
||||
"""Render a list of Chapters into a 16:9 PPTX deck. Never raises.
|
||||
|
||||
Returns ``{path, n_slides, chapters, note}`` where ``chapters`` is a list of
|
||||
``{id, version, n_slides}`` for the manifest. On a fatal error ``path`` is
|
||||
None and ``note`` explains why (e.g. python-pptx not installed).
|
||||
"""
|
||||
meta = meta or {}
|
||||
if not _PPTX_OK:
|
||||
return {"path": None, "n_slides": 0, "chapters": [],
|
||||
"note": f"python-pptx no disponible: {_PPTX_ERR}"}
|
||||
|
||||
chapters = model.as_chapters(chapters)
|
||||
notes = []
|
||||
try:
|
||||
parent = os.path.dirname(os.path.abspath(out_path))
|
||||
os.makedirs(parent, exist_ok=True)
|
||||
except OSError as e:
|
||||
return {"path": None, "n_slides": 0, "chapters": [],
|
||||
"note": f"no se pudo crear el directorio destino: {e}"}
|
||||
|
||||
title = meta.get("title") or model.ENGINE_NAME
|
||||
chapters_meta = []
|
||||
try:
|
||||
prs = Presentation()
|
||||
prs.slide_width = Inches(_W)
|
||||
prs.slide_height = Inches(_H)
|
||||
st = _PptxState(prs, title)
|
||||
for ch in chapters:
|
||||
st.chapter = ch
|
||||
st.chapter_slides = 0
|
||||
_new_slide(st, cont=False)
|
||||
for block in ch.blocks:
|
||||
placer = _PLACERS.get(getattr(block, "kind", ""), _place_note)
|
||||
try:
|
||||
placer(st, block)
|
||||
except Exception as e: # noqa: BLE001
|
||||
notes.append(
|
||||
f"bloque '{getattr(block, 'kind', '?')}' del capítulo "
|
||||
f"'{ch.id}' omitido: {e}")
|
||||
chapters_meta.append({"id": ch.id, "version": ch.version,
|
||||
"n_slides": st.chapter_slides})
|
||||
if st.slide_no == 0:
|
||||
st.chapter = model.Chapter(id="vacio", title=title,
|
||||
version=model.ENGINE_VERSION)
|
||||
_new_slide(st, cont=False)
|
||||
_place_note(st, model.Note(
|
||||
"(documento vacío — sin capítulos aplicables)"))
|
||||
prs.save(out_path)
|
||||
n_slides = st.slide_no
|
||||
except Exception as e: # noqa: BLE001
|
||||
return {"path": None, "n_slides": 0, "chapters": [],
|
||||
"note": f"fallo al escribir el PPTX: {e}"}
|
||||
|
||||
note = f"{n_slides} slides"
|
||||
if notes:
|
||||
note += " · " + "; ".join(notes)
|
||||
return {"path": out_path, "n_slides": n_slides, "chapters": chapters_meta,
|
||||
"note": note}
|
||||
@@ -0,0 +1,107 @@
|
||||
"""Shared text-measurement helpers for the AutomaticEDA renderers.
|
||||
|
||||
Both renderers flow content top-to-bottom and must know, *before* placing a
|
||||
block, how much vertical space it will take — that is what guarantees nothing is
|
||||
cut: a unit either fits in the remaining space or moves to the next page/slide
|
||||
whole. Measuring proportional text exactly in matplotlib/pptx is impractical, so
|
||||
we use a deterministic character-grid estimate (chars-per-line from an average
|
||||
glyph width) which slightly over-estimates and is therefore safe: it never
|
||||
claims something fits when it would overflow.
|
||||
|
||||
Wrapping is word-aware (``textwrap``) and additionally hard-splits any single
|
||||
token longer than the line so a 200-character value still wraps instead of
|
||||
overflowing — that is wrapping, not loss: every character is still rendered.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import textwrap
|
||||
|
||||
|
||||
def avg_char_width_in(fontsize_pt: float) -> float:
|
||||
"""Approximate average glyph width in inches for a sans-serif font.
|
||||
|
||||
~0.5 of the point size is a conservative mean advance width for proportional
|
||||
sans fonts; dividing by 72 converts points to inches.
|
||||
"""
|
||||
return 0.5 * fontsize_pt / 72.0
|
||||
|
||||
|
||||
def line_height_in(fontsize_pt: float, leading: float = 1.32) -> float:
|
||||
"""Line height in inches for a given font size and leading."""
|
||||
return leading * fontsize_pt / 72.0
|
||||
|
||||
|
||||
def chars_per_line(width_in: float, fontsize_pt: float) -> int:
|
||||
"""How many average glyphs fit in ``width_in`` at ``fontsize_pt``."""
|
||||
cw = avg_char_width_in(fontsize_pt)
|
||||
if cw <= 0:
|
||||
return 80
|
||||
n = int(width_in / cw)
|
||||
return max(1, n)
|
||||
|
||||
|
||||
def wrap(text: str, max_chars: int) -> list:
|
||||
"""Word-wrap ``text`` to lines of at most ``max_chars``, never losing chars.
|
||||
|
||||
Long tokens (no spaces) are hard-split so they cannot overflow. Existing
|
||||
newlines are honored as hard breaks. Empty input yields a single empty line
|
||||
so callers can still reserve a row.
|
||||
"""
|
||||
if max_chars < 1:
|
||||
max_chars = 1
|
||||
s = "" if text is None else str(text)
|
||||
out: list = []
|
||||
for raw_line in s.split("\n"):
|
||||
if raw_line == "":
|
||||
out.append("")
|
||||
continue
|
||||
# textwrap with break_long_words so no token overflows the column.
|
||||
wrapped = textwrap.wrap(
|
||||
raw_line, width=max_chars, break_long_words=True,
|
||||
break_on_hyphens=False, replace_whitespace=True,
|
||||
drop_whitespace=True,
|
||||
)
|
||||
if not wrapped:
|
||||
out.append("")
|
||||
else:
|
||||
out.extend(wrapped)
|
||||
return out or [""]
|
||||
|
||||
|
||||
def strip_inline_md(text: str) -> str:
|
||||
"""Strip a tiny subset of inline markdown markers, keeping the text.
|
||||
|
||||
Removes ``**bold**`` / ``__bold__`` / ``*em*`` / `` `code` `` markers so the
|
||||
content is preserved without trying to style spans (which the line-grid
|
||||
layout cannot do). Nothing is dropped except the markers themselves.
|
||||
"""
|
||||
if not text:
|
||||
return ""
|
||||
s = str(text)
|
||||
for marker in ("**", "__", "`"):
|
||||
s = s.replace(marker, "")
|
||||
return s
|
||||
|
||||
|
||||
def parse_md_table(lines: list):
|
||||
"""Parse consecutive ``| a | b |`` lines into ``(header, rows)`` or None.
|
||||
|
||||
Accepts an optional separator row (``|---|---|``) right after the header,
|
||||
which is ignored. Returns None if the lines are not a pipe table.
|
||||
"""
|
||||
cells_rows = []
|
||||
for ln in lines:
|
||||
s = ln.strip()
|
||||
if not (s.startswith("|") and s.endswith("|")):
|
||||
return None
|
||||
parts = [c.strip() for c in s.strip("|").split("|")]
|
||||
cells_rows.append(parts)
|
||||
if not cells_rows:
|
||||
return None
|
||||
header = cells_rows[0]
|
||||
body = cells_rows[1:]
|
||||
# Drop a markdown separator row (all cells are dashes/colons).
|
||||
if body and all(set(c) <= set("-: ") and "-" in c for c in body[0]):
|
||||
body = body[1:]
|
||||
return header, body
|
||||
Reference in New Issue
Block a user