feat(eda): render quality global — DPI 220, tablas anchas como imagen, layout side_by_side, índice clicable

Mejoras transversales del motor AutomaticEDA (PDF + PPTX) sobre el modelo de bloques:

1. DPI alto global: toda figura/imagen embebida se rasteriza a 220 dpi (antes 150,
   y en PDF la página se guardaba a ~100 dpi re-rasterizando los imshow). En PDF se
   aplica savefig.dpi=220 a la página; el texto sigue vectorial y seleccionable.
   Permite ampliar en el móvil sin pixelar. Imagen embebida medida: ~1081px (antes ~492px).

2. Tabla ancha → imagen de alta resolución: cuando un DataTable tiene demasiadas
   columnas para ser legible como texto (criterio _table_fits_as_text), se dibuja entera
   como una imagen nítida (nueva función render_table_as_figure_py_datascience: cabecera
   sombreada + zebra) escalada para caber completa, de modo que el lector hace zoom y la
   lee sin perder datos. Las tablas que sí caben siguen como texto seleccionable / tabla
   nativa. Aplica en PDF y PPTX. El df.head de 19 columnas del dataset sintético ya no se
   corta: sale como imagen.

3. Group.layout: nuevo hint retrocompatible (default "stack"). "side_by_side" coloca la
   tabla a la izquierda (~55%) y la figura a la derecha (~45%) en la misma slide PPTX
   (cae a apilado si no hay par tabla+figura o no caben); en PDF se trata como "stack"
   (el ancho A5 móvil no admite dos columnas). Pensado para que el capítulo cat_distr
   ponga el gráfico al lado de la tabla en PPT.

4. Portada con índice clicable: la lista de capítulos pasa de "Este informe incluye..."
   (markdown) a un Heading "Índice" + un TocEntry por capítulo. El renderer registra el
   inicio de cada capítulo y cablea cada entrada como salto real (PDF: link GOTO PyMuPDF;
   PPTX: salto a slide nativo), reutilizando el mecanismo del glosario clicable.

Modelo: Group gana `layout`; nuevo bloque TocEntry; normalizers y __init__ actualizados.
Contrato: documentado en docs/automatic_eda_contract.md §11.4 (incluye el contrato exacto
del campo layout para el agente de cat_distr).

Tests: nuevo render_quality_test.py (13 golden: DPI alto real, tabla ancha→imagen PDF/PPTX,
narrow→texto, side_by_side PPTX dos columnas / PDF apilado, índice clicable PDF+PPTX,
retrocompatibilidad layout por defecto). render_features_test actualizado al índice nuevo.
Suite: 188 passed (módulo) + 38 passed/1 skipped (acceptance + pipeline).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-07-01 01:34:21 +02:00
parent f2eb782a5f
commit a74a5a047f
11 changed files with 1272 additions and 43 deletions
@@ -0,0 +1,241 @@
"""Impure EDA helper: a crisp table rendered as a matplotlib Figure (`eda` group).
Draws a tabular block (header + rows) as a sharp ``matplotlib.figure.Figure``
ready to be rasterized at high DPI, so a table that does NOT fit as text on a
page/slide can still be read in full by zooming into the rasterized image on a
phone. The header is shaded and bold, even rows carry a soft zebra stripe, the
ink is dark on white and the grid is very thin.
Impure because it touches matplotlib's rendering machinery. It uses the headless
Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
global state and is safe to call repeatedly from a report renderer. It is fully
defensive and NEVER raises: empty/invalid input or any internal error returns a
small placeholder figure carrying a centered "(tabla no disponible)".
"""
import matplotlib
matplotlib.use("Agg")
from matplotlib.figure import Figure # noqa: E402
# Palette shared with the EDA report renderer so the document stays coherent.
_HEADER_BG = "#eef3f6" # header cell background.
_HEADER_TEXT = "#1b1b1b" # header cell text (bold).
_ZEBRA_BG = "#f6f8fa" # even (1-based) row background stripe.
_BODY_BG = "#ffffff" # odd row background.
_INK = "#1b1b1b" # body text + title ink.
_GRID = "#cccccc" # cell borders / grid (thin).
_NOTE_TEXT = "#8a8a8a" # muted gray for the note (italic).
def _placeholder_figure(message: str = "(tabla no disponible)") -> "Figure":
"""Return a small fallback ``Figure`` carrying a single centered message."""
fig = Figure(figsize=(6.0, 1.6), dpi=150)
ax = fig.add_subplot(111)
ax.axis("off")
ax.text(
0.5,
0.5,
message,
ha="center",
va="center",
fontsize=11,
color=_NOTE_TEXT,
style="italic",
wrap=True,
transform=ax.transAxes,
)
fig.tight_layout()
return fig
def _cell_text(value, max_cell_chars: int) -> str:
"""``str()`` a cell value defensively, None -> "", truncate with an ellipsis."""
s = "" if value is None else str(value)
# Collapse newlines/tabs so a single cell never spills across table rows.
s = s.replace("\n", " ").replace("\r", " ").replace("\t", " ")
try:
limit = int(max_cell_chars)
except (TypeError, ValueError):
limit = 40
if limit <= 0:
return ""
if len(s) <= limit:
return s
if limit == 1:
return ""
return s[: limit - 1] + ""
def render_table_as_figure(
header,
rows,
title=None,
note=None,
fontsize=9.0,
max_cell_chars=40,
):
"""Dibuja una tabla nítida como matplotlib.figure.Figure, lista para rasterizar a DPI alto.
Pensada para tablas que NO caben como texto en una página/slide: se rasteriza
a alta resolución y el usuario hace zoom en el móvil para leerla entera sin
perder datos. Cabecera sombreada + negrita, filas pares con zebra suave,
tinta oscura sobre blanco, rejilla muy fina.
Args:
header: lista de nombres de columna (puede ser []).
rows: lista de filas; cada fila es una lista de celdas (valores cualquiera, se str()-ean).
title: título opcional dibujado encima de la tabla (o None).
note: nota opcional en gris/itálica bajo la tabla (o None).
fontsize: tamaño de fuente base (pt) de las celdas.
max_cell_chars: trunca el texto de celda a este nº de chars (con … final) para que no explote el ancho.
Returns:
matplotlib.figure.Figure — NO cerrada (el llamante la rasteriza y la cierra).
Nunca lanza: ante cualquier error devuelve una Figure con el texto "(tabla no disponible)".
"""
try:
# --- Defensive normalization of header/rows into a rectangular grid.
header_list = list(header) if isinstance(header, (list, tuple)) else []
raw_rows = list(rows) if isinstance(rows, (list, tuple)) else []
clean_rows = []
for row in raw_rows:
if isinstance(row, (list, tuple)):
clean_rows.append(list(row))
elif row is None:
clean_rows.append([])
else:
# A scalar row becomes a single-cell row instead of being dropped.
clean_rows.append([row])
# Nothing to draw at all -> placeholder.
if not header_list and not clean_rows:
return _placeholder_figure()
# Number of columns = widest of header / any row.
n_cols = len(header_list)
for row in clean_rows:
if len(row) > n_cols:
n_cols = len(row)
if n_cols <= 0:
return _placeholder_figure()
# Base font size, tolerate a bad value.
try:
base_fs = float(fontsize)
except (TypeError, ValueError):
base_fs = 9.0
if base_fs <= 0:
base_fs = 9.0
# --- Build the truncated, padded text matrix.
header_cells = [
_cell_text(header_list[c] if c < len(header_list) else "", max_cell_chars)
for c in range(n_cols)
]
body_cells = []
for row in clean_rows:
body_cells.append(
[
_cell_text(row[c] if c < len(row) else "", max_cell_chars)
for c in range(n_cols)
]
)
has_header = any(t for t in header_cells)
n_body = len(body_cells)
# Total drawn table rows (header counts as one when present).
n_table_rows = n_body + (1 if has_header else 0)
if n_table_rows <= 0:
return _placeholder_figure()
# --- figsize proportional to content so it reads under zoom.
# Width: per-column width scales with the longest text in that column,
# clamped to a sensible per-column range, total capped.
per_col_widths = []
for c in range(n_cols):
col_texts = [header_cells[c]] if has_header else []
col_texts += [body_cells[r][c] for r in range(n_body)]
longest = max((len(t) for t in col_texts), default=0)
# ~0.085" per char at the base font, clamped to [0.9, 1.6] inches.
w = 0.9 + 0.085 * max(longest - 6, 0)
w = max(0.9, min(1.6, w))
per_col_widths.append(w)
fig_w = sum(per_col_widths)
fig_w = max(3.0, min(26.0, fig_w))
# Height: ~0.32" per row + room for title / note.
fig_h = 0.32 * n_table_rows + 0.30
if title is not None and str(title) != "":
fig_h += 0.45
if note is not None and str(note) != "":
fig_h += 0.30
fig_h = max(1.0, min(60.0, fig_h))
fig = Figure(figsize=(fig_w, fig_h), dpi=150)
ax = fig.add_subplot(111)
ax.axis("off")
# Reserve vertical bands for the optional title (top) and note (bottom)
# so the table itself never overlaps them.
title_band = 0.10 if (title is not None and str(title) != "") else 0.0
note_band = 0.07 if (note is not None and str(note) != "") else 0.0
table_bbox = [0.0, note_band, 1.0, max(0.05, 1.0 - title_band - note_band)]
cell_text = ([header_cells] if has_header else []) + body_cells
col_widths = [w / fig_w for w in per_col_widths]
table = ax.table(
cellText=cell_text,
colWidths=col_widths,
cellLoc="left",
loc="center",
bbox=table_bbox,
)
table.auto_set_font_size(False)
table.set_fontsize(base_fs)
# --- Style every cell: zebra body, shaded bold header, thin gray grid.
for (r, _c), cell in table.get_celld().items():
cell.set_edgecolor(_GRID)
cell.set_linewidth(0.4)
# Small horizontal padding so text does not touch the border.
cell.PAD = 0.04
if has_header and r == 0:
cell.set_facecolor(_HEADER_BG)
cell.set_text_props(color=_HEADER_TEXT, fontweight="bold", ha="left")
else:
body_index = r - 1 if has_header else r # 0-based body row.
# 1-based even rows get the zebra stripe.
is_even = ((body_index + 1) % 2) == 0
cell.set_facecolor(_ZEBRA_BG if is_even else _BODY_BG)
cell.set_text_props(color=_INK, ha="left")
if title is not None and str(title) != "":
ax.set_title(
str(title),
fontsize=base_fs + 3.0,
fontweight="bold",
color=_INK,
loc="left",
pad=8,
)
if note is not None and str(note) != "":
fig.text(
0.01,
0.01,
str(note),
ha="left",
va="bottom",
fontsize=max(7.0, base_fs - 1.0),
color=_NOTE_TEXT,
style="italic",
)
return fig
except Exception: # noqa: BLE001 — never raise from a figure builder.
return _placeholder_figure()