a74a5a047f
Mejoras transversales del motor AutomaticEDA (PDF + PPTX) sobre el modelo de bloques: 1. DPI alto global: toda figura/imagen embebida se rasteriza a 220 dpi (antes 150, y en PDF la página se guardaba a ~100 dpi re-rasterizando los imshow). En PDF se aplica savefig.dpi=220 a la página; el texto sigue vectorial y seleccionable. Permite ampliar en el móvil sin pixelar. Imagen embebida medida: ~1081px (antes ~492px). 2. Tabla ancha → imagen de alta resolución: cuando un DataTable tiene demasiadas columnas para ser legible como texto (criterio _table_fits_as_text), se dibuja entera como una imagen nítida (nueva función render_table_as_figure_py_datascience: cabecera sombreada + zebra) escalada para caber completa, de modo que el lector hace zoom y la lee sin perder datos. Las tablas que sí caben siguen como texto seleccionable / tabla nativa. Aplica en PDF y PPTX. El df.head de 19 columnas del dataset sintético ya no se corta: sale como imagen. 3. Group.layout: nuevo hint retrocompatible (default "stack"). "side_by_side" coloca la tabla a la izquierda (~55%) y la figura a la derecha (~45%) en la misma slide PPTX (cae a apilado si no hay par tabla+figura o no caben); en PDF se trata como "stack" (el ancho A5 móvil no admite dos columnas). Pensado para que el capítulo cat_distr ponga el gráfico al lado de la tabla en PPT. 4. Portada con índice clicable: la lista de capítulos pasa de "Este informe incluye..." (markdown) a un Heading "Índice" + un TocEntry por capítulo. El renderer registra el inicio de cada capítulo y cablea cada entrada como salto real (PDF: link GOTO PyMuPDF; PPTX: salto a slide nativo), reutilizando el mecanismo del glosario clicable. Modelo: Group gana `layout`; nuevo bloque TocEntry; normalizers y __init__ actualizados. Contrato: documentado en docs/automatic_eda_contract.md §11.4 (incluye el contrato exacto del campo layout para el agente de cat_distr). Tests: nuevo render_quality_test.py (13 golden: DPI alto real, tabla ancha→imagen PDF/PPTX, narrow→texto, side_by_side PPTX dos columnas / PDF apilado, índice clicable PDF+PPTX, retrocompatibilidad layout por defecto). render_features_test actualizado al índice nuevo. Suite: 188 passed (módulo) + 38 passed/1 skipped (acceptance + pipeline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
242 lines
8.9 KiB
Python
242 lines
8.9 KiB
Python
"""Impure EDA helper: a crisp table rendered as a matplotlib Figure (`eda` group).
|
|
|
|
Draws a tabular block (header + rows) as a sharp ``matplotlib.figure.Figure``
|
|
ready to be rasterized at high DPI, so a table that does NOT fit as text on a
|
|
page/slide can still be read in full by zooming into the rasterized image on a
|
|
phone. The header is shaded and bold, even rows carry a soft zebra stripe, the
|
|
ink is dark on white and the grid is very thin.
|
|
|
|
Impure because it touches matplotlib's rendering machinery. It uses the headless
|
|
Agg backend and the object-oriented ``Figure`` API (no ``pyplot``) so it leaks no
|
|
global state and is safe to call repeatedly from a report renderer. It is fully
|
|
defensive and NEVER raises: empty/invalid input or any internal error returns a
|
|
small placeholder figure carrying a centered "(tabla no disponible)".
|
|
"""
|
|
|
|
import matplotlib
|
|
|
|
matplotlib.use("Agg")
|
|
|
|
from matplotlib.figure import Figure # noqa: E402
|
|
|
|
# Palette shared with the EDA report renderer so the document stays coherent.
|
|
_HEADER_BG = "#eef3f6" # header cell background.
|
|
_HEADER_TEXT = "#1b1b1b" # header cell text (bold).
|
|
_ZEBRA_BG = "#f6f8fa" # even (1-based) row background stripe.
|
|
_BODY_BG = "#ffffff" # odd row background.
|
|
_INK = "#1b1b1b" # body text + title ink.
|
|
_GRID = "#cccccc" # cell borders / grid (thin).
|
|
_NOTE_TEXT = "#8a8a8a" # muted gray for the note (italic).
|
|
|
|
|
|
def _placeholder_figure(message: str = "(tabla no disponible)") -> "Figure":
|
|
"""Return a small fallback ``Figure`` carrying a single centered message."""
|
|
fig = Figure(figsize=(6.0, 1.6), dpi=150)
|
|
ax = fig.add_subplot(111)
|
|
ax.axis("off")
|
|
ax.text(
|
|
0.5,
|
|
0.5,
|
|
message,
|
|
ha="center",
|
|
va="center",
|
|
fontsize=11,
|
|
color=_NOTE_TEXT,
|
|
style="italic",
|
|
wrap=True,
|
|
transform=ax.transAxes,
|
|
)
|
|
fig.tight_layout()
|
|
return fig
|
|
|
|
|
|
def _cell_text(value, max_cell_chars: int) -> str:
|
|
"""``str()`` a cell value defensively, None -> "", truncate with an ellipsis."""
|
|
s = "" if value is None else str(value)
|
|
# Collapse newlines/tabs so a single cell never spills across table rows.
|
|
s = s.replace("\n", " ").replace("\r", " ").replace("\t", " ")
|
|
try:
|
|
limit = int(max_cell_chars)
|
|
except (TypeError, ValueError):
|
|
limit = 40
|
|
if limit <= 0:
|
|
return ""
|
|
if len(s) <= limit:
|
|
return s
|
|
if limit == 1:
|
|
return "…"
|
|
return s[: limit - 1] + "…"
|
|
|
|
|
|
def render_table_as_figure(
|
|
header,
|
|
rows,
|
|
title=None,
|
|
note=None,
|
|
fontsize=9.0,
|
|
max_cell_chars=40,
|
|
):
|
|
"""Dibuja una tabla nítida como matplotlib.figure.Figure, lista para rasterizar a DPI alto.
|
|
|
|
Pensada para tablas que NO caben como texto en una página/slide: se rasteriza
|
|
a alta resolución y el usuario hace zoom en el móvil para leerla entera sin
|
|
perder datos. Cabecera sombreada + negrita, filas pares con zebra suave,
|
|
tinta oscura sobre blanco, rejilla muy fina.
|
|
|
|
Args:
|
|
header: lista de nombres de columna (puede ser []).
|
|
rows: lista de filas; cada fila es una lista de celdas (valores cualquiera, se str()-ean).
|
|
title: título opcional dibujado encima de la tabla (o None).
|
|
note: nota opcional en gris/itálica bajo la tabla (o None).
|
|
fontsize: tamaño de fuente base (pt) de las celdas.
|
|
max_cell_chars: trunca el texto de celda a este nº de chars (con … final) para que no explote el ancho.
|
|
|
|
Returns:
|
|
matplotlib.figure.Figure — NO cerrada (el llamante la rasteriza y la cierra).
|
|
Nunca lanza: ante cualquier error devuelve una Figure con el texto "(tabla no disponible)".
|
|
"""
|
|
try:
|
|
# --- Defensive normalization of header/rows into a rectangular grid.
|
|
header_list = list(header) if isinstance(header, (list, tuple)) else []
|
|
raw_rows = list(rows) if isinstance(rows, (list, tuple)) else []
|
|
|
|
clean_rows = []
|
|
for row in raw_rows:
|
|
if isinstance(row, (list, tuple)):
|
|
clean_rows.append(list(row))
|
|
elif row is None:
|
|
clean_rows.append([])
|
|
else:
|
|
# A scalar row becomes a single-cell row instead of being dropped.
|
|
clean_rows.append([row])
|
|
|
|
# Nothing to draw at all -> placeholder.
|
|
if not header_list and not clean_rows:
|
|
return _placeholder_figure()
|
|
|
|
# Number of columns = widest of header / any row.
|
|
n_cols = len(header_list)
|
|
for row in clean_rows:
|
|
if len(row) > n_cols:
|
|
n_cols = len(row)
|
|
if n_cols <= 0:
|
|
return _placeholder_figure()
|
|
|
|
# Base font size, tolerate a bad value.
|
|
try:
|
|
base_fs = float(fontsize)
|
|
except (TypeError, ValueError):
|
|
base_fs = 9.0
|
|
if base_fs <= 0:
|
|
base_fs = 9.0
|
|
|
|
# --- Build the truncated, padded text matrix.
|
|
header_cells = [
|
|
_cell_text(header_list[c] if c < len(header_list) else "", max_cell_chars)
|
|
for c in range(n_cols)
|
|
]
|
|
body_cells = []
|
|
for row in clean_rows:
|
|
body_cells.append(
|
|
[
|
|
_cell_text(row[c] if c < len(row) else "", max_cell_chars)
|
|
for c in range(n_cols)
|
|
]
|
|
)
|
|
|
|
has_header = any(t for t in header_cells)
|
|
n_body = len(body_cells)
|
|
# Total drawn table rows (header counts as one when present).
|
|
n_table_rows = n_body + (1 if has_header else 0)
|
|
if n_table_rows <= 0:
|
|
return _placeholder_figure()
|
|
|
|
# --- figsize proportional to content so it reads under zoom.
|
|
# Width: per-column width scales with the longest text in that column,
|
|
# clamped to a sensible per-column range, total capped.
|
|
per_col_widths = []
|
|
for c in range(n_cols):
|
|
col_texts = [header_cells[c]] if has_header else []
|
|
col_texts += [body_cells[r][c] for r in range(n_body)]
|
|
longest = max((len(t) for t in col_texts), default=0)
|
|
# ~0.085" per char at the base font, clamped to [0.9, 1.6] inches.
|
|
w = 0.9 + 0.085 * max(longest - 6, 0)
|
|
w = max(0.9, min(1.6, w))
|
|
per_col_widths.append(w)
|
|
fig_w = sum(per_col_widths)
|
|
fig_w = max(3.0, min(26.0, fig_w))
|
|
|
|
# Height: ~0.32" per row + room for title / note.
|
|
fig_h = 0.32 * n_table_rows + 0.30
|
|
if title is not None and str(title) != "":
|
|
fig_h += 0.45
|
|
if note is not None and str(note) != "":
|
|
fig_h += 0.30
|
|
fig_h = max(1.0, min(60.0, fig_h))
|
|
|
|
fig = Figure(figsize=(fig_w, fig_h), dpi=150)
|
|
ax = fig.add_subplot(111)
|
|
ax.axis("off")
|
|
|
|
# Reserve vertical bands for the optional title (top) and note (bottom)
|
|
# so the table itself never overlaps them.
|
|
title_band = 0.10 if (title is not None and str(title) != "") else 0.0
|
|
note_band = 0.07 if (note is not None and str(note) != "") else 0.0
|
|
table_bbox = [0.0, note_band, 1.0, max(0.05, 1.0 - title_band - note_band)]
|
|
|
|
cell_text = ([header_cells] if has_header else []) + body_cells
|
|
|
|
col_widths = [w / fig_w for w in per_col_widths]
|
|
|
|
table = ax.table(
|
|
cellText=cell_text,
|
|
colWidths=col_widths,
|
|
cellLoc="left",
|
|
loc="center",
|
|
bbox=table_bbox,
|
|
)
|
|
table.auto_set_font_size(False)
|
|
table.set_fontsize(base_fs)
|
|
|
|
# --- Style every cell: zebra body, shaded bold header, thin gray grid.
|
|
for (r, _c), cell in table.get_celld().items():
|
|
cell.set_edgecolor(_GRID)
|
|
cell.set_linewidth(0.4)
|
|
# Small horizontal padding so text does not touch the border.
|
|
cell.PAD = 0.04
|
|
if has_header and r == 0:
|
|
cell.set_facecolor(_HEADER_BG)
|
|
cell.set_text_props(color=_HEADER_TEXT, fontweight="bold", ha="left")
|
|
else:
|
|
body_index = r - 1 if has_header else r # 0-based body row.
|
|
# 1-based even rows get the zebra stripe.
|
|
is_even = ((body_index + 1) % 2) == 0
|
|
cell.set_facecolor(_ZEBRA_BG if is_even else _BODY_BG)
|
|
cell.set_text_props(color=_INK, ha="left")
|
|
|
|
if title is not None and str(title) != "":
|
|
ax.set_title(
|
|
str(title),
|
|
fontsize=base_fs + 3.0,
|
|
fontweight="bold",
|
|
color=_INK,
|
|
loc="left",
|
|
pad=8,
|
|
)
|
|
|
|
if note is not None and str(note) != "":
|
|
fig.text(
|
|
0.01,
|
|
0.01,
|
|
str(note),
|
|
ha="left",
|
|
va="bottom",
|
|
fontsize=max(7.0, base_fs - 1.0),
|
|
color=_NOTE_TEXT,
|
|
style="italic",
|
|
)
|
|
|
|
return fig
|
|
except Exception: # noqa: BLE001 — never raise from a figure builder.
|
|
return _placeholder_figure()
|