feat(eda): capítulo RELACIONES para AutomaticEDA

Añade el capítulo `relaciones` al motor AutomaticEDA: analiza las relaciones de clave de la tabla/base y se coloca tras `correlacion`, antes de `modelos`, en CHAPTER_ORDER. Capas que renderiza (solo las que aplican; None si no hay nada que decir): - Claves declaradas: PK/FK/UNIQUE reales del esquema DuckDB, vía la nueva función `detect_declared_keys_duckdb` (lee `duckdb_constraints()`). - Candidatos a clave primaria: los `key_candidates` del TableProfile. - FK candidatas inter-tabla: reusa `infer_fk_containment_duckdb` (containment + señal de nombre) y `build_join_graph` (roles de nodos + diagrama Mermaid pegable). Solo si la fuente DuckDB tiene varias tablas. - FK candidatas intra-tabla: heurística nombre + cardinalidad, vía la nueva función pura `suggest_intratable_fk_candidates`, marcada como sugerencia. Engancha al glosario clicable los términos PK, FK, containment/inclusión y cardinalidad (contrato §11.1) y usa Group (keep-together) para el grafo. Funciones nuevas del registry (grupo `eda`): - detect_declared_keys_duckdb (impure, datascience) + test. - suggest_intratable_fk_candidates (pure, datascience) + test. Tests: relaciones_test.py (golden intra + inter, edges, no-cut render) + los tests de ambas funciones. Suite automatic_eda + render_automatic_eda verde (89 passed). Golden end-to-end con el pipeline render_automatic_eda verificado sobre titanic (intra) y una BD customers/orders (inter). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
merge: Fase 4a AutomaticEDA motor+glosario (verificado met)
2026-06-30 18:15:15 +02:00 · 2026-06-30 17:45:30 +02:00 · 2026-06-30 17:35:19 +02:00 · 2026-06-30 16:19:52 +02:00
29 changed files with 3745 additions and 107 deletions
@@ -25,7 +25,8 @@ cabecera, y figuras/imágenes se escalan para caber enteras.
 ```
 Document  = list[Chapter]
 Chapter   = { id: str, title: str, version: str, blocks: list[Block] }
-Block     = Heading | Markdown | KVTable | DataTable | Figure | Image | Caption | Note
+Block     = Heading | Markdown | KVTable | DataTable | Figure | Image | Caption
+          | Note | Group | GlossaryEntry
 ```

 Importa el modelo desde `datascience.automatic_eda.model` (o
@@ -44,6 +45,10 @@ reconocido se degrada a `Note`, nunca lanza).
 | `Figure(fig=None, make=None, caption=None, height_in=None)` | una `matplotlib.figure.Figure` ya construida (`fig`) o un callable `make()->Figure` (perezoso) | se rasteriza y escala para caber entera (nunca recortada) |
 | `Image(path, caption=None, height_in=None)` | ruta a PNG/JPG | se escala para caber entera |
 | `Caption(text)` / `Note(text)` | texto auxiliar pequeño | pie/nota en gris; `Note` es además el fallback de lo desconocido |
+| `Group(blocks, title=None)` | unidad **keep-together**: sus bloques se mantienen juntos | el renderer mide el grupo entero y lo mueve completo a la página/slide siguiente si no cabe; encoge la figura para dejar sitio al título+texto. Ver §11 |
+| `GlossaryEntry(key, label, definition)` | una entrada del glosario (destino clicable) | la genera el capítulo `glosario`; registra su posición como destino de los términos marcados. Ver §11 |
+
+`Figure`/`Image` aceptan `height_in` (hint): el renderer **clampa** la figura a esa altura máxima (lo usa `Group` para encoger la figura). Toda figura escala dejando sitio a su caption en la misma página/slide; en PPTX el caption es **siempre** visible (si no se da `caption`, cae al último heading o a "Figura").

 ### Subset de markdown soportado (`Markdown`)

@@ -84,8 +89,9 @@ El orden canónico está **pre-declarado** en

 ```python
 CHAPTER_ORDER = [
-    "portada", "overview", "num_distr", "cat_distr", "calidad", "correlacion",
-    "modelos", "analisis_llm", "timeseries", "geospatial", "agregacion",
+    "portada", "overview", "analisis_llm", "num_distr", "cat_distr", "calidad",
+    "correlacion", "modelos", "timeseries", "geospatial", "agregacion",
+    "glosario",
 ]
 ```

@@ -95,6 +101,15 @@ CHAPTER_ORDER = [
 `CHAPTER_ORDER`) y aparecerá automáticamente en su posición. Esto permite que muchos
 agentes trabajen **en paralelo** sin contención: cada uno toca solo su archivo.

+**Dos capítulos tienen posición especial** (los gestiona `build_document`, no toques esto):
+
+- `portada`: se **construye el último** (después del cuerpo) para poder resumir el
+  análisis, pero se **coloca el primero**. Recibe `ctx['document_summary']` (ver §5) con
+  un resumen agregado del resto. Decisión del usuario: la portada refleja hallazgos.
+- `glosario`: se construye y se **coloca el último**. Lee los términos que los demás
+  capítulos registraron en `ctx['glossary']` (ver §11). Si no se registró ninguno, el
+  capítulo devuelve `None` y desaparece.
+
 Si tu capítulo usa un `<id>` que aún no está en `CHAPTER_ORDER`, añádelo en la posición
 correcta (única edición compartida; coordínala con el orquestador).

@@ -143,6 +158,8 @@ defensivo). Esto habilita el **seguimiento y la mejora continua por capítulo**.
 | `granularity` | "Cada fila es…" (portada). Default: derivado de `key_candidates` |
 | `quality_criteria` | criterios del score de calidad (portada) |
 | `head_rows` | `list[dict]` con `df.head` (overview). Ver §7 |
+| `glossary` | `GlossaryCollector` compartido — los capítulos registran términos en él. Lo crea `build_document`; ver §11 |
+| `document_summary` | dict con el resumen agregado del cuerpo (n_rows, n_cols, quality_score, n_numeric, n_categorical, chapter_titles, …). Lo calcula `build_document` y lo consume la portada |

 Un capítulo puede definir y consumir sus propias claves `ctx` — documenta cuáles en su
 docstring.
@@ -279,6 +296,109 @@ sus bloques presentes y el no-corte (texto largo intacto en la salida). Patrón:

 ---

+## 11. Glosario, keep-together y zebra (motor, fase 4a)
+
+Tres capacidades transversales del motor que **todos** los capítulos pueden usar. La 6.1
+(glosario) requiere que el capítulo coopere (registrar + marcar términos); la 6.2
+(keep-together) es opt-in por capítulo (envolver bloques en `Group`); la 6.3 (zebra) es
+automática (no hay nada que hacer).
+
+### 11.1 Glosario con términos clicables
+
+El glosario es un capítulo nuevo (`chapters/glosario.py`) que se renderiza **siempre el
+último** y lista cada término técnico que algún capítulo haya registrado. Cada aparición
+del término en el texto se vuelve un **clic real** que salta a su entrada: en PDF como
+*link annotation* interno (post-proceso con PyMuPDF, porque `PdfPages` no soporta
+hyperlinks internos), en PPTX como *slide-jump* nativo (`ppaction://hlinksldjump`).
+
+**API exacta para un capítulo (dos pasos):**
+
+1. **Registrar el término** en el colector compartido `ctx['glossary']` (un
+   `model.GlossaryCollector`, creado por `build_document` y pasado a todos los capítulos):
+
+   ```python
+   glossary = ctx.get("glossary")
+   if isinstance(glossary, model.GlossaryCollector):
+       glossary.add("entropia", "Entropía (de Shannon)", "Medida, en bits, de …")
+   ```
+
+   `add(key, label, definition)` es idempotente (la primera definición de cada `key` gana).
+   `key` debe ser `[A-Za-z0-9_]+`. Si no hay colector en `ctx` (renderizado suelto), el
+   capítulo simplemente no marca términos — degrada sin romper.
+
+2. **Marcar cada aparición** en el texto de un bloque `Markdown` con el span inline
+   `[[term:KEY]]texto visible[[/term]]`. El texto visible puede llevar `**negrita**`. El
+   marcador no altera el texto visible (se elimina como cualquier marcador inline); solo
+   añade el destino clicable.
+
+   ```python
+   # En cat_distr (ejemplo real ya implementado):
+   "La [[term:entropia]]**entropía de Shannon**[[/term]] mide cómo de repartidos…"
+   ```
+
+Eso es todo: el capítulo `glosario` recoge los términos (orden alfabético por `label`),
+emite un `GlossaryEntry` por término, y los renderers cablean los enlaces automáticamente.
+Si ningún capítulo registró términos, el glosario no aparece.
+
+**Helpers de `text_layout` (no reimplementar):** `parse_inline_rich(text)` →
+`[(texto, is_bold, term_key), …]`; `wrap_rich_terms(text, max_chars)` → líneas de esos
+spans sin corte. `strip_inline_md` ya elimina los marcadores `[[term:…]]`/`[[/term]]`.
+(Las funciones previas `parse_inline_bold` / `wrap_rich` siguen existiendo, sin términos.)
+
+**Funciones del registry que cablean los enlaces** (grupo `eda`, ya invocadas por los
+renderers; degradan en silencio si faltan): `add_pdf_internal_links_py_datascience`
+(PyMuPDF, link GOTO) y `pptx_link_run_to_slide_py_datascience` (salto a slide nativo).
+Dependencia: `pymupdf` (declarada en `python/pyproject.toml`).
+
+**Trabajo de la siguiente fase — enganchar más términos.** El mecanismo está hecho y
+probado de extremo a extremo con `entropia` (en `cat_distr`). Cada capítulo debe registrar
+y marcar SUS términos con el mismo patrón de dos pasos. Candidatos por capítulo:
+
+| Capítulo | Términos a enganchar (key sugerida) |
+|---|---|
+| `cat_distr` | `entropia` ✅ (hecho) |
+| `calidad` | `completitud`, `validez`, `consistencia` |
+| `correlacion` | `cramers_v`, `fdr` (comparaciones múltiples), método de correlación usado |
+| `modelos` | `pca`, `silhouette`, `isolation_forest` |
+| `timeseries` | `estacionariedad`, `acf_pacf`, `stl` |
+| `num_distr` | `iqr`, `curtosis`, `outlier` (vallas de Tukey) |
+
+Define la definición de cada término en su capítulo (constante local, como
+`_TERM_ENTROPIA_DEF` en `cat_distr`) y márcalo en su primera aparición.
+
+### 11.2 Keep-together: gráfico junto a su título y texto (`Group`)
+
+Para que un encabezado no quede en una página/slide y su figura en la siguiente, envuelve
+los bloques de una misma idea en un `model.Group`:
+
+```python
+blocks.append(model.Group(blocks=[
+    model.Heading(text=str(name), level=2),
+    model.Figure(make=_figura_perezosa(...), caption="…"),
+    model.Markdown(text="explicación…"),
+]))
+```
+
+El renderer **mide el grupo entero** antes de dibujar nada: si no cabe en lo que queda de
+página/slide pero cabe en una entera, lo mueve **completo** a la siguiente; y **encoge la
+figura** (vía `height_in`) lo justo para que el título + texto + figura quepan juntos. Si
+el grupo es más alto que una página entera, empieza en una nueva y fluye (degradación
+honesta, nunca corta). Ejemplo real implementado: `num_distr` envuelve cada columna
+(heading + figura histograma/boxplot + nota) en un `Group`.
+
+Recomendado para `agregacion` y cualquier capítulo donde una figura deba ir pegada a su
+título/explicación. Coste: si un capítulo inspecciona `chapter.blocks` en sus tests, ahora
+encontrará `Group`s — aplana con un helper recursivo (ver `num_distr_test.py::_flatten`).
+
+### 11.3 Zebra striping en tablas (automático)
+
+Todo `DataTable` se renderiza con **filas pares sombreadas** (gris muy suave `#f6f8fa`) y
+cabecera con su fondo propio. Es automático en PDF y PPTX; el patrón se mantiene coherente
+cuando una tabla larga se parte y repite cabecera (el índice de fila es lógico, no por
+página). No hay nada que hacer en los capítulos.
+
+---
+
 ## 10. Integración futura con `profile_table` (siguiente fase)

 `profile_table(emit_pdf=True)` usa hoy `render_eda_pdf` (intacto). En la siguiente fase
@@ -34,6 +34,7 @@ from .theils_u import theils_u
 from .correlation_ratio import correlation_ratio
 from .mutual_info_columns import mutual_info_columns
 from .infer_fk_containment_duckdb import infer_fk_containment_duckdb
+from .detect_declared_keys_duckdb import detect_declared_keys_duckdb
 from .build_join_graph import build_join_graph
 from .association_matrix import association_matrix
 from .correlation_matrix_duckdb import correlation_matrix_duckdb
@@ -68,11 +69,15 @@ from .extract_timeseries_raw import extract_timeseries_raw
 from .build_eda_render_ctx import build_eda_render_ctx
 from .profile_datetime import profile_datetime
 from .resample_timeseries import resample_timeseries
+from .add_pdf_internal_links import add_pdf_internal_links
+from .suggest_intratable_fk_candidates import suggest_intratable_fk_candidates

 __all__ = [
+    "suggest_intratable_fk_candidates",
    "detect_time_column",
    "extract_timeseries_raw",
    "build_eda_render_ctx",
+    "add_pdf_internal_links",
    "profile_datetime",
    "resample_timeseries",
    "render_automatic_eda_pdf",
@@ -95,6 +100,7 @@ __all__ = [
    "correlation_ratio",
    "mutual_info_columns",
    "infer_fk_containment_duckdb",
+    "detect_declared_keys_duckdb",
    "build_join_graph",
    "association_matrix",
    "correlation_matrix_duckdb",
@@ -0,0 +1,85 @@
+---
+name: add_pdf_internal_links
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def add_pdf_internal_links(pdf_path: str, links: list) -> dict"
+description: "Postprocesa un PDF YA escrito insertando link annotations internos de tipo GOTO ('ir a') con PyMuPDF (import fitz). Pensado para PDFs generados por matplotlib PdfPages, que NO soporta hyperlinks internos: tras escribir el PDF se reabre y, por cada entrada de `links`, se añade una anotacion clicable desde un rectangulo de una pagina origen (src_page + src_rect en puntos top-left) hasta un punto de una pagina destino (dst_page + dst_point). Caso de uso tipico del grupo eda: hacer clicables los terminos de un AutomaticEDA que apuntan a su entrada en el glosario al final del documento. Estilo dict-no-throw: NUNCA lanza; valida cada link y SALTA (n_skipped++) los malformados o fuera de rango en vez de fallar. Guarda de forma segura escribiendo a un temporal en el mismo directorio y haciendo os.replace atomico (evita corromper el original). Devuelve {status:ok,n_links,n_skipped} o {status:error,error}; si pymupdf no esta disponible o el archivo no existe devuelve status error."
+tags: [eda, datascience, pdf, links, glossary, pymupdf, fitz, postprocess, python]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: []
+params:
+  - name: pdf_path
+    desc: "ruta al PDF existente (str no vacio). Se reescribe IN SITU (in-place) tras añadir los links: se guarda a un temporal `.<base>.tmp_links` en el mismo directorio y se reemplaza atomicamente con os.replace. Si no es str o no existe el archivo -> {status:error}."
+  - name: links
+    desc: "lista de dicts, uno por link a insertar. Cada dict: src_page (int 0-based de la pagina origen), src_rect ([x0,y0,x1,y1] del rectangulo clicable en PUNTOS PDF 1/72\" con origen ARRIBA-IZQUIERDA), dst_page (int 0-based de la pagina destino), dst_point ([x,y] punto destino, mismos puntos top-left). Las entradas que no son dict, con page fuera de rango [0,page_count), src_rect que no tenga 4 numeros o dst_point que no tenga 2 numeros se SALTAN (n_skipped++), no lanzan. None se trata como lista vacia."
+output: "dict (NUNCA lanza): en exito {\"status\":\"ok\",\"n_links\":int,\"n_skipped\":int} con n_links = anotaciones GOTO insertadas y n_skipped = entradas invalidas saltadas. En fallo {\"status\":\"error\",\"error\":str}: pymupdf no disponible, pdf_path no es str / no existe, links no es lista, o cualquier excepcion global (el PDF original queda intacto porque el replace solo ocurre tras un save correcto)."
+tested: true
+tests: ["test_add_goto_link_basico", "test_links_invalidos_se_saltan", "test_archivo_inexistente_devuelve_error"]
+test_file_path: "python/functions/datascience/add_pdf_internal_links_test.py"
+file_path: "python/functions/datascience/add_pdf_internal_links.py"
+---
+
+## Ejemplo
+
+```python
+import sys, os
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience import add_pdf_internal_links
+
+# Tienes un PDF ya escrito por matplotlib PdfPages (sin hyperlinks internos).
+# Quieres que el texto "Margen bruto" de la pagina 0 (rectangulo en puntos
+# top-left) salte a su entrada del glosario en la ultima pagina (indice 7).
+res = add_pdf_internal_links(
+    "reports/eda.pdf",
+    [
+        {"src_page": 0, "src_rect": [72, 120, 180, 134], "dst_page": 7, "dst_point": [72, 200]},
+        {"src_page": 0, "src_rect": [72, 140, 180, 154], "dst_page": 7, "dst_point": [72, 260]},
+    ],
+)
+# res == {"status": "ok", "n_links": 2, "n_skipped": 0}
+```
+
+## Cuando usarla
+
+Justo DESPUES de escribir un PDF con matplotlib `PdfPages` (o cualquier motor
+que no genere hyperlinks internos) cuando necesitas que ciertos terminos o
+referencias sean clicables y salten a otra pagina del mismo documento — el caso
+canonico es enlazar los terminos de un AutomaticEDA con su entrada de glosario
+al final. Es un paso de postproceso: primero generas el PDF y calculas en que
+rectangulo quedo cada termino (en puntos PDF), luego pasas esa lista a esta
+funcion para inyectar las anotaciones GOTO.
+
+## Gotchas
+
+- **Impura — reescribe el archivo IN SITU.** El PDF en `pdf_path` se reemplaza
+  por la version con los links. El guardado es seguro: escribe a un temporal
+  `.<base>.tmp_links` en el MISMO directorio y hace `os.replace` atomico tras
+  cerrar el documento, asi un fallo a mitad no corrompe el original. Aun asi,
+  conserva una copia si el PDF es valioso.
+- **Sistema de coordenadas: puntos top-left, igual que matplotlib.** PyMuPDF y
+  matplotlib (PdfPages) usan ambos PUNTOS PDF (1/72") con el origen ARRIBA-
+  IZQUIERDA, asi que los rectangulos/puntos COINCIDEN: el `src_rect` que calcules
+  con la geometria de la figura matplotlib se pasa tal cual, sin invertir el eje
+  Y. (Ojo: el espacio de datos de matplotlib SI tiene el origen abajo; lo que
+  coincide es el espacio de la PAGINA en puntos.)
+- **Indices de pagina 0-based.** `src_page` / `dst_page` son indices base 0
+  (la primera pagina es 0). Fuera del rango `[0, page_count)` el link se SALTA
+  (cuenta en `n_skipped`), no lanza.
+- **dict-no-throw, validacion por-link.** Las entradas malformadas (no dict,
+  page fuera de rango, `src_rect` sin 4 numeros, `dst_point` sin 2 numeros) se
+  saltan individualmente e incrementan `n_skipped`; el resto de links validos se
+  insertan igual. La funcion solo devuelve `{status:error}` ante fallos globales
+  (pymupdf ausente, archivo inexistente, `links` no es lista).
+- **`error_type: error_go_core` es metadata del registry, no comportamiento.**
+  Toda funcion impura debe declararlo y el indexer lo exige, pero el codigo NUNCA
+  lanza esa excepcion: degrada al dict de estado.
+- **Requiere PyMuPDF (`import fitz`).** Si no esta instalado devuelve
+  `{"status":"error","error":"pymupdf no disponible: ..."}`. En el registry el
+  venv `python/.venv` ya lo trae.
@@ -0,0 +1,132 @@
+"""Postprocesa un PDF existente insertando link annotations internos (GOTO).
+
+Motor: PyMuPDF (``import fitz``). Pensado para PDFs generados por matplotlib
+``PdfPages``, que no soporta hyperlinks internos: tras escribir el PDF, esta
+funcion lo reabre y le añade anotaciones "ir a" (GOTO) desde un rectangulo de
+una pagina origen hasta un punto de una pagina destino. Util para hacer
+clicables terminos que apuntan a su entrada en un glosario al final del
+documento.
+
+Estilo dict-no-throw del grupo `eda`: NUNCA lanza; devuelve un dict de estado.
+"""
+
+import os
+
+
+def add_pdf_internal_links(pdf_path: str, links: list) -> dict:
+    """Añade link annotations internos (GOTO) a un PDF ya escrito.
+
+    Postprocesa un PDF (p.ej. generado por matplotlib PdfPages, que NO soporta
+    hyperlinks internos) insertando, por cada entrada de ``links``, una
+    anotacion de tipo "ir a" desde un rectangulo de una pagina origen hasta un
+    punto de una pagina destino. Sirve para hacer clicables terminos que apuntan
+    a su entrada en un glosario al final del documento.
+
+    Args:
+        pdf_path: ruta al PDF existente (se reescribe in situ).
+        links: lista de dicts, cada uno:
+            {
+              "src_page": int,            # indice 0-based de la pagina origen
+              "src_rect": [x0,y0,x1,y1],  # rectangulo clicable, en PUNTOS PDF
+                                          # (1/72") con origen ARRIBA-IZQUIERDA
+              "dst_page": int,            # indice 0-based de la pagina destino
+              "dst_point": [x, y],        # punto destino, mismos puntos top-left
+            }
+
+    Returns:
+        dict (NUNCA lanza): {"status":"ok","n_links":int,"n_skipped":int}
+        o {"status":"error","error":str}. Si pymupdf no esta disponible o el
+        archivo no existe -> {"status":"error", ...}.
+    """
+    try:
+        try:
+            import fitz  # PyMuPDF
+        except Exception as exc:  # ImportError u otro fallo de carga
+            return {"status": "error", "error": f"pymupdf no disponible: {exc}"}
+
+        if not isinstance(pdf_path, str) or not pdf_path:
+            return {"status": "error", "error": "pdf_path debe ser una ruta no vacia"}
+        if not os.path.isfile(pdf_path):
+            return {"status": "error", "error": f"el archivo no existe: {pdf_path}"}
+
+        if links is None:
+            links = []
+        if not isinstance(links, (list, tuple)):
+            return {"status": "error", "error": "links debe ser una lista de dicts"}
+
+        doc = fitz.open(pdf_path)
+        try:
+            n_pages = doc.page_count
+            n_ok = 0
+            n_skipped = 0
+
+            for link in links:
+                if not isinstance(link, dict):
+                    n_skipped += 1
+                    continue
+
+                src_page = link.get("src_page")
+                dst_page = link.get("dst_page")
+                src_rect = link.get("src_rect")
+                dst_point = link.get("dst_point")
+
+                # src_page / dst_page: enteros 0-based en rango.
+                if not _is_int(src_page) or not _is_int(dst_page):
+                    n_skipped += 1
+                    continue
+                if not (0 <= src_page < n_pages) or not (0 <= dst_page < n_pages):
+                    n_skipped += 1
+                    continue
+
+                # src_rect: 4 numeros.
+                if not _is_num_seq(src_rect, 4):
+                    n_skipped += 1
+                    continue
+                # dst_point: 2 numeros.
+                if not _is_num_seq(dst_point, 2):
+                    n_skipped += 1
+                    continue
+
+                try:
+                    doc[int(src_page)].insert_link(
+                        {
+                            "kind": fitz.LINK_GOTO,
+                            "from": fitz.Rect(*[float(v) for v in src_rect]),
+                            "page": int(dst_page),
+                            "to": fitz.Point(*[float(v) for v in dst_point]),
+                        }
+                    )
+                    n_ok += 1
+                except Exception:
+                    n_skipped += 1
+                    continue
+
+            # Guardado seguro: escribir a temporal en el mismo directorio y
+            # reemplazar atomicamente (evita corromper el PDF original).
+            directory = os.path.dirname(os.path.abspath(pdf_path)) or "."
+            base = os.path.basename(pdf_path)
+            tmp_path = os.path.join(directory, f".{base}.tmp_links")
+            doc.save(tmp_path)
+        finally:
+            doc.close()
+
+        os.replace(tmp_path, pdf_path)
+
+        return {"status": "ok", "n_links": n_ok, "n_skipped": n_skipped}
+    except Exception as exc:  # degrada cualquier fallo a dict de error
+        return {"status": "error", "error": str(exc)}
+
+
+def _is_int(value) -> bool:
+    """True si value es un entero (no bool)."""
+    return isinstance(value, int) and not isinstance(value, bool)
+
+
+def _is_num_seq(value, length: int) -> bool:
+    """True si value es una secuencia de `length` numeros (int/float, no bool)."""
+    if not isinstance(value, (list, tuple)) or len(value) != length:
+        return False
+    for v in value:
+        if isinstance(v, bool) or not isinstance(v, (int, float)):
+            return False
+    return True
@@ -0,0 +1,77 @@
+"""Tests para add_pdf_internal_links."""
+
+import os
+import sys
+
+import pytest
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+from add_pdf_internal_links import add_pdf_internal_links
+
+
+def test_add_goto_link_basico(tmp_path):
+    """Golden: un PDF de 2 paginas recibe un link GOTO de la pag 0 a la pag 1."""
+    fitz = pytest.importorskip("fitz")
+
+    # 1) PDF temporal de 2 paginas A5 (~419x595 puntos).
+    pdf = str(tmp_path / "doc.pdf")
+    doc = fitz.open()
+    doc.new_page(width=419, height=595)
+    doc.new_page(width=419, height=595)
+    doc.save(pdf)
+    doc.close()
+
+    # 2) Insertar un link interno desde la pag 0 hacia la pag 1.
+    res = add_pdf_internal_links(
+        pdf,
+        [{"src_page": 0, "src_rect": [50, 50, 200, 70], "dst_page": 1, "dst_point": [40, 40]}],
+    )
+    assert res["status"] == "ok"
+    assert res["n_links"] == 1
+    assert res["n_skipped"] == 0
+
+    # 3) Reabrir y verificar que la pag 0 tiene un link GOTO a la pag 1.
+    doc = fitz.open(pdf)
+    try:
+        links = doc[0].get_links()
+        goto = [l for l in links if l.get("kind") == fitz.LINK_GOTO and l.get("page") == 1]
+        assert len(goto) >= 1
+    finally:
+        doc.close()
+
+
+def test_links_invalidos_se_saltan(tmp_path):
+    """Edge: entradas malformadas o fuera de rango incrementan n_skipped, no lanzan."""
+    fitz = pytest.importorskip("fitz")
+
+    pdf = str(tmp_path / "doc.pdf")
+    doc = fitz.open()
+    doc.new_page(width=419, height=595)
+    doc.new_page(width=419, height=595)
+    doc.save(pdf)
+    doc.close()
+
+    res = add_pdf_internal_links(
+        pdf,
+        [
+            # valido
+            {"src_page": 0, "src_rect": [10, 10, 90, 30], "dst_page": 1, "dst_point": [20, 20]},
+            # dst_page fuera de rango
+            {"src_page": 0, "src_rect": [10, 40, 90, 60], "dst_page": 9, "dst_point": [20, 20]},
+            # src_rect con 3 numeros
+            {"src_page": 0, "src_rect": [10, 70, 90], "dst_page": 1, "dst_point": [20, 20]},
+            # no es dict
+            "no-soy-un-dict",
+        ],
+    )
+    assert res["status"] == "ok"
+    assert res["n_links"] == 1
+    assert res["n_skipped"] == 3
+
+
+def test_archivo_inexistente_devuelve_error():
+    """Error path: pdf_path inexistente -> status error sin lanzar."""
+    res = add_pdf_internal_links("/ruta/que/no/existe_xyz.pdf", [])
+    assert res["status"] == "error"
+    assert "error" in res
@@ -21,6 +21,9 @@ from .model import (  # noqa: F401
    Chapter,
    DataTable,
    Figure,
+    GlossaryCollector,
+    GlossaryEntry,
+    Group,
    Heading,
    Image,
    KVTable,
@@ -45,6 +48,9 @@ __all__ = [
    "Image",
    "Caption",
    "Note",
+    "Group",
+    "GlossaryEntry",
+    "GlossaryCollector",
    "Chapter",
    "as_blocks",
    "as_chapters",
@@ -33,10 +33,23 @@ import math

 from .. import model

-CHAPTER_VERSION = "1.0.0"
+CHAPTER_VERSION = "1.1.0"
 CHAPTER_ID = "cat_distr"
 CHAPTER_TITLE = "Distribuciones categóricas"

+# Glossary term this chapter explains. Registered in the shared collector and
+# marked clickable on its first appearance (end-to-end glossary example —
+# mejora 6). Other chapters hook their own terms the same way (see the contract).
+_TERM_ENTROPIA_KEY = "entropia"
+_TERM_ENTROPIA_LABEL = "Entropía (de Shannon)"
+_TERM_ENTROPIA_DEF = (
+    "Medida, en bits, de cómo de repartidos están los valores de una columna "
+    "categórica. Vale 0 cuando una sola categoría concentra todas las filas "
+    "(máxima previsibilidad) y alcanza su máximo, log2(k) para k categorías "
+    "distintas, cuando todas aparecen por igual (máxima diversidad). La entropía "
+    "normalizada (entropía dividida por su máximo) la lleva al rango 0–1 para "
+    "comparar columnas con distinto número de categorías.")
+
 # Cap the number of categorical columns rendered to keep the document bounded;
 # the rest are summarized in a closing note (no silent truncation).
 MAX_COLS = 40
@@ -337,10 +350,14 @@ def _topk_table(cat: dict):
                           note=note)


-def _intro_blocks(n_rows):
+def _intro_blocks(n_rows, mark_term: bool = False):
    total = _fmt_int(n_rows)
+    # Mark the first appearance of the term as a clickable glossary jump when the
+    # term was registered (mark_term). The visible text is identical either way.
+    entropia = ("[[term:entropia]]**entropía de Shannon**[[/term]]" if mark_term
+                else "**entropía de Shannon**")
    text = (
-        "La **entropía de Shannon** mide cómo de repartidos están los valores de "
+        f"La {entropia} mide cómo de repartidos están los valores de "
        "una columna categórica, en bits. Vale 0 cuando una sola categoría "
        "concentra todas las filas (máxima previsibilidad) y alcanza su máximo, "
        "log2(k) para k categorías distintas, cuando todas aparecen por igual "
@@ -370,7 +387,15 @@ def build_cat_distr(profile: dict, ctx: dict):
        return None

    n_rows = profile.get("n_rows")
-    blocks = list(_intro_blocks(n_rows))
+    # Register "entropía" in the shared glossary collector (if present) and mark
+    # its first appearance clickable. End-to-end glossary example (mejora 6).
+    glossary = ctx.get("glossary")
+    mark_term = False
+    if isinstance(glossary, model.GlossaryCollector):
+        glossary.add(_TERM_ENTROPIA_KEY, _TERM_ENTROPIA_LABEL,
+                     _TERM_ENTROPIA_DEF)
+        mark_term = True
+    blocks = list(_intro_blocks(n_rows, mark_term=mark_term))

    rendered = cat_cols[:MAX_COLS]
    for col in rendered:
@@ -0,0 +1,47 @@
+"""Glossary chapter (GLOSARIO) — always the last chapter, clickable terms.
+
+Renders one entry per glossary term that the other chapters registered during
+the document build through ``ctx['glossary'].add(key, label, definition)`` (see
+``GlossaryCollector`` in ``model.py``). Each entry is a clickable destination:
+every in-text appearance a chapter marked with ``[[term:key]]texto[[/term]]``
+becomes a real jump to its entry here — PDF link annotations (PyMuPDF) and PPTX
+native slide jumps, both wired by the renderers.
+
+Returns ``None`` when no term was registered (there is nothing to show), so the
+chapter simply disappears from documents that did not mark any term.
+
+Contract: build_<id>(profile, ctx) -> Chapter | None ; CHAPTER_VERSION = "x.y.z".
+"""
+
+from __future__ import annotations
+
+from .. import model
+
+CHAPTER_VERSION = "1.0.0"
+CHAPTER_ID = "glosario"
+CHAPTER_TITLE = "Glosario"
+
+
+def build_glosario(profile: dict, ctx: dict):
+    """Build the glossary Chapter from the shared collector, or None if empty."""
+    ctx = ctx or {}
+    glossary = ctx.get("glossary")
+    if not isinstance(glossary, model.GlossaryCollector) or not glossary:
+        return None
+
+    blocks = [
+        model.Heading(text="Glosario de términos", level=1),
+        model.Markdown(text=(
+            "Definición de los términos técnicos que aparecen en el informe. "
+            "Cada término va resaltado en el texto y, al pulsarlo, salta a su "
+            "definición en esta sección.")),
+    ]
+    # One clickable destination per term, alphabetically by visible label.
+    for term in glossary.terms(by="label"):
+        blocks.append(model.GlossaryEntry(
+            key=model._safe_str(term.get("key")),
+            label=model._safe_str(term.get("label")),
+            definition=model._safe_str(term.get("definition"))))
+
+    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
+                         version=CHAPTER_VERSION, blocks=blocks)
@@ -34,7 +34,7 @@ try:
 except Exception:  # noqa: BLE001 — keep the chapter importable no matter what.
    build_boxplot_stats = None  # type: ignore[assignment]

-CHAPTER_VERSION = "1.0.0"
+CHAPTER_VERSION = "1.1.0"
 CHAPTER_ID = "num_distr"
 CHAPTER_TITLE = "Distribuciones numéricas"

@@ -278,12 +278,17 @@ def build_num_distr(profile: dict, ctx: dict):
                box = build_boxplot_stats(numeric) or {}
            except Exception:  # noqa: BLE001 — degrade, never raise.
                box = {}
-        blocks.append(model.Heading(text=str(name), level=2))
-        blocks.append(model.Figure(
-            make=_figure_maker(name, numeric, box),
-            caption=f"Distribución de «{name}» — histograma (media/mediana/±σ) "
-                    f"y boxplot."))
-        blocks.append(model.Markdown(text=_stats_note(name, numeric, box)))
+        # Keep the column heading, its figure and its stats note together on the
+        # same page/slide (mejora 3 — keep-together): the renderers measure the
+        # whole Group and move it whole when it would not fit.
+        blocks.append(model.Group(blocks=[
+            model.Heading(text=str(name), level=2),
+            model.Figure(
+                make=_figure_maker(name, numeric, box),
+                caption=f"Distribución de «{name}» — histograma "
+                        f"(media/mediana/±σ) y boxplot."),
+            model.Markdown(text=_stats_note(name, numeric, box)),
+        ]))

    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
                         version=CHAPTER_VERSION, blocks=blocks)
@@ -65,19 +65,33 @@ def _pdf_text(path: str) -> str:
    return re.sub(r"\s+", " ", txt)


+def _flatten(blocks):
+    """Expand keep-together Groups so the per-column heading/figure/markdown are
+    inspectable as a flat block list (the chapter wraps each column in a Group)."""
+    out = []
+    for b in blocks:
+        if getattr(b, "kind", "") == "group":
+            out.extend(_flatten(getattr(b, "blocks", []) or []))
+        else:
+            out.append(b)
+    return out
+
+
 def test_golden_chapter_estructura_y_bloques():
    ch = build_num_distr(_profile(n_numeric=2), {})
    assert ch is not None
    assert ch.id == "num_distr"
    assert ch.version == CHAPTER_VERSION
-    kinds = [b.kind for b in ch.blocks]
+    # Per-column blocks are wrapped in keep-together Groups: flatten to inspect.
+    flat = _flatten(ch.blocks)
+    kinds = [b.kind for b in flat]
    # Heading + intro Markdown, then per column: Heading + Figure + Markdown.
    assert kinds[0] == "heading"
    assert kinds[1] == "markdown"
    assert kinds.count("figure") == 2          # one figure per numeric column.
    assert kinds.count("heading") == 1 + 2     # chapter title + one per column.
    # Each figure has a lazy maker that produces a real matplotlib figure.
-    figs = [b for b in ch.blocks if b.kind == "figure"]
+    figs = [b for b in flat if b.kind == "figure"]
    fig = figs[0].make()
    assert fig is not None
    # Two stacked axes: histogram + boxplot share the figure.
@@ -90,7 +104,8 @@ def test_golden_media_mediana_sigma_y_boxplot_presentes():
    # The intro documents the three reference lines and the Tukey boxplot; the
    # per-column note carries the actual mean/median/σ numbers and the shape.
    ch = build_num_distr(_profile(n_numeric=1, extra_categorical=False), {})
-    md_texts = " ".join(b.text for b in ch.blocks if b.kind == "markdown")
+    md_texts = " ".join(b.text for b in _flatten(ch.blocks)
+                        if b.kind == "markdown")
    assert "media" in md_texts and "mediana" in md_texts
    assert "±1σ" in md_texts or "σ" in md_texts
    assert "boxplot" in md_texts.lower()
@@ -126,7 +141,8 @@ def test_anti_corte_muchas_columnas_pdf_y_pptx():
    # 8 numeric columns + long note text: nothing may be cut. Every column
    # heading must survive in both the PDF text and the PPTX deck.
    ch = build_num_distr(_profile(n_numeric=8), {})
-    names = [b.text for b in ch.blocks if b.kind == "heading" and b.level == 2]
+    names = [b.text for b in _flatten(ch.blocks)
+             if b.kind == "heading" and b.level == 2]
    assert len(names) == 8
    with tempfile.TemporaryDirectory() as d:
        pdf = os.path.join(d, "num.pdf")
@@ -17,7 +17,7 @@ from datetime import datetime, timezone

 from .. import model

-CHAPTER_VERSION = "1.0.0"
+CHAPTER_VERSION = "1.1.0"
 CHAPTER_ID = "portada"
 CHAPTER_TITLE = "Portada"

@@ -67,6 +67,53 @@ def _fmt_int(v) -> str:
        return str(v)


+def _fmt_pct(value) -> str:
+    """Format a percentage that may arrive as a 0–1 fraction or a 0–100 number."""
+    if value is None:
+        return "—"
+    try:
+        v = float(value)
+    except (TypeError, ValueError):
+        return str(value)
+    if 0 < v <= 1.0:
+        v *= 100.0
+    return f"{v:.1f}%"
+
+
+def _summary_blocks(summary) -> list:
+    """Mini-summary of the rest of the analysis, shown on the cover (mejora 5).
+
+    The cover is built AFTER the body (``build_document`` passes the aggregated
+    ``ctx['document_summary']``), so it can reflect what the analysis found:
+    shape, column types, quality flags and which chapters were included. Returns
+    an empty list when there is no summary (the cover degrades to its metadata
+    table only)."""
+    if not isinstance(summary, dict) or not summary:
+        return []
+    rows = []
+    n_num = summary.get("n_numeric")
+    n_cat = summary.get("n_categorical")
+    if n_num is not None or n_cat is not None:
+        rows.append(("Columnas numéricas / categóricas",
+                     f"{_fmt_int(n_num)} / {_fmt_int(n_cat)}"))
+    if summary.get("duplicate_pct") is not None:
+        rows.append(("Filas duplicadas", _fmt_pct(summary.get("duplicate_pct"))))
+    if summary.get("null_cell_pct") is not None:
+        rows.append(("Celdas nulas", _fmt_pct(summary.get("null_cell_pct"))))
+    titles = summary.get("chapter_titles") or []
+    if titles:
+        rows.append(("Capítulos del informe", _fmt_int(len(titles))))
+
+    blocks = [model.Heading(text="Resumen del análisis", level=2)]
+    if rows:
+        blocks.append(model.KVTable(rows=rows))
+    if titles:
+        bullets = "\n".join(f"- {model._safe_str(t)}" for t in titles)
+        blocks.append(model.Markdown(
+            text="Este informe incluye los siguientes capítulos:\n" + bullets))
+    return blocks
+
+
 def _fmt_date_eu(value) -> str:
    """Format a date/ISO string as European DD/MM/AAAA HH:mm (UI convention).

@@ -152,5 +199,8 @@ def build_portada(profile: dict, ctx: dict):
        model.Markdown(text=str(granularity)),
    ]

+    # Mini-summary of the rest of the analysis (built last, shown on the cover).
+    blocks.extend(_summary_blocks(ctx.get("document_summary")))
+
    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
                         version=CHAPTER_VERSION, blocks=blocks)
@@ -0,0 +1,500 @@
+"""Key-relations chapter (RELACIONES) — the keys / join structure of the data.
+
+This chapter is the *relational* section of an AutomaticEDA report. It answers a
+single question for the table (or the whole DuckDB source it lives in): **how do
+the keys relate?** It composes, without reimplementing them, the registry's
+relation primitives and degrades honestly when a layer does not apply.
+
+It renders, in order, only the layers that have something to say:
+
+1. **Declared keys** (real schema constraints) — when the DuckDB source declares
+   PRIMARY KEY / FOREIGN KEY / UNIQUE constraints, they are read verbatim via
+   ``detect_declared_keys_duckdb`` and shown as ground truth: which column is the
+   PK, which columns are FKs and the table/column they point to.
+2. **Primary-key candidates** — the ``key_candidates`` the TableProfile already
+   carries (columns whose cardinality equals the row count, with no nulls). These
+   are *candidates*: a column that could serve as the row identifier.
+3. **Foreign-key candidates** when none are declared:
+   - **Inter-table** (the DuckDB source has several tables): real FK candidates by
+     name signal + value containment via ``infer_fk_containment_duckdb``, plus the
+     join graph (roles + a pasteable Mermaid diagram) via ``build_join_graph``.
+   - **Intra-table** (a single table): columns that *look* like a foreign key by a
+     name+cardinality heuristic (``suggest_intratable_fk_candidates``). This is a
+     **suggestion**, explicitly flagged as a heuristic, never an assertion.
+
+``build_relaciones(profile, ctx) -> Chapter | None``: returns ``None`` when there
+is nothing to say (no declared key, no key candidates, and no FK candidate —
+inter- or intra-table). Reads everything defensively (``.get``) and never raises:
+anything missing degrades to a note or is omitted; a failing registry call drops
+its layer instead of aborting the chapter.
+
+ctx keys this chapter consumes (all optional):
+    db_path, table : str — the DuckDB file and table being profiled (set by
+        ``build_eda_render_ctx``). ``db_path`` is needed to read declared
+        constraints, to list the sibling tables, and to run the containment-based
+        FK inference. Without it, only the profile-derived layers (PK candidates,
+        intra-table FK heuristic) are available.
+    glossary : model.GlossaryCollector — shared glossary; the chapter registers
+        the relational terms (PK, FK, containment, cardinality) and marks their
+        first appearance clickable.
+
+Contract: build_<id>(profile, ctx) -> Chapter | None ; CHAPTER_VERSION = "x.y.z".
+"""
+
+from __future__ import annotations
+
+from .. import model
+
+# Pure/impure registry functions (group ``eda``) this chapter composes. Imported
+# defensively (module-leaf imports, like the AGREGACION chapter) so the chapter
+# still builds — degrading the affected layer to nothing — if a function is
+# somehow unavailable / not indexed yet.
+try:
+    from datascience.detect_declared_keys_duckdb import detect_declared_keys_duckdb
+except Exception:  # noqa: BLE001 — keep the chapter importable no matter what.
+    detect_declared_keys_duckdb = None  # type: ignore[assignment]
+try:
+    from datascience.infer_fk_containment_duckdb import infer_fk_containment_duckdb
+except Exception:  # noqa: BLE001
+    infer_fk_containment_duckdb = None  # type: ignore[assignment]
+try:
+    from datascience.build_join_graph import build_join_graph
+except Exception:  # noqa: BLE001
+    build_join_graph = None  # type: ignore[assignment]
+try:
+    from datascience.suggest_intratable_fk_candidates import (
+        suggest_intratable_fk_candidates,
+    )
+except Exception:  # noqa: BLE001
+    suggest_intratable_fk_candidates = None  # type: ignore[assignment]
+try:
+    from infra import duckdb_list_tables
+except Exception:  # noqa: BLE001
+    duckdb_list_tables = None  # type: ignore[assignment]
+
+CHAPTER_VERSION = "1.0.0"
+CHAPTER_ID = "relaciones"
+CHAPTER_TITLE = "Relaciones de clave"
+
+# Cap the inter-table FK table so a wide schema does not blow up the page; the
+# rest is summarized in a closing note (no silent truncation).
+MAX_FK_ROWS = 40
+
+# --------------------------------------------------------------------------- #
+# Glossary terms this chapter explains. Registered in the shared collector and
+# marked clickable on their first appearance (contract §11.1).
+# --------------------------------------------------------------------------- #
+_TERMS = {
+    "pk": (
+        "Clave primaria (PK)",
+        "Columna (o conjunto de columnas) que identifica de forma única cada fila "
+        "de una tabla: sus valores no se repiten y no son nulos. Una tabla tiene "
+        "como mucho una clave primaria; es el ancla por la que otras tablas la "
+        "referencian.",
+    ),
+    "fk": (
+        "Clave foránea (FK)",
+        "Columna de una tabla cuyos valores apuntan a la clave primaria de otra "
+        "tabla (o de la misma), creando una relación entre ambas. Una FK suele ser "
+        "N:1: muchas filas de la tabla origen comparten el mismo valor de la tabla "
+        "destino.",
+    ),
+    "containment": (
+        "Containment / inclusión",
+        "Señal con la que se infiere una clave foránea sin que la base la declare: "
+        "la fracción de valores distintos de una columna A que también aparecen "
+        "como valores de otra columna B. Si casi todos los valores de A están "
+        "contenidos en B (inclusión ≈ 1) y B parece una clave, A → B es una FK "
+        "candidata.",
+    ),
+    "cardinalidad": (
+        "Cardinalidad",
+        "Número de valores distintos de una columna. Cardinalidad igual al número "
+        "de filas (y sin nulos) señala un identificador (candidato a clave "
+        "primaria); cardinalidad alta pero menor que el número de filas, con "
+        "valores repetidos, es típica de una clave foránea.",
+    ),
+}
+
+
+def _register_terms(ctx: dict) -> bool:
+    """Register the relational terms in the shared glossary. Returns whether the
+    in-text appearances should be marked clickable."""
+    glossary = ctx.get("glossary")
+    if not isinstance(glossary, model.GlossaryCollector):
+        return False
+    for key, (label, definition) in _TERMS.items():
+        glossary.add(key, label, definition)
+    return True
+
+
+# --------------------------------------------------------------------------- #
+# Formatting helpers (mirror the other chapters' defensive style).
+# --------------------------------------------------------------------------- #
+def _fmt_int(value) -> str:
+    if value is None:
+        return "—"
+    try:
+        return f"{int(value):,}".replace(",", ".")
+    except (TypeError, ValueError):
+        return model._safe_str(value)
+
+
+def _fmt_pct_fraction(value, decimals: int = 1) -> str:
+    """Format a 0–1 fraction as a percentage. None -> placeholder."""
+    if value is None:
+        return "—"
+    try:
+        v = float(value)
+    except (TypeError, ValueError):
+        return model._safe_str(value)
+    if v <= 1.0:
+        v *= 100.0
+    return f"{v:.{decimals}f}%"
+
+
+def _fmt_ratio(value, decimals: int = 3) -> str:
+    """Format an already-0–1 ratio (inclusion) as a plain number."""
+    if value is None:
+        return "—"
+    try:
+        return f"{float(value):.{decimals}f}".rstrip("0").rstrip(".")
+    except (TypeError, ValueError):
+        return model._safe_str(value)
+
+
+def _is_dict(v) -> bool:
+    return isinstance(v, dict)
+
+
+def _columns_by_name(profile: dict) -> dict:
+    """Index the profile columns by name for quick metric lookup."""
+    out = {}
+    for col in (profile.get("columns") or []):
+        if _is_dict(col) and col.get("name") is not None:
+            out[col.get("name")] = col
+    return out
+
+
+# --------------------------------------------------------------------------- #
+# Layer 1 — declared keys (real schema constraints).
+# --------------------------------------------------------------------------- #
+def _declared_keys(db_path: str, table: str):
+    """Read declared PK/FK/UNIQUE for the source, or None if unavailable."""
+    if not db_path or detect_declared_keys_duckdb is None:
+        return None
+    try:
+        out = detect_declared_keys_duckdb(db_path, table)
+    except Exception:  # noqa: BLE001 — dict-no-throw: treat as unavailable.
+        return None
+    if not _is_dict(out) or out.get("status") != "ok":
+        return None
+    return out
+
+
+def _declared_section(declared: dict) -> list:
+    """Blocks for the declared-keys layer, or [] if there is nothing declared."""
+    pks = [p for p in (declared.get("primary_keys") or []) if _is_dict(p)]
+    fks = [f for f in (declared.get("foreign_keys") or []) if _is_dict(f)]
+    uqs = [u for u in (declared.get("unique") or []) if _is_dict(u)]
+    if not (pks or fks or uqs):
+        return []
+
+    blocks = [
+        model.Heading(text="Claves declaradas en el esquema", level=2),
+        model.Markdown(text=(
+            "La base **declara** estas relaciones de clave como restricciones "
+            "reales del esquema (constraints). Son la verdad de referencia: no se "
+            "infieren, se leen tal cual de la definición de las tablas.")),
+    ]
+
+    if pks:
+        rows = [[model._safe_str(p.get("table")),
+                 ", ".join(model._safe_str(c) for c in (p.get("columns") or []))]
+                for p in pks]
+        blocks.append(model.DataTable(
+            header=["Tabla", "Columna(s) PK"], rows=rows,
+            title="Claves primarias declaradas",
+            note="Cada fila: la clave primaria declarada de una tabla."))
+
+    if fks:
+        rows = []
+        for f in fks:
+            src = ", ".join(model._safe_str(c) for c in (f.get("columns") or []))
+            dst = ", ".join(
+                model._safe_str(c) for c in (f.get("referenced_columns") or []))
+            rows.append([
+                model._safe_str(f.get("table")), src,
+                model._safe_str(f.get("referenced_table")), dst])
+        blocks.append(model.DataTable(
+            header=["Tabla origen", "Columna(s) FK", "→ Tabla destino",
+                    "Columna(s) destino"],
+            rows=rows, title="Claves foráneas declaradas",
+            note="Cada fila: una FK declarada — origen → destino."))
+
+    if uqs:
+        rows = [[model._safe_str(u.get("table")),
+                 ", ".join(model._safe_str(c) for c in (u.get("columns") or []))]
+                for u in uqs]
+        blocks.append(model.DataTable(
+            header=["Tabla", "Columna(s) UNIQUE"], rows=rows,
+            title="Restricciones UNIQUE declaradas"))
+
+    return blocks
+
+
+# --------------------------------------------------------------------------- #
+# Layer 2 — primary-key candidates (from the profile).
+# --------------------------------------------------------------------------- #
+def _pk_candidates_section(profile: dict, mark: bool) -> list:
+    """Blocks for the PK-candidates layer, or [] if there are none."""
+    keys = [k for k in (profile.get("key_candidates") or []) if k is not None]
+    if not keys:
+        return []
+    by_name = _columns_by_name(profile)
+
+    pk = ("[[term:pk]]**clave primaria**[[/term]]" if mark
+          else "**clave primaria**")
+    intro = (
+        f"Estas columnas son **candidatas a {pk}**: su "
+        "[[term:cardinalidad]]cardinalidad[[/term]] iguala al número de filas y no "
+        "tienen nulos, así que cada valor identifica una fila distinta. Son "
+        "candidatas, no una clave declarada: la base no las marca como tal."
+        if mark else
+        "Estas columnas son **candidatas a clave primaria**: su cardinalidad "
+        "iguala al número de filas y no tienen nulos, así que cada valor "
+        "identifica una fila distinta.")
+
+    rows = []
+    for name in keys:
+        col = by_name.get(name) or {}
+        rows.append([
+            model._safe_str(name),
+            _fmt_int(col.get("distinct_count")),
+            _fmt_pct_fraction(col.get("unique_pct")),
+            model._safe_str(col.get("inferred_type") or col.get("physical_type") or "—"),
+        ])
+    return [
+        model.Heading(text="Candidatos a clave primaria", level=2),
+        model.Markdown(text=intro),
+        model.DataTable(
+            header=["Columna", "Valores distintos", "% único", "Tipo"],
+            rows=rows, title="Candidatas a clave primaria",
+            note=f"{_fmt_int(profile.get('n_rows'))} filas en total como referencia."),
+    ]
+
+
+# --------------------------------------------------------------------------- #
+# Layer 3a — inter-table FK candidates (containment) + join graph.
+# --------------------------------------------------------------------------- #
+def _list_source_tables(db_path: str) -> list:
+    """List the tables in the DuckDB source, or [] if it can't be listed."""
+    if not db_path or duckdb_list_tables is None:
+        return []
+    try:
+        out = duckdb_list_tables(db_path)
+    except Exception:  # noqa: BLE001
+        return []
+    if not _is_dict(out) or out.get("status") != "ok":
+        return []
+    return [t for t in (out.get("tables") or []) if isinstance(t, str)]
+
+
+def _inter_table_section(db_path: str, tables: list, mark: bool) -> list:
+    """Blocks for the inter-table FK layer (containment + join graph), or []."""
+    if infer_fk_containment_duckdb is None or len(tables) < 2:
+        return []
+    try:
+        fk = infer_fk_containment_duckdb(db_path, tables=tables)
+    except Exception:  # noqa: BLE001
+        return []
+    if not _is_dict(fk) or fk.get("status") != "ok":
+        return []
+    candidates = [c for c in (fk.get("fk_candidates") or []) if _is_dict(c)]
+    if not candidates:
+        return []
+
+    containment = ("[[term:containment]]containment (inclusión de valores)[[/term]]"
+                   if mark else "containment (inclusión de valores)")
+    fk_term = "[[term:fk]]**claves foráneas**[[/term]]" if mark else "**claves foráneas**"
+    blocks = [
+        model.Heading(text="Claves foráneas candidatas (inter-tabla)", level=2),
+        model.Markdown(text=(
+            f"La fuente tiene varias tablas. Estas {fk_term} candidatas se infieren "
+            f"por señal de nombre y por {containment}: una columna de una tabla cuyos "
+            "valores están contenidos en la clave de otra. No están declaradas por "
+            "la base; son la relación más probable según los datos.")),
+    ]
+
+    shown = candidates[:MAX_FK_ROWS]
+    rows = []
+    for c in shown:
+        rows.append([
+            f"{model._safe_str(c.get('from_table'))}.{model._safe_str(c.get('from_col'))}",
+            f"{model._safe_str(c.get('to_table'))}.{model._safe_str(c.get('to_col'))}",
+            _fmt_ratio(c.get("inclusion")),
+            model._safe_str(c.get("cardinality") or "—"),
+            "sí" if c.get("name_match") else "no",
+        ])
+    note = "Ordenadas por señal de nombre e inclusión."
+    if len(candidates) > len(shown):
+        note += f" Se muestran {len(shown)} de {len(candidates)} candidatas."
+    blocks.append(model.DataTable(
+        header=["Origen", "→ Destino", "Inclusión", "Cardinalidad", "Coincide nombre"],
+        rows=rows, title="FK candidatas por containment", note=note))
+
+    # Join graph: node roles + a pasteable Mermaid diagram, kept together.
+    if build_join_graph is not None:
+        try:
+            graph = build_join_graph(candidates, tables=tables)
+        except Exception:  # noqa: BLE001
+            graph = None
+        if _is_dict(graph):
+            graph_blocks = [model.Heading(text="Grafo de relaciones", level=3)]
+            nodes = [n for n in (graph.get("nodes") or []) if _is_dict(n)]
+            if nodes:
+                node_rows = [[
+                    model._safe_str(n.get("table")),
+                    model._safe_str(n.get("role") or "—"),
+                    _fmt_int(n.get("out_degree")),
+                    _fmt_int(n.get("in_degree")),
+                ] for n in nodes]
+                graph_blocks.append(model.DataTable(
+                    header=["Tabla", "Rol", "FK salientes", "FK entrantes"],
+                    rows=node_rows, title="Tablas y su rol en el grafo",
+                    note="Rol: fact (apunta a otras), dimension (referenciada), "
+                         "bridge (ambas), standalone (aislada)."))
+            hubs = [h for h in (graph.get("hubs") or []) if h]
+            if hubs:
+                graph_blocks.append(model.Markdown(text=(
+                    "Tablas con más relaciones salientes (candidatas a tabla de "
+                    "hechos): " + ", ".join(model._safe_str(h) for h in hubs) + ".")))
+            mermaid = model._safe_str(graph.get("mermaid")).strip()
+            if mermaid:
+                graph_blocks.append(model.Markdown(text=(
+                    "Diagrama de las relaciones (pegable en un bloque Mermaid):")))
+                graph_blocks.append(model.Markdown(
+                    text="```mermaid\n" + mermaid + "\n```"))
+            if len(graph_blocks) > 1:
+                blocks.append(model.Group(blocks=graph_blocks,
+                                          title="Grafo de relaciones"))
+
+    skipped = [s for s in (fk.get("skipped") or []) if s]
+    if skipped:
+        blocks.append(model.Note(
+            "Algunos pares se omitieron por tamaño: "
+            + "; ".join(model._safe_str(s) for s in skipped) + "."))
+    return blocks
+
+
+# --------------------------------------------------------------------------- #
+# Layer 3b — intra-table FK candidates (name+cardinality heuristic).
+# --------------------------------------------------------------------------- #
+def _intra_table_section(profile: dict, mark: bool) -> list:
+    """Blocks for the intra-table FK heuristic layer, or [] if no candidates."""
+    if suggest_intratable_fk_candidates is None:
+        return []
+    try:
+        cands = suggest_intratable_fk_candidates(profile)
+    except Exception:  # noqa: BLE001
+        return []
+    cands = [c for c in (cands or []) if _is_dict(c)]
+    if not cands:
+        return []
+
+    fk_term = "[[term:fk]]**claves foráneas**[[/term]]" if mark else "**claves foráneas**"
+    blocks = [
+        model.Heading(text="Posibles claves foráneas (heurística de nombre)", level=2),
+        model.Markdown(text=(
+            f"No hay otras tablas que referenciar, pero algunas columnas **parecen** "
+            f"{fk_term} por su nombre (terminan en «id») y su cardinalidad (muchos "
+            "valores repetidos, N:1). Es una **sugerencia heurística**, no una "
+            "afirmación: el nombre de la tabla destino es una conjetura y no se "
+            "comprueba inclusión de valores contra ninguna tabla real.")),
+    ]
+    rows = []
+    for c in cands:
+        rows.append([
+            model._safe_str(c.get("column")),
+            model._safe_str(c.get("ref_table_guess") or "—"),
+            _fmt_int(c.get("distinct_count")),
+            _fmt_pct_fraction(c.get("unique_pct")),
+            model._safe_str(c.get("inferred_type") or c.get("physical_type") or "—"),
+            model._safe_str(c.get("reason") or ""),
+        ])
+    blocks.append(model.DataTable(
+        header=["Columna", "Posible tabla", "Valores distintos", "% único",
+                "Tipo", "Motivo"],
+        rows=rows, title="Posibles FK por nombre y cardinalidad",
+        note="Heurística: posibles falsos positivos/negativos. No confirma containment."))
+    blocks.append(model.Note(
+        "Estas sugerencias se basan solo en el nombre y la cardinalidad. Para "
+        "confirmarlas haría falta la tabla destino y comprobar la inclusión de "
+        "valores (containment)."))
+    return blocks
+
+
+# --------------------------------------------------------------------------- #
+# Entry point.
+# --------------------------------------------------------------------------- #
+def _intro_blocks(mark: bool) -> list:
+    pk = "[[term:pk]]clave primaria[[/term]]" if mark else "clave primaria"
+    fk = "[[term:fk]]clave foránea[[/term]]" if mark else "clave foránea"
+    text = (
+        f"Este capítulo analiza las **relaciones de clave** de la tabla: qué columna "
+        f"identifica cada fila (la {pk}) y qué columnas referencian a otra tabla (las "
+        f"{fk}). Cuando la base las **declara** como restricciones del esquema, se "
+        "muestran tal cual; cuando no, se proponen las más probables a partir de los "
+        "datos —por inclusión de valores entre tablas (containment) o, en una sola "
+        "tabla, por una heurística de nombre y cardinalidad— siempre marcadas como "
+        "candidatas, nunca como hechos.")
+    return [model.Heading(text=CHAPTER_TITLE, level=1), model.Markdown(text=text)]
+
+
+def build_relaciones(profile: dict, ctx: dict):
+    """Build the RELACIONES Chapter, or None if there is nothing to say.
+
+    Args:
+        profile: the ``eda`` group TableProfile dict (may be None/empty).
+        ctx: presentation context. Consumes ``db_path`` + ``table`` (to read
+            declared constraints, list sibling tables and run the containment FK
+            inference) and ``glossary`` (to register the relational terms).
+
+    Returns:
+        A ``model.Chapter`` with the applicable relation layers; or ``None`` when
+        the dataset has no declared key, no key candidates and no FK candidate
+        (neither inter- nor intra-table).
+    """
+    if not isinstance(profile, dict):
+        profile = {}
+    ctx = ctx if isinstance(ctx, dict) else {}
+    db_path = ctx.get("db_path")
+    table = ctx.get("table")
+
+    mark = _register_terms(ctx)
+
+    # Build each layer; the chapter is the concatenation of the non-empty ones.
+    declared = _declared_keys(db_path, table)
+    declared_blocks = _declared_section(declared) if declared else []
+    declared_has_fk = bool(declared and declared.get("foreign_keys"))
+
+    pk_blocks = _pk_candidates_section(profile, mark)
+
+    tables = _list_source_tables(db_path)
+    inter_blocks = _inter_table_section(db_path, tables, mark)
+
+    # The intra-table heuristic only makes sense when no real FK is available for
+    # this table — neither declared nor inferred inter-table. Otherwise the real
+    # relations already answer the question and the heuristic is just noise.
+    if declared_has_fk or inter_blocks:
+        intra_blocks = []
+    else:
+        intra_blocks = _intra_table_section(profile, mark)
+
+    body = declared_blocks + pk_blocks + inter_blocks + intra_blocks
+    if not body:
+        return None  # chapter does not apply: nothing to say about relations.
+
+    blocks = _intro_blocks(mark) + body
+    return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
+                         version=CHAPTER_VERSION, blocks=blocks)
@@ -0,0 +1,273 @@
+"""Tests for the RELACIONES chapter — DoD: golden(s) + edges + no-cut render.
+
+Two goldens covering the two real paths of the chapter:
+
+- **Intra-table** (a single table, no db source for relations): the chapter shows
+  the primary-key candidates from the profile and the heuristic foreign-key
+  suggestions (name + cardinality), explicitly flagged as a heuristic. Renders to
+  PDF and PPTX with nothing cut.
+- **Inter-table** (a real DuckDB file with two related tables, customers/orders,
+  with a declared FK): the chapter shows the declared keys, the containment-based
+  FK candidates and the join graph (roles + a pasteable Mermaid diagram).
+
+Edges: a profile with no key candidate and no FK-looking column returns None;
+``None`` / ``{}`` profiles do not raise. The chapter registers its glossary terms.
+
+Layers that depend on the sibling registry functions delegated alongside this
+chapter (``detect_declared_keys_duckdb``, ``suggest_intratable_fk_candidates``)
+are asserted **conditionally on the function being importable**, so the chapter's
+honest-degradation contract is what is tested, never a hard dependency on import
+timing.
+"""
+
+import os
+import tempfile
+
+import duckdb
+from pptx import Presentation
+from pypdf import PdfReader
+
+from datascience.automatic_eda.chapters.relaciones import build_relaciones
+from datascience.automatic_eda.model import Chapter, Group, GlossaryCollector
+from datascience.render_automatic_eda_pdf import render_automatic_eda_pdf
+from datascience.render_automatic_eda_pptx import render_automatic_eda_pptx
+
+# The optional sibling functions: their layers are asserted only when present.
+try:
+    from datascience.detect_declared_keys_duckdb import detect_declared_keys_duckdb
+except Exception:  # noqa: BLE001
+    detect_declared_keys_duckdb = None
+try:
+    from datascience.suggest_intratable_fk_candidates import (
+        suggest_intratable_fk_candidates,
+    )
+except Exception:  # noqa: BLE001
+    suggest_intratable_fk_candidates = None
+
+
+# --------------------------------------------------------------------------- #
+# Helpers.
+# --------------------------------------------------------------------------- #
+def _flatten(blocks) -> list:
+    """Flatten Group blocks so a test can inspect every leaf block."""
+    out = []
+    for b in blocks:
+        if isinstance(b, Group):
+            out.extend(_flatten(b.blocks))
+        else:
+            out.append(b)
+    return out
+
+
+def _text_of(chapter: Chapter) -> str:
+    """Collect all visible text of a chapter's blocks into one string."""
+    parts = []
+    for b in _flatten(chapter.blocks):
+        for attr in ("text", "title", "note"):
+            v = getattr(b, attr, None)
+            if isinstance(v, str):
+                parts.append(v)
+        header = getattr(b, "header", None)
+        if isinstance(header, list):
+            parts.extend(str(c) for c in header)
+        rows = getattr(b, "rows", None)
+        if isinstance(rows, list):
+            for r in rows:
+                if isinstance(r, (list, tuple)):
+                    parts.extend(str(c) for c in r)
+                else:
+                    parts.append(str(r))
+    return "\n".join(parts)
+
+
+def _render_both(chapter: Chapter, tag: str):
+    """Render the chapter to PDF and PPTX; return (pdf_text, n_slides)."""
+    tmp = tempfile.mkdtemp(prefix=f"relaciones_{tag}_")
+    pdf_path = os.path.join(tmp, "out.pdf")
+    pptx_path = os.path.join(tmp, "out.pptx")
+    meta = {"title": f"EDA — {tag}"}
+    render_automatic_eda_pdf([chapter], pdf_path, meta)
+    render_automatic_eda_pptx([chapter], pptx_path, meta)
+    assert os.path.exists(pdf_path) and os.path.getsize(pdf_path) > 0
+    assert os.path.exists(pptx_path) and os.path.getsize(pptx_path) > 0
+    text = "".join(p.extract_text() or "" for p in PdfReader(pdf_path).pages)
+    n_slides = len(Presentation(pptx_path).slides)
+    return text, n_slides
+
+
+# --------------------------------------------------------------------------- #
+# Fixtures.
+# --------------------------------------------------------------------------- #
+def _titanic_profile() -> dict:
+    """A single-table profile: a PK candidate + a column that looks like a FK."""
+    return {
+        "table": "titanic",
+        "source": "/data/titanic.csv",
+        "n_rows": 891,
+        "n_cols": 4,
+        "key_candidates": ["PassengerId"],
+        "columns": [
+            {"name": "PassengerId", "inferred_type": "numeric",
+             "physical_type": "BIGINT", "distinct_count": 891,
+             "unique_pct": 1.0, "flags": ["possible_id"]},
+            {"name": "ticket_id", "inferred_type": "numeric",
+             "physical_type": "BIGINT", "distinct_count": 681,
+             "unique_pct": 0.76, "flags": []},
+            {"name": "fare", "inferred_type": "numeric",
+             "physical_type": "DOUBLE", "distinct_count": 248,
+             "unique_pct": 0.28, "flags": []},
+            {"name": "sex", "inferred_type": "categorical",
+             "physical_type": "VARCHAR", "distinct_count": 2,
+             "unique_pct": 0.002, "flags": []},
+        ],
+    }
+
+
+def _make_relational_db(path: str) -> None:
+    """Create a small DuckDB with customers(id) <- orders(customer_id), real FK."""
+    con = duckdb.connect(path)
+    con.execute("CREATE TABLE customers(id INTEGER PRIMARY KEY, name TEXT)")
+    con.execute(
+        "CREATE TABLE orders(id INTEGER PRIMARY KEY, "
+        "customer_id INTEGER REFERENCES customers(id), amount DOUBLE)")
+    con.execute("INSERT INTO customers VALUES "
+                "(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')")
+    con.execute("INSERT INTO orders VALUES "
+                "(1,1,10.0),(2,1,20.0),(3,2,30.0),(4,3,40.0),"
+                "(5,3,50.0),(6,4,60.0),(7,5,70.0),(8,2,80.0)")
+    con.close()
+
+
+def _orders_profile() -> dict:
+    """A profile for the `orders` table of the relational DB."""
+    return {
+        "table": "orders",
+        "source": "orders",
+        "n_rows": 8,
+        "n_cols": 3,
+        "key_candidates": ["id"],
+        "columns": [
+            {"name": "id", "inferred_type": "numeric", "physical_type": "INTEGER",
+             "distinct_count": 8, "unique_pct": 1.0, "flags": ["possible_id"]},
+            {"name": "customer_id", "inferred_type": "numeric",
+             "physical_type": "INTEGER", "distinct_count": 5, "unique_pct": 0.625,
+             "flags": []},
+            {"name": "amount", "inferred_type": "numeric", "physical_type": "DOUBLE",
+             "distinct_count": 8, "unique_pct": 1.0, "flags": []},
+        ],
+    }
+
+
+# --------------------------------------------------------------------------- #
+# Golden 1 — intra-table.
+# --------------------------------------------------------------------------- #
+def test_golden_intra_table_pk_and_fk_heuristic():
+    """Single table: PK candidate shown; FK heuristic shown (if fn available);
+    renders to PDF + PPTX with nothing cut."""
+    prof = _titanic_profile()
+    glossary = GlossaryCollector()
+    # No db_path: only the profile-derived layers apply (no declared, no inter).
+    chapter = build_relaciones(prof, {"glossary": glossary})
+
+    assert isinstance(chapter, Chapter)
+    assert chapter.id == "relaciones"
+    text = _text_of(chapter)
+
+    # PK candidate is always present (comes from the profile).
+    assert "Candidatos a clave primaria" in text
+    assert "PassengerId" in text
+
+    # Glossary terms got registered.
+    for key in ("pk", "fk", "cardinalidad"):
+        assert glossary.has(key)
+
+    # FK heuristic layer: present iff the delegated function is importable.
+    if suggest_intratable_fk_candidates is not None:
+        assert "Posibles claves foráneas" in text
+        assert "ticket_id" in text
+        # The float measure and the PK itself are NOT suggested as FKs.
+        assert "Posibles FK por nombre" in text
+
+    pdf_text, n_slides = _render_both(chapter, "intra")
+    assert "PassengerId" in pdf_text
+    assert n_slides >= 1
+
+
+# --------------------------------------------------------------------------- #
+# Golden 2 — inter-table (real DuckDB).
+# --------------------------------------------------------------------------- #
+def test_golden_inter_table_containment_and_join_graph():
+    """Two related tables: declared FK (if fn available) + containment FK
+    candidate + Mermaid join graph."""
+    tmp = tempfile.mkdtemp(prefix="relaciones_db_")
+    db_path = os.path.join(tmp, "shop.duckdb")
+    _make_relational_db(db_path)
+
+    prof = _orders_profile()
+    glossary = GlossaryCollector()
+    chapter = build_relaciones(
+        prof, {"db_path": db_path, "table": "orders", "glossary": glossary})
+
+    assert isinstance(chapter, Chapter)
+    text = _text_of(chapter)
+
+    # Inter-table containment FK candidate: customer_id -> customers.id. This path
+    # uses infer_fk_containment_duckdb + build_join_graph, both already in the
+    # registry, so it must be present.
+    assert "Claves foráneas candidatas (inter-tabla)" in text
+    assert "orders.customer_id" in text
+    assert "customers.id" in text
+    # Join graph with a pasteable Mermaid diagram.
+    assert "Grafo de relaciones" in text
+    assert "mermaid" in text
+    assert "graph LR" in text
+    assert "containment" in text.lower()
+
+    # Declared-keys layer: present iff the delegated function is importable.
+    if detect_declared_keys_duckdb is not None:
+        assert "Claves declaradas en el esquema" in text
+        assert "Claves foráneas declaradas" in text
+
+    pdf_text, n_slides = _render_both(chapter, "inter")
+    assert "customer_id" in pdf_text
+    assert n_slides >= 1
+
+
+# --------------------------------------------------------------------------- #
+# Edges.
+# --------------------------------------------------------------------------- #
+def test_none_when_no_relations():
+    """No key candidates, no FK-looking columns, no db source -> None."""
+    prof = {
+        "table": "flat", "n_rows": 100, "n_cols": 2, "key_candidates": [],
+        "columns": [
+            {"name": "value", "inferred_type": "numeric", "physical_type": "DOUBLE",
+             "distinct_count": 50, "unique_pct": 0.5, "flags": []},
+            {"name": "label", "inferred_type": "categorical",
+             "physical_type": "VARCHAR", "distinct_count": 3, "unique_pct": 0.03,
+             "flags": []},
+        ],
+    }
+    assert build_relaciones(prof, {}) is None
+
+
+def test_empty_and_none_profile_do_not_raise():
+    """None / {} profile and missing ctx degrade to None without raising."""
+    assert build_relaciones(None, None) is None
+    assert build_relaciones({}, {}) is None
+    assert build_relaciones({}, {"glossary": GlossaryCollector()}) is None
+
+
+def test_pk_candidate_only_builds_chapter():
+    """A profile with only a key candidate (no FK anything, no db) still builds:
+    the relations chapter applies because there is a PK candidate to report."""
+    prof = {
+        "table": "t", "n_rows": 10, "n_cols": 1, "key_candidates": ["row_id"],
+        "columns": [
+            {"name": "row_id", "inferred_type": "numeric", "physical_type": "BIGINT",
+             "distinct_count": 10, "unique_pct": 1.0, "flags": ["possible_id"]},
+        ],
+    }
+    chapter = build_relaciones(prof, {})
+    assert isinstance(chapter, Chapter)
+    assert "Candidatos a clave primaria" in _text_of(chapter)
@@ -26,19 +26,27 @@ from . import model
 # placeholders other agents will fill by creating chapters/<id>.py — they will
 # appear in this exact position automatically once their module exists.
 CHAPTER_ORDER = [
-    "portada",       # cover
+    "portada",       # cover — BUILT LAST, PLACED FIRST (see build_document).
    "overview",      # df.head + columns/types/nulls/examples + describe
    "analisis_llm",  # LLM interpretation — sits next to overview (user request)
    "num_distr",     # numeric distributions
    "cat_distr",     # categorical distributions
    "calidad",       # data quality
    "correlacion",   # correlations / associations
+    "relaciones",    # key relations: declared/candidate PK + FK (inter/intra-table)
    "modelos",       # cheap models (PCA/KMeans/outliers)
    "timeseries",    # time-series analysis
    "geospatial",    # geospatial
    "agregacion",    # aggregations / pivots
+    "glosario",      # glossary — ALWAYS LAST; clickable term destinations.
 ]

+# Chapters whose position is special-cased by build_document: portada is built
+# last (so it can summarize the rest) but placed first; glosario is built and
+# placed last (it reads the terms every other chapter registered).
+_PORTADA = "portada"
+_GLOSARIO = "glosario"
+

 def build_chapter(chapter_id: str, profile: dict, ctx: dict):
    """Build a single chapter by id, or None if absent/not-applicable/error.
@@ -75,15 +83,72 @@ def build_document(profile: dict, ctx: dict = None) -> list:
        list[Chapter] in canonical order, containing only the chapters that are
        implemented and applicable. Never raises.
    """
-    if profile is None:
-        profile = {}
    if not isinstance(profile, dict):
        profile = {}
-    if ctx is None:
-        ctx = {}
-    chapters = []
+    # Copy ctx so the shared collector / summary we add do not leak to the caller.
+    ctx = dict(ctx) if isinstance(ctx, dict) else {}
+
+    # A single glossary collector is shared by every chapter via ctx['glossary'].
+    # Chapters call ctx['glossary'].add(key, label, definition) and mark in-text
+    # appearances with [[term:key]]…[[/term]]; the glosario chapter renders the
+    # registered terms and the renderers wire the clickable links.
+    glossary = ctx.get("glossary")
+    if not isinstance(glossary, model.GlossaryCollector):
+        glossary = model.GlossaryCollector()
+        ctx["glossary"] = glossary
+
+    # 1) Body: every chapter except portada (built last) and glosario (placed
+    # last), in canonical order. This also fills the glossary collector.
+    body = []
    for cid in CHAPTER_ORDER:
+        if cid in (_PORTADA, _GLOSARIO):
+            continue
        ch = build_chapter(cid, profile, ctx)
        if ch is not None and ch.blocks:
-            chapters.append(ch)
+            body.append(ch)
+
+    # 2) Aggregated summary of the rest, for the cover (user decision: the cover
+    # is BUILT after the body so it can reflect what the analysis found).
+    ctx["document_summary"] = _summarize_document(profile, body)
+
+    # 3) Build the cover last, place it FIRST.
+    portada = build_chapter(_PORTADA, profile, ctx)
+    # 4) Build the glossary last (reads the terms the body registered), place LAST.
+    glosario = build_chapter(_GLOSARIO, profile, ctx)
+
+    chapters = []
+    if portada is not None and portada.blocks:
+        chapters.append(portada)
+    chapters.extend(body)
+    if glosario is not None and glosario.blocks:
+        chapters.append(glosario)
    return chapters
+
+
+def _summarize_document(profile: dict, body: list) -> dict:
+    """Aggregate a tiny findings summary of the body for the cover. Never raises.
+
+    Returns a dict with dataset shape, quality, column-type counts and the list
+    of chapters actually included — enough for the cover to show a mini-summary
+    of the analysis without re-deriving anything."""
+    try:
+        cols = profile.get("columns") or []
+        n_num = sum(1 for c in cols if isinstance(c, dict)
+                    and c.get("inferred_type") == "numeric")
+        n_cat = sum(1 for c in cols if isinstance(c, dict)
+                    and isinstance(c.get("categorical"), dict)
+                    and c.get("categorical", {}).get("top")
+                    and c.get("inferred_type") != "numeric")
+        return {
+            "n_chapters": len(body),
+            "chapter_titles": [getattr(c, "title", "") for c in body],
+            "n_rows": profile.get("n_rows"),
+            "n_cols": profile.get("n_cols"),
+            "quality_score": profile.get("quality_score"),
+            "n_numeric": n_num,
+            "n_categorical": n_cat,
+            "duplicate_pct": profile.get("duplicate_pct"),
+            "null_cell_pct": profile.get("null_cell_pct"),
+        }
+    except Exception:  # noqa: BLE001 — the summary is best-effort.
+        return {"n_chapters": len(body) if isinstance(body, list) else 0}
@@ -128,6 +128,39 @@ class Note:
    kind: str = field(default="note", init=False)


+@dataclass
+class Group:
+    """A keep-together unit: its blocks render on the SAME page/slide.
+
+    Renderers measure the whole group first; if it does not fit in the remaining
+    space they move it *whole* to the next page (PDF) or slide (PPTX) before
+    drawing anything — so a heading never gets stranded apart from the figure and
+    text it introduces. If the group is taller than a full page even on its own,
+    it starts on a fresh page and flows (honest degradation, never cut). Use it to
+    bind ``Heading`` + ``Markdown`` + ``Figure`` of one idea together (see the
+    DISTR NUM / AGREGACION chapters).
+    """
+
+    blocks: list = field(default_factory=list)
+    title: Optional[str] = None
+    kind: str = field(default="group", init=False)
+
+
+@dataclass
+class GlossaryEntry:
+    """One glossary term: a clickable destination at the end of the document.
+
+    Rendered as the term ``label`` (heading) plus its ``definition`` (markdown).
+    The renderers register its page/slide position as the link target so every
+    in-text appearance of the same ``key`` becomes a real clickable jump (PDF link
+    annotation via PyMuPDF; PPTX internal slide jump)."""
+
+    key: str = ""
+    label: str = ""
+    definition: str = ""
+    kind: str = field(default="glossary_entry", init=False)
+
+
@dataclass
 class Chapter:
    """An ordered set of blocks with an id, a title and a generation version."""
@@ -150,13 +183,17 @@ _BLOCK_BY_KIND = {
    "image": Image,
    "caption": Caption,
    "note": Note,
+    "group": Group,
+    "glossary_entry": GlossaryEntry,
 }


 def as_block(obj: Any):
    """Coerce a value into a block dataclass. Unknown values become a Note."""
    if isinstance(obj, (Heading, Markdown, KVTable, DataTable, Figure, Image,
-                        Caption, Note)):
+                        Caption, Note, Group, GlossaryEntry)):
+        if isinstance(obj, Group):
+            obj.blocks = as_blocks(obj.blocks)
        return obj
    if isinstance(obj, dict):
        kind = obj.get("kind")
@@ -189,6 +226,13 @@ def as_block(obj: Any):
                return Caption(text=_safe_str(obj.get("text")))
            if cls is Note:
                return Note(text=_safe_str(obj.get("text")))
+            if cls is Group:
+                return Group(blocks=as_blocks(obj.get("blocks")),
+                             title=obj.get("title"))
+            if cls is GlossaryEntry:
+                return GlossaryEntry(key=_safe_str(obj.get("key")),
+                                     label=_safe_str(obj.get("label")),
+                                     definition=_safe_str(obj.get("definition")))
        except Exception:  # noqa: BLE001 — never raise on a malformed block.
            return Note(text=_safe_str(obj))
    return Note(text=_safe_str(obj))
@@ -246,6 +290,67 @@ def _safe_str(v: Any) -> str:
        return ""


+# --------------------------------------------------------------------------- #
+# Glossary collector — chapters register the terms they use; the glosario
+# chapter renders them at the end and the renderers wire the clickable links.
+# --------------------------------------------------------------------------- #
+class GlossaryCollector:
+    """Accumulates glossary terms registered by chapters during document build.
+
+    A single instance is created by :func:`build_document` and passed to every
+    chapter via ``ctx['glossary']``. A chapter calls ``add(key, label,
+    definition)`` to declare a term it explains (e.g. ``"entropia"`` →
+    "Entropía"), and marks each in-text appearance with the inline span
+    ``[[term:key]]texto visible[[/term]]`` (see ``text_layout.parse_inline_rich``).
+    The ``glosario`` chapter reads ``terms()`` to emit one :class:`GlossaryEntry`
+    per term; the renderers turn every marked appearance into a real click that
+    jumps to that entry. First registration of a key wins (idempotent); never
+    raises."""
+
+    def __init__(self):
+        self._terms: dict = {}
+        self._order: list = []
+
+    def add(self, key: Any, label: Any = None, definition: Any = "") -> str:
+        """Register a term and return its normalized key (''. if invalid)."""
+        try:
+            k = _safe_str(key).strip()
+            if not k:
+                return ""
+            if k not in self._terms:
+                self._terms[k] = {
+                    "key": k,
+                    "label": _safe_str(label).strip() or k,
+                    "definition": _safe_str(definition),
+                }
+                self._order.append(k)
+            return k
+        except Exception:  # noqa: BLE001 — collecting a term never breaks a build.
+            return ""
+
+    def has(self, key: Any) -> bool:
+        return _safe_str(key).strip() in self._terms
+
+    def get(self, key: Any) -> Optional[dict]:
+        return self._terms.get(_safe_str(key).strip())
+
+    def terms(self, by: str = "label") -> list:
+        """Return the registered terms as dicts.
+
+        ``by='label'`` (default) sorts alphabetically by visible label;
+        ``by='order'`` keeps first-appearance order."""
+        if by == "order":
+            return [self._terms[k] for k in self._order]
+        return sorted(self._terms.values(),
+                      key=lambda t: _safe_str(t.get("label")).lower())
+
+    def __len__(self) -> int:
+        return len(self._terms)
+
+    def __bool__(self) -> bool:
+        return bool(self._terms)
+
+
 # --------------------------------------------------------------------------- #
 # Manifest — per-chapter versions and page/slide counts for tracking.
 # --------------------------------------------------------------------------- #
@@ -0,0 +1,354 @@
+"""Tests for the AutomaticEDA engine features added in phase 4a.
+
+Covers, with executable evidence, the six render-engine improvements:
+
+1. Bold no longer overlaps the following text in the PDF (real width measured).
+2. Zebra striping on data tables (PDF Rectangle fills + PPTX cell fills).
+3. Keep-together: a Group moves whole to the next page/slide (heading never gets
+   stranded from its figure).
+4. Every PPTX figure carries a visible caption/title (fallback to the heading).
+5. Cover is built last but placed first and reflects an aggregated summary.
+6. Glossary is the last chapter; the term "entropía" is a real clickable link in
+   the PDF (PyMuPDF GOTO annotation) and in the PPTX (native slide-jump run).
+
+Self-contained: synthetic profiles, no DuckDB. Heavy renderer checks (fitz/pptx)
+skip cleanly when the optional engine is missing.
+"""
+
+import os
+import sys
+
+import pytest
+
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_FUNCTIONS = os.path.abspath(os.path.join(_HERE, "..", "..", ".."))  # python/functions
+if _FUNCTIONS not in sys.path:
+    sys.path.insert(0, _FUNCTIONS)
+
+import matplotlib  # noqa: E402
+
+matplotlib.use("Agg")
+import matplotlib.colors as mcolors  # noqa: E402
+import matplotlib.pyplot as plt  # noqa: E402
+from matplotlib.patches import Rectangle  # noqa: E402
+
+from datascience.automatic_eda import model  # noqa: E402
+from datascience.automatic_eda import render_pdf_impl as RP  # noqa: E402
+from datascience.automatic_eda import render_pptx_impl as RX  # noqa: E402
+from datascience.automatic_eda import build_document  # noqa: E402
+from datascience.render_automatic_eda_pdf import render_automatic_eda_pdf  # noqa: E402
+from datascience.render_automatic_eda_pptx import render_automatic_eda_pptx  # noqa: E402
+
+
+class _FakePdf:
+    """Stand-in for PdfPages so the placers can call _new_page in unit tests."""
+
+    def savefig(self, fig):  # noqa: D401
+        pass
+
+
+def _small_fig():
+    fig = plt.figure(figsize=(4.0, 1.5))
+    ax = fig.add_subplot(111)
+    ax.plot([0, 1, 2], [1, 3, 2])
+    return fig
+
+
+def _profile_with_cat_and_num():
+    """A tiny profile that triggers cat_distr (→ entropía term) and num_distr."""
+    return {
+        "table": "ventas", "n_rows": 120, "n_cols": 2, "quality_score": 91,
+        "duplicate_pct": 1.5, "null_cell_pct": 0.8,
+        "columns": [
+            {"name": "region", "inferred_type": "categorical",
+             "categorical": {
+                 "top": [{"value": "norte", "count": 50, "pct": 0.42},
+                         {"value": "sur", "count": 40, "pct": 0.33},
+                         {"value": "este", "count": 30, "pct": 0.25}],
+                 "mode": "norte", "n_distinct": 3, "entropy": 1.55,
+                 "imbalance": 0.1}},
+            {"name": "importe", "inferred_type": "numeric",
+             "numeric": {"mean": 50.0, "median": 48.0, "std": 10.0,
+                         "min": 10, "max": 99, "iqr": 15,
+                         "histogram": [{"lo": 0, "hi": 50, "count": 40},
+                                       {"lo": 50, "hi": 100, "count": 80}]}},
+        ],
+    }
+
+
+# --------------------------------------------------------------------------- #
+# 1) Bold does not overlap the following text (PDF).
+# --------------------------------------------------------------------------- #
+def test_pdf_bold_span_does_not_overlap_following_text():
+    fig = plt.figure(figsize=(RP._W, RP._H))
+    st = RP._PdfState(_FakePdf(), "t")
+    st.fig = fig
+    st.page = 1
+    # A wide bold token immediately followed by normal text on the SAME line.
+    rich = [[("PALABRAMUYANCHAENNEGRITA", True, None),
+             (" texto normal justo después", False, None)]]
+    RP._place_rich_lines(st, rich, RP._FS_BODY, RP._INK)
+
+    renderer = fig.canvas.get_renderer()
+    boxes = sorted((t.get_window_extent(renderer) for t in fig.texts),
+                   key=lambda b: b.x0)
+    assert len(boxes) == 2, "se esperaban dos spans dibujados"
+    # The bold span ends before the normal span starts (no overlap). 1px slack.
+    assert boxes[0].x1 <= boxes[1].x0 + 1.0, \
+        "la negrita se solapa con el texto siguiente"
+    plt.close(fig)
+
+
+# --------------------------------------------------------------------------- #
+# 2) Zebra striping.
+# --------------------------------------------------------------------------- #
+def _facecolor_eq(artist, hexcolor) -> bool:
+    want = mcolors.to_rgba(hexcolor)
+    got = artist.get_facecolor()
+    return all(abs(a - b) < 0.02 for a, b in zip(got[:3], want[:3]))
+
+
+def test_pdf_table_has_zebra_striping():
+    fig = plt.figure(figsize=(RP._W, RP._H))
+    st = RP._PdfState(_FakePdf(), "t")
+    st.fig = fig
+    st.page = 1
+    st.chapter = model.Chapter(id="c", title="C", version="1.0.0")
+    dt = model.DataTable(header=["A", "B"],
+                         rows=[["1", "x"], ["2", "y"], ["3", "z"], ["4", "w"]])
+    RP._place_data_table(st, dt)
+    zebra = [a for a in fig.findobj(Rectangle) if _facecolor_eq(a, RP._ZEBRA)]
+    # 4 data rows → even rows (1-based 2 and 4) shaded = 2 zebra rectangles.
+    assert len(zebra) == 2, f"esperadas 2 filas zebra, hay {len(zebra)}"
+    plt.close(fig)
+
+
+def test_pptx_table_has_zebra_striping(tmp_path):
+    pptx = pytest.importorskip("pptx")
+    from pptx import Presentation
+    from pptx.dml.color import RGBColor
+
+    doc = [model.Chapter(id="c", title="Tabla", version="1.0.0", blocks=[
+        model.DataTable(header=["A", "B"],
+                        rows=[["1", "x"], ["2", "y"], ["3", "z"], ["4", "w"]])])]
+    out = str(tmp_path / "zebra.pptx")
+    assert render_automatic_eda_pptx(doc, out, {"write_manifest": False})["path"]
+
+    prs = Presentation(out)
+    table = None
+    for slide in prs.slides:
+        for sh in slide.shapes:
+            if sh.has_table:
+                table = sh.table
+                break
+    assert table is not None, "no se encontró la tabla en el deck"
+    zebra = RGBColor(0xF6, 0xF8, 0xFA)
+    white = RGBColor(0xFF, 0xFF, 0xFF)
+    # Row 0 = header; data rows follow. Even data rows (table rows 2, 4) shaded.
+    assert table.cell(1, 0).fill.fore_color.rgb == white
+    assert table.cell(2, 0).fill.fore_color.rgb == zebra
+    assert table.cell(4, 0).fill.fore_color.rgb == zebra
+
+
+# --------------------------------------------------------------------------- #
+# 3) Keep-together (Group): heading + figure never split.
+# --------------------------------------------------------------------------- #
+def test_pdf_group_moves_whole_to_next_page_when_it_does_not_fit():
+    fig = plt.figure(figsize=(RP._W, RP._H))
+    st = RP._PdfState(_FakePdf(), "t")
+    st.fig = fig
+    st.page = 1
+    st.chapter = model.Chapter(id="c", title="C", version="1.0.0")
+    grp = model.Group(blocks=[
+        model.Heading(text="Sección con figura", level=2),
+        model.Figure(make=_small_fig, caption="cap"),
+        model.Markdown(text="Descripción breve de la figura."),
+    ])
+    # Only ~0.4in left: the group does not fit here but fits on a fresh page.
+    st.y = RP._CONTENT_BOTTOM - 0.4
+    page_before = st.page
+    RP._place_group(st, grp)
+    # Exactly one page break: the whole group (heading+figure+text) stays
+    # together on the new page — no second break inside it.
+    assert st.page == page_before + 1
+    plt.close(st.fig)
+
+
+def test_pdf_group_does_not_break_when_it_fits():
+    fig = plt.figure(figsize=(RP._W, RP._H))
+    st = RP._PdfState(_FakePdf(), "t")
+    st.fig = fig
+    st.page = 1
+    st.chapter = model.Chapter(id="c", title="C", version="1.0.0")
+    grp = model.Group(blocks=[
+        model.Heading(text="Cabe entera", level=2),
+        model.Figure(make=_small_fig, caption="cap"),
+    ])
+    st.y = RP._CONTENT_TOP  # empty page → fits, must not break.
+    page_before = st.page
+    RP._place_group(st, grp)
+    assert st.page == page_before
+    plt.close(st.fig)
+
+
+def test_pptx_group_moves_whole_to_next_slide(tmp_path):
+    pytest.importorskip("pptx")
+    from pptx import Presentation
+    from pptx.util import Inches
+
+    prs = Presentation()
+    prs.slide_width = Inches(RX._W)
+    prs.slide_height = Inches(RX._H)
+    st = RX._PptxState(prs, "t")
+    st.chapter = model.Chapter(id="c", title="C", version="1.0.0")
+    RX._new_slide(st, cont=False)
+    grp = model.Group(blocks=[
+        model.Heading(text="Sección con figura", level=2),
+        model.Figure(make=_small_fig, caption="cap"),
+        model.Markdown(text="Descripción breve."),
+    ])
+    st.y = RX._CONTENT_BOTTOM - 0.4  # does not fit here.
+    slide_before = st.slide_no
+    RX._place_group(st, grp)
+    assert st.slide_no == slide_before + 1  # one jump; group kept together.
+
+
+# --------------------------------------------------------------------------- #
+# 4) Every PPTX figure carries a visible caption/title.
+# --------------------------------------------------------------------------- #
+def test_pptx_figure_without_caption_gets_heading_title(tmp_path):
+    pytest.importorskip("pptx")
+    from pptx import Presentation
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+
+    doc = [model.Chapter(id="c", title="Cap", version="1.0.0", blocks=[
+        model.Heading(text="Mi sección gráfica", level=2),
+        model.Figure(make=_small_fig),  # NO caption provided.
+    ])]
+    out = str(tmp_path / "cap.pptx")
+    assert render_automatic_eda_pptx(doc, out, {"write_manifest": False})["path"]
+
+    prs = Presentation(out)
+    for slide in prs.slides:
+        has_pic = any(sh.shape_type == MSO_SHAPE_TYPE.PICTURE
+                      for sh in slide.shapes)
+        if not has_pic:
+            continue
+        italic = [r.text for sh in slide.shapes if sh.has_text_frame
+                  for p in sh.text_frame.paragraphs for r in p.runs
+                  if r.font.italic and r.text.strip()]
+        assert italic, "la figura no lleva caption visible en su slide"
+        assert any("Mi sección gráfica" in t for t in italic), \
+            "el caption no cayó al título de la sección"
+        return
+    pytest.fail("no se encontró ningún slide con imagen")
+
+
+def test_pptx_no_figure_slide_is_ever_untitled(tmp_path):
+    """Invariant: across many figures (incl. tall ones), NO slide with an image
+    lacks a visible caption — the caption never spills to the next slide."""
+    pytest.importorskip("pptx")
+    from pptx import Presentation
+    from pptx.enum.shapes import MSO_SHAPE_TYPE
+
+    def _tall_fig():
+        fig = plt.figure(figsize=(5.0, 4.6))  # nearly square → fills the slide.
+        fig.add_subplot(111).bar([1, 2, 3], [4, 5, 6])
+        return fig
+
+    blocks = []
+    for i in range(6):
+        blocks.append(model.Heading(text=f"Gráfico {i}", level=2))
+        blocks.append(model.Figure(
+            make=_tall_fig,
+            caption=("Una descripción de la figura deliberadamente larga para "
+                     "que el caption ocupe más de una línea al envolverse en el "
+                     f"ancho del slide — figura número {i} del bloque.")))
+    doc = [model.Chapter(id="c", title="Muchas figuras", version="1.0.0",
+                         blocks=blocks)]
+    out = str(tmp_path / "many.pptx")
+    assert render_automatic_eda_pptx(doc, out, {"write_manifest": False})["path"]
+
+    prs = Presentation(out)
+    missing = []
+    pics = 0
+    for i, slide in enumerate(prs.slides):
+        if not any(sh.shape_type == MSO_SHAPE_TYPE.PICTURE
+                   for sh in slide.shapes):
+            continue
+        pics += 1
+        italic = [r.text for sh in slide.shapes if sh.has_text_frame
+                  for p in sh.text_frame.paragraphs for r in p.runs
+                  if r.font.italic and r.text.strip()]
+        if not italic:
+            missing.append(i)
+    assert pics >= 6, f"esperadas >=6 figuras, hay {pics}"
+    assert not missing, f"slides con imagen sin caption: {missing}"
+
+
+# --------------------------------------------------------------------------- #
+# 5) Cover built last, placed first, with an aggregated summary.
+# --------------------------------------------------------------------------- #
+def test_cover_first_glossary_last_with_summary():
+    chs = build_document(_profile_with_cat_and_num(), ctx={"dataset_name": "v"})
+    ids = [c.id for c in chs]
+    assert ids[0] == "portada", f"la portada no es la primera: {ids}"
+    assert ids[-1] == "glosario", f"el glosario no es el último: {ids}"
+    cover = chs[0]
+    headings = [b.text for b in cover.blocks if b.kind == "heading"]
+    assert any("Resumen" in h for h in headings), \
+        "la portada no incluye el resumen agregado"
+    # The summary reflects the body chapters (e.g. the numeric/categorical ones).
+    cover_text = " ".join(
+        b.text for b in cover.blocks if getattr(b, "kind", "") == "markdown")
+    assert "Distribuciones" in cover_text, \
+        "el resumen de portada no menciona los capítulos del cuerpo"
+
+
+# --------------------------------------------------------------------------- #
+# 6) Glossary clickable in PDF (PyMuPDF GOTO) and PPTX (native slide jump).
+# --------------------------------------------------------------------------- #
+def test_pdf_glossary_term_is_clickable(tmp_path):
+    fitz = pytest.importorskip("fitz")
+    out = str(tmp_path / "glos.pdf")
+    res = render_automatic_eda_pdf(_profile_with_cat_and_num(), out,
+                                   {"ctx": {"dataset_name": "v"},
+                                    "write_manifest": False})
+    assert res["path"] == out and os.path.exists(out)
+
+    doc = fitz.open(out)
+    goto = [(pno, l) for pno in range(doc.page_count)
+            for l in doc[pno].get_links() if l.get("kind") == fitz.LINK_GOTO]
+    doc.close()
+    assert goto, "no hay ningún enlace interno (entropía → glosario) en el PDF"
+    # Destination must be a real page in the document (the glossary page).
+    assert all(0 <= l.get("page", -1) for _p, l in goto)
+
+
+def test_pptx_glossary_term_is_clickable(tmp_path):
+    pytest.importorskip("pptx")
+    from pptx import Presentation
+    from pptx.oxml.ns import qn
+
+    out = str(tmp_path / "glos.pptx")
+    res = render_automatic_eda_pptx(_profile_with_cat_and_num(), out,
+                                    {"ctx": {"dataset_name": "v"},
+                                     "write_manifest": False})
+    assert res["path"] == out and os.path.exists(out)
+
+    prs = Presentation(out)
+    found = False
+    for slide in prs.slides:
+        for sh in slide.shapes:
+            if not sh.has_text_frame:
+                continue
+            for p in sh.text_frame.paragraphs:
+                for r in p.runs:
+                    rpr = r._r.find(qn("a:rPr"))
+                    if rpr is None:
+                        continue
+                    hl = rpr.find(qn("a:hlinkClick"))
+                    if hl is not None and \
+                            hl.get("action") == "ppaction://hlinksldjump":
+                        found = True
+    assert found, "ningún término tiene hyperlink de salto a slide en el PPTX"
@@ -60,6 +60,8 @@ _FS_BODY, _FS_CELL, _FS_NOTE = 10.5, 9.0, 9.0
 _GAP = 0.12          # vertical gap after a block, inches.
 _CELL_PAD = 0.06     # horizontal padding inside a table cell, inches.
 _ROW_VPAD = 0.05     # vertical padding inside a table row, inches.
+_ZEBRA = "#f6f8fa"   # very light grey for zebra-striped (even) table rows.
+_LINK = "#2a6f97"    # accent colour for clickable glossary terms.


 class _PdfState:
@@ -73,6 +75,11 @@ class _PdfState:
        self.page = 0                # global page counter.
        self.chapter = None          # current Chapter (for the footer).
        self.chapter_pages = 0       # pages produced for the current chapter.
+        self.last_heading = ""       # text of the most recent heading.
+        # Glossary wiring (mejora 6). Pages are 0-based; rects/points are in PDF
+        # points (1/72") with a top-left origin — same convention as PyMuPDF.
+        self.term_sources = []       # [{key, page, rect:[x0,y0,x1,y1]}]
+        self.term_dests = {}         # key -> {page, point:[x,y]}


 # --------------------------------------------------------------------------- #
@@ -121,6 +128,35 @@ def _draw_footer(st: _PdfState) -> None:
        transform=st.fig.transFigure, color=_RULE, lw=0.6))


+def _text_width_in(st: _PdfState, s: str, fs: float, bold: bool) -> float:
+    """Real rendered width (inches) of ``s`` at ``fs`` with the given weight.
+
+    Measured with the Agg renderer's own font metrics (the same TrueType the PDF
+    backend embeds), so a **bold** span advances the cursor by its ACTUAL width —
+    fixing the bug where bold text overlapped the following normal text because
+    the cursor advanced by the normal-weight average-glyph estimate. Falls back to
+    the deterministic character grid if the renderer is unavailable, so it never
+    raises.
+    """
+    if not s:
+        return 0.0
+    try:
+        from matplotlib.font_manager import FontProperties
+        renderer = st.fig.canvas.get_renderer()
+        prop = FontProperties(family="sans-serif", size=fs,
+                              weight="bold" if bold else "normal")
+        w_px, _h, _d = renderer.get_text_width_height_descent(s, prop, False)
+        return w_px / float(st.fig.dpi)
+    except Exception:  # noqa: BLE001 — fall back to the conservative grid metric.
+        return tl.avg_char_width_in(fs) * len(s)
+
+
+def _pt_rect(x0_in: float, y_top_in: float, x1_in: float,
+             y_bottom_in: float) -> list:
+    """An inches box (top-left origin) → a PDF-points rect for PyMuPDF links."""
+    return [x0_in * 72.0, y_top_in * 72.0, x1_in * 72.0, y_bottom_in * 72.0]
+
+
 def _remaining(st: _PdfState) -> float:
    return _CONTENT_BOTTOM - st.y

@@ -138,6 +174,7 @@ def _place_heading(st: _PdfState, block) -> None:
    level = max(1, min(3, int(getattr(block, "level", 1) or 1)))
    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
    text = tl.strip_inline_md(getattr(block, "text", ""))
+    st.last_heading = text or st.last_heading
    max_chars = tl.chars_per_line(_USABLE_W, fs)
    lines = tl.wrap(text, max_chars)
    lh = tl.line_height_in(fs, leading=1.2)
@@ -171,17 +208,19 @@ def _place_text_lines(st: _PdfState, lines: list, fs: float, color: str,

 def _place_rich_lines(st: _PdfState, rich_lines: list, fs: float, color: str,
                      indent: float = 0.0, prefixes=None) -> None:
-    """Draw pre-wrapped lines of styled segments (bold spans rendered bold).
+    """Draw pre-wrapped lines of styled segments (bold + clickable term spans).

-    Each line is ``[(text, is_bold), ...]``. Segments are placed left-to-right,
-    advancing x by the deterministic character grid (same metric the wrapper
-    used), so a bold span is rendered with ``fontweight='bold'`` without
-    changing the line's measured width — the no-cut guarantee is preserved.
+    Each line is a list of ``(text, is_bold)`` or ``(text, is_bold, term_key)``
+    segments. Segments are placed left-to-right, advancing x by the segment's
+    REAL rendered width (measured with the renderer's font metrics for the actual
+    weight) — this is what stops a bold span from overlapping the following text:
+    the cursor no longer advances by the normal-weight estimate. A segment with a
+    ``term_key`` is drawn in the accent colour and its rectangle is recorded in
+    ``st.term_sources`` so it becomes a clickable jump to the glossary entry.
    ``prefixes`` is an optional ``(first_line, other_lines)`` pair (e.g. a
    bullet) drawn before the segments.
    """
    lh = tl.line_height_in(fs)
-    cw = tl.avg_char_width_in(fs)
    for idx, segs in enumerate(rich_lines):
        _ensure_space(st, lh)
        x = _ML + indent
@@ -190,14 +229,23 @@ def _place_rich_lines(st: _PdfState, rich_lines: list, fs: float, color: str,
            if prefix:
                st.fig.text(_xf(x), _yf(st.y), prefix, fontsize=fs, color=color,
                            ha="left", va="top")
-                x += cw * len(prefix)
-        for seg_text, is_bold in segs:
+                x += _text_width_in(st, prefix, fs, False)
+        for seg in segs:
+            if len(seg) == 3:
+                seg_text, is_bold, term = seg
+            else:
+                seg_text, is_bold, term = seg[0], seg[1], None
            if seg_text == "":
                continue
-            st.fig.text(_xf(x), _yf(st.y), seg_text, fontsize=fs, color=color,
-                        ha="left", va="top",
+            w = _text_width_in(st, seg_text, fs, bool(is_bold))
+            st.fig.text(_xf(x), _yf(st.y), seg_text, fontsize=fs,
+                        color=(_LINK if term else color), ha="left", va="top",
                        fontweight="bold" if is_bold else "normal")
-            x += cw * len(seg_text)
+            if term:
+                st.term_sources.append({
+                    "key": term, "page": st.page - 1,
+                    "rect": _pt_rect(x, st.y, x + w, st.y + lh)})
+            x += w
        st.y += lh


@@ -242,7 +290,7 @@ def _place_markdown(st: _PdfState, block) -> None:
        if stripped.startswith("- ") or stripped.startswith("* "):
            content = stripped[2:]  # keep inline markers for bold rendering.
            bullet_chars = tl.chars_per_line(_USABLE_W - 0.22, _FS_BODY)
-            rich = tl.wrap_rich(content, bullet_chars)
+            rich = tl.wrap_rich_terms(content, bullet_chars)
            _place_rich_lines(st, rich, _FS_BODY, _INK,
                              prefixes=("•  ", "   "))
            i += 1
@@ -258,7 +306,8 @@ def _place_markdown(st: _PdfState, block) -> None:
            j += 1
        text = " ".join(para)
        max_chars = tl.chars_per_line(_USABLE_W, _FS_BODY)
-        _place_rich_lines(st, tl.wrap_rich(text, max_chars), _FS_BODY, _INK)
+        _place_rich_lines(st, tl.wrap_rich_terms(text, max_chars), _FS_BODY,
+                          _INK)
        i = j
    st.y += _GAP

@@ -325,15 +374,18 @@ def _wrap_row(cells: list, widths: list, fs: float) -> list:


 def _draw_table_row(st: _PdfState, cells_lines: list, widths: list, fs: float,
-                    y0: float, header: bool) -> float:
+                    y0: float, header: bool, zebra: bool = False) -> float:
    lh = tl.line_height_in(fs)
    nlines = max((len(c) for c in cells_lines), default=1)
    row_h = lh * nlines + _ROW_VPAD * 2
-    if header:
+    # Background: header band, or a faint zebra fill for even data rows. Drawn
+    # below the text/rule (zorder 0) so striping never hides cell content.
+    bg = _HEAD_BG if header else (_ZEBRA if zebra else None)
+    if bg is not None:
        st.fig.add_artist(Rectangle(
            (_xf(_ML), _yf(y0 + row_h)), _xf(_ML + _USABLE_W) - _xf(_ML),
            _yf(y0) - _yf(y0 + row_h), transform=st.fig.transFigure,
-            color=_HEAD_BG, lw=0, zorder=0))
+            color=bg, lw=0, zorder=0))
    x = _ML
    for c, lines in enumerate(cells_lines):
        for k, ln in enumerate(lines):
@@ -378,14 +430,18 @@ def _place_data_table(st: _PdfState, block) -> None:
            + _ROW_VPAD * 2
    _ensure_space(st, header_h() + max(first_row_h, lh))
    draw_header()
-    for r in rows:
+    # ``data_idx`` is the LOGICAL row index (not reset across page breaks) so the
+    # zebra pattern stays coherent when a long table splits and repeats the
+    # header: even rows (1-based) are shaded → 0-based odd indices.
+    for data_idx, r in enumerate(rows):
        cells_lines = _wrap_row(r, widths, fs)
        row_h = lh * max((len(c) for c in cells_lines), default=1) \
            + _ROW_VPAD * 2
        if _remaining(st) < row_h:
            _new_page(st)
            draw_header()  # repeat header on the continuation page.
-        st.y += _draw_table_row(st, cells_lines, widths, fs, st.y, header=False)
+        st.y += _draw_table_row(st, cells_lines, widths, fs, st.y,
+                                header=False, zebra=(data_idx % 2 == 1))
    note = getattr(block, "note", None)
    if note:
        _place_text_lines(st, tl.wrap(model._safe_str(note),
@@ -414,53 +470,98 @@ def _png_from_figure(fig) -> bytes:
    return buf.read()


-def _place_image_array(st: _PdfState, arr, caption) -> None:
+def _figure_png_cached(block):
+    """Rasterize a Figure to PNG bytes ONCE and cache (bytes, aspect).
+
+    Measuring (keep-together) and drawing must agree on the REAL aspect ratio:
+    ``bbox_inches='tight'`` changes it vs ``figsize``, so we rasterize once and
+    reuse the bytes for both. Cached on the block; never raises."""
+    cached = getattr(block, "_aeda_png", None)
+    if cached is not None:
+        return cached
+    fig, owned = _resolve_figure(block)
+    data = None
+    if fig is not None:
+        try:
+            data = _png_from_figure(fig)
+        finally:
+            if owned:
+                try:
+                    plt.close(fig)
+                except Exception:  # noqa: BLE001
+                    pass
+    aspect = 0.66
+    if data is not None:
+        try:
+            arr = mpimg.imread(io.BytesIO(data))
+            aspect = (arr.shape[0] / arr.shape[1]) if arr.shape[1] else 0.66
+        except Exception:  # noqa: BLE001
+            aspect = 0.66
+    try:
+        block._aeda_png = (data, aspect)
+        return block._aeda_png
+    except Exception:  # noqa: BLE001 — block may reject attributes; degrade.
+        return (data, aspect)
+
+
+def _image_aspect(block) -> float:
+    """Real aspect (h/w) of an Image block by path, for measurement."""
+    path = getattr(block, "path", "")
+    if path and os.path.exists(path):
+        try:
+            arr = mpimg.imread(path)
+            return (arr.shape[0] / arr.shape[1]) if arr.shape[1] else 0.66
+        except Exception:  # noqa: BLE001
+            pass
+    return 0.66
+
+
+def _place_image_array(st: _PdfState, arr, caption, max_h_in=None) -> None:
    h_px, w_px = arr.shape[0], arr.shape[1]
    aspect = (h_px / w_px) if w_px else 1.0
+    # Reserve the caption's REAL (possibly multi-line) height FIRST, then scale
+    # the image to (max_h - cap_reserve) so figure + caption always fit the same
+    # page. cap_reserve adds a cushion so the caption never spills to next page.
+    cap_lines = (tl.wrap(model._safe_str(caption),
+                         tl.chars_per_line(_USABLE_W, _FS_NOTE))
+                 if caption else [])
+    cap_real = tl.line_height_in(_FS_NOTE) * len(cap_lines) if caption else 0.0
+    cap_reserve = (cap_real + 0.04 + 0.08) if caption else 0.0
    max_h = _CONTENT_BOTTOM - _CONTENT_TOP
+    # height_in hint (model.Figure/Image): cap the height so a figure in a
+    # keep-together Group shrinks to leave room for its heading and text.
+    if isinstance(max_h_in, (int, float)) and max_h_in > 0:
+        max_h = min(max_h, float(max_h_in))
+    max_img_h = max(max_h - cap_reserve, 0.6)
    target_w = _USABLE_W
    target_h = target_w * aspect
-    if target_h > max_h:
-        target_h = max_h
+    if target_h > max_img_h:
+        target_h = max_img_h
        target_w = target_h / aspect if aspect else _USABLE_W
-    cap_h = tl.line_height_in(_FS_NOTE) + 0.04 if caption else 0.0
    # Move whole image to next page if it does not fit in remaining space.
-    if _remaining(st) < target_h + cap_h:
-        if (max_h) >= target_h + cap_h:
-            _new_page(st)
-        else:
-            # Taller than a full page even at min — already clamped to max_h.
-            _new_page(st)
+    if _remaining(st) < target_h + cap_reserve:
+        _new_page(st)
    left_frac = _xf(_ML + (_USABLE_W - target_w) / 2.0)
    bottom_frac = _yf(st.y + target_h)
    ax = st.fig.add_axes([left_frac, bottom_frac, target_w / _W, target_h / _H])
    ax.imshow(arr)
    ax.axis("off")
    st.y += target_h + 0.04
-    if caption:
-        _place_text_lines(st, tl.wrap(model._safe_str(caption),
-                          tl.chars_per_line(_USABLE_W, _FS_NOTE)),
-                          _FS_NOTE, _MUTED, style="italic")
+    if cap_lines:
+        _place_text_lines(st, cap_lines, _FS_NOTE, _MUTED, style="italic")
    st.y += _GAP


 def _place_figure(st: _PdfState, block) -> None:
-    fig, owned = _resolve_figure(block)
-    if fig is None:
+    png, _aspect = _figure_png_cached(block)
+    if png is None:
        _place_text_lines(st, ["(figura no disponible)"], _FS_NOTE, _MUTED,
                          style="italic")
        st.y += _GAP
        return
-    try:
-        png = _png_from_figure(fig)
-    finally:
-        if owned:
-            try:
-                plt.close(fig)
-            except Exception:  # noqa: BLE001
-                pass
    arr = mpimg.imread(io.BytesIO(png))
-    _place_image_array(st, arr, getattr(block, "caption", None))
+    _place_image_array(st, arr, getattr(block, "caption", None),
+                       max_h_in=getattr(block, "height_in", None))


 def _place_image(st: _PdfState, block) -> None:
@@ -471,7 +572,8 @@ def _place_image(st: _PdfState, block) -> None:
        st.y += _GAP
        return
    arr = mpimg.imread(path)
-    _place_image_array(st, arr, getattr(block, "caption", None))
+    _place_image_array(st, arr, getattr(block, "caption", None),
+                       max_h_in=getattr(block, "height_in", None))


 def _place_caption(st: _PdfState, block) -> None:
@@ -488,6 +590,189 @@ def _place_note(st: _PdfState, block) -> None:
    st.y += _GAP


+# --------------------------------------------------------------------------- #
+# Block measurement (mejora 3 — keep-together). These estimate a block's height
+# WITHOUT drawing it, so a Group can decide to move whole to the next page before
+# anything is drawn. Over-estimating is safe: it only triggers an earlier page
+# break, never a content cut (the placers keep their own no-cut pagination).
+# --------------------------------------------------------------------------- #
+def _measure_heading_text(text: str, level: int) -> float:
+    level = max(1, min(3, int(level or 1)))
+    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
+    lines = tl.wrap(tl.strip_inline_md(text), tl.chars_per_line(_USABLE_W, fs))
+    h = tl.line_height_in(fs, leading=1.2) * len(lines) + 0.06
+    if level == 1:
+        h += 0.10
+    return h + _GAP
+
+
+def _measure_markdown(block) -> float:
+    raw = str(getattr(block, "text", "") or "")
+    md_lines = raw.split("\n")
+    h = 0.0
+    i, n = 0, len(md_lines)
+    while i < n:
+        stripped = md_lines[i].strip()
+        if stripped.startswith("|") and stripped.endswith("|"):
+            j = i
+            while j < n and md_lines[j].strip().startswith("|") \
+                    and md_lines[j].strip().endswith("|"):
+                j += 1
+            h += (tl.line_height_in(_FS_CELL) + _ROW_VPAD * 2) * (j - i) + _GAP
+            i = j
+            continue
+        if stripped == "":
+            h += tl.line_height_in(_FS_BODY) * 0.5
+            i += 1
+            continue
+        if stripped.startswith("### "):
+            h += _measure_heading_text(stripped[4:], 3)
+            i += 1
+            continue
+        if stripped.startswith("## "):
+            h += _measure_heading_text(stripped[3:], 2)
+            i += 1
+            continue
+        if stripped.startswith("# "):
+            h += _measure_heading_text(stripped[2:], 1)
+            i += 1
+            continue
+        if stripped.startswith("- ") or stripped.startswith("* "):
+            lines = tl.wrap_rich_terms(
+                stripped[2:], tl.chars_per_line(_USABLE_W - 0.22, _FS_BODY))
+            h += tl.line_height_in(_FS_BODY) * len(lines)
+            i += 1
+            continue
+        para = [stripped]
+        j = i + 1
+        while j < n:
+            nxt = md_lines[j].strip()
+            if nxt == "" or nxt.startswith(("|", "#", "- ", "* ")):
+                break
+            para.append(nxt)
+            j += 1
+        lines = tl.wrap_rich_terms(" ".join(para),
+                                   tl.chars_per_line(_USABLE_W, _FS_BODY))
+        h += tl.line_height_in(_FS_BODY) * len(lines)
+        i = j
+    return h + _GAP
+
+
+def _measure_figure_like(block) -> float:
+    max_h = _CONTENT_BOTTOM - _CONTENT_TOP
+    hint = getattr(block, "height_in", None)
+    if isinstance(hint, (int, float)) and hint > 0:
+        target_h = min(float(hint), max_h)
+    else:
+        # Real rasterized aspect (cached) so measuring matches drawing.
+        if getattr(block, "kind", "") == "image":
+            aspect = _image_aspect(block)
+        else:
+            _data, aspect = _figure_png_cached(block)
+        target_h = min(_USABLE_W * aspect, max_h)
+    cap = getattr(block, "caption", None)
+    cap_h = tl.line_height_in(_FS_NOTE) + 0.04 if cap else 0.0
+    return target_h + 0.04 + cap_h + _GAP
+
+
+def _measure_block(st: _PdfState, block) -> float:
+    kind = getattr(block, "kind", "")
+    try:
+        if kind == "heading":
+            return _measure_heading_text(getattr(block, "text", ""),
+                                         getattr(block, "level", 1))
+        if kind == "markdown":
+            return _measure_markdown(block)
+        if kind in ("figure", "image"):
+            return _measure_figure_like(block)
+        if kind in ("caption", "note"):
+            lines = tl.wrap(getattr(block, "text", ""),
+                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
+            return tl.line_height_in(_FS_NOTE) * len(lines) + _GAP
+        if kind == "kv_table":
+            rows = getattr(block, "rows", []) or []
+            return (tl.line_height_in(_FS_BODY) + _ROW_VPAD) * (len(rows) + 1) \
+                + _GAP
+        if kind == "data_table":
+            rows = getattr(block, "rows", []) or []
+            return (tl.line_height_in(_FS_CELL) + _ROW_VPAD * 2) \
+                * (len(rows) + 1) + _GAP
+        if kind == "group":
+            return sum(_measure_block(st, b)
+                       for b in (getattr(block, "blocks", []) or []))
+    except Exception:  # noqa: BLE001 — a measurement never aborts rendering.
+        pass
+    return tl.line_height_in(_FS_BODY)
+
+
+def _shrink_group_figures(st: _PdfState, blocks: list, avail_full: float) -> None:
+    """Cap each figure's height (via height_in) so the whole group fits a page.
+
+    The figure shrinks just enough to leave room for its heading, text and
+    caption — keep-together puts the chart on the SAME page as its title and
+    description instead of pushing it to the next page."""
+    fig_blocks = [b for b in blocks
+                  if getattr(b, "kind", "") in ("figure", "image")]
+    if not fig_blocks:
+        return
+    nonfig_h = sum(_measure_block(st, b) for b in blocks
+                   if getattr(b, "kind", "") not in ("figure", "image"))
+    fig_overhead = tl.line_height_in(_FS_NOTE) + 0.04 + 0.04 + _GAP
+    budget = avail_full - nonfig_h - 0.08 * len(fig_blocks)
+    if budget <= 0.8:
+        return
+    per = budget / len(fig_blocks) - fig_overhead
+    if per <= 0.6:
+        return
+    for fb in fig_blocks:
+        cur = getattr(fb, "height_in", None)
+        fb.height_in = (min(float(cur), per)
+                        if isinstance(cur, (int, float)) and cur > 0 else per)
+
+
+def _place_group(st: _PdfState, block) -> None:
+    """Render a keep-together Group: move it whole to the next page if needed."""
+    blocks = getattr(block, "blocks", []) or []
+    if not blocks:
+        return
+    avail_full = _CONTENT_BOTTOM - _CONTENT_TOP
+    _shrink_group_figures(st, blocks, avail_full)
+    total = sum(_measure_block(st, b) for b in blocks)
+    if total <= avail_full:
+        # Fits on one page: keep it together by moving whole when it won't fit.
+        if total > _remaining(st):
+            _new_page(st)
+    elif st.y > _CONTENT_TOP + 1e-6:
+        # Taller than a full page: at least start it on a fresh page, then flow.
+        _new_page(st)
+    for b in blocks:
+        placer = _PLACERS.get(getattr(b, "kind", ""), _place_note)
+        try:
+            placer(st, b)
+        except Exception:  # noqa: BLE001 — a bad block never aborts the group.
+            pass
+
+
+def _place_glossary_entry(st: _PdfState, block) -> None:
+    """Render one glossary term and register it as a clickable link target."""
+    key = getattr(block, "key", "")
+    label = getattr(block, "label", "") or key
+    definition = getattr(block, "definition", "")
+    # Reserve the term + its first definition line together, then anchor the
+    # destination at the resolved page/position before drawing.
+    _ensure_space(st, tl.line_height_in(_FS_H3, leading=1.2)
+                  + tl.line_height_in(_FS_BODY) * 2)
+    if key:
+        st.term_dests[key] = {"page": st.page - 1,
+                              "point": [_ML * 72.0, st.y * 72.0]}
+    _place_heading(st, model.Heading(text=str(label), level=3))
+    if definition:
+        _place_text_lines(st, tl.wrap(model._safe_str(definition),
+                          tl.chars_per_line(_USABLE_W, _FS_BODY)),
+                          _FS_BODY, _INK)
+    st.y += _GAP * 0.5
+
+
 _PLACERS = {
    "heading": _place_heading,
    "markdown": _place_markdown,
@@ -497,6 +782,8 @@ _PLACERS = {
    "image": _place_image,
    "caption": _place_caption,
    "note": _place_note,
+    "group": _place_group,
+    "glossary_entry": _place_glossary_entry,
 }


@@ -553,8 +840,42 @@ def render_pdf(chapters: list, out_path: str, meta: dict = None) -> dict:
        return {"path": None, "n_pages": 0, "chapters": [],
                "note": f"fallo al escribir el PDF: {e}"}

+    # Mejora 6 — wire clickable glossary links now the PDF is closed on disk.
+    # PdfPages cannot emit internal hyperlinks, so we post-process with PyMuPDF
+    # (delegated registry function). Degrades silently if it is unavailable.
+    n_links = _wire_glossary_links(st, out_path, notes)
+
    note = f"{n_pages} páginas"
+    if n_links:
+        note += f" · {n_links} enlaces de glosario"
    if notes:
        note += " · " + "; ".join(notes)
    return {"path": out_path, "n_pages": n_pages, "chapters": chapters_meta,
            "note": note}
+
+
+def _wire_glossary_links(st: _PdfState, out_path: str, notes: list) -> int:
+    """Build {source rect → glossary dest} links and apply them via PyMuPDF.
+
+    Returns the number of links applied (0 if there is nothing to wire or the
+    post-processor is unavailable). Never raises."""
+    try:
+        links = []
+        for src in st.term_sources:
+            dest = st.term_dests.get(src.get("key"))
+            if not dest:
+                continue
+            links.append({
+                "src_page": src["page"], "src_rect": src["rect"],
+                "dst_page": dest["page"], "dst_point": dest["point"]})
+        if not links:
+            return 0
+        from datascience.add_pdf_internal_links import add_pdf_internal_links
+        res = add_pdf_internal_links(out_path, links)
+        if isinstance(res, dict) and res.get("status") == "ok":
+            return int(res.get("n_links") or 0)
+        if isinstance(res, dict) and res.get("error"):
+            notes.append(f"glosario sin enlaces: {res.get('error')}")
+    except Exception as e:  # noqa: BLE001 — links are best-effort.
+        notes.append(f"glosario sin enlaces: {e}")
+    return 0
@@ -43,6 +43,8 @@ _ACCENT = (0x2A, 0x6F, 0x97)
 _MUTED = (0x8A, 0x8A, 0x8A)
 _HEAD_BG = (0xEE, 0xF3, 0xF6)
 _WHITE = (0xFF, 0xFF, 0xFF)
+_ZEBRA = (0xF6, 0xF8, 0xFA)   # faint grey for even (zebra) data rows.
+_LINK = (0x2A, 0x6F, 0x97)    # accent colour for clickable glossary terms.

 _FS_TITLE = 26
 _FS_H1, _FS_H2, _FS_H3 = 20, 16, 13
@@ -59,6 +61,10 @@ class _PptxState:
        self.chapter = None
        self.slide_no = 0
        self.chapter_slides = 0
+        self.last_heading = ""        # text of the most recent heading.
+        # Glossary wiring (mejora 6): runs to link and per-term target slide.
+        self.term_runs = []           # [(key, run)]
+        self.term_anchor_slide = {}   # key -> Slide (glossary entry)


 def _rgb(c):
@@ -155,9 +161,13 @@ def _add_rich_text(st: _PptxState, rich_lines: list, fs: float, color,
                   indent=0.0, bullet=False) -> None:
    """Add pre-wrapped lines of styled segments as one paragraph per line.

-    Each line is ``[(text, is_bold), ...]``; every segment becomes its own run
-    so ``**bold**`` spans render with native PowerPoint bold (``run.font.bold``)
-    without affecting the measured height (one paragraph per pre-wrapped line).
+    Each line is a list of ``(text, is_bold)`` or ``(text, is_bold, term_key)``
+    segments; every segment becomes its own run so ``**bold**`` spans render with
+    native PowerPoint bold (``run.font.bold``) without affecting the measured
+    height (one paragraph per pre-wrapped line). A segment carrying a
+    ``term_key`` is drawn in the accent colour and its run is recorded in
+    ``st.term_runs`` so it later becomes a native hyperlink jumping to the
+    glossary slide of that term.
    """
    lh = tl.line_height_in(fs)
    height = lh * len(rich_lines) + 0.05
@@ -176,14 +186,20 @@ def _add_rich_text(st: _PptxState, rich_lines: list, fs: float, color,
            r0.text = "•  "
            r0.font.size = Pt(fs)
            r0.font.color.rgb = _rgb(color)
-        for seg_text, is_bold in segs:
+        for seg in segs:
+            if len(seg) == 3:
+                seg_text, is_bold, term = seg
+            else:
+                seg_text, is_bold, term = seg[0], seg[1], None
            if seg_text == "":
                continue
            run = p.add_run()
            run.text = seg_text
            run.font.size = Pt(fs)
            run.font.bold = bool(is_bold)
-            run.font.color.rgb = _rgb(color)
+            run.font.color.rgb = _rgb(_LINK if term else color)
+            if term:
+                st.term_runs.append((term, run, st.slide))
    st.y += height


@@ -191,6 +207,7 @@ def _place_heading(st: _PptxState, block) -> None:
    level = max(1, min(3, int(getattr(block, "level", 1) or 1)))
    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
    text = tl.strip_inline_md(getattr(block, "text", ""))
+    st.last_heading = text or st.last_heading
    lines = tl.wrap(text, tl.chars_per_line(_USABLE_W, fs))
    _add_text(st, lines, fs, _INK, bold=True)
    st.y += 0.04
@@ -233,12 +250,12 @@ def _place_markdown(st: _PptxState, block) -> None:
            continue
        if stripped.startswith("- ") or stripped.startswith("* "):
            content = stripped[2:]  # keep inline markers for bold rendering.
-            rich = tl.wrap_rich(content,
-                                tl.chars_per_line(_USABLE_W - 0.3, _FS_BODY))
+            rich = tl.wrap_rich_terms(content,
+                                      tl.chars_per_line(_USABLE_W - 0.3, _FS_BODY))
            _add_rich_text(st, rich, _FS_BODY, _INK, bullet=True)
            i += 1
            continue
-        para = [stripped]  # keep inline markers; wrap_rich renders **bold**.
+        para = [stripped]  # keep inline markers; wrap_rich_terms renders **bold**.
        j = i + 1
        while j < n:
            nxt = md_lines[j].strip()
@@ -247,8 +264,8 @@ def _place_markdown(st: _PptxState, block) -> None:
            para.append(nxt)
            j += 1
        text = " ".join(para)
-        _add_rich_text(st, tl.wrap_rich(text, tl.chars_per_line(_USABLE_W, _FS_BODY)),
-                       _FS_BODY, _INK)
+        _add_rich_text(st, tl.wrap_rich_terms(
+            text, tl.chars_per_line(_USABLE_W, _FS_BODY)), _FS_BODY, _INK)
        i = j
    st.y += _GAP

@@ -295,7 +312,8 @@ def _row_height_in(cells, widths, fs) -> float:
    return lh * maxlines + 0.10


-def _emit_table(st: _PptxState, header, chunk, widths, fs) -> None:
+def _emit_table(st: _PptxState, header, chunk, widths, fs,
+                start_index: int = 0) -> None:
    nrows = len(chunk) + (1 if header else 0)
    ncol = len(widths)
    # Pre-measure total height to size the shape (pptx still auto-grows rows).
@@ -319,11 +337,14 @@ def _emit_table(st: _PptxState, header, chunk, widths, fs) -> None:
            cell.text = model._safe_str(header[c]) if c < len(header) else ""
            _style_cell(cell, fs, _INK, bold=True, fill=_HEAD_BG)
        ridx = 1
-    for r in chunk:
+    # Zebra striping: shade even data rows (1-based) using the GLOBAL row index
+    # (start_index offset) so the pattern stays coherent across split chunks.
+    for k, r in enumerate(chunk):
+        fill = _ZEBRA if (start_index + k) % 2 == 1 else _WHITE
        for c in range(ncol):
            cell = gtable.cell(ridx, c)
            cell.text = model._safe_str(r[c]) if c < len(r) else ""
-            _style_cell(cell, fs, _INK, bold=False, fill=_WHITE)
+            _style_cell(cell, fs, _INK, bold=False, fill=fill)
        ridx += 1
    st.y += total_h + _GAP

@@ -367,6 +388,7 @@ def _place_data_table(st: _PptxState, block, shaded_header=True,
        avail = _remaining(st) - header_h
        chunk = []
        used = 0.0
+        chunk_start = idx  # global index of the first row in this chunk (zebra).
        while idx < n:
            rh = _row_height_in(rows[idx], widths, fs)
            if used + rh > avail and chunk:
@@ -374,7 +396,7 @@ def _place_data_table(st: _PptxState, block, shaded_header=True,
            chunk.append(rows[idx])
            used += rh
            idx += 1
-        _emit_table(st, header, chunk, widths, fs)
+        _emit_table(st, header, chunk, widths, fs, start_index=chunk_start)
    note = getattr(block, "note", None)
    if note:
        _add_text(st, tl.wrap(model._safe_str(note),
@@ -421,54 +443,97 @@ def _resolve_png(block):
                pass


-def _place_picture_bytes(st: _PptxState, data: bytes, caption) -> None:
+def _figure_bytes_cached(block):
+    """Rasterize a figure/image to PNG bytes ONCE and cache (bytes, aspect).
+
+    Measuring (keep-together) and drawing must agree on the real aspect ratio —
+    ``bbox_inches='tight'`` changes it vs ``figsize``, so we rasterize once and
+    reuse the bytes for both. Cached on the block; never raises."""
+    cached = getattr(block, "_aeda_png", None)
+    if cached is not None:
+        return cached
+    kind = getattr(block, "kind", "")
+    data = None
+    if kind == "image":
+        path = getattr(block, "path", "")
+        if path and os.path.exists(path):
+            try:
+                with open(path, "rb") as fh:
+                    data = fh.read()
+            except Exception:  # noqa: BLE001
+                data = None
+    else:
+        data = _resolve_png(block)
+    aspect = 0.66
+    if data is not None:
+        w_px, h_px = _img_size_px(data)
+        aspect = (h_px / w_px) if w_px else 0.66
+    try:
+        block._aeda_png = (data, aspect)
+        return block._aeda_png
+    except Exception:  # noqa: BLE001 — block may reject attributes; degrade.
+        return (data, aspect)
+
+
+def _place_picture_bytes(st: _PptxState, data: bytes, caption,
+                         max_h_in=None) -> None:
+    # Mejora 4 — every figure on a slide carries a visible caption/title. If the
+    # block has no caption, fall back to the current section heading, then to a
+    # generic label, so no image is ever shown untitled.
+    caption = (model._safe_str(caption).strip()
+               or model._safe_str(st.last_heading).strip() or "Figura")
    w_px, h_px = _img_size_px(data)
    aspect = (h_px / w_px) if w_px else 0.66
+    # Reserve the caption's REAL (possibly multi-line) height FIRST, then scale
+    # the image to (max_h - cap_reserve): a figure never fills the whole slide,
+    # so its caption always fits on the SAME slide and no image is untitled.
+    # cap_real = what _add_text consumes; cap_reserve adds the post-image gap and
+    # a small cushion so the caption never spills to the next slide.
+    cap_lines = tl.wrap(caption, tl.chars_per_line(_USABLE_W, _FS_NOTE))
+    cap_real = tl.line_height_in(_FS_NOTE) * len(cap_lines) + 0.05
+    cap_reserve = cap_real + 0.05 + 0.10
    max_h = _CONTENT_BOTTOM - _CONTENT_TOP
+    # height_in hint (model.Figure/Image): cap the target height so a figure in a
+    # keep-together Group shrinks to leave room for its heading and text.
+    if isinstance(max_h_in, (int, float)) and max_h_in > 0:
+        max_h = min(max_h, float(max_h_in))
+    max_img_h = max(max_h - cap_reserve, 0.6)
    target_w = _USABLE_W
    target_h = target_w * aspect
-    if target_h > max_h:
-        target_h = max_h
+    if target_h > max_img_h:
+        target_h = max_img_h
        target_w = target_h / aspect if aspect else _USABLE_W
-    cap_h = tl.line_height_in(_FS_NOTE) + 0.05 if caption else 0.0
-    if _remaining(st) < target_h + cap_h:
+    # Keep the image and its caption together on the same slide.
+    if _remaining(st) < target_h + cap_reserve:
        _new_slide(st, cont=True)
    left = _ML + (_USABLE_W - target_w) / 2.0
    st.slide.shapes.add_picture(io.BytesIO(data), Inches(left), Inches(st.y),
                                width=Inches(target_w), height=Inches(target_h))
    st.y += target_h + 0.05
-    if caption:
-        _add_text(st, tl.wrap(model._safe_str(caption),
-                  tl.chars_per_line(_USABLE_W, _FS_NOTE)), _FS_NOTE, _MUTED,
-                  italic=True)
+    _add_text(st, cap_lines, _FS_NOTE, _MUTED, italic=True)
    st.y += _GAP


 def _place_figure(st: _PptxState, block) -> None:
-    png = _resolve_png(block)
+    png, _aspect = _figure_bytes_cached(block)
    if png is None:
        _add_text(st, ["(figura no disponible)"], _FS_NOTE, _MUTED, italic=True)
        st.y += _GAP
        return
-    _place_picture_bytes(st, png, getattr(block, "caption", None))
+    _place_picture_bytes(st, png, getattr(block, "caption", None),
+                         max_h_in=getattr(block, "height_in", None))


 def _place_image(st: _PptxState, block) -> None:
-    path = getattr(block, "path", "")
-    if not path or not os.path.exists(path):
+    data, _aspect = _figure_bytes_cached(block)
+    if data is None:
+        path = getattr(block, "path", "")
        _add_text(st, [f"(imagen no encontrada: {path})"], _FS_NOTE, _MUTED,
                  italic=True)
        st.y += _GAP
        return
-    try:
-        with open(path, "rb") as fh:
-            data = fh.read()
-    except Exception as e:  # noqa: BLE001
-        _add_text(st, [f"(no se pudo leer la imagen: {e})"], _FS_NOTE, _MUTED,
-                  italic=True)
-        st.y += _GAP
-        return
-    _place_picture_bytes(st, data, getattr(block, "caption", None))
+    _place_picture_bytes(st, data, getattr(block, "caption", None),
+                         max_h_in=getattr(block, "height_in", None))


 def _place_caption(st: _PptxState, block) -> None:
@@ -482,6 +547,170 @@ def _place_note(st: _PptxState, block) -> None:
    _place_caption(st, block)


+# --------------------------------------------------------------------------- #
+# Block measurement (mejora 3 — keep-together). Estimate a block's slide height
+# WITHOUT drawing it so a Group can move whole to the next slide before drawing.
+# Over-estimating only triggers an earlier slide break, never a content cut.
+# --------------------------------------------------------------------------- #
+def _measure_heading_text(text: str, level: int) -> float:
+    level = max(1, min(3, int(level or 1)))
+    fs = {1: _FS_H1, 2: _FS_H2, 3: _FS_H3}[level]
+    lines = tl.wrap(tl.strip_inline_md(text), tl.chars_per_line(_USABLE_W, fs))
+    return tl.line_height_in(fs) * len(lines) + 0.05 + 0.04
+
+
+def _measure_markdown(block) -> float:
+    raw = str(getattr(block, "text", "") or "")
+    md_lines = raw.split("\n")
+    h = 0.0
+    i, n = 0, len(md_lines)
+    while i < n:
+        stripped = md_lines[i].strip()
+        if stripped.startswith("|") and stripped.endswith("|"):
+            j = i
+            while j < n and md_lines[j].strip().startswith("|") \
+                    and md_lines[j].strip().endswith("|"):
+                j += 1
+            h += (tl.line_height_in(_FS_CELL) + 0.10) * (j - i) + _GAP
+            i = j
+            continue
+        if stripped == "":
+            h += tl.line_height_in(_FS_BODY) * 0.4
+            i += 1
+            continue
+        if stripped.startswith("### "):
+            h += _measure_heading_text(stripped[4:], 3)
+            i += 1
+            continue
+        if stripped.startswith("## "):
+            h += _measure_heading_text(stripped[3:], 2)
+            i += 1
+            continue
+        if stripped.startswith("# "):
+            h += _measure_heading_text(stripped[2:], 1)
+            i += 1
+            continue
+        if stripped.startswith("- ") or stripped.startswith("* "):
+            lines = tl.wrap_rich_terms(
+                stripped[2:], tl.chars_per_line(_USABLE_W - 0.3, _FS_BODY))
+            h += tl.line_height_in(_FS_BODY) * len(lines) + 0.05
+            i += 1
+            continue
+        para = [stripped]
+        j = i + 1
+        while j < n:
+            nxt = md_lines[j].strip()
+            if nxt == "" or nxt.startswith(("|", "#", "- ", "* ")):
+                break
+            para.append(nxt)
+            j += 1
+        lines = tl.wrap_rich_terms(" ".join(para),
+                                   tl.chars_per_line(_USABLE_W, _FS_BODY))
+        h += tl.line_height_in(_FS_BODY) * len(lines) + 0.05
+        i = j
+    return h + _GAP
+
+
+def _measure_figure_like(block) -> float:
+    max_h = _CONTENT_BOTTOM - _CONTENT_TOP
+    hint = getattr(block, "height_in", None)
+    if isinstance(hint, (int, float)) and hint > 0:
+        max_h = min(max_h, float(hint))
+    # Use the REAL rasterized aspect (cached) so measuring matches drawing — this
+    # is what keeps a figure together with its heading instead of splitting.
+    _data, aspect = _figure_bytes_cached(block)
+    target_h = min(_USABLE_W * aspect, max_h)
+    # Caption is always emitted now (mejora 4), so always reserve its line.
+    cap_h = tl.line_height_in(_FS_NOTE) + 0.05
+    return target_h + 0.05 + cap_h + _GAP
+
+
+def _measure_block(st: _PptxState, block) -> float:
+    kind = getattr(block, "kind", "")
+    try:
+        if kind == "heading":
+            return _measure_heading_text(getattr(block, "text", ""),
+                                         getattr(block, "level", 1))
+        if kind == "markdown":
+            return _measure_markdown(block)
+        if kind in ("figure", "image"):
+            return _measure_figure_like(block)
+        if kind in ("caption", "note"):
+            lines = tl.wrap(getattr(block, "text", ""),
+                            tl.chars_per_line(_USABLE_W, _FS_NOTE))
+            return tl.line_height_in(_FS_NOTE) * len(lines) + 0.05 + _GAP
+        if kind in ("kv_table", "data_table"):
+            rows = getattr(block, "rows", []) or []
+            return (tl.line_height_in(_FS_CELL) + 0.10) * (len(rows) + 1) + _GAP
+        if kind == "group":
+            return sum(_measure_block(st, b)
+                       for b in (getattr(block, "blocks", []) or []))
+    except Exception:  # noqa: BLE001 — a measurement never aborts rendering.
+        pass
+    return tl.line_height_in(_FS_BODY)
+
+
+def _shrink_group_figures(st: _PptxState, blocks: list, avail_full: float) -> None:
+    """Cap each figure's height (via height_in) so the whole group fits a slide.
+
+    The figure shrinks just enough to leave room for its heading, text and
+    caption — that is how keep-together puts a chart on the SAME slide as its
+    title and description instead of pushing it to the next slide."""
+    fig_blocks = [b for b in blocks
+                  if getattr(b, "kind", "") in ("figure", "image")]
+    if not fig_blocks:
+        return
+    nonfig_h = sum(_measure_block(st, b) for b in blocks
+                   if getattr(b, "kind", "") not in ("figure", "image"))
+    fig_overhead = tl.line_height_in(_FS_NOTE) + 0.05 + 0.05 + _GAP
+    budget = avail_full - nonfig_h - 0.10 * len(fig_blocks)
+    if budget <= 1.0:
+        return  # not enough room to keep together; let it flow (degrade).
+    per = budget / len(fig_blocks) - fig_overhead
+    if per <= 0.8:
+        return
+    for fb in fig_blocks:
+        cur = getattr(fb, "height_in", None)
+        fb.height_in = (min(float(cur), per)
+                        if isinstance(cur, (int, float)) and cur > 0 else per)
+
+
+def _place_group(st: _PptxState, block) -> None:
+    """Render a keep-together Group: move it whole to the next slide if needed."""
+    blocks = getattr(block, "blocks", []) or []
+    if not blocks:
+        return
+    avail_full = _CONTENT_BOTTOM - _CONTENT_TOP
+    _shrink_group_figures(st, blocks, avail_full)
+    total = sum(_measure_block(st, b) for b in blocks)
+    if total <= avail_full:
+        if total > _remaining(st):
+            _new_slide(st, cont=True)
+    elif st.y > _CONTENT_TOP + 1e-6:
+        _new_slide(st, cont=True)
+    for b in blocks:
+        placer = _PLACERS.get(getattr(b, "kind", ""), _place_note)
+        try:
+            placer(st, b)
+        except Exception:  # noqa: BLE001 — a bad block never aborts the group.
+            pass
+
+
+def _place_glossary_entry(st: _PptxState, block) -> None:
+    """Render one glossary term and register its slide as the link target."""
+    key = getattr(block, "key", "")
+    label = getattr(block, "label", "") or key
+    definition = getattr(block, "definition", "")
+    _ensure(st, tl.line_height_in(_FS_H3) + tl.line_height_in(_FS_BODY) * 2)
+    if key:
+        st.term_anchor_slide[key] = st.slide
+    _place_heading(st, model.Heading(text=str(label), level=3))
+    if definition:
+        _add_text(st, tl.wrap(model._safe_str(definition),
+                  tl.chars_per_line(_USABLE_W, _FS_BODY)), _FS_BODY, _INK)
+    st.y += _GAP
+
+
 _PLACERS = {
    "heading": _place_heading,
    "markdown": _place_markdown,
@@ -491,6 +720,8 @@ _PLACERS = {
    "image": _place_image,
    "caption": _place_caption,
    "note": _place_note,
+    "group": _place_group,
+    "glossary_entry": _place_glossary_entry,
 }


@@ -542,6 +773,9 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:
            _new_slide(st, cont=False)
            _place_note(st, model.Note(
                "(documento vacío — sin capítulos aplicables)"))
+        # Mejora 6 — wire clickable glossary terms to their entry slide (native
+        # PowerPoint slide-jump). Delegated registry function; degrades silently.
+        n_links = _wire_glossary_links(st, notes)
        prs.save(out_path)
        n_slides = st.slide_no
    except Exception as e:  # noqa: BLE001
@@ -549,7 +783,35 @@ def render_pptx(chapters: list, out_path: str, meta: dict = None) -> dict:
                "note": f"fallo al escribir el PPTX: {e}"}

    note = f"{n_slides} slides"
+    if n_links:
+        note += f" · {n_links} enlaces de glosario"
    if notes:
        note += " · " + "; ".join(notes)
    return {"path": out_path, "n_slides": n_slides, "chapters": chapters_meta,
            "note": note}
+
+
+def _wire_glossary_links(st: _PptxState, notes: list) -> int:
+    """Turn each recorded term run into a native jump to its glossary slide.
+
+    Returns the number of links applied. A term whose only appearance is inside
+    its own glossary entry (source slide == target slide) is skipped. Never
+    raises."""
+    if not st.term_runs or not st.term_anchor_slide:
+        return 0
+    linked = 0
+    try:
+        from datascience.pptx_link_run_to_slide import pptx_link_run_to_slide
+    except Exception as e:  # noqa: BLE001
+        notes.append(f"glosario sin enlaces: {e}")
+        return 0
+    for key, run, src_slide in st.term_runs:
+        tgt = st.term_anchor_slide.get(key)
+        if tgt is None or tgt is src_slide:
+            continue
+        try:
+            if pptx_link_run_to_slide(run, src_slide, tgt):
+                linked += 1
+        except Exception:  # noqa: BLE001 — links are best-effort.
+            pass
+    return linked
@@ -24,6 +24,13 @@ import textwrap
 # the visible text matches ``strip_inline_md`` exactly.
 _INLINE_SPAN_RE = re.compile(r"(\*\*.+?\*\*|__.+?__|`.+?`)")

+# Glossary term span: ``[[term:key]]texto visible[[/term]]``. The visible text
+# (which may itself contain ``**bold**``) is kept and tagged with ``key`` so the
+# renderers can turn each appearance into a clickable jump to the glossary entry.
+_TERM_SPAN_RE = re.compile(r"\[\[term:([A-Za-z0-9_]+)\]\](.*?)\[\[/term\]\]",
+                           re.S)
+_TERM_OPEN_RE = re.compile(r"\[\[term:[A-Za-z0-9_]+\]\]")
+

 def avg_char_width_in(fontsize_pt: float) -> float:
    """Approximate average glyph width in inches for a sans-serif font.
@@ -86,11 +93,21 @@ def strip_inline_md(text: str) -> str:
    if not text:
        return ""
    s = str(text)
+    # Drop glossary term markers, keeping the visible inner text.
+    s = _TERM_SPAN_RE.sub(lambda m: m.group(2), s)
+    s = _TERM_OPEN_RE.sub("", s)      # leftover unbalanced open marker.
+    s = s.replace("[[/term]]", "")    # leftover unbalanced close marker.
    for marker in ("**", "__", "`"):
        s = s.replace(marker, "")
    return s


+def _strip_term_markers(s: str) -> str:
+    """Remove any (balanced or leftover) glossary term markers, keeping text."""
+    s = _TERM_OPEN_RE.sub("", s)
+    return s.replace("[[/term]]", "")
+
+
 def _strip_leftover_markers(s: str) -> str:
    """Drop any unbalanced inline markers from a plain (non-span) fragment.

@@ -222,6 +239,118 @@ def wrap_rich(text: str, max_chars: int):
    return lines or [[("", False)]]


+def parse_inline_rich(text: str):
+    """Split ``text`` into ``[(fragment, is_bold, term_key), ...]``.
+
+    Extends :func:`parse_inline_bold` with glossary term spans
+    ``[[term:key]]visible[[/term]]``: the inner ``visible`` text is parsed for
+    ``**bold**`` as usual and every resulting fragment carries ``term_key`` so the
+    renderers can make it clickable. Text outside a term span gets ``term_key =
+    None``. Unbalanced term markers are stripped (kept identical to
+    :func:`strip_inline_md`). The concatenation of all fragment texts equals
+    ``strip_inline_md(text)`` — visible characters and wrapping are unchanged; only
+    the bold flag and the term key are added. Adjacent fragments with the same
+    (bold, term) are merged.
+    """
+    s = "" if text is None else str(text)
+    if not s:
+        return []
+    out = []
+
+    def _emit(fragment: str, bold: bool, term) -> None:
+        if fragment == "":
+            return
+        if out and out[-1][1] == bold and out[-1][2] == term:
+            out[-1] = (out[-1][0] + fragment, bold, term)
+        else:
+            out.append((fragment, bold, term))
+
+    def _emit_bolded(segment: str, term) -> None:
+        # Reuse the bold parser on a term-marker-free segment.
+        for frag, bold in parse_inline_bold(_strip_term_markers(segment)):
+            _emit(frag, bold, term)
+
+    pos = 0
+    for m in _TERM_SPAN_RE.finditer(s):
+        if m.start() > pos:
+            _emit_bolded(s[pos:m.start()], None)
+        _emit_bolded(m.group(2), m.group(1))
+        pos = m.end()
+    if pos < len(s):
+        _emit_bolded(s[pos:], None)
+    return out
+
+
+def wrap_rich_terms(text: str, max_chars: int):
+    """Like :func:`wrap_rich` but preserving glossary term keys per fragment.
+
+    Returns ``list[list[(fragment, is_bold, term_key)]]`` — one inner list per
+    output line. Wrapping is word-aware and hard-splits over-long tokens so no
+    line exceeds ``max_chars`` (the renderers measure these very lines). Term and
+    bold flags never widen a line: the visible width matches :func:`wrap`.
+    """
+    if max_chars < 1:
+        max_chars = 1
+    spans = parse_inline_rich(text)
+    if not spans:
+        return [[("", False, None)]]
+
+    tokens = []  # each: (word, bold, term) or ("\n", None, None)
+    for frag, bold, term in spans:
+        parts = frag.split("\n")
+        for pi, part in enumerate(parts):
+            if pi > 0:
+                tokens.append(("\n", None, None))
+            for word in part.split(" "):
+                if word == "":
+                    continue
+                tokens.append((word, bold, term))
+
+    lines = []
+    cur = []
+    cur_len = 0
+
+    def _flush():
+        nonlocal cur, cur_len
+        merged = []
+        for k, (word, bold, term) in enumerate(cur):
+            piece = word if k == 0 else " " + word
+            if merged and merged[-1][1] == bold and merged[-1][2] == term:
+                merged[-1] = (merged[-1][0] + piece, bold, term)
+            else:
+                merged.append((piece, bold, term))
+        lines.append(merged or [("", False, None)])
+        cur = []
+        cur_len = 0
+
+    for word, bold, term in tokens:
+        if bold is None:  # forced newline
+            _flush()
+            continue
+        if len(word) > max_chars:
+            if cur:
+                _flush()
+            chunks = _hard_split(word, max_chars)
+            for ci, chunk in enumerate(chunks):
+                if ci < len(chunks) - 1:
+                    lines.append([(chunk, bold, term)])
+                else:
+                    cur = [(chunk, bold, term)]
+                    cur_len = len(chunk)
+            continue
+        add = len(word) if cur_len == 0 else cur_len + 1 + len(word)
+        if cur_len != 0 and add > max_chars:
+            _flush()
+            cur = [(word, bold, term)]
+            cur_len = len(word)
+        else:
+            cur.append((word, bold, term))
+            cur_len = add
+    if cur:
+        _flush()
+    return lines or [[("", False, None)]]
+
+
 def parse_md_table(lines: list):
    """Parse consecutive ``| a | b |`` lines into ``(header, rows)`` or None.

@@ -0,0 +1,107 @@
+---
+name: detect_declared_keys_duckdb
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def detect_declared_keys_duckdb(db_path: str, table: str = None) -> dict"
+description: "Detecta las claves DECLARADAS (constraints reales) de un schema DuckDB leyendo la table function duckdb_constraints(): extrae PRIMARY KEY, FOREIGN KEY y UNIQUE (ignora NOT NULL y CHECK) y las devuelve normalizadas con sus columnas, y para las FK con su tabla y columnas referenciadas. Con table=None procesa todas las tablas; con table='X' filtra a PK/UNIQUE de X y a FK cuyo origen es X (case-sensitive). A diferencia de infer_fk_containment_duckdb (que INFIERE FKs candidatas por containment de valores cuando el schema no las declara), esta funcion devuelve las relaciones de clave REALES del schema. Estilo dict-no-throw: nunca lanza. Parte del grupo eda (relaciones de clave)."
+tags: [eda, duckdb, datascience, relations, primary-key, foreign-key, schema, exploratory-data-analysis]
+params:
+  - name: db_path
+    desc: "Ruta al archivo DuckDB. Debe existir (lectura read-only via duckdb_query_readonly; no se crea). Un path inexistente devuelve {status:'error', ...}."
+  - name: table
+    desc: "Si se pasa, filtra los resultados a esa tabla: incluye PRIMARY KEY y UNIQUE cuya tabla sea `table`, y FOREIGN KEY cuya tabla ORIGEN sea `table` (no la referenciada). None (default) devuelve los constraints de todas las tablas. La comparacion es case-sensitive (nombres tal cual los devuelve DuckDB)."
+output: "dict dict-no-throw. En exito {status:'ok', primary_keys:[{table:str, columns:[str,...]}, ...], foreign_keys:[{table:str, columns:[str,...], referenced_table:str, referenced_columns:[str,...]}, ...], unique:[{table:str, columns:[str,...]}, ...], tables:[str,...]} donde tables es la lista ordenada de tablas (origen) que poseen al menos un constraint PK/FK/UNIQUE emitido. Solo se emiten constraints de clave: NOT NULL y CHECK se ignoran. En error {status:'error', error:str}."
+uses_functions: [duckdb_query_readonly_py_infra]
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: []
+tested: true
+tests: ["test_golden_detecta_pks_y_fk", "test_golden_ignora_not_null_y_check", "test_edge_filtra_por_tabla_orders", "test_edge_filtra_por_tabla_customers", "test_edge_unique_declarado", "test_edge_sin_constraints_listas_vacias", "test_error_db_inexistente_no_lanza", "test_shape_resultado"]
+test_file_path: "python/functions/datascience/detect_declared_keys_duckdb_test.py"
+file_path: "python/functions/datascience/detect_declared_keys_duckdb.py"
+---
+
+## Ejemplo
+
+```python
+import sys, os, duckdb
+sys.path.insert(0, os.path.join("python", "functions"))
+from datascience import detect_declared_keys_duckdb
+
+# Base de ejemplo en /tmp: orders.customer_id -> customers.id (FK declarada)
+path = "/tmp/declared_keys_demo.duckdb"
+if os.path.exists(path):
+    os.remove(path)
+con = duckdb.connect(path)
+con.execute("CREATE TABLE customers(id INTEGER PRIMARY KEY, name TEXT)")
+con.execute(
+    "CREATE TABLE orders("
+    "  id INTEGER PRIMARY KEY,"
+    "  customer_id INTEGER REFERENCES customers(id),"
+    "  amt DOUBLE)"
+)
+con.close()
+
+res = detect_declared_keys_duckdb(path)
+if res["status"] == "ok":
+    for pk in res["primary_keys"]:
+        print(f"PK  {pk['table']}({', '.join(pk['columns'])})")
+    for fk in res["foreign_keys"]:
+        print(f"FK  {fk['table']}({', '.join(fk['columns'])}) -> "
+              f"{fk['referenced_table']}({', '.join(fk['referenced_columns'])})")
+    # PK  customers(id)
+    # PK  orders(id)
+    # FK  orders(customer_id) -> customers(id)
+else:
+    print("error:", res["error"])
+
+# Filtrar a una tabla concreta (PK/UNIQUE de orders + FK con origen orders):
+solo_orders = detect_declared_keys_duckdb(path, table="orders")
+print(solo_orders["tables"])  # ['orders']
+```
+
+## Cuando usarla
+
+- Cuando exploras un esquema DuckDB y quieres mostrar las relaciones de clave REALES (PK/FK/UNIQUE) que el schema ha declarado, sin inferir nada.
+- Como paso del capitulo RELACIONES del grupo `eda`: primero mira las claves declaradas con esta funcion; si el schema no declara FKs, complementa con `infer_fk_containment_duckdb` (inferencia por containment).
+- Antes de documentar o migrar un esquema, para listar el contrato de integridad referencial que el motor ya conoce.
+- Para validar que las constraints que esperas (esa FK que creaste con `REFERENCES`) realmente estan declaradas en la base materializada.
+
+## Gotchas
+
+- **Impura**: lee de disco via la primitiva read-only `duckdb_query_readonly` (no crea ni modifica la base). El `db_path` debe existir; un path inexistente devuelve `{status:'error'}` (read_only NO crea la base).
+- **Requiere `duckdb_constraints()`**: usa la table function `duckdb_constraints()`, disponible en DuckDB modernos (verificado en 1.5.2). En versiones antiguas sin esa funcion, la query falla y se devuelve `{status:'error'}`.
+- **Solo claves DECLARADAS**: devuelve lo que el schema declaro con `PRIMARY KEY` / `FOREIGN KEY (... REFERENCES ...)` / `UNIQUE`. Una tabla materializada con `CREATE TABLE AS SELECT` NO lleva constraints — para esos casos no habra claves que mostrar y hay que INFERIRLAS (`infer_fk_containment_duckdb`).
+- **NOT NULL y CHECK se ignoran**: `duckdb_constraints()` tambien emite filas `NOT NULL` (DuckDB genera una por cada columna PK) y `CHECK`; esta funcion las descarta y solo conserva PK/FK/UNIQUE.
+- **Nombres case-sensitive**: el filtro `table='Orders'` no casa con una tabla `orders`. Se comparan los nombres tal cual los devuelve DuckDB.
+- **FK atribuida al origen**: una FOREIGN KEY se atribuye a su tabla ORIGEN (el `table` de la entrada), no a la referenciada. El filtro `table='X'` trae las FK cuyo origen es X, no las que apuntan a X.
+- **`tables` = tablas dueñas de constraints emitidos**: la lista `tables` contiene solo las tablas que poseen al menos un PK/FK/UNIQUE en el resultado (su campo `table`), ordenadas. No incluye tablas referenciadas que no tengan constraint propio en la salida.
+- **Columnas como listas**: `constraint_column_names` y `referenced_column_names` son columnas LIST de DuckDB; en 1.5.2 llegan como listas Python. La funcion las normaliza a listas de strings con una red de seguridad por si llegaran como string.
+
+## Notas
+
+`duckdb_constraints()` devuelve una fila por constraint con los campos
+`table_name`, `constraint_type`, `constraint_column_names`, `referenced_table`,
+`referenced_column_names`. Mapeo a la salida:
+
+```text
+PRIMARY KEY -> primary_keys[]: {table, columns}
+UNIQUE      -> unique[]:       {table, columns}
+FOREIGN KEY -> foreign_keys[]: {table, columns, referenced_table, referenced_columns}
+NOT NULL    -> ignorado
+CHECK       -> ignorado
+```
+
+Para una FK, `referenced_table` y `referenced_column_names` vienen poblados; para
+PK/UNIQUE, `referenced_table` es NULL y `referenced_column_names` una lista vacia.
+
+Complementa a `infer_fk_containment_duckdb`: esta funcion devuelve las relaciones
+de clave REALES del schema (declaradas); la otra INFIERE FKs candidatas por
+containment de valores cuando el schema no las declaro. En el capitulo RELACIONES
+de AutomaticEDA se usan en orden: primero las declaradas, luego la inferencia como
+respaldo.
@@ -0,0 +1,127 @@
+"""detect_declared_keys_duckdb — lee las claves DECLARADAS de un schema DuckDB.
+
+Funcion impura: lee de disco a traves de la primitiva read-only del grupo
+`duckdb` (duckdb_query_readonly). Pertenece al grupo de capacidad `eda`
+(relaciones de clave): a diferencia de infer_fk_containment_duckdb, que INFIERE
+FOREIGN KEYs candidatas por containment de valores, esta funcion devuelve las
+constraints REALES que el schema ha declarado (PRIMARY KEY / FOREIGN KEY /
+UNIQUE) leyendo la table function `duckdb_constraints()`.
+
+Es la pieza del capitulo RELACIONES de AutomaticEDA que muestra las relaciones de
+clave reales cuando existen — frente a la inferencia, que se usa cuando el schema
+no las declaro.
+
+Estilo dict-no-throw del grupo duckdb: nunca lanza; captura cualquier error y
+devuelve {status:'error', error:str}.
+"""
+
+from infra import duckdb_query_readonly
+
+
+def _as_list(value) -> list:
+    """Normaliza el valor de una columna LIST de DuckDB a una lista de strings.
+
+    En DuckDB 1.5.2, `constraint_column_names` y `referenced_column_names` llegan
+    ya como listas Python a traves de duckdb_query_readonly. Este helper es solo
+    una red de seguridad: si por cualquier motivo llegara como string (p.ej. la
+    representacion `[id, customer_id]`), la parsea de forma defensiva.
+    """
+    if value is None:
+        return []
+    if isinstance(value, (list, tuple)):
+        return [str(v) for v in value]
+    if isinstance(value, str):
+        s = value.strip()
+        if s.startswith("[") and s.endswith("]"):
+            s = s[1:-1]
+        if not s.strip():
+            return []
+        return [
+            part.strip().strip("'\"")
+            for part in s.split(",")
+            if part.strip().strip("'\"")
+        ]
+    return [str(value)]
+
+
+def detect_declared_keys_duckdb(db_path: str, table: str = None) -> dict:
+    """Detecta las claves PRIMARY KEY / FOREIGN KEY / UNIQUE declaradas en DuckDB.
+
+    Lee la table function `duckdb_constraints()` y extrae solo las constraints de
+    clave (PRIMARY KEY, FOREIGN KEY, UNIQUE), ignorando NOT NULL y CHECK.
+
+    Args:
+        db_path: ruta al archivo DuckDB. Debe existir (lectura read-only; no se
+            crea). Un path inexistente devuelve {status:'error', ...} sin lanzar.
+        table: si se pasa, filtra los resultados a esa tabla: incluye PRIMARY KEY
+            y UNIQUE cuya tabla sea `table`, y FOREIGN KEY cuya tabla ORIGEN sea
+            `table`. None (default) devuelve los constraints de todas las tablas.
+            La comparacion de nombres es case-sensitive (tal cual los devuelve
+            DuckDB).
+
+    Returns:
+        dict dict-no-throw. En exito:
+            {status:'ok',
+             primary_keys:[{table:str, columns:[str, ...]}, ...],
+             foreign_keys:[{table:str, columns:[str, ...],
+                            referenced_table:str,
+                            referenced_columns:[str, ...]}, ...],
+             unique:[{table:str, columns:[str, ...]}, ...],
+             tables:[str, ...]}   # tablas (origen) con algun PK/FK/UNIQUE emitido
+        En error (sin lanzar): {status:'error', error:str}.
+    """
+    try:
+        sql = (
+            "SELECT table_name, constraint_type, constraint_column_names, "
+            "referenced_table, referenced_column_names FROM duckdb_constraints()"
+        )
+        res = duckdb_query_readonly(db_path, sql)
+        if res["status"] != "ok":
+            return {"status": "error", "error": res["error"]}
+
+        primary_keys = []
+        foreign_keys = []
+        unique = []
+        tables = set()
+
+        for row in res["rows"]:
+            ctype = row["constraint_type"]
+            tname = row["table_name"]
+
+            # Filtro por tabla origen: para PK/FK/UNIQUE el dueño del constraint es
+            # `table_name`. Una FK se atribuye a su tabla origen (no a la
+            # referenciada), igual que el filtro pide.
+            if table is not None and tname != table:
+                continue
+
+            cols = _as_list(row["constraint_column_names"])
+
+            if ctype == "PRIMARY KEY":
+                primary_keys.append({"table": tname, "columns": cols})
+                tables.add(tname)
+            elif ctype == "UNIQUE":
+                unique.append({"table": tname, "columns": cols})
+                tables.add(tname)
+            elif ctype == "FOREIGN KEY":
+                foreign_keys.append(
+                    {
+                        "table": tname,
+                        "columns": cols,
+                        "referenced_table": row["referenced_table"],
+                        "referenced_columns": _as_list(
+                            row["referenced_column_names"]
+                        ),
+                    }
+                )
+                tables.add(tname)
+            # NOT NULL y CHECK se ignoran: no son relaciones de clave.
+
+        return {
+            "status": "ok",
+            "primary_keys": primary_keys,
+            "foreign_keys": foreign_keys,
+            "unique": unique,
+            "tables": sorted(tables),
+        }
+    except Exception as e:  # noqa: BLE001
+        return {"status": "error", "error": str(e)}
@@ -0,0 +1,167 @@
+"""Tests para detect_declared_keys_duckdb."""
+
+import duckdb
+import pytest
+
+from .detect_declared_keys_duckdb import detect_declared_keys_duckdb
+
+
+@pytest.fixture
+def db(tmp_path):
+    """DuckDB temporal con claves declaradas.
+
+    - customers(id PRIMARY KEY, name)
+    - orders(id PRIMARY KEY, customer_id REFERENCES customers(id), amt)
+
+    Esto declara dos PRIMARY KEY (customers.id, orders.id) y una FOREIGN KEY
+    (orders.customer_id -> customers.id). DuckDB ademas genera constraints
+    NOT NULL para las columnas PK, que la funcion debe ignorar.
+    """
+    path = str(tmp_path / "keys_test.duckdb")
+    con = duckdb.connect(path)
+    con.execute("CREATE TABLE customers(id INTEGER PRIMARY KEY, name TEXT)")
+    con.execute(
+        "CREATE TABLE orders("
+        "  id INTEGER PRIMARY KEY,"
+        "  customer_id INTEGER REFERENCES customers(id),"
+        "  amt DOUBLE"
+        ")"
+    )
+    con.close()
+    return path
+
+
+def _pk_for(res, table):
+    """Devuelve la entrada primary_keys cuya tabla es `table`, o None."""
+    for pk in res["primary_keys"]:
+        if pk["table"] == table:
+            return pk
+    return None
+
+
+def test_golden_detecta_pks_y_fk(db):
+    """Golden: detecta las dos PK y la FK declaradas, con valores concretos."""
+    res = detect_declared_keys_duckdb(db)
+    assert res["status"] == "ok"
+
+    # PRIMARY KEY de customers y de orders.
+    pk_customers = _pk_for(res, "customers")
+    pk_orders = _pk_for(res, "orders")
+    assert pk_customers is not None
+    assert pk_customers["columns"] == ["id"]
+    assert pk_orders is not None
+    assert pk_orders["columns"] == ["id"]
+
+    # FOREIGN KEY orders.customer_id -> customers.id.
+    assert len(res["foreign_keys"]) == 1
+    fk = res["foreign_keys"][0]
+    assert fk["table"] == "orders"
+    assert fk["columns"] == ["customer_id"]
+    assert fk["referenced_table"] == "customers"
+    assert fk["referenced_columns"] == ["id"]
+
+    # tables incluye ambas (origen de algun constraint).
+    assert res["tables"] == ["customers", "orders"]
+
+
+def test_golden_ignora_not_null_y_check(db):
+    """NOT NULL (auto-generado por las PK) no aparece como clave."""
+    res = detect_declared_keys_duckdb(db)
+    assert res["status"] == "ok"
+    # Solo 2 PK reales (no las NOT NULL que DuckDB genera por cada columna PK).
+    assert len(res["primary_keys"]) == 2
+    # No hay UNIQUE declarado en este schema.
+    assert res["unique"] == []
+
+
+def test_edge_filtra_por_tabla_orders(db):
+    """Edge table='orders': PK de orders + su FK; NO la PK de customers."""
+    res = detect_declared_keys_duckdb(db, table="orders")
+    assert res["status"] == "ok"
+
+    # Solo la PK de orders.
+    assert len(res["primary_keys"]) == 1
+    assert res["primary_keys"][0]["table"] == "orders"
+    assert res["primary_keys"][0]["columns"] == ["id"]
+    # La PK de customers NO esta.
+    assert _pk_for(res, "customers") is None
+
+    # La FK de orders si esta (origen = orders).
+    assert len(res["foreign_keys"]) == 1
+    assert res["foreign_keys"][0]["table"] == "orders"
+    assert res["foreign_keys"][0]["referenced_table"] == "customers"
+
+    # tables solo contiene orders (la dueña de los constraints emitidos).
+    assert res["tables"] == ["orders"]
+
+
+def test_edge_filtra_por_tabla_customers(db):
+    """Edge table='customers': solo su PK; ninguna FK (orders queda fuera)."""
+    res = detect_declared_keys_duckdb(db, table="customers")
+    assert res["status"] == "ok"
+    assert len(res["primary_keys"]) == 1
+    assert res["primary_keys"][0]["table"] == "customers"
+    assert res["foreign_keys"] == []
+    assert res["tables"] == ["customers"]
+
+
+def test_edge_unique_declarado(tmp_path):
+    """Edge: una constraint UNIQUE declarada aparece en `unique`."""
+    path = str(tmp_path / "unique_test.duckdb")
+    con = duckdb.connect(path)
+    con.execute("CREATE TABLE products(sku INTEGER UNIQUE, name TEXT)")
+    con.close()
+
+    res = detect_declared_keys_duckdb(path)
+    assert res["status"] == "ok"
+    assert len(res["unique"]) == 1
+    assert res["unique"][0]["table"] == "products"
+    assert res["unique"][0]["columns"] == ["sku"]
+    assert res["primary_keys"] == []
+    assert res["foreign_keys"] == []
+    assert res["tables"] == ["products"]
+
+
+def test_edge_sin_constraints_listas_vacias(tmp_path):
+    """Edge: tabla sin PK/FK/UNIQUE -> todas las listas vacias, status ok."""
+    path = str(tmp_path / "no_keys.duckdb")
+    con = duckdb.connect(path)
+    con.execute("CREATE TABLE log(a INTEGER, b INTEGER)")
+    con.close()
+
+    res = detect_declared_keys_duckdb(path)
+    assert res["status"] == "ok"
+    assert res["primary_keys"] == []
+    assert res["foreign_keys"] == []
+    assert res["unique"] == []
+    assert res["tables"] == []
+
+
+def test_error_db_inexistente_no_lanza(tmp_path):
+    """Error: db_path inexistente -> status error, sin lanzar excepcion."""
+    path = str(tmp_path / "does_not_exist.duckdb")
+    res = detect_declared_keys_duckdb(path)
+    assert res["status"] == "error"
+    assert isinstance(res["error"], str)
+    assert res["error"] != ""
+
+
+def test_shape_resultado(db):
+    """El retorno tiene exactamente las claves esperadas."""
+    res = detect_declared_keys_duckdb(db)
+    assert set(res.keys()) == {
+        "status",
+        "primary_keys",
+        "foreign_keys",
+        "unique",
+        "tables",
+    }
+    for pk in res["primary_keys"]:
+        assert set(pk.keys()) == {"table", "columns"}
+    for fk in res["foreign_keys"]:
+        assert set(fk.keys()) == {
+            "table",
+            "columns",
+            "referenced_table",
+            "referenced_columns",
+        }
@@ -0,0 +1,85 @@
+---
+name: pptx_link_run_to_slide
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: impure
+signature: "def pptx_link_run_to_slide(run, source_slide, target_slide) -> bool"
+description: "Convierte un run de texto de python-pptx en un hyperlink INTERNO 'ir a la diapositiva'. python-pptx soporta run.hyperlink.address para URLs externas pero NO para saltar a otra slide del mismo deck; esta función crea ese salto manipulando el XML: añade una relación slide->slide (RT.SLIDE) y un <a:hlinkClick> con action='ppaction://hlinksldjump' y el r:id de la relación, insertado como primer hijo del <a:rPr> del run (orden del schema CT_TextCharacterProperties). Idempotente (elimina un hlinkClick previo antes de insertar). Al pulsar el texto en PowerPoint o visores compatibles se navega a target_slide. Motor python-pptx. No lanza nunca: cualquier excepción -> return False."
+tags: [eda, pptx, hyperlink, slide-jump, navigation, glossary, automatic-eda, python-pptx, xml, datascience, python]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: ["python-pptx"]
+params:
+  - name: run
+    desc: "el pptx.text.text._Run cuyo texto se vuelve clicable. Debe pertenecer a un run real (expone ._r, el elemento <a:r>). Un objeto sin ._r hace que la función devuelva False sin lanzar."
+  - name: source_slide
+    desc: "la Slide que contiene el run. Su part recibe la relación slide->slide (relate_to con RELATIONSHIP_TYPE.SLIDE); el r:id resultante se referencia en el hlinkClick."
+  - name: target_slide
+    desc: "la Slide de destino del salto. Debe pertenecer al MISMO Presentation que source_slide para que la relación interna sea válida."
+output: "bool. True si se aplicó el hyperlink interno (relación creada + <a:hlinkClick> insertado en el rPr del run); False si algo lo impidió (run inválido, slides de presentaciones distintas, etc.). Nunca lanza."
+tested: true
+tests: ["test_golden_run_se_vuelve_salto_a_otra_slide", "test_idempotente_reaplica_sin_duplicar_hlinkclick", "test_error_path_run_invalido_devuelve_false_sin_lanzar"]
+test_file_path: "python/functions/datascience/pptx_link_run_to_slide_test.py"
+file_path: "python/functions/datascience/pptx_link_run_to_slide.py"
+---
+
+## Ejemplo
+
+```python
+from pptx import Presentation
+from pptx.util import Inches
+from pptx.oxml.ns import qn
+
+from datascience.pptx_link_run_to_slide import pptx_link_run_to_slide
+
+prs = Presentation()
+blank = prs.slide_layouts[6]            # layout en blanco
+slide0 = prs.slides.add_slide(blank)
+slide1 = prs.slides.add_slide(blank)    # destino del salto (p.ej. el glosario)
+
+box = slide0.shapes.add_textbox(Inches(1), Inches(1), Inches(4), Inches(1))
+run = box.text_frame.paragraphs[0].add_run()
+run.text = "ir al glosario"
+
+ok = pptx_link_run_to_slide(run, slide0, slide1)
+print(ok)  # -> True
+
+# El run quedó con <a:rPr><a:hlinkClick action="ppaction://hlinksldjump" r:id="rIdN"/></a:rPr>
+hlink = run._r.get_or_add_rPr().find(qn("a:hlinkClick"))
+print(hlink.get("action"))  # -> ppaction://hlinksldjump
+prs.save("deck_con_salto.pptx")
+```
+
+## Cuando usarla
+
+Cuando construyas un deck PPTX con **navegación interna** y quieras que un texto salte a
+otra diapositiva al pulsarlo: un **glosario clicable** (cada término enlaza a su slide de
+definición), un **índice/tabla de contenidos navegable**, botones "volver a la portada", o
+referencias cruzadas entre capítulos. Es la pieza que `python-pptx` no cubre de fábrica —
+úsala sobre los runs ya creados por renderers como `render_automatic_eda_pptx` del grupo
+`eda` para enriquecer el deck con saltos sin reescribir el XML a mano cada vez.
+
+## Gotchas
+
+- **Impura**: muta el XML del run y crea una relación nueva en el part de `source_slide`.
+- **Solo navega en visores que respetan `ppaction://hlinksldjump`**: PowerPoint y la
+  mayoría de visores compatibles lo siguen; algunos visores web/ligeros lo ignoran (el
+  texto se ve igual pero no salta).
+- **Mismo Presentation**: `source_slide` y `target_slide` deben pertenecer al mismo deck.
+  Si son de presentaciones distintas, la relación interna no es válida y el salto no
+  funcionará (la función puede devolver True por crear la relación, pero el resultado en
+  el visor no será el esperado).
+- **El `<a:hlinkClick>` vive en el `<a:rPr>` del run**, no como hijo directo del `<a:r>`.
+  Para localizarlo: `run._r.get_or_add_rPr().find(qn("a:hlinkClick"))` (un `find` sobre
+  `run._r` devuelve `None` porque solo mira hijos directos del `<a:r>`).
+- **Idempotente**: si el run ya tenía un `hlinkClick` (p.ej. una URL externa o un salto
+  previo), se elimina antes de insertar el nuevo — un run tiene como mucho un click-link.
+- **Nunca lanza**: cualquier excepción (run sin `._r`, slides incompatibles, etc.) se
+  traga y devuelve `False`. Comprobar el booleano si el salto es crítico.
+- **Dependencia python-pptx**: declarada en `python/pyproject.toml`. Tests con
+  `~/fn_registry/python/.venv/bin/python3` (tiene `python-pptx` instalado).
@@ -0,0 +1,50 @@
+"""Convierte un run de texto de python-pptx en un hyperlink interno "ir a la diapositiva".
+
+python-pptx expone ``run.hyperlink.address`` para URLs externas, pero NO ofrece una
+API pública para saltar a otra diapositiva del mismo deck. Esta función crea ese salto
+interno manipulando el XML: añade una relación ``slide -> slide`` y un
+``<a:hlinkClick>`` con la acción ``ppaction://hlinksldjump`` en el run, de modo que al
+pulsar el texto en PowerPoint (o en visores que respetan esa acción) se navega a la
+diapositiva de destino.
+"""
+
+from pptx.opc.constants import RELATIONSHIP_TYPE as RT
+from pptx.oxml.ns import qn
+
+
+def pptx_link_run_to_slide(run, source_slide, target_slide) -> bool:
+    """Convierte un run de texto en un hyperlink interno "ir a la diapositiva".
+
+    Añade una relación ``slide -> slide`` desde la slide origen al part de la slide
+    destino y crea un ``<a:hlinkClick>`` con ``action="ppaction://hlinksldjump"`` como
+    primer hijo del ``<a:rPr>`` del run (orden válido del schema
+    ``CT_TextCharacterProperties``). La operación es idempotente: un ``hlinkClick``
+    previo en el mismo run se elimina antes de insertar el nuevo.
+
+    Args:
+        run: el ``pptx.text.text._Run`` cuyo texto se vuelve clicable.
+        source_slide: la ``Slide`` que contiene el run.
+        target_slide: la ``Slide`` de destino del salto.
+
+    Returns:
+        True si se aplicó el hyperlink; False si algo impidió aplicarlo (no lanza).
+    """
+    try:
+        rId = source_slide.part.relate_to(target_slide.part, RT.SLIDE)
+        rPr = run._r.get_or_add_rPr()
+        # Elimina un hlinkClick previo si lo hubiera (idempotente).
+        for existing in rPr.findall(qn("a:hlinkClick")):
+            rPr.remove(existing)
+        hlink = rPr.makeelement(
+            qn("a:hlinkClick"),
+            {
+                qn("r:id"): rId,
+                "action": "ppaction://hlinksldjump",
+            },
+        )
+        # a:hlinkClick debe ir como primer hijo de rPr
+        # (orden del schema CT_TextCharacterProperties).
+        rPr.insert(0, hlink)
+        return True
+    except Exception:
+        return False
@@ -0,0 +1,73 @@
+"""Tests for pptx_link_run_to_slide — salto interno run -> diapositiva.
+
+Self-contained: construye una Presentation en memoria con dos slides en blanco,
+un textbox con un run en la slide 0, y verifica que la función inyecta un
+``<a:hlinkClick>`` con ``action="ppaction://hlinksldjump"`` y un ``r:id`` que
+resuelve al part de la slide 1.
+"""
+
+import pytest
+
+pytest.importorskip("pptx")
+
+from pptx import Presentation  # noqa: E402
+from pptx.oxml.ns import qn  # noqa: E402
+from pptx.util import Inches  # noqa: E402
+
+from datascience.pptx_link_run_to_slide import pptx_link_run_to_slide  # noqa: E402
+
+
+def _two_slide_deck_with_run():
+    prs = Presentation()
+    blank = prs.slide_layouts[6]  # layout en blanco
+    slide0 = prs.slides.add_slide(blank)
+    slide1 = prs.slides.add_slide(blank)
+
+    box = slide0.shapes.add_textbox(Inches(1), Inches(1), Inches(4), Inches(1))
+    tf = box.text_frame
+    para = tf.paragraphs[0]
+    run = para.add_run()
+    run.text = "ir al glosario"
+    return prs, slide0, slide1, run
+
+
+def test_golden_run_se_vuelve_salto_a_otra_slide():
+    prs, slide0, slide1, run = _two_slide_deck_with_run()
+
+    ok = pptx_link_run_to_slide(run, slide0, slide1)
+    assert ok is True
+
+    # El hlinkClick es hijo del rPr del run (orden del schema
+    # CT_TextCharacterProperties), no hijo directo del <a:r>.
+    rPr = run._r.get_or_add_rPr()
+    hlink = rPr.find(qn("a:hlinkClick"))
+    assert hlink is not None
+    assert hlink.get("action") == "ppaction://hlinksldjump"
+
+    rId = hlink.get(qn("r:id"))
+    assert rId, "el hlinkClick debe llevar un r:id no vacío"
+
+    # El rId debe existir en las relaciones de la slide origen y apuntar
+    # al part de la slide destino.
+    rels = slide0.part.rels
+    assert rId in rels
+    assert rels[rId].target_part is slide1.part
+
+
+def test_idempotente_reaplica_sin_duplicar_hlinkclick():
+    prs, slide0, slide1, run = _two_slide_deck_with_run()
+
+    assert pptx_link_run_to_slide(run, slide0, slide1) is True
+    assert pptx_link_run_to_slide(run, slide0, slide1) is True
+
+    rPr = run._r.get_or_add_rPr()
+    hlinks = rPr.findall(qn("a:hlinkClick"))
+    assert len(hlinks) == 1
+
+
+def test_error_path_run_invalido_devuelve_false_sin_lanzar():
+    prs, slide0, slide1, _run = _two_slide_deck_with_run()
+
+    # Un objeto sin ._r ni soporte de relación -> la función no lanza, devuelve False.
+    ok = pptx_link_run_to_slide(object(), slide0, slide1)
+    assert ok is False
@@ -0,0 +1,91 @@
+---
+name: suggest_intratable_fk_candidates
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: pure
+signature: "def suggest_intratable_fk_candidates(profile: dict, max_candidates: int = 20) -> list"
+description: "Sobre el TableProfile de UNA tabla (el dict de profile_table), sugiere por heuristica de nombre + cardinalidad que columnas PARECEN una clave foranea hacia otra tabla, cuando no hay relaciones inter-tabla que medir (una sola tabla). Es una SUGERENCIA, no una afirmacion: el ref_table_guess es el stem del nombre (customer_id -> customer) y NO confirma containment. Pura: solo lee el dict, sin I/O; nunca lanza (devuelve [])."
+tags: [eda, datascience, relationships, foreign-key, fk, heuristic, schema, python]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: ""
+imports: []
+params:
+  - name: profile
+    desc: "TableProfile (dict que produce profile_table / summarize_table_*). Se leen de forma defensiva `columns` (lista de ColumnProfile con name/inferred_type/physical_type/distinct_count/unique_pct/flags), `n_rows` (int) y `key_candidates` (lista de nombres de columna ya candidatos a PK, que se excluyen). Si no es dict o no trae columns -> []."
+  - name: max_candidates
+    desc: "Tope de sugerencias devueltas (default 20). Las columnas candidatas se ordenan por distinct_count descendente (mas informativas primero) antes de cortar a este maximo."
+output: "list (posiblemente vacia) de dicts, uno por columna sugerida, con claves: `column` (nombre), `ref_table_guess` (tabla conjeturada por el stem del nombre, p.ej. customer_id -> 'customer'), `reason` (frase humana que deja claro que es heuristica sin confirmar containment), `distinct_count` (int|None), `unique_pct` (float|None, fraccion 0-1 tal como viene del profile), `inferred_type` (str), `physical_type` (str). Nunca lanza."
+tested: true
+tests: ["test_golden_customer_id_detectado_otras_no", "test_camelcase_albumid_detectado", "test_constante_status_id_no_aparece", "test_profile_vacio_y_none_devuelven_lista_vacia", "test_category_id_casi_unico_parece_pk_no_aparece", "test_ref_table_guess_multitoken_y_orden_por_distinct", "test_max_candidates_corta_la_lista", "test_id_generico_solo_nunca_es_fk"]
+test_file_path: "python/functions/datascience/suggest_intratable_fk_candidates_test.py"
+file_path: "python/functions/datascience/suggest_intratable_fk_candidates.py"
+---
+
+## Ejemplo
+
+```python
+from datascience import suggest_intratable_fk_candidates
+
+# TableProfile de UNA tabla (tipo titanic): customer_id es FK N:1; id es la PK;
+# amount es una medida float; name es categorica sin sufijo de id.
+profile = {
+    "n_rows": 891,
+    "key_candidates": ["id"],
+    "columns": [
+        {"name": "id", "inferred_type": "numeric", "physical_type": "BIGINT",
+         "distinct_count": 891, "unique_pct": 1.0, "flags": ["possible_id"]},
+        {"name": "customer_id", "inferred_type": "numeric", "physical_type": "BIGINT",
+         "distinct_count": 137, "unique_pct": 0.15, "flags": []},
+        {"name": "amount", "inferred_type": "numeric", "physical_type": "DOUBLE",
+         "distinct_count": 400, "unique_pct": 0.45, "flags": []},
+        {"name": "name", "inferred_type": "categorical", "physical_type": "VARCHAR",
+         "distinct_count": 700, "unique_pct": 0.78, "flags": []},
+    ],
+}
+
+out = suggest_intratable_fk_candidates(profile)
+[c["column"] for c in out]              # -> ["customer_id"]
+out[0]["ref_table_guess"]               # -> "customer"
+out[0]["reason"]
+# -> "el nombre termina en '_id' y es N:1 (137 valores distintos < 891 filas):
+#     parece (heuristica por nombre, sin confirmar containment) una referencia a
+#     una tabla «customer»"
+```
+
+## Cuando usarla
+
+Cuando el EDA tiene SOLO UNA tabla y, por tanto, no se puede inferir una FK
+inter-tabla por containment (no hay otra tabla cuyos valores contener). Es el plan B
+del capitulo RELACIONES de AutomaticEDA: en vez de medir solapamiento de valores
+entre tablas (lo correcto cuando hay varias, ver `infer_fk_containment_duckdb` /
+`build_join_graph`), conjetura por el NOMBRE de la columna (`<algo>_id`) y por su
+CARDINALIDAD N:1 que columnas parecen apuntar a una entidad externa. Usala para
+enriquecer el reporte con "estas columnas parecen referencias a otras tablas" sin
+prometer que esa tabla exista. NO la uses si tienes varias tablas: ahi mide
+containment de verdad.
+
+## Gotchas
+
+- Es **heuristica**, no una verdad: produce **falsos positivos** (una columna
+  `period_id` que en realidad es un codigo libre, no una FK) y **falsos negativos**
+  (una FK que no se llama `*_id`, p.ej. `parent`, `owner`, `sku`). No la trates como
+  una afirmacion de esquema.
+- `ref_table_guess` es una **conjetura por el nombre** (el stem sin el sufijo id):
+  `customer_id` -> `customer`, `AlbumId` -> `album`, `manager_staff_id` ->
+  `manager_staff`. Puede no coincidir con el nombre real de la tabla (plurales,
+  prefijos, alias). Es una pista, no un join garantizado.
+- **NO confirma containment**: no comprueba que los valores de la columna existan en
+  ninguna otra tabla (no puede — solo recibe el perfil de una tabla). Para confirmar
+  una FK real con varias tablas usa `infer_fk_containment_duckdb`.
+- Excluye deliberadamente: el `id`/`Id`/`ID` generico a secas (suele ser la PK
+  propia, no una referencia), las columnas constantes, las que parecen unicas
+  (`unique_pct >= 0.99`, mas PK que FK) y los tipos no-clave (float/decimal son
+  medidas; date/time/timestamp y boolean no son claves). En camelCase, `paid`,
+  `valid`, `grid` (con `id` en minuscula y sin separador) NO se confunden con FK.
+- `unique_pct` se interpreta como **fraccion 0-1** (tal como la emite el profile), no
+  como porcentaje 0-100.
@@ -0,0 +1,202 @@
+"""suggest_intratable_fk_candidates — heuristica de FK intra-tabla del grupo `eda`.
+
+Sobre el TableProfile de UNA tabla (el dict que produce ``profile_table``), sugiere
+por heuristica de NOMBRE + CARDINALIDAD que columnas PARECEN una clave foranea hacia
+otra tabla, util cuando no hay relaciones inter-tabla disponibles (una sola tabla y,
+por tanto, sin containment cruzado que medir). Es una SUGERENCIA, no una afirmacion:
+no confirma que exista la tabla referida ni que los valores esten contenidos en ella.
+
+La consume el capitulo RELACIONES de AutomaticEDA cuando solo hay una tabla.
+
+Funcion PURA: solo lee el dict (lectura defensiva con ``.get``), no hace I/O y nunca
+lanza por inputs raros (devuelve ``[]``).
+"""
+
+# inferred_type que es compatible con una clave foranea (entero/categorico).
+_FK_INFERRED_OK = {"numeric", "categorical", "integer"}
+
+# Prefijos de physical_type que admiten ser clave foranea (enteros, texto, uuid).
+_FK_PHYSICAL_PREFIXES = (
+    "int", "bigint", "smallint", "tinyint", "hugeint", "uint",
+    "varchar", "text", "char", "bpchar", "string", "uuid",
+)
+
+# Prefijos de physical_type que EXCLUYEN ser clave foranea: medidas en coma flotante
+# (float/double/decimal/numeric/real), temporales (date/time/timestamp/interval) y
+# boolean. Se comprueban ANTES que las senales positivas (la exclusion gana: una
+# columna numeric con physical DOUBLE es una medida, no una FK).
+_FK_PHYSICAL_EXCLUDE = (
+    "float", "double", "decimal", "numeric", "real",
+    "date", "time", "timestamp", "interval",
+    "bool",
+)
+
+
+def _fk_name_signal(name):
+    """Detecta el sufijo de clave foranea en el nombre y devuelve ``(stem, sufijo)``.
+
+    Reconoce ``<algo>_id`` (snake), ``<Algo>Id`` y ``<algo>ID`` (camel). NO reconoce
+    el ``id``/``Id``/``ID`` generico a secas (suele ser la PK propia de la tabla, no
+    una referencia). En camelCase la ``I`` mayuscula marca el limite de palabra, asi
+    que ``paid``/``valid``/``grid`` (``id`` en minuscula y sin separador) NO matchean.
+
+    El ``stem`` se devuelve en minusculas y sirve de ``ref_table_guess`` (la tabla a
+    la que probablemente apunta): ``customer_id`` -> ``"customer"``, ``AlbumId`` ->
+    ``"album"``, ``manager_staff_id`` -> ``"manager_staff"``. Devuelve ``None`` si no
+    hay senal de nombre.
+    """
+    if not isinstance(name, str):
+        return None
+    raw = name.strip()
+    if not raw:
+        return None
+    # Snake: termina en "_id" (indiferente a mayusculas en la parte "id").
+    if raw.lower().endswith("_id"):
+        stem = raw[:-3].rstrip("_-. ")
+        if not stem:
+            return None
+        return (stem.lower(), "_id")
+    # Camel todo-mayuscula: "...ID" (p.ej. customerID).
+    if raw.endswith("ID"):
+        stem = raw[:-2].rstrip("_-. ")
+        if not stem:
+            return None
+        return (stem.lower(), "ID")
+    # Camel: "...Id" (p.ej. AlbumId).
+    if raw.endswith("Id"):
+        stem = raw[:-2].rstrip("_-. ")
+        if not stem:
+            return None
+        return (stem.lower(), "Id")
+    return None
+
+
+def _fk_type_compatible(col):
+    """True si el tipo de la columna admite ser clave foranea.
+
+    Compatible si el ``physical_type`` NO es una medida flotante, una temporal ni
+    boolean, Y ademas (``inferred_type`` en {numeric, categorical, integer} O el
+    ``physical_type`` empieza por entero/varchar/text/char/uuid). La comparacion es
+    indistinta a mayusculas/minusculas.
+    """
+    phys = (col.get("physical_type") or "").strip().lower()
+    inferred = (col.get("inferred_type") or "").strip().lower()
+    # Exclusion por tipo fisico (gana sobre cualquier senal positiva).
+    for bad in _FK_PHYSICAL_EXCLUDE:
+        if phys.startswith(bad):
+            return False
+    # Senal positiva por tipo inferido.
+    if inferred in _FK_INFERRED_OK:
+        return True
+    # Senal positiva por tipo fisico (entero/texto/uuid).
+    for good in _FK_PHYSICAL_PREFIXES:
+        if phys.startswith(good):
+            return True
+    return False
+
+
+def suggest_intratable_fk_candidates(profile: dict, max_candidates: int = 20) -> list:
+    """Sugiere columnas que parecen una FK intra-tabla por nombre + cardinalidad.
+
+    Heuristica (no afirma nada): una columna es candidata a clave foranea si su nombre
+    tiene sufijo de id con stem no vacio (``<algo>_id`` / ``<Algo>Id`` / ``<algo>ID``,
+    NUNCA el ``id`` generico), no es ya candidata a PK, no es constante, tiene
+    cardinalidad alta pero por debajo del numero de filas (N:1, no unica) y un tipo
+    compatible con clave (entero/categorico/texto/uuid; nunca float/fecha/boolean).
+
+    Args:
+        profile: TableProfile (dict de ``profile_table``). Se leen, de forma
+            defensiva, ``columns`` (lista de ColumnProfile), ``n_rows`` y
+            ``key_candidates`` (nombres de columna ya candidatos a PK).
+        max_candidates: tope de sugerencias devueltas (default 20). Las columnas se
+            ordenan por ``distinct_count`` descendente (mas informativas primero)
+            antes de cortar.
+
+    Returns:
+        list de dicts (posiblemente vacia), uno por columna sugerida, con claves:
+        ``column``, ``ref_table_guess`` (stem del nombre), ``reason`` (frase humana),
+        ``distinct_count``, ``unique_pct`` (fraccion 0-1 tal como viene del profile),
+        ``inferred_type``, ``physical_type``. Nunca lanza: si ``profile`` no es dict o
+        no hay columnas, devuelve ``[]``.
+    """
+    if not isinstance(profile, dict):
+        return []
+    columns = profile.get("columns")
+    if not isinstance(columns, list):
+        return []
+
+    n_rows = profile.get("n_rows")
+    has_n_rows = (
+        isinstance(n_rows, int) and not isinstance(n_rows, bool) and n_rows > 0
+    )
+
+    key_candidates = profile.get("key_candidates")
+    if not isinstance(key_candidates, (list, tuple, set)):
+        key_candidates = []
+    key_set = set(key_candidates)
+
+    out = []
+    for col in columns:
+        if not isinstance(col, dict):
+            continue
+        name = col.get("name")
+
+        # 1) Senal de nombre: sufijo de id con stem no vacio.
+        signal = _fk_name_signal(name)
+        if signal is None:
+            continue
+        ref_guess, suffix = signal
+
+        # 2) No es ya candidata a PK (clave primaria de la propia tabla).
+        if name in key_set:
+            continue
+
+        # 3) No constante y con >= 2 valores distintos.
+        flags = col.get("flags") or []
+        if "constant" in flags:
+            continue
+        dc = col.get("distinct_count")
+        if not (isinstance(dc, int) and not isinstance(dc, bool) and dc >= 2):
+            continue
+
+        # 4) Cardinalidad alta pero < n_rows (no es PK) y no parece unica.
+        if has_n_rows and dc >= n_rows:
+            continue
+        unique_pct = col.get("unique_pct")
+        has_unique = (
+            isinstance(unique_pct, (int, float)) and not isinstance(unique_pct, bool)
+        )
+        if has_unique and unique_pct >= 0.99:
+            continue
+
+        # 5) Tipo compatible con clave foranea (entero/categorico/texto; no medida).
+        if not _fk_type_compatible(col):
+            continue
+
+        out.append(
+            {
+                "column": name,
+                "ref_table_guess": ref_guess,
+                "reason": _build_reason(suffix, dc, n_rows if has_n_rows else None, ref_guess),
+                "distinct_count": dc,
+                "unique_pct": float(unique_pct) if has_unique else None,
+                "inferred_type": col.get("inferred_type") or "",
+                "physical_type": col.get("physical_type") or "",
+            }
+        )
+
+    # Mas informativas primero (mayor cardinalidad), luego corte.
+    out.sort(key=lambda d: d.get("distinct_count") or 0, reverse=True)
+    return out[: max(0, int(max_candidates))]
+
+
+def _build_reason(suffix, dc, n_rows, ref_guess):
+    """Frase humana que deja claro que la sugerencia es heuristica, no confirmada."""
+    if n_rows is not None:
+        card = f"es N:1 ({dc} valores distintos < {n_rows} filas)"
+    else:
+        card = f"tiene {dc} valores distintos que se repiten (cardinalidad N:1)"
+    return (
+        f"el nombre termina en '{suffix}' y {card}: parece (heuristica por nombre, "
+        f"sin confirmar containment) una referencia a una tabla «{ref_guess}»"
+    )
@@ -0,0 +1,157 @@
+"""Tests para suggest_intratable_fk_candidates (funcion pura, sin I/O)."""
+
+from suggest_intratable_fk_candidates import suggest_intratable_fk_candidates
+
+
+def _col(name, inferred_type="numeric", physical_type="BIGINT", distinct_count=10,
+         unique_pct=0.1, flags=None):
+    """Construye un ColumnProfile minimo a mano (el dict que emite profile_table)."""
+    return {
+        "name": name,
+        "inferred_type": inferred_type,
+        "physical_type": physical_type,
+        "semantic_type": "",
+        "distinct_count": distinct_count,
+        "unique_pct": unique_pct,
+        "null_count": 0,
+        "null_pct": 0.0,
+        "flags": list(flags) if flags else [],
+    }
+
+
+def test_golden_customer_id_detectado_otras_no():
+    # Tabla tipo titanic: customer_id es FK N:1; id es la PK; amount es medida;
+    # name es categorica sin sufijo de id. Solo customer_id debe aparecer.
+    profile = {
+        "n_rows": 891,
+        "key_candidates": ["id"],
+        "columns": [
+            _col("id", inferred_type="numeric", physical_type="BIGINT",
+                 distinct_count=891, unique_pct=1.0, flags=["possible_id"]),
+            _col("customer_id", inferred_type="numeric", physical_type="BIGINT",
+                 distinct_count=137, unique_pct=0.15, flags=[]),
+            _col("amount", inferred_type="numeric", physical_type="DOUBLE",
+                 distinct_count=400, unique_pct=0.45),
+            _col("name", inferred_type="categorical", physical_type="VARCHAR",
+                 distinct_count=700, unique_pct=0.78),
+        ],
+    }
+    out = suggest_intratable_fk_candidates(profile)
+    assert isinstance(out, list)
+    assert [c["column"] for c in out] == ["customer_id"]
+    cand = out[0]
+    assert cand["ref_table_guess"] == "customer"
+    assert cand["distinct_count"] == 137
+    assert cand["unique_pct"] == 0.15
+    assert cand["inferred_type"] == "numeric"
+    assert cand["physical_type"] == "BIGINT"
+    # La razon deja claro que es heuristica + cita el sufijo y la tabla.
+    assert "customer" in cand["reason"]
+    assert "_id" in cand["reason"]
+
+
+def test_camelcase_albumid_detectado():
+    # AlbumId (camelCase, VARCHAR) -> detectada, ref_table_guess "album".
+    profile = {
+        "n_rows": 3503,
+        "key_candidates": ["TrackId"],
+        "columns": [
+            _col("AlbumId", inferred_type="categorical", physical_type="VARCHAR",
+                 distinct_count=347, unique_pct=0.10),
+        ],
+    }
+    out = suggest_intratable_fk_candidates(profile)
+    # TrackId es PK candidata (en key_candidates), AlbumId no -> AlbumId aparece.
+    assert [c["column"] for c in out] == ["AlbumId"]
+    assert out[0]["ref_table_guess"] == "album"
+
+
+def test_constante_status_id_no_aparece():
+    # status_id constante (flag "constant", distinct_count 1) NO es FK util.
+    profile = {
+        "n_rows": 1000,
+        "key_candidates": [],
+        "columns": [
+            _col("status_id", inferred_type="numeric", physical_type="INTEGER",
+                 distinct_count=1, unique_pct=0.001, flags=["constant"]),
+        ],
+    }
+    out = suggest_intratable_fk_candidates(profile)
+    assert out == []
+
+
+def test_profile_vacio_y_none_devuelven_lista_vacia():
+    # Lectura defensiva: ni {} ni None lanzan; devuelven [].
+    assert suggest_intratable_fk_candidates({}) == []
+    assert suggest_intratable_fk_candidates(None) == []
+    # profile sin columns o con columns no-lista tampoco lanza.
+    assert suggest_intratable_fk_candidates({"n_rows": 10}) == []
+    assert suggest_intratable_fk_candidates({"columns": "no-soy-lista"}) == []
+
+
+def test_category_id_casi_unico_parece_pk_no_aparece():
+    # unique_pct 0.999 -> parece PK (no N:1) -> NO se sugiere como FK.
+    profile = {
+        "n_rows": 891,
+        "key_candidates": [],
+        "columns": [
+            _col("category_id", inferred_type="numeric", physical_type="BIGINT",
+                 distinct_count=890, unique_pct=0.999),
+        ],
+    }
+    out = suggest_intratable_fk_candidates(profile)
+    assert out == []
+
+
+def test_ref_table_guess_multitoken_y_orden_por_distinct():
+    # manager_staff_id conserva los underscores del stem -> "manager_staff".
+    # Ademas, con varias candidatas, se ordenan por distinct_count descendente.
+    profile = {
+        "n_rows": 10000,
+        "key_candidates": ["staff_id"],  # staff_id es PK aqui, no debe aparecer
+        "columns": [
+            _col("staff_id", inferred_type="numeric", physical_type="BIGINT",
+                 distinct_count=10000, unique_pct=1.0, flags=["possible_id"]),
+            _col("store_id", inferred_type="numeric", physical_type="INTEGER",
+                 distinct_count=2, unique_pct=0.0002),
+            _col("manager_staff_id", inferred_type="numeric", physical_type="INTEGER",
+                 distinct_count=40, unique_pct=0.004),
+        ],
+    }
+    out = suggest_intratable_fk_candidates(profile)
+    cols = [c["column"] for c in out]
+    # staff_id excluida (PK); las otras dos ordenadas por distinct desc.
+    assert cols == ["manager_staff_id", "store_id"]
+    refs = {c["column"]: c["ref_table_guess"] for c in out}
+    assert refs["manager_staff_id"] == "manager_staff"
+    assert refs["store_id"] == "store"
+
+
+def test_max_candidates_corta_la_lista():
+    # max_candidates limita el numero de sugerencias devueltas.
+    profile = {
+        "n_rows": 10000,
+        "key_candidates": [],
+        "columns": [
+            _col("a_id", distinct_count=300, unique_pct=0.03),
+            _col("b_id", distinct_count=200, unique_pct=0.02),
+            _col("c_id", distinct_count=100, unique_pct=0.01),
+        ],
+    }
+    out = suggest_intratable_fk_candidates(profile, max_candidates=2)
+    assert [c["column"] for c in out] == ["a_id", "b_id"]
+
+
+def test_id_generico_solo_nunca_es_fk():
+    # 'id'/'Id'/'ID' a secas (sin stem) jamas se sugieren como FK.
+    profile = {
+        "n_rows": 500,
+        "key_candidates": [],
+        "columns": [
+            _col("id", distinct_count=500, unique_pct=1.0),
+            _col("Id", distinct_count=120, unique_pct=0.24),
+            _col("ID", distinct_count=80, unique_pct=0.16),
+        ],
+    }
+    out = suggest_intratable_fk_candidates(profile)
+    assert out == []
@@ -25,6 +25,7 @@ dependencies = [
    "polars>=1.40.1",
    "pymeshlab>=2025.7.post1",
    "pymssql>=2.3.13",
+    "pymupdf>=1.28.0",
    "pypdf>=6.10.0",
    "pyproj>=3.7.2",
    "python-docx>=1.2.0",
Author	SHA1	Message	Date
egutierrez	68f4ddabce	feat(eda): capítulo RELACIONES para AutomaticEDA Añade el capítulo `relaciones` al motor AutomaticEDA: analiza las relaciones de clave de la tabla/base y se coloca tras `correlacion`, antes de `modelos`, en CHAPTER_ORDER. Capas que renderiza (solo las que aplican; None si no hay nada que decir): - Claves declaradas: PK/FK/UNIQUE reales del esquema DuckDB, vía la nueva función `detect_declared_keys_duckdb` (lee `duckdb_constraints()`). - Candidatos a clave primaria: los `key_candidates` del TableProfile. - FK candidatas inter-tabla: reusa `infer_fk_containment_duckdb` (containment + señal de nombre) y `build_join_graph` (roles de nodos + diagrama Mermaid pegable). Solo si la fuente DuckDB tiene varias tablas. - FK candidatas intra-tabla: heurística nombre + cardinalidad, vía la nueva función pura `suggest_intratable_fk_candidates`, marcada como sugerencia. Engancha al glosario clicable los términos PK, FK, containment/inclusión y cardinalidad (contrato §11.1) y usa Group (keep-together) para el grafo. Funciones nuevas del registry (grupo `eda`): - detect_declared_keys_duckdb (impure, datascience) + test. - suggest_intratable_fk_candidates (pure, datascience) + test. Tests: relaciones_test.py (golden intra + inter, edges, no-cut render) + los tests de ambas funciones. Suite automatic_eda + render_automatic_eda verde (89 passed). Golden end-to-end con el pipeline render_automatic_eda verificado sobre titanic (intra) y una BD customers/orders (inter). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 18:15:15 +02:00
egutierrez	c6d9bc26da	merge: Fase 4a AutomaticEDA motor+glosario (verificado met) - fix negrita-pisa PDF, zebra striping (PDF+PPT), keep-together (Group: heading+figura+texto misma pagina/slide), imagenes con caption en PPT - portada construida-al-final mostrada en posicion 1 (con resumen agregado del cuerpo) - capitulo glosario al final + terminos clicables REALES: PDF link annotation (add_pdf_internal_links, PyMuPDF) + PPT hyperlink nativo (pptx_link_run_to_slide); entropia enganchado en cat_distr como ejemplo E2E - contrato docs/automatic_eda_contract.md §11 (glosario + keep-together + zebra) - pymupdf>=1.28.0	2026-06-30 17:45:30 +02:00
egutierrez	d1a3d58a6b	feat(eda): motor AutomaticEDA fase 4a — render fixes + keep-together + glosario clicable Mejoras transversales del motor de render (no del contenido de capítulos): 1. Fix negrita pisa texto (PDF): _place_rich_lines mide el ancho REAL de cada span con las métricas de fuente del renderer (peso correcto) en vez del grid de ancho medio; negrita y normal en la misma línea ya no se solapan. 2. Zebra striping: filas pares sombreadas (#f6f8fa) en DataTable (PDF + PPTX), coherente al partir tablas largas (índice de fila lógico, no por página). 3. Keep-together: bloque Group nuevo; el renderer mide el grupo entero y lo mueve completo a la página/slide siguiente si no cabe, y encoge la figura (height_in) para dejar sitio a su título y texto. num_distr lo usa. 4. Caption siempre visible en toda figura PPTX (fallback al heading); la figura reserva el alto de su caption para que ambos quepan en el mismo slide. 5. Portada construida al final (con resumen agregado del análisis vía ctx['document_summary']) pero colocada primera por build_document. 6. Glosario: capítulo nuevo (último) + GlossaryCollector en ctx; los capítulos registran términos y marcan apariciones con [[term:key]]...[[/term]]. Links clicables reales: PDF (PyMuPDF, link GOTO) y PPTX (slide-jump nativo). Enganchado "entropía" en cat_distr como ejemplo end-to-end. Funciones reutilizables delegadas a fn-constructor (tag eda): - add_pdf_internal_links_py_datascience (PyMuPDF) - pptx_link_run_to_slide_py_datascience (slide-jump) Contrato docs/automatic_eda_contract.md actualizado (§1/§3/§5 + §11 nueva) con la API de glosario, keep-together y zebra para la siguiente fase. PyMuPDF declarado en pyproject. Suite verde (90 tests); golden titanic verificado. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 17:35:19 +02:00
egutierrez	b5334a2e97	merge: Fase 3 AutomaticEDA wiring (verificado met) - build_eda_render_ctx: arma ctx (raw_numeric, timeseries_raw, geo_points, db_path+table) desde tabla DuckDB - pipeline render_automatic_eda: perfila + ctx + build_document -> PDF + PPTX (11 capitulos poblados) - profile_table: flag emit_automatic emite el report AutomaticEDA (PDF+PPT) sin romper render_eda_pdf - text_layout: render real de negrita en PDF y PPTX - .claude/commands/eda.md actualizado Los 4 capitulos que degradaban (modelos/timeseries/geospatial/agregacion) ahora salen POBLADOS end-to-end.	2026-06-30 16:19:52 +02:00