feat(eda): capítulo OUTLIERS — valores atípicos univariantes + multivariantes

Nuevo capítulo dedicado `outliers` para el motor AutomaticEDA que reúne y
profundiza en un solo sitio el análisis de valores atípicos, hoy disperso entre
`num_distr` (conteo por columna) y `modelos` (IsolationForest). Se registra en
`chapters_registry.py` entre `missingness` y `correlacion` (bloque de calidad de
datos: calidad → missingness → outliers).

Contenido del capítulo:
- Resumen univariante por columna: nº y % de atípicos por Tukey (1.5·IQR) y por
  z-score (|z| > 3), con vallas inferior/superior y valores extremos. Ordenado
  por contaminación y marcando las columnas más afectadas. Reusa las funciones
  del registry `build_boxplot_stats` (vallas desde los percentiles del profile)
  y `detect_outliers` (regla z-score sobre la muestra cruda de `ctx`).
- Boxplots de Tukey de las columnas más contaminadas (caja, bigotes y puntos
  atípicos), delegados a la función nueva `build_boxplots_figure`.
- Multivariante: filas anómalas considerando todas las columnas a la vez con
  `isolation_forest_outliers` — nº y % de filas, las más anómalas con su score y
  las dimensiones que las hacen raras (top columnas por |z|, vía la función nueva
  `summarize_outlier_dims`). El detector se corre en vivo sobre `raw_numeric`
  para que el indexado de filas coincida exactamente con el de las dimensiones;
  cae al bloque precomputado del perfil cuando no hay muestra cruda (preset lite).
- Interpretación exploratoria: un atípico no es necesariamente un error
  (distingue error de dato vs dato real extremo) y recomendaciones (revisar,
  winsorizar o re-expresar, enlazando con la re-expresión de Tukey del perfil).

Términos clicables registrados en el glosario compartido: `outlier`,
`tukey_fence`, `zscore`, `isolation_forest`.

Funciones nuevas del registry (dominio datascience, grupo eda):
- `build_boxplots_figure_py_datascience` (figure helper, impura)
- `summarize_outlier_dims_py_datascience` (pura)

El capítulo se activa con ≥1 columna numérica y devuelve None en su ausencia;
lee todo defensivo y nunca lanza. Tests: capítulo (golden + edges + error path +
render PDF/PPTX) y ambas funciones nuevas. Suite de no-regresión de AutomaticEDA
verde. Verificado end-to-end con el dataset Titanic (Fare/Parch/SibSp como las
columnas más contaminadas).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-30 21:12:40 +02:00
parent e5abc18211
commit 6f88f184f1
9 changed files with 1698 additions and 0 deletions
@@ -0,0 +1,593 @@
"""Outliers chapter (OUTLIERS) — univariate + multivariate atypical values.
Today the analysis of atypical values is scattered across the document: the
NUM DISTR chapter mentions the per-column outlier count inside each distribution
figure, and the MODELOS chapter runs Isolation Forest as one of several cheap
models. This chapter gathers and deepens the whole outlier story in a single
place, with its interpretation: an [[term:outlier]]outlier[[/term]] is **not
necessarily an error** — it can be a legitimate, extreme but real observation —
so the reading is exploratory (what to look at), never confirmatory (what to
delete).
Sections, in order:
1. **Resumen univariante por columna** — for every numeric column, the number
and percentage of atypical values by two complementary criteria: Tukey's
1.5·IQR rule ([[term:tukey_fence]]vallas de Tukey[[/term]]) and the
[[term:zscore]]z-score[[/term]] rule (|z| > 3). The most contaminated columns
are flagged. The fences come from the pure registry function
``build_boxplot_stats`` (derived from the profile percentiles); the per-column
counts use the raw sample in ``ctx['raw_numeric']`` when available (the exact
count), degrading to the profile's own z-score counts otherwise.
2. **Boxplots** — a single figure with the Tukey boxplots of the most
contaminated columns (box, whiskers and atypical points), delegated to the
reusable registry helper ``build_boxplots_figure``.
3. **Multivariante (filas anómalas)** — rows that are atypical considering ALL
columns at once, via the registry function ``isolation_forest_outliers``: the
count and percentage of anomalous rows, the most anomalous rows with their
score, and the dimensions that make each one rare (top columns by |z|, via
``summarize_outlier_dims``). Run live on ``ctx['raw_numeric']`` (the same
numeric columns ``summarize_outlier_dims`` uses, so the row indexing stays
coherent and the dimension breakdown is correct); falls back to the
precomputed ``profile['models']['outliers']`` only when no raw sample is
available (e.g. the lite preset), where no per-row breakdown is shown.
4. **Interpretación** — outlier ≠ error: how to tell a data-entry error from a
genuine extreme value, and what to do (inspect, winsorize, or re-express —
linking to the Tukey re-expression the profile already computes).
The chapter activates whenever the table has at least one numeric column; with
no numeric column it returns ``None`` and disappears from the document.
Reads everything defensively (``.get``) and never raises: every registry
delegation is imported lazily and degraded to an honest note on any failure.
Contract: build_<id>(profile, ctx) -> Chapter | None ; CHAPTER_VERSION = "x.y.z".
"""
from __future__ import annotations
from .. import model
CHAPTER_VERSION = "1.0.0"
CHAPTER_ID = "outliers"
CHAPTER_TITLE = "Valores atípicos"
# z-score threshold for the univariate z rule: |z| > 3 flags a value ~3 standard
# deviations from the mean (≈99.7% of a normal distribution lies within ±3σ).
_Z_THRESH = 3.0
# How many columns to draw in the boxplots figure (most contaminated first) and
# how many anomalous rows to list in the multivariate table.
_TOP_BOX = 12
_TOP_ROWS = 12
# Cap on the raw atypical values passed as boxplot fliers, so a heavy-tailed
# column does not flood the figure with thousands of points.
_MAX_FLIERS = 200
# How many columns flagged as "most contaminated" in the summary note.
_TOP_FLAGGED = 3
# Glossary terms this chapter explains (contract §11.1). Registered in the shared
# collector and marked clickable on first appearance. ``isolation_forest`` and
# ``zscore`` may also be registered by the MODELOS chapter — ``add`` is
# idempotent (first definition wins), so registering them here is harmless and
# keeps this chapter self-contained when MODELOS does not render.
_TERM_DEFS = {
"outlier": (
"Valor atípico (outlier)",
"Una observación que se aparta mucho del grueso de los datos. Un atípico "
"NO es necesariamente un error: puede ser un fallo de medida o de "
"registro, pero también un dato real extremo (un cliente que gasta diez "
"veces la media, un día de ventas excepcional). Por eso se señalan para "
"revisarlos, no para borrarlos automáticamente.",
),
"tukey_fence": (
"Vallas de Tukey (1,5·IQR)",
"Regla clásica para marcar atípicos a partir de los cuartiles: se calcula "
"el rango intercuartílico IQR = P75 P25 y se trazan dos vallas, una "
"inferior en P25 1,5·IQR y otra superior en P75 + 1,5·IQR. Los valores "
"que caen fuera de esas vallas se consideran atípicos. Es robusta porque "
"se apoya en la mediana y los cuartiles, no en la media.",
),
"zscore": (
"z-score (puntuación típica)",
"Mide a cuántas desviaciones típicas está un valor de la media de su "
"columna: z = (valor media) / desviación típica. Un |z| grande (aquí > "
"3) señala un valor alejado del centro. A diferencia de las vallas de "
"Tukey, el z-score usa media y desviación, así que es más sensible a la "
"presencia de los propios atípicos.",
),
"isolation_forest": (
"Isolation Forest (anomalías multivariantes)",
"Algoritmo de detección de anomalías que considera TODAS las columnas a "
"la vez: construye árboles que parten el espacio con cortes aleatorios y "
"mide cuántos cortes hacen falta para aislar cada fila. Las filas raras "
"se aíslan con muy pocos cortes y se marcan como atípicas según un umbral "
"de contaminación. Detecta combinaciones de valores poco frecuentes que "
"ninguna columna por separado revelaría.",
),
}
# --------------------------------------------------------------------------- #
# Lazy registry delegations (each degrades to None / no-op on any failure).
# --------------------------------------------------------------------------- #
def _load_build_boxplot_stats():
try:
from datascience.build_boxplot_stats import build_boxplot_stats
return build_boxplot_stats
except Exception: # noqa: BLE001
return None
def _load_detect_outliers():
# detect_outliers lives in the monolithic ``datascience.datascience`` module
# (file_path datascience.py), not in its own submodule — try both shapes.
try:
from datascience.datascience import detect_outliers
return detect_outliers
except Exception: # noqa: BLE001
try:
from datascience import detect_outliers
return detect_outliers
except Exception: # noqa: BLE001
return None
def _load_isolation_forest():
try:
from datascience.isolation_forest_outliers import isolation_forest_outliers
return isolation_forest_outliers
except Exception: # noqa: BLE001
return None
def _load_summarize_dims():
try:
from datascience.summarize_outlier_dims import summarize_outlier_dims
return summarize_outlier_dims
except Exception: # noqa: BLE001
return None
# --------------------------------------------------------------------------- #
# Defensive formatters (own copy: the chapter never imports siblings).
# --------------------------------------------------------------------------- #
def _fmt_num(value, decimals: int = 3) -> str:
if value is None:
return ""
if isinstance(value, bool):
return "" if value else "no"
if isinstance(value, int):
return f"{value:,}".replace(",", ".")
if isinstance(value, float):
if value != value: # NaN
return ""
if value in (float("inf"), float("-inf")):
return str(value)
text = f"{value:.{decimals}f}".rstrip("0").rstrip(".")
return text if text else "0"
return model._safe_str(value)
def _fmt_int(value) -> str:
if value is None:
return ""
try:
return f"{int(round(float(value))):,}".replace(",", ".")
except (TypeError, ValueError):
return model._safe_str(value)
def _fmt_pct(value, decimals: int = 2) -> str:
"""Format an already-0-100 value as a percentage. None -> placeholder."""
if value is None:
return ""
try:
return f"{float(value):.{decimals}f}%"
except (TypeError, ValueError):
return model._safe_str(value)
def _term(mark: bool, key: str, text: str) -> str:
return f"[[term:{key}]]{text}[[/term]]" if mark else text
def _is_dict(v) -> bool:
return isinstance(v, dict)
# --------------------------------------------------------------------------- #
# Profile reads.
# --------------------------------------------------------------------------- #
def _numeric_columns(profile: dict) -> list:
"""Return [(name, numeric_dict)] for numeric columns with usable stats."""
out = []
for col in profile.get("columns") or []:
if not isinstance(col, dict):
continue
if col.get("inferred_type") != "numeric":
continue
num = col.get("numeric")
if not isinstance(num, dict) or not num:
continue
if num.get("mean") is None and num.get("median") is None:
continue
out.append((col.get("name") or "(columna)", num))
return out
def _clean_values(raw):
"""Return the finite float values of a raw column list (drop None/NaN/inf)."""
if not isinstance(raw, (list, tuple)):
return None
vals = []
for v in raw:
if v is None or isinstance(v, bool):
continue
try:
f = float(v)
except (TypeError, ValueError):
continue
if f != f or f in (float("inf"), float("-inf")):
continue
vals.append(f)
return vals
# --------------------------------------------------------------------------- #
# Per-column univariate summary.
# --------------------------------------------------------------------------- #
def _univariate_row(name, numeric, raw_vals, box_fn, detect_fn):
"""Compute one univariate summary row + boxplot inputs for a column.
Returns a dict with the table cells and, when raw values are available, the
exact Tukey/z counts and the list of atypical (flier) values; otherwise it
degrades to the profile's own z-score counts and the fence flags.
"""
box = {}
if box_fn is not None:
try:
box = box_fn(numeric) or {}
except Exception: # noqa: BLE001
box = {}
lf = box.get("lower_fence")
uf = box.get("upper_fence")
vals = _clean_values(raw_vals)
n_tukey = pct_tukey = None
n_z = pct_z = None
low_extreme = high_extreme = None
fliers = []
contamination = None # metric used to rank columns (prefer Tukey %).
if vals:
n = len(vals)
tukey_out = []
for v in vals:
below = (lf is not None and v < lf)
above = (uf is not None and v > uf)
if below or above:
tukey_out.append(v)
n_tukey = len(tukey_out)
pct_tukey = 100.0 * n_tukey / n if n else None
if tukey_out:
low_extreme = min(tukey_out)
high_extreme = max(tukey_out)
fliers = tukey_out[:_MAX_FLIERS]
# z-score rule via the registry function (returns parallel bools).
if detect_fn is not None:
try:
flags = detect_fn(vals, _Z_THRESH) or []
n_z = int(sum(1 for b in flags if b))
pct_z = 100.0 * n_z / n if n else None
except Exception: # noqa: BLE001
n_z = pct_z = None
contamination = pct_tukey
else:
# Degrade: no raw sample for this column. The profile's own outlier
# count/pct come from the z-score block (build_boxplot_stats note); the
# Tukey count is unknown, only the fence flags are.
n_z = numeric.get("n_outliers")
pct_z = numeric.get("outlier_pct")
if box.get("has_low_outliers") and box.get("min") is not None:
low_extreme = box.get("min")
if box.get("has_high_outliers") and box.get("max") is not None:
high_extreme = box.get("max")
contamination = pct_z if isinstance(pct_z, (int, float)) else None
# Compact "extremos atípicos" cell: down/up arrows for the low/high tail.
extremes = []
if low_extreme is not None:
extremes.append(f"{_fmt_num(low_extreme)}")
if high_extreme is not None:
extremes.append(f"{_fmt_num(high_extreme)}")
extremes_cell = " ".join(extremes) if extremes else ""
return {
"name": model._safe_str(name),
"n_tukey": n_tukey,
"pct_tukey": pct_tukey,
"n_z": n_z,
"pct_z": pct_z,
"lower_fence": lf,
"upper_fence": uf,
"extremes": extremes_cell,
"box": box,
"fliers": fliers,
"has_raw": bool(vals),
"contamination": contamination if isinstance(contamination, (int, float)) else -1.0,
}
def _univariate_table(rows: list) -> model.DataTable:
header = ["Columna", "Atípicos Tukey", "% Tukey", "Atípicos z", "% z",
"Valla inf.", "Valla sup.", "Extremos atípicos"]
table_rows = []
for r in rows:
table_rows.append([
r["name"],
_fmt_int(r["n_tukey"]) if r["n_tukey"] is not None else "",
_fmt_pct(r["pct_tukey"]) if r["pct_tukey"] is not None else "",
_fmt_int(r["n_z"]) if r["n_z"] is not None else "",
_fmt_pct(r["pct_z"]) if r["pct_z"] is not None else "",
_fmt_num(r["lower_fence"]),
_fmt_num(r["upper_fence"]),
r["extremes"],
])
return model.DataTable(
header=header, rows=table_rows,
title="Valores atípicos por columna",
note="Tukey = fuera de las vallas 1,5·IQR · z = |z-score| > 3 · "
"ordenado de más a menos contaminada")
# --------------------------------------------------------------------------- #
# Multivariate (Isolation Forest) section.
# --------------------------------------------------------------------------- #
def _resolve_multivariate(profile: dict, ctx: dict, raw_numeric):
"""Return (outliers_dict_or_None, source).
Prefers a LIVE Isolation Forest over ``raw_numeric`` so the detector and
``summarize_outlier_dims`` use EXACTLY the same numeric columns and the same
valid-row indexing — otherwise the precomputed ``profile['models']
['outliers']`` (run by MODELOS over a possibly different column subset) would
yield ``row_index`` values that no longer point at the rows
``summarize_outlier_dims`` reconstructs, mislabelling the "dimensions that
make each row rare". Falls back to the precomputed block when no raw sample
is available (e.g. the lite preset drops ``raw_numeric``)."""
if _is_dict(raw_numeric) and raw_numeric:
iso = _load_isolation_forest()
if iso is not None:
try:
out = iso(raw_numeric)
if _is_dict(out) and out.get("n_outliers") is not None and out.get("n_rows_used"):
return out, "live"
except Exception: # noqa: BLE001
pass
# Fallback: the model the MODELOS chapter already computed (no raw sample to
# recompute against, so no per-row dimension breakdown either).
models = profile.get("models") if _is_dict(profile.get("models")) else {}
pre = models.get("outliers") if _is_dict(models) else None
if _is_dict(pre) and pre.get("n_outliers") is not None and pre.get("n_rows_used"):
return pre, "precomputed"
return None, "none"
def _multivariate_blocks(outliers: dict, raw_numeric, mark: bool) -> list:
isof = _term(mark, "isolation_forest", "**Isolation Forest**")
blocks = [
model.Heading(text="Filas atípicas (multivariante)", level=2),
model.Markdown(text=(
f"Hasta aquí cada columna se ha mirado por separado. {isof} busca "
"filas raras considerando **todas las columnas a la vez**: una fila "
"puede ser normal en cada variable y aun así ser atípica por la "
"**combinación** de sus valores (p. ej. una edad baja con una tarifa "
"muy alta). La tabla resume cuántas filas se marcaron y el umbral de "
"decisión.")),
model.KVTable(rows=[
("Filas analizadas", _fmt_int(outliers.get("n_rows_used"))),
("Columnas consideradas", _fmt_int(outliers.get("n_features"))),
("Filas atípicas", _fmt_int(outliers.get("n_outliers"))),
("% filas atípicas", _fmt_pct(outliers.get("outlier_pct"))),
("Umbral de decisión", _fmt_num(outliers.get("threshold"), 4)),
], title="Anomalías multivariantes"),
]
rows_in = outliers.get("outlier_rows") or []
if not rows_in:
return blocks
# Enrich each anomalous row with the dimensions that make it rare, when the
# raw sample is available (summarize_outlier_dims reconstructs the same
# valid-row indexing as isolation_forest_outliers).
dims_by_row = {}
if _is_dict(raw_numeric) and raw_numeric:
summ = _load_summarize_dims()
if summ is not None:
try:
enriched = summ(raw_numeric, rows_in, top_k=3) or []
for e in enriched:
if _is_dict(e) and e.get("row_index") is not None:
dims_by_row[e.get("row_index")] = e.get("dims") or []
except Exception: # noqa: BLE001
dims_by_row = {}
has_dims = bool(dims_by_row)
header = ["Fila (entre válidas)", "Score"]
if has_dims:
header.append("Dimensiones que la hacen rara (col = valor, z)")
table_rows = []
for r in rows_in[:_TOP_ROWS]:
if not _is_dict(r):
continue
ridx = r.get("row_index")
cells = [_fmt_int(ridx), _fmt_num(r.get("score"), 4)]
if has_dims:
dims = dims_by_row.get(ridx) or []
parts = []
for d in dims:
if not _is_dict(d):
continue
parts.append(
f"{model._safe_str(d.get('col'))} = {_fmt_num(d.get('value'))} "
f"(z {_fmt_num(d.get('z'), 2)})")
cells.append("; ".join(parts) if parts else "")
table_rows.append(cells)
if table_rows:
shown = len(table_rows)
total = outliers.get("n_outliers")
note = "las filas más anómalas primero (score más bajo = más rara)"
if isinstance(total, int) and total > shown:
note += f" — top {shown} de {total}"
if not has_dims:
note += (" · no se pudo recuperar la muestra cruda para explicar las "
"dimensiones de cada fila")
blocks.append(model.DataTable(
header=header, rows=table_rows,
title="Filas más atípicas", note=note))
return blocks
# --------------------------------------------------------------------------- #
# Interpretation section.
# --------------------------------------------------------------------------- #
def _interpretation_block(mark: bool) -> model.Markdown:
outlier = _term(mark, "outlier", "atípico")
text = (
f"**Un {outlier} no es necesariamente un error.** Conviene distinguir "
"dos casos antes de actuar:\n\n"
"- **Error de dato** (medida, registro o unidad equivocada): una edad de "
"200 años, un importe negativo donde no puede haberlo, un decimal "
"desplazado. Estos sí se corrigen o se eliminan, idealmente en el origen.\n"
"- **Dato real extremo**: una observación legítima de la cola de la "
"distribución (un cliente que gasta mucho más, una tarifa de lujo, un día "
"de ventas excepcional). Borrarla sesga el análisis y oculta información "
"valiosa.\n\n"
"**Qué hacer.** Primero, **revisar** los valores señalados arriba contra "
"su origen para decidir cuál de los dos casos es. Si son errores, "
"corregirlos. Si son datos reales que distorsionan medias y modelos, hay "
"alternativas a borrarlos: **winsorizar** (recortar los extremos a un "
"percentil), o **re-expresar** la variable (por ejemplo una "
"transformación logarítmica o la escalera de re-expresión de Tukey que "
"este mismo perfil ya calcula para las columnas asimétricas), que suele "
"domar la cola sin perder ninguna fila. La elección depende del objetivo: "
"esta lectura es **exploratoria** —orienta dónde mirar—, no una regla "
"automática de limpieza.")
return model.Markdown(text=text)
# --------------------------------------------------------------------------- #
# Entry point.
# --------------------------------------------------------------------------- #
def build_outliers(profile: dict, ctx: dict):
"""Build the OUTLIERS Chapter, or None if the dataset has no numeric column."""
profile = profile or {}
ctx = ctx or {}
if not isinstance(profile, dict):
return None
numerics = _numeric_columns(profile)
if not numerics:
return None # chapter does not apply to a dataset with no numerics.
# Register glossary terms (if a collector is present) and mark them clickable.
glossary = ctx.get("glossary")
mark = False
if isinstance(glossary, model.GlossaryCollector):
for key, (label, definition) in _TERM_DEFS.items():
glossary.add(key, label, definition)
mark = True
raw_numeric = ctx.get("raw_numeric")
raw_numeric = raw_numeric if isinstance(raw_numeric, dict) else {}
box_fn = _load_build_boxplot_stats()
detect_fn = _load_detect_outliers()
# --- Univariate summary ------------------------------------------------- #
uni_rows = []
for name, numeric in numerics:
uni_rows.append(_univariate_row(
name, numeric, raw_numeric.get(name), box_fn, detect_fn))
# Rank columns by contamination (Tukey % when available, else z %).
uni_rows.sort(key=lambda r: r.get("contamination", -1.0), reverse=True)
intro = (
"Este capítulo reúne en un solo sitio el análisis de los **valores "
"atípicos** de la tabla, que en el resto del informe aparecen dispersos. "
f"Un {_term(mark, 'outlier', 'atípico')} es una observación que se aparta "
"mucho del grueso de los datos. Cada columna numérica se evalúa con dos "
f"criterios complementarios: las {_term(mark, 'tukey_fence', 'vallas de Tukey')} "
"(fuera de P251,5·IQR o P75+1,5·IQR, robusto a la propia cola) y el "
f"{_term(mark, 'zscore', 'z-score')} (|z| > 3, sensible a la media). La "
"tabla está ordenada de la columna más contaminada a la menos.")
blocks = [
model.Heading(text=CHAPTER_TITLE, level=1),
model.Markdown(text=intro),
_univariate_table(uni_rows),
]
# Flag the most contaminated columns explicitly.
flagged = [r["name"] for r in uni_rows
if r.get("contamination", -1.0) > 0][:_TOP_FLAGGED]
if flagged:
names = ", ".join(f"**{n}**" for n in flagged)
blocks.append(model.Markdown(text=(
f"Las columnas con mayor proporción de atípicos son {names}: "
"concentran el grueso de los valores fuera de las vallas y son las "
"primeras a revisar.")))
# --- Boxplots figure ---------------------------------------------------- #
box_entries = [
{"name": r["name"], "box": r["box"], "fliers": r.get("fliers")}
for r in uni_rows
if r.get("box")
][:_TOP_BOX]
if box_entries:
def _boxplots_make(entries=box_entries):
try:
from datascience.build_boxplots_figure import build_boxplots_figure
return build_boxplots_figure(
entries, title="Boxplots de Tukey por columna",
max_boxes=_TOP_BOX)
except Exception: # noqa: BLE001 — minimal fallback figure.
import matplotlib
matplotlib.use("Agg")
from matplotlib.figure import Figure
fig = Figure(figsize=(5.0, 2.2))
ax = fig.add_subplot(111)
ax.text(0.5, 0.5, "(boxplots no disponibles)",
ha="center", va="center")
ax.axis("off")
return fig
blocks.append(model.Group(blocks=[
model.Heading(text="Boxplots", level=2),
model.Markdown(text=(
"Cada caja abarca del primer al tercer cuartil (P25P75), la línea "
"interior es la mediana y los bigotes llegan hasta 1,5·IQR; los "
"puntos son los valores que caen fuera de las vallas (atípicos por "
"Tukey).")),
model.Figure(
make=_boxplots_make,
caption="Boxplots de Tukey de las columnas más contaminadas."),
]))
# --- Multivariate ------------------------------------------------------- #
outliers, _src = _resolve_multivariate(profile, ctx, raw_numeric)
if outliers is not None:
blocks.extend(_multivariate_blocks(outliers, raw_numeric, mark))
else:
blocks.append(model.Heading(text="Filas atípicas (multivariante)", level=2))
blocks.append(model.Note(
"No se pudo analizar la anomalía multivariante: hacen falta al menos "
"dos columnas numéricas y la muestra cruda (o los modelos del perfil) "
"para correr Isolation Forest."))
# --- Interpretation ----------------------------------------------------- #
blocks.append(model.Heading(text="Cómo interpretar los atípicos", level=2))
blocks.append(_interpretation_block(mark))
return model.Chapter(id=CHAPTER_ID, title=CHAPTER_TITLE,
version=CHAPTER_VERSION, blocks=blocks)