feat(eda): generadores de datasets sintéticos Faker que ejercitan el AutomaticEDA

Añade dos funciones impuras dict-no-throw, deterministas por seed, al dominio
datascience (grupo eda):

- generate_synthetic_eda_table: una tabla DuckDB de 19 columnas (numéricas
  correlacionadas + outliers, categóricas desbalanceadas, texto largo
  multi-idioma es/en/fr, fecha DATE, lat/lon válidas, PII email/iban/phone/uuid,
  nulos con patrón MCAR/MAR co-ocurrentes). Activa 14 capítulos del motor
  AutomaticEDA (num_distr, cat_distr, text_distr, calidad, missingness,
  correlacion, relaciones, modelos, timeseries, geospatial, agregacion,
  glosario + portada/overview).
- generate_synthetic_eda_folder: 3 CSV relacionados (customers/orders/reviews)
  con FK customer detectable por containment, para el EDA de carpeta multi-tabla.

Determinismo via Faker.seed_instance + numpy.default_rng. Tests: 16 passed
(incluye determinismo por hash, rangos lat/lon, co-nulos income/spending,
mediana palabras review >=20, phone formato internacional, FK containment).

Añade faker (40.27.0) a python/pyproject.toml + uv.lock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-30 21:25:31 +02:00
parent 792b890195
commit ea6678ec23
9 changed files with 872 additions and 0 deletions
+4
View File
@@ -77,8 +77,12 @@ from .add_pdf_internal_links import add_pdf_internal_links
from .suggest_intratable_fk_candidates import suggest_intratable_fk_candidates
from .render_paper_pdf import render_paper_pdf
from .draw_join_graph_figure import draw_join_graph_figure
from .generate_synthetic_eda_table import generate_synthetic_eda_table
from .generate_synthetic_eda_folder import generate_synthetic_eda_folder
__all__ = [
"generate_synthetic_eda_table",
"generate_synthetic_eda_folder",
"render_paper_pdf",
"draw_join_graph_figure",
"suggest_intratable_fk_candidates",