feat(eda): generadores de datasets sintéticos Faker que ejercitan el AutomaticEDA
Añade dos funciones impuras dict-no-throw, deterministas por seed, al dominio datascience (grupo eda): - generate_synthetic_eda_table: una tabla DuckDB de 19 columnas (numéricas correlacionadas + outliers, categóricas desbalanceadas, texto largo multi-idioma es/en/fr, fecha DATE, lat/lon válidas, PII email/iban/phone/uuid, nulos con patrón MCAR/MAR co-ocurrentes). Activa 14 capítulos del motor AutomaticEDA (num_distr, cat_distr, text_distr, calidad, missingness, correlacion, relaciones, modelos, timeseries, geospatial, agregacion, glosario + portada/overview). - generate_synthetic_eda_folder: 3 CSV relacionados (customers/orders/reviews) con FK customer detectable por containment, para el EDA de carpeta multi-tabla. Determinismo via Faker.seed_instance + numpy.default_rng. Tests: 16 passed (incluye determinismo por hash, rangos lat/lon, co-nulos income/spending, mediana palabras review >=20, phone formato internacional, FK containment). Añade faker (40.27.0) a python/pyproject.toml + uv.lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -77,8 +77,12 @@ from .add_pdf_internal_links import add_pdf_internal_links
|
||||
from .suggest_intratable_fk_candidates import suggest_intratable_fk_candidates
|
||||
from .render_paper_pdf import render_paper_pdf
|
||||
from .draw_join_graph_figure import draw_join_graph_figure
|
||||
from .generate_synthetic_eda_table import generate_synthetic_eda_table
|
||||
from .generate_synthetic_eda_folder import generate_synthetic_eda_folder
|
||||
|
||||
__all__ = [
|
||||
"generate_synthetic_eda_table",
|
||||
"generate_synthetic_eda_folder",
|
||||
"render_paper_pdf",
|
||||
"draw_join_graph_figure",
|
||||
"suggest_intratable_fk_candidates",
|
||||
|
||||
Reference in New Issue
Block a user