feat(eda): funciones de agregación/OLAP para AutomaticEDA (groupby/pivot push-down + selección LLM)

Cuatro funciones nuevas del grupo eda que nutren el capítulo AGREGACION:
- select_groupby_keys (pure): elige categóricas agrupables + numéricas medida desde el TableProfile.
- groupby_stats_duckdb (impure): GROUP BY push-down en DuckDB (count/mean/median/std/min/max por grupo).
- pivot_table_duckdb (impure): pivot A×B push-down, limitado a top filas/cols para no cortar.
- suggest_aggregations_llm (impure): el LLM elige las agregaciones interesantes con fallback determinista.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-30 15:33:55 +02:00
parent 415154d9a3
commit 96da9e3015
13 changed files with 2146 additions and 0 deletions
+6
View File
@@ -25,6 +25,7 @@ from .describe_numeric import describe_numeric
from .summarize_categorical import summarize_categorical
from .infer_semantic_type import infer_semantic_type
from .column_quality_score import column_quality_score
from .select_groupby_keys import select_groupby_keys
from .render_eda_markdown import render_eda_markdown
from .detect_distribution_type import detect_distribution_type
from .spearman_corr import spearman_corr
@@ -36,6 +37,8 @@ from .infer_fk_containment_duckdb import infer_fk_containment_duckdb
from .build_join_graph import build_join_graph
from .association_matrix import association_matrix
from .correlation_matrix_duckdb import correlation_matrix_duckdb
from .pivot_table_duckdb import pivot_table_duckdb
from .groupby_stats_duckdb import groupby_stats_duckdb
from .pca_explained import pca_explained
from .kmeans_segments import kmeans_segments
from .isolation_forest_outliers import isolation_forest_outliers
@@ -82,6 +85,8 @@ __all__ = [
"build_join_graph",
"association_matrix",
"correlation_matrix_duckdb",
"pivot_table_duckdb",
"groupby_stats_duckdb",
"pca_explained",
"kmeans_segments",
"isolation_forest_outliers",
@@ -96,6 +101,7 @@ __all__ = [
"summarize_categorical",
"infer_semantic_type",
"column_quality_score",
"select_groupby_keys",
"render_eda_markdown",
"detect_distribution_type",
"pull_gsc_search_analytics",