fn_registry/python/functions/datascience/build_eda_notebook.md at e4a36f1133c5196a952db5c13b6cce51e4202b20

Files

T

egutierrez 763e06c127 feat(browser): auto-commit con 178 cambios

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-06-20 18:22:23 +02:00

3.5 KiB

Raw Blame History

name, kind, lang, domain, version, purity, signature, description, tags, uses_functions, uses_types, returns, returns_optional, error_type, imports, params, output, tested, tests, test_file_path, file_path

name

kind

lang

domain

version

purity

signature

description

tags

uses_functions

uses_types

returns

returns_optional

error_type

imports

params

output

tested

tests

test_file_path

file_path

build_eda_notebook

function

datascience

1.0.0

impure

def build_eda_notebook(db_path: str, table: str, notebook_path: str, run_models: bool = False, run_llm: bool = False) -> dict

Genera un notebook Jupyter de EDA (nbformat v4) para una tabla DuckDB usando el grupo eda. Escribe el .ipynb a disco listo para abrir/ejecutar; no ejecuta el notebook. dict-no-throw.

eda

notebook

jupyter

datascience

duckdb

profiling

false

error_go_core

json

name	desc
db_path	Ruta al archivo DuckDB que contiene la tabla a perfilar. Se referencia dentro del notebook, no se abre en esta funcion.

name	desc
table	Nombre de la tabla DuckDB a perfilar.

name	desc
notebook_path	Ruta de salida del .ipynb. El directorio padre se crea si no existe.

name	desc
run_models	Si True, añade celda con prof['models'] (PCA explained_variance_ratio, kmeans best_k, outliers n_outliers) y pasa run_models=True a profile_table dentro del notebook. Default False.

name	desc
run_llm	Si True, añade celda que llama eda_llm_insights(prof) para insights generados por LLM. Default False.

dict. En exito {status:'ok', notebook_path:str, n_cells:int}. En error {status:'error', error:str}.

true

genera notebook ok

notebook es json nbformat valido

run_models añade celda de modelos

run_llm añade celda de insights

sin flags no añade celdas opcionales

crea directorio padre

python/functions/datascience/build_eda_notebook_test.py

python/functions/datascience/build_eda_notebook.py

Ejemplo

import sys, os
sys.path.insert(0, os.path.join("python", "functions"))
from datascience.build_eda_notebook import build_eda_notebook

r = build_eda_notebook(
    db_path="/home/enmanuel/data/ventas.duckdb",
    table="cubo_ventas",
    notebook_path="/tmp/eda_demo.ipynb",
    run_models=True,
    run_llm=False,
)
# {'status': 'ok', 'notebook_path': '/tmp/eda_demo.ipynb', 'n_cells': 10}
# Luego se abre/ejecuta en Jupyter; este paso solo escribe el .ipynb.

Cuando usarla

Cuando quieras entregar un EDA como notebook ejecutable (no un report estatico): perfilar una tabla DuckDB con el grupo eda y dejar un .ipynb listo. El notebook se lanza despues en Jupyter colaborativo con las funciones del grupo notebook (jupyter_discover / jupyter_exec / jupyter_write) y el usuario lo ve ejecutarse en vivo. Es la base de la entrega "analysis EDA".

Gotchas

Impura: escribe un archivo .ipynb a notebook_path (crea el directorio padre).
NO ejecuta el notebook: solo emite las celdas. La ejecucion la hace Jupyter despues.
Las celdas asumen que python/functions del registry esta accesible desde el kernel: el startup 00_fn_registry.py del analysis lo expone, o como fallback la primera celda inserta ~/fn_registry/python/functions en sys.path. Si el repo no esta ahi y el kernel no lo expone, las celdas de import fallaran al ejecutarse (no al generar).
profile_table se invoca con write_report=False dentro del notebook: no toca disco para reports, el perfil vive en la variable prof.
run_llm=True emite una celda que llama eda_llm_insights, que requiere token OAuth de Claude disponible para el kernel; sin el, esa celda fallara al ejecutarse.
dict-no-throw: cualquier fallo de escritura se devuelve como {status:'error', error}, no se propaga excepcion.

3.5 KiB Raw Blame History

Ejemplo

Cuando usarla

Gotchas

3.5 KiB

Raw Blame History