Files

T

egutierrez a03675113a chore: auto-commit (286 archivos)

- .claude/agents/fn-orquestador/SKILL.md
- .claude/commands/fn_claude.md
- .claude/rules/INDEX.md
- .claude/rules/cpp_apps.md
- .claude/rules/ids_naming.md
- CHANGELOG.md
- apps/dag_engine/README.md
- apps/dag_engine/api.go
- apps/dag_engine/dags_migrated/example.yaml
- apps/dag_engine/dags_migrated/example_lineage_tracking.yaml
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 16:33:22 +02:00

8.7 KiB

Raw Blame History

Capability: bigquery

Operar Google BigQuery via SDK Python google-cloud-bigquery y Go fetch_data_frame. Cubre: auth (bq_auth), queries con caching, gestion de datasets/tables (create/copy/delete/get/list), jobs (cancel, wait), schemas, exports a GCS, y carga a DataFrames (fetch_data_frame, polars).

Funciones

ID	Firma	Que hace
`bq_auth_py_infra`	`def bq_auth(project_id: str = '', credentials_path: str = '') -> BQClient`	Autentica contra Google BigQuery con ADC o service account JSON. Retorna un BQClient listo para usar con todas las funciones CRUD.
`bq_cancel_job_py_infra`	`def bq_cancel_job(client: BQClient, job_id: str) -> dict`	Cancela un job en ejecucion. Retorna el estado tras la cancelacion.
`bq_copy_table_py_infra`	`def bq_copy_table(client: BQClient, source_dataset: str, source_table: str, dest_dataset: str, dest_table: str, write_disposition: str = 'WRITE_EMPTY') -> dict`	Copia una tabla BigQuery a otro dataset o tabla dentro del mismo proyecto usando copy_table del SDK. Espera la finalizacion del CopyJob.
`bq_create_dataset_py_infra`	`def bq_create_dataset(client: BQClient, dataset_id: str, location: str = 'US', description: str = '', labels: dict[str, str] \| None = None, default_table_expiration_ms: int = 0) -> dict`	Crea un dataset en Google BigQuery con ubicacion, descripcion y labels. Usa client._client.create_dataset() del SDK oficial.
`bq_create_routine_py_infra`	`def bq_create_routine(client: BQClient, dataset_id: str, routine_id: str, body: str, routine_type: str = 'SCALAR_FUNCTION', language: str = 'SQL', arguments: list[dict] \| None = None, return_type: str = '', description: str = '') -> dict`	Crea una routine (UDF scalar, tabla o stored procedure) en BigQuery. Soporta SQL, JavaScript y Python.
`bq_create_table_py_infra`	`def bq_create_table(client: BQClient, dataset_id: str, table_id: str, schema: list[dict], partitioning: dict \| None = None, clustering: list[str] \| None = None, description: str = '', labels: dict \| None = None) -> dict`	Crea una tabla en BigQuery con schema, particionamiento opcional y clustering. Usa client._client.create_table() del SDK oficial.
`bq_delete_dataset_py_infra`	`def bq_delete_dataset(client: BQClient, dataset_id: str, delete_contents: bool = False) -> None`	Elimina un dataset de Google BigQuery. IRREVERSIBLE. Usa client._client.delete_dataset() del SDK oficial.
`bq_delete_routine_py_infra`	`def bq_delete_routine(client: BQClient, dataset_id: str, routine_id: str) -> None`	Elimina una routine de un dataset.
`bq_delete_table_py_infra`	`def bq_delete_table(client: BQClient, dataset_id: str, table_id: str) -> None`	Elimina permanentemente una tabla de BigQuery. IRREVERSIBLE. Usa client._client.delete_table() del SDK oficial.
`bq_export_to_gcs_py_infra`	`def bq_export_to_gcs(client: BQClient, dataset_id: str, table_id: str, destination_uri: str, destination_format: str = 'CSV', compression: str = 'NONE') -> dict`	Exporta una tabla BigQuery a Google Cloud Storage usando extract_table del SDK. Soporta CSV, JSON, Avro y Parquet con compresion opcional.
`bq_get_dataset_py_infra`	`def bq_get_dataset(client: BQClient, dataset_id: str) -> dict`	Obtiene los detalles completos de un dataset de Google BigQuery. Usa client._client.get_dataset() del SDK oficial.
`bq_get_job_py_infra`	`def bq_get_job(client: BQClient, job_id: str) -> dict`	Obtiene detalles completos de un job por su ID incluyendo estado, bytes procesados y errores. Incluye campos adicionales respecto a bq_list_jobs: destination_table, query y lista de errores.
`bq_get_table_py_infra`	`def bq_get_table(client: BQClient, dataset_id: str, table_id: str) -> dict`	Obtiene los metadatos completos de una tabla BigQuery incluyendo schema, estadisticas y configuracion. Usa client._client.get_table() del SDK oficial.
`bq_insert_rows_py_infra`	`def bq_insert_rows(client: BQClient, dataset_id: str, table_id: str, rows: list[dict]) -> dict`	Inserta filas en una tabla BigQuery usando streaming insert (insert_rows_json). Retorna el conteo de filas insertadas y errores por fila.
`bq_list_datasets_py_infra`	`def bq_list_datasets(client: BQClient) -> list[dict]`	Lista todos los datasets del proyecto de Google BigQuery. Usa client._client.list_datasets() del SDK oficial.
`bq_list_jobs_py_infra`	`def bq_list_jobs(client: BQClient, state_filter: str = '', max_results: int = 50, all_users: bool = False) -> list[dict]`	Lista jobs del proyecto con filtro por estado (running, pending, done). Retorna una lista de dicts planos con metadatos de cada job.
`bq_list_routines_py_infra`	`def bq_list_routines(client: BQClient, dataset_id: str) -> list[dict]`	Lista routines de un dataset incluyendo tipo y lenguaje.
`bq_list_tables_py_infra`	`def bq_list_tables(client: BQClient, dataset_id: str) -> list[dict]`	Lista todas las tablas (y vistas) de un dataset BigQuery con informacion resumida. Usa client._client.list_tables() del SDK oficial.
`bq_load_from_file_py_infra`	`def bq_load_from_file(client: BQClient, file_path: str, dataset_id: str, table_id: str, source_format: str = 'CSV', write_disposition: str = 'WRITE_APPEND', autodetect: bool = True, skip_leading_rows: int = 0) -> dict`	Carga datos desde un archivo local a una tabla BigQuery usando load_table_from_file del SDK. Equivalente a bq_load_from_gcs pero para disco local.
`bq_load_from_gcs_py_infra`	`def bq_load_from_gcs(client: BQClient, uri: str \| list[str], dataset_id: str, table_id: str, source_format: str = 'CSV', write_disposition: str = 'WRITE_APPEND', autodetect: bool = True, skip_leading_rows: int = 0) -> dict`	Carga datos desde uno o varios URIs de Google Cloud Storage a una tabla BigQuery configurando un LoadJob. Espera la finalizacion del job.
`bq_preview_rows_py_infra`	`def bq_preview_rows(client: BQClient, dataset_id: str, table_id: str, max_results: int = 10) -> dict`	Obtiene una muestra de filas de una tabla BigQuery sin ejecutar query SQL, sin coste de procesamiento. Usa client._client.list_rows() del SDK oficial.
`bq_query_py_infra`	`def bq_query(client: BQClient, sql: str, params: list[dict] \| None = None, dry_run: bool = False) -> dict`	Ejecuta una query SQL en BigQuery con soporte para parametros tipados y modo dry-run para estimacion de costos.
`bq_update_dataset_py_infra`	`def bq_update_dataset(client: BQClient, dataset_id: str, description: str \| None = None, labels: dict[str, str] \| None = None, default_table_expiration_ms: int \| None = None) -> dict`	Actualiza campos de un dataset de Google BigQuery. Solo modifica los campos pasados explicitamente (no-None). Usa client._client.update_dataset() del SDK oficial.
`bq_update_table_py_infra`	`def bq_update_table(client: BQClient, dataset_id: str, table_id: str, schema: list[dict] \| None = None, description: str \| None = None, labels: dict \| None = None) -> dict`	Actualiza metadatos de una tabla BigQuery: schema (solo adicion de columnas), descripcion y etiquetas. Usa client._client.update_table() del SDK oficial.
`fetch_data_frame_go_datascience`	`func FetchDataFrame(dsn, query string) ([]map[string]any, error)`	Ejecuta una consulta SQL contra un DSN y retorna los resultados como slice de mapas columna-valor.
`metabase_smartscalar_kpi_sql_py_infra`	`def metabase_smartscalar_kpi_sql(*, act_expr: str, n1_expr: str, body_sql: str, date_expr: str = 'MIN(fecha)') -> str`	Envuelve agregaciones actual+n-1 en el patron de 2 filas (periodo, valor) que el display smartscalar de Metabase v0.59 requiere para mostrar comparacion vs ano anterior sin pedir breakout temporal. Genera SQL nativo BigQuery con UNION ALL d_min/d_min-52w.

Ejemplo canonico

Query + cargar a polars

import os, sys
sys.path.insert(0, os.path.join(os.environ["FN_REGISTRY_ROOT"], "python", "functions"))
from infra import bq_auth, bq_query

client = bq_auth(project_id="my-gcp-project")
df = bq_query(client, "SELECT * FROM `my-gcp-project.dataset.table` LIMIT 1000")
print(df.shape)

Gestion de tablas

./fn run bq_list_tables --dataset analytics
./fn run bq_create_table --dataset analytics --table events --schema-file schema.json
./fn run bq_copy_table --src analytics.events --dst analytics.events_backup

Carga a DataFrame Go (polars/pandas-go)

./fn run fetch_data_frame --query "SELECT date, sales FROM analytics.daily" --output sales.parquet

Fronteras

NO maneja Auth ADC interactivo. Asume credenciales via service account JSON o gcloud auth application-default login ya ejecutado.
NO billing analytics. Para coste/quota analysis, BigQuery UI o Cloud Console.
NO orquesta Dataflow ni Dataproc. Solo SQL queries + table ops.
Comparte funciones con tags gcp y google-cloud (mismas funciones, multi-tag).

8.7 KiB Raw Blame History