feat: add BigQuery Python functions and BQClient type

Funciones CRUD completas para BigQuery: auth, datasets, tables, queries, jobs, routines, load/export. Tipo BQClient como wrapper del SDK oficial.
2026-04-07 18:45:02 +02:00
parent c311623a76
commit 690e68a542
33 changed files with 2720 additions and 0 deletions
@@ -0,0 +1,75 @@
+---
+name: bq_load_from_file
+kind: function
+lang: py
+domain: infra
+version: "1.0.0"
+purity: impure
+signature: "def bq_load_from_file(client: BQClient, file_path: str, dataset_id: str, table_id: str, source_format: str = 'CSV', write_disposition: str = 'WRITE_APPEND', autodetect: bool = True, skip_leading_rows: int = 0) -> dict"
+description: "Carga datos desde un archivo local a una tabla BigQuery usando load_table_from_file del SDK. Equivalente a bq_load_from_gcs pero para disco local."
+tags: [bigquery, gcp, load, file, google-cloud, python, etl]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_go_core"
+imports: [google-cloud-bigquery]
+params:
+  - name: client
+    desc: "cliente BQClient autenticado contra el proyecto GCP"
+  - name: file_path
+    desc: "ruta absoluta o relativa al archivo local a cargar"
+  - name: dataset_id
+    desc: "ID del dataset de destino en BigQuery"
+  - name: table_id
+    desc: "ID de la tabla de destino en BigQuery"
+  - name: source_format
+    desc: "formato del archivo fuente: CSV, NEWLINE_DELIMITED_JSON, AVRO, PARQUET, ORC"
+  - name: write_disposition
+    desc: "comportamiento si la tabla ya existe: WRITE_APPEND agrega, WRITE_TRUNCATE reemplaza, WRITE_EMPTY falla si hay datos"
+  - name: autodetect
+    desc: "si True, BigQuery infiere el schema automaticamente desde los datos"
+  - name: skip_leading_rows
+    desc: "numero de filas a ignorar al inicio del archivo (tipicamente 1 para saltar cabeceras CSV)"
+output: "dict con {job_id: ID del LoadJob, rows_loaded: filas cargadas, status: DONE o FAILED}"
+tested: false
+tests: []
+test_file_path: ""
+file_path: "python/functions/bigquery/queries.py"
+---
+
+## Ejemplo
+
+```python
+from bigquery.client import bq_auth
+from bigquery.queries import bq_load_from_file
+
+client = bq_auth("my-project")
+
+# Cargar CSV local con cabecera
+result = bq_load_from_file(
+    client,
+    "/tmp/export_users.csv",
+    "my_dataset", "users",
+    skip_leading_rows=1,
+    write_disposition="WRITE_TRUNCATE",
+)
+print(f"Cargadas {result['rows_loaded']} filas — job: {result['job_id']}")
+
+# Cargar JSONL local
+result = bq_load_from_file(
+    client,
+    "/data/events.jsonl",
+    "my_dataset", "events",
+    source_format="NEWLINE_DELIMITED_JSON",
+)
+```
+
+## Notas
+
+El archivo se abre en modo binario (`rb`) y se sube directamente al job de BigQuery.
+Para archivos muy grandes, preferir `bq_load_from_gcs` — subir primero a GCS y luego
+cargar desde ahi es mas eficiente y permite paralelismo.
+
+La funcion bloquea hasta que el job termina (`job.result()`). Los archivos Parquet y
+Avro no admiten `skip_leading_rows` — ese parametro solo aplica para CSV.