5a4f82cf76
- python/functions/bigquery/bq_auth.md - python/functions/bigquery/bq_load_from_file.md - python/functions/bigquery/bq_load_from_gcs.md - python/functions/bigquery/client.py - python/functions/bigquery/queries.py - python/functions/datascience/__init__.py - python/functions/datascience/decode_qr_image.py - python/functions/datascience/load_bq_table_to_duckdb.md - python/functions/datascience/load_bq_table_to_duckdb.py - python/functions/pipelines/profile_bq_table.md - ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.0 KiB
3.0 KiB
name, kind, lang, domain, version, purity, signature, description, tags, uses_functions, uses_types, returns, returns_optional, error_type, imports, params, output, tested, tests, test_file_path, file_path
| name | kind | lang | domain | version | purity | signature | description | tags | uses_functions | uses_types | returns | returns_optional | error_type | imports | params | output | tested | tests | test_file_path | file_path | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| bq_load_from_gcs | function | py | infra | 1.0.1 | impure | def bq_load_from_gcs(client: BQClient, uri: str | list[str], dataset_id: str, table_id: str, source_format: str = 'CSV', write_disposition: str = 'WRITE_APPEND', autodetect: bool = True, skip_leading_rows: int = 0) -> dict | Carga datos desde uno o varios URIs de Google Cloud Storage a una tabla BigQuery configurando un LoadJob. Espera la finalizacion del job. |
|
false | error_go_core |
|
|
dict con {job_id: ID del LoadJob, rows_loaded: filas cargadas, status: DONE o FAILED} | false | python/functions/bigquery/queries.py |
Ejemplo
from bigquery.client import bq_auth
from bigquery.queries import bq_load_from_gcs
client = bq_auth("my-project")
# Cargar un archivo CSV con cabecera
result = bq_load_from_gcs(
client,
"gs://my-bucket/data/users_2024.csv",
"my_dataset", "users",
skip_leading_rows=1,
)
print(f"Cargadas {result['rows_loaded']} filas — job: {result['job_id']}")
# Cargar multiples archivos Parquet reemplazando la tabla
result = bq_load_from_gcs(
client,
["gs://my-bucket/export/part_001.parquet", "gs://my-bucket/export/part_002.parquet"],
"my_dataset", "events",
source_format="PARQUET",
write_disposition="WRITE_TRUNCATE",
)
Notas
El job se ejecuta de forma asincrona en BigQuery; job.result() bloquea hasta completar.
Los wildcards en el URI (gs://bucket/prefix_*.csv) son resueltos por GCS — BigQuery
acepta la lista de archivos resultante como una sola carga atomica.
autodetect=True es conveniente pero puede inferir tipos incorrectamente para columnas
con valores nulos o mixtos. Para produccion, definir el schema explicitamente via
job_config.schema.
Capability growth log
- v1.0.1 (2026-07-02) — fix:
skip_leading_rowssolo se envía al LoadJobConfig cuandosource_formates CSV; BigQuery rechazaba el job para JSON/Avro/Parquet incluso con valor 0.