chore: auto-commit (26 archivos)

- python/functions/bigquery/bq_auth.md - python/functions/bigquery/bq_load_from_file.md - python/functions/bigquery/bq_load_from_gcs.md - python/functions/bigquery/client.py - python/functions/bigquery/queries.py - python/functions/datascience/__init__.py - python/functions/datascience/decode_qr_image.py - python/functions/datascience/load_bq_table_to_duckdb.md - python/functions/datascience/load_bq_table_to_duckdb.py - python/functions/pipelines/profile_bq_table.md - ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-07-02 19:00:13 +02:00
parent 2ebc9efeb2
commit 5a4f82cf76
26 changed files with 2573 additions and 94 deletions
@@ -3,11 +3,11 @@ name: bq_auth
 kind: function
 lang: py
 domain: infra
-version: "1.0.0"
+version: "1.1.0"
 purity: impure
-signature: "def bq_auth(project_id: str = '', credentials_path: str = '') -> BQClient"
-description: "Autentica contra Google BigQuery con ADC o service account JSON. Retorna un BQClient listo para usar con todas las funciones CRUD."
-tags: [bigquery, gcp, auth, google-cloud, python, pendiente-usar]
+signature: "def bq_auth(project_id: str = '', credentials_path: str = '', drop_quota_project: bool = False) -> BQClient"
+description: "Autentica contra Google BigQuery con ADC o service account JSON. Retorna un BQClient listo para usar con todas las funciones CRUD. Con drop_quota_project=True descarta el quota project del ADC del usuario (creds.with_quota_project(None)) para evitar el 403 USER_PROJECT_DENIED cuando el ADC lleva un quota_project_id ajeno."
+tags: [bigquery, gcp, auth, google-cloud, python, forecast, pendiente-usar]
 uses_functions: []
 uses_types: []
 returns: []
@@ -19,6 +19,8 @@ params:
    desc: "ID del proyecto GCP (vacio = detectar de credenciales/entorno)"
  - name: credentials_path
    desc: "ruta a archivo JSON de service account (vacio = Application Default Credentials)"
+  - name: drop_quota_project
+    desc: "si True y sin credentials_path, resuelve ADC via google.auth.default y descarta el quota project del ADC (with_quota_project(None)); evita el 403 USER_PROJECT_DENIED cuando el ADC del usuario lleva un quota_project_id ajeno. Default False = comportamiento original"
 output: "BQClient: cliente autenticado con proyecto resuelto"
 tested: false
 tests: []
@@ -40,6 +42,9 @@ client = bq_auth("my-project-id")
 # Service account
 client = bq_auth(credentials_path="/path/to/service-account.json")

+# Sin quota project (evita 403 USER_PROJECT_DENIED con ADC de usuario)
+client = bq_auth("autingo-159109", drop_quota_project=True)
+
 # Context manager
 with bq_auth() as client:
    # client se cierra automaticamente
@@ -48,9 +53,14 @@ with bq_auth() as client:

 ## Notas

-Tres modos de autenticacion:
+Modos de autenticacion:
 - Sin argumentos: usa Application Default Credentials (ADC) — requiere `gcloud auth application-default login`
 - Con project_id: usa ADC pero fuerza el proyecto
 - Con credentials_path: lee el JSON de service account directamente
+- Con drop_quota_project=True (y sin credentials_path): resuelve ADC via `google.auth.default(scopes=[".../bigquery"])`, aplica `creds.with_quota_project(None)` si el atributo existe y construye el cliente con ese creds. Es el fix del gotcha conocido: el ADC del usuario (`egutierrez`) lleva `quota_project_id=autingo` ajeno y BigQuery devuelve `403 USER_PROJECT_DENIED`; descartar el quota project lo resuelve.

 El BQClient wrappea `google.cloud.bigquery.Client` y expone `_client` para que las funciones del modulo lo usen internamente.
+
+## Capability growth log
+
+- v1.1.0 (2026-07-02) — anade `drop_quota_project` para descartar el quota project del ADC del usuario (`creds.with_quota_project(None)`) y evitar el 403 USER_PROJECT_DENIED. Default False = comportamiento identico al anterior.
@@ -3,7 +3,7 @@ name: bq_load_from_file
 kind: function
 lang: py
 domain: infra
-version: "1.0.0"
+version: "1.0.1"
 purity: impure
 signature: "def bq_load_from_file(client: BQClient, file_path: str, dataset_id: str, table_id: str, source_format: str = 'CSV', write_disposition: str = 'WRITE_APPEND', autodetect: bool = True, skip_leading_rows: int = 0) -> dict"
 description: "Carga datos desde un archivo local a una tabla BigQuery usando load_table_from_file del SDK. Equivalente a bq_load_from_gcs pero para disco local."
@@ -73,3 +73,7 @@ cargar desde ahi es mas eficiente y permite paralelismo.

 La funcion bloquea hasta que el job termina (`job.result()`). Los archivos Parquet y
 Avro no admiten `skip_leading_rows` — ese parametro solo aplica para CSV.
+
+## Capability growth log
+
+- v1.0.1 (2026-07-02) — fix: `skip_leading_rows` solo se envía al LoadJobConfig cuando `source_format` es CSV; BigQuery rechazaba el job para JSON/Avro/Parquet incluso con valor 0.
@@ -3,7 +3,7 @@ name: bq_load_from_gcs
 kind: function
 lang: py
 domain: infra
-version: "1.0.0"
+version: "1.0.1"
 purity: impure
 signature: "def bq_load_from_gcs(client: BQClient, uri: str | list[str], dataset_id: str, table_id: str, source_format: str = 'CSV', write_disposition: str = 'WRITE_APPEND', autodetect: bool = True, skip_leading_rows: int = 0) -> dict"
 description: "Carga datos desde uno o varios URIs de Google Cloud Storage a una tabla BigQuery configurando un LoadJob. Espera la finalizacion del job."
@@ -75,3 +75,7 @@ acepta la lista de archivos resultante como una sola carga atomica.
 `autodetect=True` es conveniente pero puede inferir tipos incorrectamente para columnas
 con valores nulos o mixtos. Para produccion, definir el schema explicitamente via
 `job_config.schema`.
+
+## Capability growth log
+
+- v1.0.1 (2026-07-02) — fix: `skip_leading_rows` solo se envía al LoadJobConfig cuando `source_format` es CSV; BigQuery rechazaba el job para JSON/Avro/Parquet incluso con valor 0.
@@ -1,6 +1,7 @@
 """Cliente base para Google BigQuery."""

 from dataclasses import dataclass, field
+import google.auth
 from google.cloud import bigquery
 from google.oauth2 import service_account

@@ -27,7 +28,11 @@ class BQClient:
        self.close()


-def bq_auth(project_id: str = "", credentials_path: str = "") -> BQClient:
+def bq_auth(
+    project_id: str = "",
+    credentials_path: str = "",
+    drop_quota_project: bool = False,
+) -> BQClient:
    """Autentica contra Google BigQuery.

    Tres modos de autenticacion:
@@ -35,9 +40,18 @@ def bq_auth(project_id: str = "", credentials_path: str = "") -> BQClient:
    2. Service account JSON: con credentials_path
    3. Proyecto explicito: con project_id (usa ADC para credenciales)

+    Con drop_quota_project=True (y sin credentials_path) resuelve las credenciales
+    ADC via google.auth.default y elimina el quota project fijado en el ADC del
+    usuario (creds.with_quota_project(None)). Esto evita el error 403
+    USER_PROJECT_DENIED cuando el ADC lleva un quota_project_id ajeno al proyecto
+    contra el que se consulta.
+
    Args:
        project_id: ID del proyecto GCP. Vacio = detectar de credenciales.
        credentials_path: Ruta a archivo JSON de service account. Vacio = ADC.
+        drop_quota_project: Si True y sin credentials_path, resuelve ADC con
+            google.auth.default y descarta el quota project del ADC
+            (with_quota_project(None)). Default False = comportamiento original.

    Returns:
        BQClient autenticado listo para usar.
@@ -50,11 +64,19 @@ def bq_auth(project_id: str = "", credentials_path: str = "") -> BQClient:
        >>> client = bq_auth()  # ADC
        >>> client = bq_auth("my-project")  # ADC con proyecto explicito
        >>> client = bq_auth(credentials_path="/path/to/sa.json")  # Service account
+        >>> client = bq_auth("autingo-159109", drop_quota_project=True)  # sin quota project
    """
    if credentials_path:
        creds = service_account.Credentials.from_service_account_file(credentials_path)
        proj = project_id or creds.project_id
        client = bigquery.Client(credentials=creds, project=proj)
+    elif drop_quota_project:
+        creds, adc_project = google.auth.default(
+            scopes=["https://www.googleapis.com/auth/bigquery"]
+        )
+        if hasattr(creds, "with_quota_project"):
+            creds = creds.with_quota_project(None)
+        client = bigquery.Client(project=project_id or adc_project, credentials=creds)
    elif project_id:
        client = bigquery.Client(project=project_id)
    else:
@@ -173,11 +173,14 @@ def bq_load_from_gcs(

    job_config = bigquery.LoadJobConfig(
        source_format=format_map.get(source_format, bigquery.SourceFormat.CSV),
-        write_disposition=disposition_map.get(source_format, bigquery.WriteDisposition.WRITE_APPEND),
+        write_disposition=disposition_map.get(write_disposition, bigquery.WriteDisposition.WRITE_APPEND),
        autodetect=autodetect,
-        skip_leading_rows=skip_leading_rows,
    )
-    job_config.write_disposition = disposition_map.get(write_disposition, bigquery.WriteDisposition.WRITE_APPEND)
+    # skip_leading_rows solo es valido para CSV: BigQuery rechaza el job
+    # ("Only CSV imports may specify leading rows to skip") si el campo va
+    # seteado con cualquier otro formato, incluso a 0.
+    if source_format == "CSV":
+        job_config.skip_leading_rows = skip_leading_rows

    table_ref = client._client.dataset(dataset_id).table(table_id)
    uris = uri if isinstance(uri, list) else [uri]
@@ -251,8 +254,12 @@ def bq_load_from_file(
        source_format=format_map.get(source_format, bigquery.SourceFormat.CSV),
        write_disposition=disposition_map.get(write_disposition, bigquery.WriteDisposition.WRITE_APPEND),
        autodetect=autodetect,
-        skip_leading_rows=skip_leading_rows,
    )
+    # skip_leading_rows solo es valido para CSV: BigQuery rechaza el job
+    # ("Only CSV imports may specify leading rows to skip") si el campo va
+    # seteado con cualquier otro formato, incluso a 0.
+    if source_format == "CSV":
+        job_config.skip_leading_rows = skip_leading_rows

    table_ref = client._client.dataset(dataset_id).table(table_id)