chore: auto-commit (95 archivos)
- cmd/fn/doctor.go - cmd/fn/main.go - cpp/apps/primitives_gallery/playground/tables/CMakeLists.txt - cpp/apps/primitives_gallery/playground/tables/data_table.cpp - cpp/apps/primitives_gallery/playground/tables/data_table_logic.cpp - cpp/apps/primitives_gallery/playground/tables/data_table_logic.h - cpp/apps/primitives_gallery/playground/tables/self_test.cpp - cpp/apps/primitives_gallery/playground/tables/tql.cpp - cpp/apps/primitives_gallery/playground/tables/viz.cpp - cpp/apps/primitives_gallery/playground/tables/viz.h - ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,61 @@
|
||||
---
|
||||
name: vault_csv_profile
|
||||
kind: function
|
||||
lang: py
|
||||
domain: datascience
|
||||
version: "1.0.0"
|
||||
purity: impure
|
||||
signature: "def vault_csv_profile(vault_path: str, rel_path: str, db_path: str | None = None) -> dict"
|
||||
description: "Perfila un CSV del vault: detecta encoding, lee schema con polars, extrae n_rows y columnas de fecha; persiste en csv_profiles y actualiza files_fts para búsqueda por contenido."
|
||||
tags: [vault, csv, profiling, polars, encoding, datascience, fts]
|
||||
uses_functions: []
|
||||
uses_types: []
|
||||
returns: []
|
||||
returns_optional: false
|
||||
error_type: "error_go_core"
|
||||
imports: [sqlite3, time, pathlib, json, polars, chardet]
|
||||
params:
|
||||
- name: vault_path
|
||||
desc: "Ruta absoluta a la raiz del vault donde vive el CSV y vault_index.db."
|
||||
- name: rel_path
|
||||
desc: "Ruta relativa al CSV dentro del vault (ej. 'data/raw/ventas.csv')."
|
||||
- name: db_path
|
||||
desc: "Override opcional de la ruta a vault_index.db. Por defecto <vault_path>/vault_index.db."
|
||||
output: "Dict con: rel_path (str), cols (list de {name, dtype}), n_rows (int), encoding (str), date_min/date_max (ISO yyyy-mm-dd o None), persisted (bool)."
|
||||
tested: true
|
||||
tests:
|
||||
- "test_csv_basic"
|
||||
- "test_csv_date_detection"
|
||||
- "test_csv_encoding_latin1"
|
||||
- "test_csv_empty"
|
||||
- "test_csv_persists_fts"
|
||||
test_file_path: "python/functions/datascience/tests/test_vault_csv_profile.py"
|
||||
file_path: "python/functions/datascience/vault_csv_profile.py"
|
||||
---
|
||||
|
||||
## Ejemplo
|
||||
|
||||
```python
|
||||
from vault_csv_profile import vault_csv_profile
|
||||
|
||||
result = vault_csv_profile("/vaults/mi_vault", "data/raw/ventas.csv")
|
||||
# {
|
||||
# "rel_path": "data/raw/ventas.csv",
|
||||
# "cols": [{"name": "fecha", "dtype": "String"}, {"name": "importe", "dtype": "Float64"}],
|
||||
# "n_rows": 1500,
|
||||
# "encoding": "utf-8",
|
||||
# "date_min": "2023-01-01",
|
||||
# "date_max": "2023-12-31",
|
||||
# "persisted": True
|
||||
# }
|
||||
```
|
||||
|
||||
## Notas
|
||||
|
||||
- Usa polars (lazy scan) como motor principal; pandas como fallback.
|
||||
- Detección de encoding: chardet con confianza >= 0.6, luego intentos utf-8-sig → utf-8 → latin-1 → cp1252.
|
||||
- Detección de fechas: columnas Date/Datetime nativas de polars, o columnas String con ≥80% de valores parseables como fecha.
|
||||
- El FTS text incluye nombres de columnas + primeras 5 filas concatenadas.
|
||||
- Upsert en csv_profiles por rel_path; el rowid de files_fts se ancla al rowid de la tabla files para que vault_search funcione correctamente.
|
||||
- Si vault_index.db no existe, la función retorna el dict sin intentar persistir (persisted=False).
|
||||
- Dependencias: polars, chardet (ambas instaladas en python/.venv con uv add).
|
||||
Reference in New Issue
Block a user