fad4006f60
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
104 lines
4.2 KiB
Markdown
104 lines
4.2 KiB
Markdown
---
|
|
id: "54"
|
|
title: "online_data_recopilation — odr_console MVP (lanzador GUI + 5-pasos + 1 collector)"
|
|
status: pendiente
|
|
type: feature
|
|
domain: []
|
|
scope: multi-app
|
|
priority: alta
|
|
depends: []
|
|
blocks: []
|
|
related: []
|
|
created: 2026-05-09
|
|
updated: 2026-05-17
|
|
tags: []
|
|
---
|
|
|
|
## Objetivo
|
|
|
|
App C++ ImGui en `projects/online_data_recopilation/apps/odr_console/` que:
|
|
|
|
1. Lanza cualquier funcion/pipeline del registry desde panel GUI con form auto-generado (params_schema).
|
|
2. Implementa el bucle reactivo de 5 pasos sobre `operations.db` propia.
|
|
3. Reusa jobs system del registry (issue 0065) para concurrencia.
|
|
4. Reusa enricher protocol + `cdp-cli` + funciones Python `fetch_webpage`/`web_search`/etc de osint_graph.
|
|
|
|
## Decisiones tomadas
|
|
|
|
| Tema | Decision |
|
|
|---|---|
|
|
| Workers default | 4 |
|
|
| operations.db | Una unica por la app |
|
|
| DuckDB | Embebido (linkar libduckdb) |
|
|
| Collectors lang | Python primero; bash/go en futuras issues |
|
|
| Browser | CDP via `cdp-cli` (issue 0038) |
|
|
| Concurrencia | jobs_pool_cpp_core (issue 0065) |
|
|
| TBD | Obligatorio (regla apps_tbd) — sub-repo `dataforge/odr_console` |
|
|
|
|
## Alcance MVP (este issue)
|
|
|
|
### Esqueleto codigo
|
|
|
|
- `main.cpp` — `fn::run_app` con AppConfig + render() + paneles.
|
|
- `data_registry.cpp/h` — abre `registry.db` RO, expone `search(query)`, `get_function(id)`.
|
|
- `data_operations.cpp/h` — abre `operations.db` RW, CRUD de relations/executions/entities/types_snapshot/assertions/assertion_results.
|
|
- `data_duck.cpp/h` — abre `local_files/odr.duckdb`, `query(sql) -> rows`, `ingest_parquet(path, table)`.
|
|
- `views_launcher.cpp/h` — panel busqueda FTS5 + lista resultados + form params + boton "Run" → encola job.
|
|
- `views_jobs.cpp/h` — panel jobs queue (pendientes/running/done) + live progress.
|
|
- `views_datasets.cpp/h` — panel DuckDB query editor + tabla preview.
|
|
- `CMakeLists.txt` — `add_imgui_app(odr_console ...)` con SQLite, libduckdb, jobs_pool del registry.
|
|
|
|
### Migrations operations.db
|
|
|
|
`migrations/001_init.sql` — schema 5-pasos completo:
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS relations (...); -- pipelines diseñados
|
|
CREATE TABLE IF NOT EXISTS executions (...); -- runs con metricas
|
|
CREATE TABLE IF NOT EXISTS entities (...); -- datos recopilados
|
|
CREATE TABLE IF NOT EXISTS types_snapshot (...); -- copia schema registry
|
|
CREATE TABLE IF NOT EXISTS assertions (...); -- reglas SQL
|
|
CREATE TABLE IF NOT EXISTS assertion_results (...); -- resultados eval
|
|
```
|
|
|
|
Reusar schema de `fn_operations/migrations/` adaptado.
|
|
|
|
### Collector MVP: `api_hn_top`
|
|
|
|
`collectors/api_hn_top/`:
|
|
- `manifest.yaml`: id, name, description, params (limit), uses_functions (`http_get_json_py_*`).
|
|
- `run.py`: lee stdin JSON {ops_db_path, app_dir, registry_root, params}, fetcha HN top stories API, escribe parquet a `vault/raw/hn_top_<ts>.parquet`, inserta `entity` con `metadata.{path,row_count,checksum,source}`, emite `PROGRESS:` por stderr.
|
|
|
|
Verificacion end-to-end:
|
|
1. Lanzar odr_console.
|
|
2. Buscar "hn_top" en launcher → click Run.
|
|
3. Job aparece en panel jobs, progress llega a 100.
|
|
4. Entity en operations.db tabla `entities`.
|
|
5. Parquet en `vaults/odr_data/raw/`.
|
|
6. Datasets panel lo lista, query SQL devuelve filas.
|
|
|
|
## Out of scope MVP (issues futuras)
|
|
|
|
- Pipeline builder DAG (`imgui_node_editor`).
|
|
- Assertions panel (eval --react).
|
|
- Proposals inbox.
|
|
- Browser CDP collectors (`browser_capture_dom`, `browser_login_capture`).
|
|
- Watchlists / scheduling.
|
|
- Rate limiting global.
|
|
- Form auto-generador desde `params_schema` complejo (MVP: solo strings + ints).
|
|
|
|
## Criterios aceptacion
|
|
|
|
- [ ] App compila en WSL + Windows.
|
|
- [ ] `app.md` indexado por `fn index` (aparece en `apps`).
|
|
- [ ] Repo Gitea creado (`dataforge/odr_console`) y branch master sincronizado.
|
|
- [ ] Collector `api_hn_top` recupera 30 stories, parquet escrito, entity creado.
|
|
- [ ] Panel datasets ejecuta `SELECT count(*) FROM hn_top`.
|
|
- [ ] Logs ImGui muestran `fn_log::log_info` calls del flujo.
|
|
|
|
## Riesgos
|
|
|
|
- Build C++ + DuckDB + SQLite + jobs_pool → CMake complejo. Vendoring limpio + apuntes en `cpp/PATTERNS.md`.
|
|
- libduckdb en Windows: probar `duckdb.dll` junto al exe.
|
|
- Collectors Python embebido (issue 0033 runtime) — MVP puede arrancar con `python3` del sistema; embeber despues.
|