Files
fn_registry/dev/issues/0066-online-data-recopilation-mvp.md

104 lines
4.2 KiB
Markdown

---
id: "54"
title: "online_data_recopilation — odr_console MVP (lanzador GUI + 5-pasos + 1 collector)"
status: pendiente
type: feature
domain: []
scope: multi-app
priority: alta
depends: []
blocks: []
related: []
created: 2026-05-09
updated: 2026-05-17
tags: []
---
## Objetivo
App C++ ImGui en `projects/online_data_recopilation/apps/odr_console/` que:
1. Lanza cualquier funcion/pipeline del registry desde panel GUI con form auto-generado (params_schema).
2. Implementa el bucle reactivo de 5 pasos sobre `operations.db` propia.
3. Reusa jobs system del registry (issue 0065) para concurrencia.
4. Reusa enricher protocol + `cdp-cli` + funciones Python `fetch_webpage`/`web_search`/etc de osint_graph.
## Decisiones tomadas
| Tema | Decision |
|---|---|
| Workers default | 4 |
| operations.db | Una unica por la app |
| DuckDB | Embebido (linkar libduckdb) |
| Collectors lang | Python primero; bash/go en futuras issues |
| Browser | CDP via `cdp-cli` (issue 0038) |
| Concurrencia | jobs_pool_cpp_core (issue 0065) |
| TBD | Obligatorio (regla apps_tbd) — sub-repo `dataforge/odr_console` |
## Alcance MVP (este issue)
### Esqueleto codigo
- `main.cpp``fn::run_app` con AppConfig + render() + paneles.
- `data_registry.cpp/h` — abre `registry.db` RO, expone `search(query)`, `get_function(id)`.
- `data_operations.cpp/h` — abre `operations.db` RW, CRUD de relations/executions/entities/types_snapshot/assertions/assertion_results.
- `data_duck.cpp/h` — abre `local_files/odr.duckdb`, `query(sql) -> rows`, `ingest_parquet(path, table)`.
- `views_launcher.cpp/h` — panel busqueda FTS5 + lista resultados + form params + boton "Run" → encola job.
- `views_jobs.cpp/h` — panel jobs queue (pendientes/running/done) + live progress.
- `views_datasets.cpp/h` — panel DuckDB query editor + tabla preview.
- `CMakeLists.txt``add_imgui_app(odr_console ...)` con SQLite, libduckdb, jobs_pool del registry.
### Migrations operations.db
`migrations/001_init.sql` — schema 5-pasos completo:
```sql
CREATE TABLE IF NOT EXISTS relations (...); -- pipelines diseñados
CREATE TABLE IF NOT EXISTS executions (...); -- runs con metricas
CREATE TABLE IF NOT EXISTS entities (...); -- datos recopilados
CREATE TABLE IF NOT EXISTS types_snapshot (...); -- copia schema registry
CREATE TABLE IF NOT EXISTS assertions (...); -- reglas SQL
CREATE TABLE IF NOT EXISTS assertion_results (...); -- resultados eval
```
Reusar schema de `fn_operations/migrations/` adaptado.
### Collector MVP: `api_hn_top`
`collectors/api_hn_top/`:
- `manifest.yaml`: id, name, description, params (limit), uses_functions (`http_get_json_py_*`).
- `run.py`: lee stdin JSON {ops_db_path, app_dir, registry_root, params}, fetcha HN top stories API, escribe parquet a `vault/raw/hn_top_<ts>.parquet`, inserta `entity` con `metadata.{path,row_count,checksum,source}`, emite `PROGRESS:` por stderr.
Verificacion end-to-end:
1. Lanzar odr_console.
2. Buscar "hn_top" en launcher → click Run.
3. Job aparece en panel jobs, progress llega a 100.
4. Entity en operations.db tabla `entities`.
5. Parquet en `vaults/odr_data/raw/`.
6. Datasets panel lo lista, query SQL devuelve filas.
## Out of scope MVP (issues futuras)
- Pipeline builder DAG (`imgui_node_editor`).
- Assertions panel (eval --react).
- Proposals inbox.
- Browser CDP collectors (`browser_capture_dom`, `browser_login_capture`).
- Watchlists / scheduling.
- Rate limiting global.
- Form auto-generador desde `params_schema` complejo (MVP: solo strings + ints).
## Criterios aceptacion
- [ ] App compila en WSL + Windows.
- [ ] `app.md` indexado por `fn index` (aparece en `apps`).
- [ ] Repo Gitea creado (`dataforge/odr_console`) y branch master sincronizado.
- [ ] Collector `api_hn_top` recupera 30 stories, parquet escrito, entity creado.
- [ ] Panel datasets ejecuta `SELECT count(*) FROM hn_top`.
- [ ] Logs ImGui muestran `fn_log::log_info` calls del flujo.
## Riesgos
- Build C++ + DuckDB + SQLite + jobs_pool → CMake complejo. Vendoring limpio + apuntes en `cpp/PATTERNS.md`.
- libduckdb en Windows: probar `duckdb.dll` junto al exe.
- Collectors Python embebido (issue 0033 runtime) — MVP puede arrancar con `python3` del sistema; embeber despues.