--- id: "54" title: "online_data_recopilation — odr_console MVP (lanzador GUI + 5-pasos + 1 collector)" status: pendiente type: feature domain: [] scope: multi-app priority: alta depends: [] blocks: [] related: [] created: 2026-05-09 updated: 2026-05-17 tags: [] --- ## Objetivo App C++ ImGui en `projects/online_data_recopilation/apps/odr_console/` que: 1. Lanza cualquier funcion/pipeline del registry desde panel GUI con form auto-generado (params_schema). 2. Implementa el bucle reactivo de 5 pasos sobre `operations.db` propia. 3. Reusa jobs system del registry (issue 0065) para concurrencia. 4. Reusa enricher protocol + `cdp-cli` + funciones Python `fetch_webpage`/`web_search`/etc de osint_graph. ## Decisiones tomadas | Tema | Decision | |---|---| | Workers default | 4 | | operations.db | Una unica por la app | | DuckDB | Embebido (linkar libduckdb) | | Collectors lang | Python primero; bash/go en futuras issues | | Browser | CDP via `cdp-cli` (issue 0038) | | Concurrencia | jobs_pool_cpp_core (issue 0065) | | TBD | Obligatorio (regla apps_tbd) — sub-repo `dataforge/odr_console` | ## Alcance MVP (este issue) ### Esqueleto codigo - `main.cpp` — `fn::run_app` con AppConfig + render() + paneles. - `data_registry.cpp/h` — abre `registry.db` RO, expone `search(query)`, `get_function(id)`. - `data_operations.cpp/h` — abre `operations.db` RW, CRUD de relations/executions/entities/types_snapshot/assertions/assertion_results. - `data_duck.cpp/h` — abre `local_files/odr.duckdb`, `query(sql) -> rows`, `ingest_parquet(path, table)`. - `views_launcher.cpp/h` — panel busqueda FTS5 + lista resultados + form params + boton "Run" → encola job. - `views_jobs.cpp/h` — panel jobs queue (pendientes/running/done) + live progress. - `views_datasets.cpp/h` — panel DuckDB query editor + tabla preview. - `CMakeLists.txt` — `add_imgui_app(odr_console ...)` con SQLite, libduckdb, jobs_pool del registry. ### Migrations operations.db `migrations/001_init.sql` — schema 5-pasos completo: ```sql CREATE TABLE IF NOT EXISTS relations (...); -- pipelines diseñados CREATE TABLE IF NOT EXISTS executions (...); -- runs con metricas CREATE TABLE IF NOT EXISTS entities (...); -- datos recopilados CREATE TABLE IF NOT EXISTS types_snapshot (...); -- copia schema registry CREATE TABLE IF NOT EXISTS assertions (...); -- reglas SQL CREATE TABLE IF NOT EXISTS assertion_results (...); -- resultados eval ``` Reusar schema de `fn_operations/migrations/` adaptado. ### Collector MVP: `api_hn_top` `collectors/api_hn_top/`: - `manifest.yaml`: id, name, description, params (limit), uses_functions (`http_get_json_py_*`). - `run.py`: lee stdin JSON {ops_db_path, app_dir, registry_root, params}, fetcha HN top stories API, escribe parquet a `vault/raw/hn_top_.parquet`, inserta `entity` con `metadata.{path,row_count,checksum,source}`, emite `PROGRESS:` por stderr. Verificacion end-to-end: 1. Lanzar odr_console. 2. Buscar "hn_top" en launcher → click Run. 3. Job aparece en panel jobs, progress llega a 100. 4. Entity en operations.db tabla `entities`. 5. Parquet en `vaults/odr_data/raw/`. 6. Datasets panel lo lista, query SQL devuelve filas. ## Out of scope MVP (issues futuras) - Pipeline builder DAG (`imgui_node_editor`). - Assertions panel (eval --react). - Proposals inbox. - Browser CDP collectors (`browser_capture_dom`, `browser_login_capture`). - Watchlists / scheduling. - Rate limiting global. - Form auto-generador desde `params_schema` complejo (MVP: solo strings + ints). ## Criterios aceptacion - [ ] App compila en WSL + Windows. - [ ] `app.md` indexado por `fn index` (aparece en `apps`). - [ ] Repo Gitea creado (`dataforge/odr_console`) y branch master sincronizado. - [ ] Collector `api_hn_top` recupera 30 stories, parquet escrito, entity creado. - [ ] Panel datasets ejecuta `SELECT count(*) FROM hn_top`. - [ ] Logs ImGui muestran `fn_log::log_info` calls del flujo. ## Riesgos - Build C++ + DuckDB + SQLite + jobs_pool → CMake complejo. Vendoring limpio + apuntes en `cpp/PATTERNS.md`. - libduckdb en Windows: probar `duckdb.dll` junto al exe. - Collectors Python embebido (issue 0033 runtime) — MVP puede arrancar con `python3` del sistema; embeber despues.