fad4006f60
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
104 lines
4.6 KiB
Markdown
104 lines
4.6 KiB
Markdown
---
|
|
id: "53"
|
|
title: "Extraer jobs system de graph_explorer al registry (jobs_pool + cache + subprocess worker)"
|
|
status: pendiente
|
|
type: feature
|
|
domain:
|
|
- registry-quality
|
|
scope: registry-only
|
|
priority: alta
|
|
depends: []
|
|
blocks:
|
|
- "54"
|
|
related:
|
|
- "22"
|
|
- "23"
|
|
- "0028"
|
|
created: 2026-05-09
|
|
updated: 2026-05-17
|
|
tags: []
|
|
---
|
|
|
|
## Contexto
|
|
|
|
`projects/osint_graph/apps/graph_explorer/jobs.{cpp,h}` (1366 + 97 lineas) implementa:
|
|
|
|
- Pool de N `std::thread` workers leyendo cola de jobs en SQLite (tabla `jobs`).
|
|
- Spawn de subprocess por job con wire protocol stdin (JSON ctx) / stderr (`PROGRESS:<float> <stage>`) / stdout (JSON resultado) / exit code.
|
|
- Cache addressable `<app_dir>/cache/<sha256[0:2]>/<sha256>.{html,md,...}`.
|
|
- Recovery: jobs que quedaron `running` de sesion anterior se marcan `error` al `jobs_init`.
|
|
- `dirty_counter` que la UI lee para refrescar tras cambios.
|
|
- Persistencia `JobRow` con created/started/finished/duration/progress/stage/error/result_json.
|
|
|
|
El proyecto `online_data_recopilation` (issue 0066) necesita el mismo sistema. En lugar de copy-paste, **extraer al registry** para que ambas apps importen.
|
|
|
|
## Plan de extraccion
|
|
|
|
1. **Identificar fronteras** entre logica generica (extraer) y especifica de graph_explorer (queda local):
|
|
- Generico: thread pool, queue SQLite, subprocess spawn, wire protocol parser, cache sha256, recovery.
|
|
- Especifico: aplicar `entities/relations/node_updates` al `operations.db` del grafo.
|
|
|
|
2. **Funciones nuevas del registry**:
|
|
|
|
| ID | Domain | Que hace |
|
|
|---|---|---|
|
|
| `jobs_pool_cpp_core` | core | Thread pool generico parametrizable (workers, callback `on_job(JobRow)`). Tabla `jobs` configurable de nombre. |
|
|
| `subprocess_worker_cpp_infra` | infra | Spawn subprocess + capturar stdin/stderr/stdout con wire protocol (`PROGRESS:`, JSON final). Devuelve `WorkerResult{stdout_json, error, exit_code}`. |
|
|
| `job_cache_sha256_cpp_infra` | infra | `cache_path(root, key) -> path`, `cache_put(root, key, bytes)`, `cache_get(root, key)`. Layout `<root>/<sha[0:2]>/<sha>`. |
|
|
| `worker_manifest_loader_cpp_core` | core | Enumera `<dir>/<id>/manifest.yaml`, valida schema, devuelve `vector<WorkerManifest>`. |
|
|
|
|
3. **Tipos nuevos**:
|
|
- `JobRow_cpp_core` — struct con campos comunes (id, worker_id, target_id, status, progress, stage, error, result_json, timestamps).
|
|
- `WorkerManifest_cpp_core` — struct (id, name, description, applies_to, emits, params, uses_functions).
|
|
- `WorkerResult_cpp_infra` — struct (stdout_json, stderr_log, exit_code, error).
|
|
|
|
4. **Migracion graph_explorer**:
|
|
- Reemplazar `jobs.cpp/h` por imports al registry.
|
|
- El callback `on_job` queda en `entity_ops.cpp` aplicando entities/relations.
|
|
- Test: lanzar enricher `fetch_webpage`, verificar que sigue funcionando.
|
|
|
|
5. **Validacion**: `cd projects/osint_graph/apps/graph_explorer && cmake --build build` + tests existentes.
|
|
|
|
## Schema tabla `jobs` (generico)
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS jobs (
|
|
id TEXT PRIMARY KEY,
|
|
worker_id TEXT NOT NULL, -- antes "enricher_id"
|
|
target_id TEXT NOT NULL, -- antes "node_id" (para odr es dataset_key, etc)
|
|
target_label TEXT, -- antes "node_name"
|
|
status TEXT NOT NULL, -- queued|running|done|error|cancelled
|
|
progress REAL DEFAULT 0,
|
|
stage TEXT,
|
|
error TEXT,
|
|
result_json TEXT,
|
|
params_json TEXT, -- params del manifest serializados
|
|
created_at INTEGER NOT NULL,
|
|
started_at INTEGER,
|
|
finished_at INTEGER
|
|
);
|
|
CREATE INDEX IF NOT EXISTS jobs_status_idx ON jobs(status);
|
|
CREATE INDEX IF NOT EXISTS jobs_worker_idx ON jobs(worker_id);
|
|
```
|
|
|
|
`graph_explorer` y `odr_console` comparten schema. Diferencia solo en interpretacion de `target_id`/`result_json` (callback per-app).
|
|
|
|
## Riesgos
|
|
|
|
- graph_explorer es app C++ activa con tests pasando. Romper imports = romper produccion.
|
|
- Camino seguro: rama TBD `issue/0065-extract-jobs-to-registry` en sub-repo de graph_explorer + sub-repo de fn_registry. Mergear ambos cuando build verde.
|
|
- Feature flag NO aplica (cambio de codigo sin runtime toggle posible).
|
|
|
|
## Criterios de aceptacion
|
|
|
|
- [ ] Funciones del registry creadas con tests + .md.
|
|
- [ ] graph_explorer compila y pasa tests existentes (32 WSL + 21 Win).
|
|
- [ ] `fetch_webpage` enricher funciona end-to-end en graph_explorer tras refactor.
|
|
- [ ] odr_console (issue 0066) puede importar `jobs_pool_cpp_core` y lanzar 1 collector dummy.
|
|
- [ ] Documentacion actualizada en `cpp/PATTERNS.md` mencionando jobs_pool como pieza estandar.
|
|
|
|
## Out of scope
|
|
|
|
- Migrar el sistema a Go (issue futura si vale la pena).
|
|
- Cambiar wire protocol (ya estable, no romper enrichers existentes).
|