474c2822bc
- graph_load_from_operations: SQLite read-only, schema-detect (type_ref/type, from_entity/source, to_entity/target, name/type, weight, updated_at). - 16-color indigo palette por hash FNV1a32 del nombre de tipo. user_data por nodo es FNV1a64(entity.id) — deterministico entre cargas. - Label pool interno: metadata.name (JSON simple) > entities.name > id. - graph_free libera nodes/edges/types/rel_types/labels/strdup'd names via arena_map (GraphData* -> arena). - Streaming pull-based con tiebreak (updated_at, id) y crecimiento x2 de capacidad. Tipos nuevos descubiertos en stream se anaden a types. - Tests: fixture in-memory (3 entity types, 2 rel types, 10 entities, 15 relations) + smoke contra apps/script_navegador/operations.db. - Issue movido a completed/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
146 lines
6.4 KiB
Markdown
146 lines
6.4 KiB
Markdown
# 0049g — `graph_sources`: lector de `operations.db` + abstraccion funcional
|
|
|
|
## Metadata
|
|
|
|
| Campo | Valor |
|
|
|-------|-------|
|
|
| **ID** | 0049g |
|
|
| **Estado** | pendiente |
|
|
| **Prioridad** | alta |
|
|
| **Tipo** | feature — parte de [#0049](0049-osint-graph-viewer.md) |
|
|
|
|
## Dependencias
|
|
|
|
**Bloqueada por:** [0049e](0049e-graph-types-extended.md) (necesita `EntityType`/`RelationType`).
|
|
|
|
---
|
|
|
|
## Objetivo
|
|
|
|
Crear la funcion `graph_sources` con la abstraccion `GraphLoadFn` y la primera implementacion: `graph_load_from_operations`. Diseñada para que JSON/JSONL/GraphML se anadan despues sin tocar el resto del codigo. Incluye variante streaming para "recoleccion masiva".
|
|
|
|
## Contexto
|
|
|
|
`operations.db` es la BD de cada app del registry con `entities`, `relations`, `executions`, `assertions`. Schema relevante:
|
|
|
|
```sql
|
|
entities (id TEXT PK, type TEXT, status TEXT, metadata JSON, created_at, updated_at)
|
|
relations (id TEXT PK, source TEXT, target TEXT, type TEXT, status TEXT, weight REAL, metadata JSON, ...)
|
|
```
|
|
|
|
Mapeo a `GraphData`:
|
|
- Cada valor distinto de `entities.type` → un `EntityType` (color generado por hash, shape default `circle`, icon 0). El consumer puede sobreescribir via `types.yaml` (lo hace la app `graph_explorer` en 0049k).
|
|
- Cada valor distinto de `relations.type` → un `RelationType`.
|
|
- Cada `entity` → un `GraphNode` con `user_data = hash64(entity.id)` y `label_idx` apuntando a string pool con `entity.id` o `metadata.name` si existe.
|
|
- Cada `relation` → un `GraphEdge` resolviendo `source`/`target` (TEXT) → `node_idx` (uint32) via hashmap.
|
|
|
|
## Arquitectura
|
|
|
|
```
|
|
cpp/functions/viz/
|
|
├── graph_sources.h # NEW: GraphLoadFn typedef + decls
|
|
├── graph_sources.cpp # NEW: graph_load_from_operations + stream
|
|
├── graph_sources.md # NEW
|
|
└── (futuras impls JSON/JSONL/GraphML iran aqui mismo)
|
|
|
|
cpp/tests/
|
|
├── test_graph_sources.cpp # NEW
|
|
└── fixtures/
|
|
└── operations_test.db # NEW: small fixture con 10 entities + 15 relations
|
|
```
|
|
|
|
### API
|
|
|
|
```cpp
|
|
// graph_sources.h
|
|
namespace graph {
|
|
|
|
struct GraphLoadStats {
|
|
int nodes_loaded;
|
|
int edges_loaded;
|
|
int types_discovered;
|
|
int rel_types_discovered;
|
|
int errors;
|
|
char error_msg[256];
|
|
};
|
|
|
|
typedef bool (*GraphLoadFn)(const char* uri, GraphData* out, GraphLoadStats* stats);
|
|
|
|
// Caller is owner of out->nodes/edges/types/rel_types after the call (must call graph_free).
|
|
bool graph_load_from_operations(const char* db_path, GraphData* out, GraphLoadStats* stats);
|
|
|
|
void graph_free(GraphData* graph);
|
|
|
|
// Streaming: poll-based reader for new entities/relations.
|
|
// Caller pre-allocates GraphData with capacity > expected max. Stream appends in place.
|
|
struct GraphStreamSource;
|
|
GraphStreamSource* graph_stream_operations_open(const char* db_path, int poll_ms);
|
|
int graph_stream_pull(GraphStreamSource*, GraphData* graph); // returns # appended
|
|
void graph_stream_close(GraphStreamSource*);
|
|
|
|
} // namespace graph
|
|
```
|
|
|
|
### Color por hash de tipo (default)
|
|
|
|
```cpp
|
|
uint32_t default_color_for(const char* type_name) {
|
|
uint32_t h = fnv1a(type_name);
|
|
// Sample from a balanced palette of 16 indigo-friendly colors.
|
|
static const uint32_t palette[16] = { /* RGBA8 */ };
|
|
return palette[h & 0xF];
|
|
}
|
|
```
|
|
|
|
## Tareas
|
|
|
|
### Fase 1 — `graph_load_from_operations`
|
|
|
|
- [ ] **1.1** Implementar funcion: abrir SQLite, query types, query entities, query relations, build hashmap `id→node_idx`, llenar `GraphData`.
|
|
- [ ] **1.2** Color default por hash sobre `type` name. Shape default `SHAPE_CIRCLE`. Icon default `0`.
|
|
- [ ] **1.3** String pool: vector<string> en el `GraphData` (extender struct con `string_pool` o pasarlo via callback). Decision: campo `char** label_pool; int label_pool_count;` interno + helper `graph_label(graph, idx)`.
|
|
- [ ] **1.4** Manejo de errores: si la BD no existe / no tiene tabla `entities`, retornar `false` con `error_msg` poblado.
|
|
- [ ] **1.5** `graph_free` libera todo lo que `_load_*` alocó. Importante: el caller no deberia tener que diferenciar quien libero — la API es uniforme.
|
|
|
|
### Fase 2 — Streaming
|
|
|
|
- [ ] **2.1** `graph_stream_operations_open` guarda `MAX(updated_at)` actual de entities y relations.
|
|
- [ ] **2.2** `graph_stream_pull`: query `WHERE updated_at > last_seen`, append a `GraphData` (verifica capacity), actualiza last_seen, retorna conteo.
|
|
- [ ] **2.3** Pinear nodos nuevos cerca del centroide del padre (si la nueva entity tiene una relacion con una existente) — opcional pero util. Marcar `NF_PINNED` por N frames; otro mecanismo (`flags |= NF_PINNED_TEMP`?) para auto-release. Mantener simple para v1: pinned manual via app.
|
|
|
|
### Fase 3 — Tests
|
|
|
|
- [ ] **3.1** Crear fixture `operations_test.db` con un script SQL: 3 entity types (Person/Email/Domain), 2 relation types (owns/connects), 10 entities, 15 relations.
|
|
- [ ] **3.2** Test: cargar fixture, verificar conteos, verificar que types_discovered == 3, que `user_data` es deterministico, que las aristas resuelven a indices validos.
|
|
- [ ] **3.3** Test stream: insertar nuevas filas en el fixture, hacer pull, verificar append.
|
|
|
|
### Fase 4 — Frontmatter `.md`
|
|
|
|
- [ ] **4.1** `graph_sources.md`:
|
|
- `purity: impure` (toca disco)
|
|
- `error_type: error_go_core` (... no aplica en C++ — usar `bool + error_msg` y documentar)
|
|
- `uses_types: [graph_data_cpp_viz, entity_type_cpp_viz, relation_type_cpp_viz]`
|
|
- `tested: true`
|
|
- `params` y `output` semanticos rellenados
|
|
|
|
### Fase 5 — Cleanup
|
|
|
|
- [ ] `fn index`.
|
|
- [ ] Commit `feat(viz): graph_sources con lector operations.db + streaming`.
|
|
|
|
## Criterio de done
|
|
|
|
- [ ] `graph_load_from_operations("apps/registry_dashboard/operations.db", &g, &s)` carga sin errores y devuelve types descubiertos.
|
|
- [ ] Tests verdes con fixture.
|
|
- [ ] Streaming detecta filas nuevas.
|
|
- [ ] La firma `GraphLoadFn` esta definida y documentada — anadir un backend nuevo es una funcion mas con la misma firma.
|
|
|
|
## Riesgos
|
|
|
|
| Riesgo | Mitigacion |
|
|
|---|---|
|
|
| Schemas de operations.db cambian entre apps | Tomar solo `id`, `type`, `source`, `target`, `weight` — campos estables. Resto via `metadata` JSON opcional |
|
|
| Relations con source/target a entities inexistentes | Skip + incrementar `stats.errors` |
|
|
| Crecimiento de string pool | Aceptable; un `entity.id` medio es ~32 bytes, 100k = 3 MB |
|
|
| Stream perdiendo updates si timestamps son iguales | Usar `(updated_at, id)` como tuple para tiebreak; o anadir un `seq` autoincrement si fuera necesario |
|