7c09255c8a
Aggregates the planning artifacts for the 0049 series (umbrella + 0049a..0049k): - New rule cpp_apps.md (registered in INDEX) — standardize structure, CMake patterns, app.md frontmatter and sub-repo for C++ apps; points to the authoritative cpp/PATTERNS.md and cpp/DESIGN_SYSTEM.md. - Feature flag osint_graph_v1 (disabled until 0049k closes). - Issue 0049 (umbrella) and sub-issues 0049b..0049k describing the GPU rendering system, force-layout, types, sources, labels and the final graph_explorer app integration. - README updated with the new rows (all pending; 0049a will flip to completed in the next commit).
6.4 KiB
6.4 KiB
0049g — graph_sources: lector de operations.db + abstraccion funcional
Metadata
| Campo | Valor |
|---|---|
| ID | 0049g |
| Estado | pendiente |
| Prioridad | alta |
| Tipo | feature — parte de #0049 |
Dependencias
Bloqueada por: 0049e (necesita EntityType/RelationType).
Objetivo
Crear la funcion graph_sources con la abstraccion GraphLoadFn y la primera implementacion: graph_load_from_operations. Diseñada para que JSON/JSONL/GraphML se anadan despues sin tocar el resto del codigo. Incluye variante streaming para "recoleccion masiva".
Contexto
operations.db es la BD de cada app del registry con entities, relations, executions, assertions. Schema relevante:
entities (id TEXT PK, type TEXT, status TEXT, metadata JSON, created_at, updated_at)
relations (id TEXT PK, source TEXT, target TEXT, type TEXT, status TEXT, weight REAL, metadata JSON, ...)
Mapeo a GraphData:
- Cada valor distinto de
entities.type→ unEntityType(color generado por hash, shape defaultcircle, icon 0). El consumer puede sobreescribir viatypes.yaml(lo hace la appgraph_exploreren 0049k). - Cada valor distinto de
relations.type→ unRelationType. - Cada
entity→ unGraphNodeconuser_data = hash64(entity.id)ylabel_idxapuntando a string pool conentity.idometadata.namesi existe. - Cada
relation→ unGraphEdgeresolviendosource/target(TEXT) →node_idx(uint32) via hashmap.
Arquitectura
cpp/functions/viz/
├── graph_sources.h # NEW: GraphLoadFn typedef + decls
├── graph_sources.cpp # NEW: graph_load_from_operations + stream
├── graph_sources.md # NEW
└── (futuras impls JSON/JSONL/GraphML iran aqui mismo)
cpp/tests/
├── test_graph_sources.cpp # NEW
└── fixtures/
└── operations_test.db # NEW: small fixture con 10 entities + 15 relations
API
// graph_sources.h
namespace graph {
struct GraphLoadStats {
int nodes_loaded;
int edges_loaded;
int types_discovered;
int rel_types_discovered;
int errors;
char error_msg[256];
};
typedef bool (*GraphLoadFn)(const char* uri, GraphData* out, GraphLoadStats* stats);
// Caller is owner of out->nodes/edges/types/rel_types after the call (must call graph_free).
bool graph_load_from_operations(const char* db_path, GraphData* out, GraphLoadStats* stats);
void graph_free(GraphData* graph);
// Streaming: poll-based reader for new entities/relations.
// Caller pre-allocates GraphData with capacity > expected max. Stream appends in place.
struct GraphStreamSource;
GraphStreamSource* graph_stream_operations_open(const char* db_path, int poll_ms);
int graph_stream_pull(GraphStreamSource*, GraphData* graph); // returns # appended
void graph_stream_close(GraphStreamSource*);
} // namespace graph
Color por hash de tipo (default)
uint32_t default_color_for(const char* type_name) {
uint32_t h = fnv1a(type_name);
// Sample from a balanced palette of 16 indigo-friendly colors.
static const uint32_t palette[16] = { /* RGBA8 */ };
return palette[h & 0xF];
}
Tareas
Fase 1 — graph_load_from_operations
- 1.1 Implementar funcion: abrir SQLite, query types, query entities, query relations, build hashmap
id→node_idx, llenarGraphData. - 1.2 Color default por hash sobre
typename. Shape defaultSHAPE_CIRCLE. Icon default0. - 1.3 String pool: vector en el
GraphData(extender struct constring_poolo pasarlo via callback). Decision: campochar** label_pool; int label_pool_count;interno + helpergraph_label(graph, idx). - 1.4 Manejo de errores: si la BD no existe / no tiene tabla
entities, retornarfalseconerror_msgpoblado. - 1.5
graph_freelibera todo lo que_load_*alocó. Importante: el caller no deberia tener que diferenciar quien libero — la API es uniforme.
Fase 2 — Streaming
- 2.1
graph_stream_operations_openguardaMAX(updated_at)actual de entities y relations. - 2.2
graph_stream_pull: queryWHERE updated_at > last_seen, append aGraphData(verifica capacity), actualiza last_seen, retorna conteo. - 2.3 Pinear nodos nuevos cerca del centroide del padre (si la nueva entity tiene una relacion con una existente) — opcional pero util. Marcar
NF_PINNEDpor N frames; otro mecanismo (flags |= NF_PINNED_TEMP?) para auto-release. Mantener simple para v1: pinned manual via app.
Fase 3 — Tests
- 3.1 Crear fixture
operations_test.dbcon un script SQL: 3 entity types (Person/Email/Domain), 2 relation types (owns/connects), 10 entities, 15 relations. - 3.2 Test: cargar fixture, verificar conteos, verificar que types_discovered == 3, que
user_dataes deterministico, que las aristas resuelven a indices validos. - 3.3 Test stream: insertar nuevas filas en el fixture, hacer pull, verificar append.
Fase 4 — Frontmatter .md
- 4.1
graph_sources.md:purity: impure(toca disco)error_type: error_go_core(... no aplica en C++ — usarbool + error_msgy documentar)uses_types: [graph_data_cpp_viz, entity_type_cpp_viz, relation_type_cpp_viz]tested: trueparamsyoutputsemanticos rellenados
Fase 5 — Cleanup
fn index.- Commit
feat(viz): graph_sources con lector operations.db + streaming.
Criterio de done
graph_load_from_operations("apps/registry_dashboard/operations.db", &g, &s)carga sin errores y devuelve types descubiertos.- Tests verdes con fixture.
- Streaming detecta filas nuevas.
- La firma
GraphLoadFnesta definida y documentada — anadir un backend nuevo es una funcion mas con la misma firma.
Riesgos
| Riesgo | Mitigacion |
|---|---|
| Schemas de operations.db cambian entre apps | Tomar solo id, type, source, target, weight — campos estables. Resto via metadata JSON opcional |
| Relations con source/target a entities inexistentes | Skip + incrementar stats.errors |
| Crecimiento de string pool | Aceptable; un entity.id medio es ~32 bytes, 100k = 3 MB |
| Stream perdiendo updates si timestamps son iguales | Usar (updated_at, id) como tuple para tiebreak; o anadir un seq autoincrement si fuera necesario |