# 0049g — `graph_sources`: lector de `operations.db` + abstraccion funcional

## Metadata

| Campo | Valor |
|-------|-------|
| **ID** | 0049g |
| **Estado** | pendiente |
| **Prioridad** | alta |
| **Tipo** | feature — parte de [#0049](0049-osint-graph-viewer.md) |

## Dependencias

**Bloqueada por:** [0049e](0049e-graph-types-extended.md) (necesita `EntityType`/`RelationType`).

---

## Objetivo

Crear la funcion `graph_sources` con la abstraccion `GraphLoadFn` y la primera implementacion: `graph_load_from_operations`. Diseñada para que JSON/JSONL/GraphML se anadan despues sin tocar el resto del codigo. Incluye variante streaming para "recoleccion masiva".

## Contexto

`operations.db` es la BD de cada app del registry con `entities`, `relations`, `executions`, `assertions`. Schema relevante:

```sql
entities    (id TEXT PK, type TEXT, status TEXT, metadata JSON, created_at, updated_at)
relations   (id TEXT PK, source TEXT, target TEXT, type TEXT, status TEXT, weight REAL, metadata JSON, ...)
```

Mapeo a `GraphData`:
- Cada valor distinto de `entities.type` → un `EntityType` (color generado por hash, shape default `circle`, icon 0). El consumer puede sobreescribir via `types.yaml` (lo hace la app `graph_explorer` en 0049k).
- Cada valor distinto de `relations.type` → un `RelationType`.
- Cada `entity` → un `GraphNode` con `user_data = hash64(entity.id)` y `label_idx` apuntando a string pool con `entity.id` o `metadata.name` si existe.
- Cada `relation` → un `GraphEdge` resolviendo `source`/`target` (TEXT) → `node_idx` (uint32) via hashmap.

## Arquitectura

```
cpp/functions/viz/
├── graph_sources.h                  # NEW: GraphLoadFn typedef + decls
├── graph_sources.cpp                # NEW: graph_load_from_operations + stream
├── graph_sources.md                 # NEW
└── (futuras impls JSON/JSONL/GraphML iran aqui mismo)

cpp/tests/
├── test_graph_sources.cpp           # NEW
└── fixtures/
    └── operations_test.db           # NEW: small fixture con 10 entities + 15 relations
```

### API

```cpp
// graph_sources.h
namespace graph {

struct GraphLoadStats {
    int  nodes_loaded;
    int  edges_loaded;
    int  types_discovered;
    int  rel_types_discovered;
    int  errors;
    char error_msg[256];
};

typedef bool (*GraphLoadFn)(const char* uri, GraphData* out, GraphLoadStats* stats);

// Caller is owner of out->nodes/edges/types/rel_types after the call (must call graph_free).
bool graph_load_from_operations(const char* db_path, GraphData* out, GraphLoadStats* stats);

void graph_free(GraphData* graph);

// Streaming: poll-based reader for new entities/relations.
// Caller pre-allocates GraphData with capacity > expected max. Stream appends in place.
struct GraphStreamSource;
GraphStreamSource* graph_stream_operations_open(const char* db_path, int poll_ms);
int                graph_stream_pull(GraphStreamSource*, GraphData* graph);  // returns # appended
void               graph_stream_close(GraphStreamSource*);

} // namespace graph
```

### Color por hash de tipo (default)

```cpp
uint32_t default_color_for(const char* type_name) {
    uint32_t h = fnv1a(type_name);
    // Sample from a balanced palette of 16 indigo-friendly colors.
    static const uint32_t palette[16] = { /* RGBA8 */ };
    return palette[h & 0xF];
}
```

## Tareas

### Fase 1 — `graph_load_from_operations`

- [ ] **1.1** Implementar funcion: abrir SQLite, query types, query entities, query relations, build hashmap `id→node_idx`, llenar `GraphData`.
- [ ] **1.2** Color default por hash sobre `type` name. Shape default `SHAPE_CIRCLE`. Icon default `0`.
- [ ] **1.3** String pool: vector<string> en el `GraphData` (extender struct con `string_pool` o pasarlo via callback). Decision: campo `char** label_pool; int label_pool_count;` interno + helper `graph_label(graph, idx)`.
- [ ] **1.4** Manejo de errores: si la BD no existe / no tiene tabla `entities`, retornar `false` con `error_msg` poblado.
- [ ] **1.5** `graph_free` libera todo lo que `_load_*` alocó. Importante: el caller no deberia tener que diferenciar quien libero — la API es uniforme.

### Fase 2 — Streaming

- [ ] **2.1** `graph_stream_operations_open` guarda `MAX(updated_at)` actual de entities y relations.
- [ ] **2.2** `graph_stream_pull`: query `WHERE updated_at > last_seen`, append a `GraphData` (verifica capacity), actualiza last_seen, retorna conteo.
- [ ] **2.3** Pinear nodos nuevos cerca del centroide del padre (si la nueva entity tiene una relacion con una existente) — opcional pero util. Marcar `NF_PINNED` por N frames; otro mecanismo (`flags |= NF_PINNED_TEMP`?) para auto-release. Mantener simple para v1: pinned manual via app.

### Fase 3 — Tests

- [ ] **3.1** Crear fixture `operations_test.db` con un script SQL: 3 entity types (Person/Email/Domain), 2 relation types (owns/connects), 10 entities, 15 relations.
- [ ] **3.2** Test: cargar fixture, verificar conteos, verificar que types_discovered == 3, que `user_data` es deterministico, que las aristas resuelven a indices validos.
- [ ] **3.3** Test stream: insertar nuevas filas en el fixture, hacer pull, verificar append.

### Fase 4 — Frontmatter `.md`

- [ ] **4.1** `graph_sources.md`:
  - `purity: impure` (toca disco)
  - `error_type: error_go_core` (... no aplica en C++ — usar `bool + error_msg` y documentar)
  - `uses_types: [graph_data_cpp_viz, entity_type_cpp_viz, relation_type_cpp_viz]`
  - `tested: true`
  - `params` y `output` semanticos rellenados

### Fase 5 — Cleanup

- [ ] `fn index`.
- [ ] Commit `feat(viz): graph_sources con lector operations.db + streaming`.

## Criterio de done

- [ ] `graph_load_from_operations("apps/registry_dashboard/operations.db", &g, &s)` carga sin errores y devuelve types descubiertos.
- [ ] Tests verdes con fixture.
- [ ] Streaming detecta filas nuevas.
- [ ] La firma `GraphLoadFn` esta definida y documentada — anadir un backend nuevo es una funcion mas con la misma firma.

## Riesgos

| Riesgo | Mitigacion |
|---|---|
| Schemas de operations.db cambian entre apps | Tomar solo `id`, `type`, `source`, `target`, `weight` — campos estables. Resto via `metadata` JSON opcional |
| Relations con source/target a entities inexistentes | Skip + incrementar `stats.errors` |
| Crecimiento de string pool | Aceptable; un `entity.id` medio es ~32 bytes, 100k = 3 MB |
| Stream perdiendo updates si timestamps son iguales | Usar `(updated_at, id)` como tuple para tiebreak; o anadir un `seq` autoincrement si fuera necesario |