chore(issues): plan 0049 OSINT graph viewer multi-issue

Aggregates the planning artifacts for the 0049 series (umbrella + 0049a..0049k): - New rule cpp_apps.md (registered in INDEX) — standardize structure, CMake patterns, app.md frontmatter and sub-repo for C++ apps; points to the authoritative cpp/PATTERNS.md and cpp/DESIGN_SYSTEM.md. - Feature flag osint_graph_v1 (disabled until 0049k closes). - Issue 0049 (umbrella) and sub-issues 0049b..0049k describing the GPU rendering system, force-layout, types, sources, labels and the final graph_explorer app integration. - README updated with the new rows (all pending; 0049a will flip to completed in the next commit).
2026-04-29 21:08:36 +02:00
parent 4cdd650502
commit f8f72e4bf7
15 changed files with 1742 additions and 0 deletions
@@ -0,0 +1,136 @@
+# 0049h — `graph_force_layout_gpu`: compute shader + spatial hash
+
+## Metadata
+
+| Campo | Valor |
+|-------|-------|
+| **ID** | 0049h |
+| **Estado** | pendiente |
+| **Prioridad** | media-alta |
+| **Tipo** | feature performance — parte de [#0049](0049-osint-graph-viewer.md) |
+
+## Dependencias
+
+**Bloqueada por:** [0049b](0049b-cpp-bump-gl-43.md) (compute shaders), [0049e](0049e-graph-types-extended.md) (`flags.pinned`).
+
+---
+
+## Objetivo
+
+Implementar el force-directed layout en GPU usando compute shaders y SSBOs. Repulsion via spatial hash grid (no Barnes-Hut), atraccion por aristas, integracion con clamp y respeto de `NF_PINNED`. API consistente con la version CPU para que el consumer pueda swappear.
+
+## Contexto
+
+A 50k+ nodos el CPU layout (Barnes-Hut, ~5-10 ms/frame con 20k nodos) impide 60fps con margen. GPU compute con grid hash:
+- Cada celda contiene punteros a sus nodos (atomic insert).
+- Repulsion: cada nodo lee solo su celda + 8 vecinas (3×3) → O(N · density) en vez de O(N log N).
+- Sin estructuras dinamicas → todo arrays planos en SSBO.
+
+## Arquitectura
+
+```
+cpp/functions/viz/
+├── graph_force_layout_gpu.h           # NEW
+├── graph_force_layout_gpu.cpp         # NEW
+├── graph_force_layout_gpu.md          # NEW
+
+cpp/functions/viz/shaders/              # NEW dir (o inline strings)
+├── force_clear_grid.comp               # NEW: zero counters
+├── force_build_grid.comp               # NEW: insert nodes en celdas (atomic counters)
+├── force_repulsion.comp                # NEW: leer celdas vecinas, acumular fuerza
+├── force_attraction.comp               # NEW: por arista, acumular spring force
+└── force_integrate.comp                # NEW: v += f, clamp, x += v, respeta pinned
+
+cpp/tests/
+└── test_graph_force_layout_gpu.cpp     # NEW
+```
+
+### API
+
+```cpp
+struct ForceLayoutGPU;
+
+ForceLayoutGPU* graph_force_layout_gpu_create(int max_nodes, int max_edges,
+                                              int grid_cells_per_side = 64);
+void            graph_force_layout_gpu_upload(ForceLayoutGPU*, const GraphData&);   // copy positions/edges to GPU
+float           graph_force_layout_gpu_step  (ForceLayoutGPU*, GraphData&,
+                                              const ForceLayoutConfig&);            // returns total energy
+void            graph_force_layout_gpu_readback(ForceLayoutGPU*, GraphData&);       // sync GPU positions back to CPU mirror
+void            graph_force_layout_gpu_destroy(ForceLayoutGPU*);
+```
+
+### Buffers GPU
+
+| SSBO | Layout | Tamaño |
+|---|---|---|
+| `positions`  | `vec2[N]`  | 8 × N |
+| `velocities` | `vec2[N]`  | 8 × N |
+| `forces`     | `vec2[N]`  | 8 × N (working) |
+| `flags`      | `uint[N]`  | 4 × N |
+| `edges`      | `uvec2[E]` | 8 × E |
+| `weights`    | `float[E]` | 4 × E |
+| `grid_counts`| `uint[G²]` | 4 × 64² = 16 KB |
+| `grid_cells` | `uint[G²][K]` | 4 × G² × max_nodes_per_cell (32) |
+
+Para 100k nodos: ~3 MB en SSBOs — trivial.
+
+## Tareas
+
+### Fase 1 — Esqueleto + buffer alloc
+
+- [ ] **1.1** Crear `ForceLayoutGPU` con `max_nodes`/`max_edges`. Crear todos los SSBOs.
+- [ ] **1.2** `_upload`: empaqueta posiciones/aristas/flags y llama `glBufferSubData`.
+- [ ] **1.3** Compilar los 5 compute shaders al crear (helper `compile_compute(src)`).
+
+### Fase 2 — Compute shaders
+
+- [ ] **2.1** `force_clear_grid.comp`: 1 thread por celda, zero counter.
+- [ ] **2.2** `force_build_grid.comp`: 1 thread por nodo, calcula celda, `atomicAdd(grid_counts[ci], 1)`, escribe en `grid_cells[ci][slot]` si `slot < K`.
+- [ ] **2.3** `force_repulsion.comp`: 1 thread por nodo. Recorre las 9 celdas vecinas, para cada otro nodo calcula `F = repulsion / dist²`. Acumula en `forces[i]`.
+- [ ] **2.4** `force_attraction.comp`: 1 thread por arista, atomic add a `forces[source]` y `forces[target]`.
+- [ ] **2.5** `force_integrate.comp`: 1 thread por nodo. Si `flags & NF_PINNED`, skip. `v = damping*v + F`, clamp a `max_velocity`, `x += v`. Atomic add a `total_energy_ssbo`.
+
+### Fase 3 — Step pipeline
+
+- [ ] **3.1** `_step` despacha los 5 compute shaders en orden con `glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT)` entre cada uno.
+- [ ] **3.2** Ultimo barrier: `GL_BUFFER_UPDATE_BARRIER_BIT` para que `_readback` vea valores frescos.
+- [ ] **3.3** Bounds calculados en CPU tras readback (cheap).
+
+### Fase 4 — Readback eficiente
+
+- [ ] **4.1** `_readback` usa `glGetBufferSubData` solo de `positions` (8 × N bytes). Para 50k nodos = 400 KB; CPU mirror se actualiza cada frame.
+- [ ] **4.2** Documentar tradeoff: si el consumer no necesita mirror cada frame (p.ej. solo la app que dibuja con la GPU), `_readback` es opcional.
+
+### Fase 5 — Tests
+
+- [ ] **5.1** Test smoke: 100 nodos, 200 aristas; correr 100 steps; verificar que la energia decrece.
+- [ ] **5.2** Test pinned: pinear 1 nodo en (0,0); tras 100 steps debe seguir en (0,0).
+- [ ] **5.3** Test consistency CPU-vs-GPU: para un grafo pequeño (50 nodos), la energia tras N steps debe diferir < 5% entre la version CPU y la GPU (no exacto por floating-point + grid approximation, pero comparable).
+- [ ] **5.4** Test SKIP en WSL si no hay context con compute (el harness ya skipea si no hay GL context).
+
+### Fase 6 — Demo
+
+- [ ] **6.1** Anadir toggle "GPU layout" en `demos_graph` que swappea CPU ↔ GPU. Mostrar fps y energia en tiempo real.
+- [ ] **6.2** Probar con 50k nodos.
+
+### Fase 7 — Cleanup
+
+- [ ] Frontmatter `.md` con `purity: impure`, `uses_types: [graph_data_cpp_viz, force_layout_config_cpp_viz]`.
+- [ ] `fn index`.
+- [ ] Commit `feat(viz): graph_force_layout_gpu compute + spatial hash`.
+
+## Criterio de done
+
+- [ ] 50k nodos + 200k aristas a 60fps con layout iterativo corriendo.
+- [ ] Tests verdes (smoke + pinned + consistency).
+- [ ] CPU usage < 10% durante el layout (medible con Tracy).
+- [ ] Toggle CPU/GPU en `demos_graph` operativo.
+
+## Riesgos
+
+| Riesgo | Mitigacion |
+|---|---|
+| `max_nodes_per_cell` overflow | Si una celda satura, los excedentes se ignoran (con warning); ajustar `grid_cells_per_side` mas grande o aumentar `K` |
+| Atomic contention en repulsion (escrituras a `forces[i]`) | Cada nodo es escrito por un solo thread (su propio); lecturas si concurrentes pero readonly de `positions` |
+| Atomic add sobre `forces` en attraction (multiple aristas tocan el mismo nodo) | Usar `atomicAdd` real sobre uvec2 packed, o serializar con un buffer temp y sumar despues. Alternativa: un compute pass por dimension |
+| Diferencias driver entre vendors (Mesa, NV, AMD) | Usar `#version 430 core`, evitar features vendor-specific. Test en al menos un GPU NVIDIA y uno AMD/Mesa |