Files

T

egutierrez 7c09255c8a chore(issues): plan 0049 OSINT graph viewer multi-issue

Aggregates the planning artifacts for the 0049 series (umbrella + 0049a..0049k):

- New rule cpp_apps.md (registered in INDEX) — standardize structure, CMake
  patterns, app.md frontmatter and sub-repo for C++ apps; points to the
  authoritative cpp/PATTERNS.md and cpp/DESIGN_SYSTEM.md.
- Feature flag osint_graph_v1 (disabled until 0049k closes).
- Issue 0049 (umbrella) and sub-issues 0049b..0049k describing the GPU
  rendering system, force-layout, types, sources, labels and the final
  graph_explorer app integration.
- README updated with the new rows (all pending; 0049a will flip to
  completed in the next commit).

2026-04-29 21:08:36 +02:00

6.5 KiB

Raw Blame History

0049h — `graph_force_layout_gpu`: compute shader + spatial hash

Metadata

Campo	Valor
ID	0049h
Estado	pendiente
Prioridad	media-alta
Tipo	feature performance — parte de #0049

Dependencias

Bloqueada por: 0049b (compute shaders), 0049e (flags.pinned).

Objetivo

Implementar el force-directed layout en GPU usando compute shaders y SSBOs. Repulsion via spatial hash grid (no Barnes-Hut), atraccion por aristas, integracion con clamp y respeto de NF_PINNED. API consistente con la version CPU para que el consumer pueda swappear.

Contexto

A 50k+ nodos el CPU layout (Barnes-Hut, ~5-10 ms/frame con 20k nodos) impide 60fps con margen. GPU compute con grid hash:

Cada celda contiene punteros a sus nodos (atomic insert).
Repulsion: cada nodo lee solo su celda + 8 vecinas (3×3) → O(N · density) en vez de O(N log N).
Sin estructuras dinamicas → todo arrays planos en SSBO.

Arquitectura

cpp/functions/viz/
├── graph_force_layout_gpu.h           # NEW
├── graph_force_layout_gpu.cpp         # NEW
├── graph_force_layout_gpu.md          # NEW

cpp/functions/viz/shaders/              # NEW dir (o inline strings)
├── force_clear_grid.comp               # NEW: zero counters
├── force_build_grid.comp               # NEW: insert nodes en celdas (atomic counters)
├── force_repulsion.comp                # NEW: leer celdas vecinas, acumular fuerza
├── force_attraction.comp               # NEW: por arista, acumular spring force
└── force_integrate.comp                # NEW: v += f, clamp, x += v, respeta pinned

cpp/tests/
└── test_graph_force_layout_gpu.cpp     # NEW

API

struct ForceLayoutGPU;

ForceLayoutGPU* graph_force_layout_gpu_create(int max_nodes, int max_edges,
                                              int grid_cells_per_side = 64);
void            graph_force_layout_gpu_upload(ForceLayoutGPU*, const GraphData&);   // copy positions/edges to GPU
float           graph_force_layout_gpu_step  (ForceLayoutGPU*, GraphData&,
                                              const ForceLayoutConfig&);            // returns total energy
void            graph_force_layout_gpu_readback(ForceLayoutGPU*, GraphData&);       // sync GPU positions back to CPU mirror
void            graph_force_layout_gpu_destroy(ForceLayoutGPU*);

Buffers GPU

SSBO	Layout	Tamaño
`positions`	`vec2[N]`	8 × N
`velocities`	`vec2[N]`	8 × N
`forces`	`vec2[N]`	8 × N (working)
`flags`	`uint[N]`	4 × N
`edges`	`uvec2[E]`	8 × E
`weights`	`float[E]`	4 × E
`grid_counts`	`uint[G²]`	4 × 64² = 16 KB
`grid_cells`	`uint[G²][K]`	4 × G² × max_nodes_per_cell (32)

Para 100k nodos: ~3 MB en SSBOs — trivial.

Tareas

Fase 1 — Esqueleto + buffer alloc

1.1 Crear ForceLayoutGPU con max_nodes/max_edges. Crear todos los SSBOs.
1.2 _upload: empaqueta posiciones/aristas/flags y llama glBufferSubData.
1.3 Compilar los 5 compute shaders al crear (helper compile_compute(src)).

Fase 2 — Compute shaders

2.1 force_clear_grid.comp: 1 thread por celda, zero counter.
2.2 force_build_grid.comp: 1 thread por nodo, calcula celda, atomicAdd(grid_counts[ci], 1), escribe en grid_cells[ci][slot] si slot < K.
2.3 force_repulsion.comp: 1 thread por nodo. Recorre las 9 celdas vecinas, para cada otro nodo calcula F = repulsion / dist². Acumula en forces[i].
2.4 force_attraction.comp: 1 thread por arista, atomic add a forces[source] y forces[target].
2.5 force_integrate.comp: 1 thread por nodo. Si flags & NF_PINNED, skip. v = damping*v + F, clamp a max_velocity, x += v. Atomic add a total_energy_ssbo.

Fase 3 — Step pipeline

3.1 _step despacha los 5 compute shaders en orden con glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT) entre cada uno.
3.2 Ultimo barrier: GL_BUFFER_UPDATE_BARRIER_BIT para que _readback vea valores frescos.
3.3 Bounds calculados en CPU tras readback (cheap).

Fase 4 — Readback eficiente

4.1 _readback usa glGetBufferSubData solo de positions (8 × N bytes). Para 50k nodos = 400 KB; CPU mirror se actualiza cada frame.
4.2 Documentar tradeoff: si el consumer no necesita mirror cada frame (p.ej. solo la app que dibuja con la GPU), _readback es opcional.

Fase 5 — Tests

5.1 Test smoke: 100 nodos, 200 aristas; correr 100 steps; verificar que la energia decrece.
5.2 Test pinned: pinear 1 nodo en (0,0); tras 100 steps debe seguir en (0,0).
5.3 Test consistency CPU-vs-GPU: para un grafo pequeño (50 nodos), la energia tras N steps debe diferir < 5% entre la version CPU y la GPU (no exacto por floating-point + grid approximation, pero comparable).
5.4 Test SKIP en WSL si no hay context con compute (el harness ya skipea si no hay GL context).

Fase 6 — Demo

6.1 Anadir toggle "GPU layout" en demos_graph que swappea CPU ↔ GPU. Mostrar fps y energia en tiempo real.
6.2 Probar con 50k nodos.

Fase 7 — Cleanup

Frontmatter .md con purity: impure, uses_types: [graph_data_cpp_viz, force_layout_config_cpp_viz].
fn index.
Commit feat(viz): graph_force_layout_gpu compute + spatial hash.

Criterio de done

50k nodos + 200k aristas a 60fps con layout iterativo corriendo.
Tests verdes (smoke + pinned + consistency).
CPU usage < 10% durante el layout (medible con Tracy).
Toggle CPU/GPU en demos_graph operativo.

Riesgos

Riesgo	Mitigacion
`max_nodes_per_cell` overflow	Si una celda satura, los excedentes se ignoran (con warning); ajustar `grid_cells_per_side` mas grande o aumentar `K`
Atomic contention en repulsion (escrituras a `forces[i]`)	Cada nodo es escrito por un solo thread (su propio); lecturas si concurrentes pero readonly de `positions`
Atomic add sobre `forces` en attraction (multiple aristas tocan el mismo nodo)	Usar `atomicAdd` real sobre uvec2 packed, o serializar con un buffer temp y sumar despues. Alternativa: un compute pass por dimension
Diferencias driver entre vendors (Mesa, NV, AMD)	Usar `#version 430 core`, evitar features vendor-specific. Test en al menos un GPU NVIDIA y uno AMD/Mesa

6.5 KiB Raw Blame History Unescape Escape

0049h — graph_force_layout_gpu: compute shader + spatial hash