07d06d5e7d
Stack base de compute shaders OpenGL 4.3 para cargas Monte Carlo intensivas en GPU. Reutiliza el patron de graph_force_layout_gpu (SSBO + compute) y se integra con el resto del registry sin nuevos simbolos en gl_loader (todo lo que se necesita ya estaba expuesto). - gpu_ssbo: lifecycle de Shader Storage Buffer Objects. - gpu_compute_program: compila compute GLSL 4.3 con preamble inyectable (mismo pattern de gl_shader::compile_fragment). - gpu_dispatch: dispatch_1d/2d/3d con ceil(N/local) automatico + barrier helpers (storage, uniform, image, buffer_update, all). - gpu_rng_glsl: PCG32 GLSL (uniform/normal/below) + SplitMix64 seed walkers para sembrar deterministicamente N walkers desde un master seed. - gpu_histogram_1d: SSBO float[N] -> uint[nbins] via atomicAdd. - gpu_histogram_2d: SSBO float[2N] xy-interleaved -> uint[nx*ny] + to_density helper para alimentar heatmap_cpp_viz. - gpu_reduce: workgroup-shared sum/min/max/mean (local 256, partials CPU). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.6 KiB
3.6 KiB
name, kind, lang, domain, version, purity, signature, description, tags, uses_functions, uses_types, returns, returns_optional, error_type, imports, tested, tests, test_file_path, file_path, framework, params, output
| name | kind | lang | domain | version | purity | signature | description | tags | uses_functions | uses_types | returns | returns_optional | error_type | imports | tested | tests | test_file_path | file_path | framework | params | output | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gpu_histogram_2d | function | cpp | gfx | 1.0.0 | impure | GpuHistogram2D gpu_histogram_2d_create(int nx, int ny); void gpu_histogram_2d_clear(GpuHistogram2D&); void gpu_histogram_2d_accumulate(GpuHistogram2D&, const Ssbo& samples_xy, int count, float xmin, float xmax, float ymin, float ymax); void gpu_histogram_2d_readback(const GpuHistogram2D&, unsigned int* out); void gpu_histogram_2d_to_density(const unsigned int* counts, int nx, int ny, float* out); void gpu_histogram_2d_destroy(GpuHistogram2D&) | Binner GPU 2D: SSBO float[2*N] xy-interleaved -> SSBO uint[nx*ny] row-major via atomicAdd. Output normalizable a float[] para alimentar heatmap_cpp_viz / contour_cpp_viz. |
|
|
false | error_go_core |
|
false | cpp/functions/gfx/gpu_histogram_2d.cpp | opengl |
|
Bins acumulados como uint[nx*ny] row-major (idx = y*nx + x). to_density convierte a float normalizado in-place. accumulate emite barrier_storage; readback emite barrier_buffer_update. |
gpu_histogram_2d
Binner 2D para densidades de muestras (joint posteriors, walk traces, scatter density). Output listo para heatmap_cpp_viz (z[]), contour_cpp_viz (z[] con marching squares) y surface_plot_3d_cpp_viz.
Patron tipico (mcmc_full / mcmc_visualizer)
auto h2d = fn::gfx::gpu_histogram_2d_create(128, 128);
// Cada step del MCMC genera un sample (x, y); los acumulamos en xy_ssbo
// como float[2*N]. Tras N steps:
fn::gfx::gpu_histogram_2d_clear(h2d);
fn::gfx::gpu_histogram_2d_accumulate(h2d, xy_ssbo, N,
-5.0f, 5.0f, -5.0f, 5.0f);
std::vector<unsigned int> counts(128 * 128);
fn::gfx::gpu_histogram_2d_readback(h2d, counts.data());
std::vector<float> density(128 * 128);
fn::gfx::gpu_histogram_2d_to_density(counts.data(), 128, 128, density.data());
fn::viz::heatmap(density.data(), 128, 128, /*...*/);
fn::gfx::gpu_histogram_2d_destroy(h2d);
Layout del SSBO de samples
samples_xy es float[2*count] interleaved. Si tu kernel MC produce vec2 en std430 (8 bytes alineados), la lectura es la misma — el shader interpreta los pares como xy. Si usas un struct con padding, compactalo antes.
Performance
Para 10^7 samples en grid 256×256 sobre RTX 3070: ~5-7 ms (memory-bound, 256k bins distribuidos), suficiente para refresh continuo a 60 FPS.
Notas
to_densityes CPU-side y conserva resolucion fp32 sobre el max — adecuado para heatmaps. Para cdf/cumulative usar otra funcion (no incluida aqui).- El binner mantiene el estado GL (programa + SSBO). Crear uno por viewport; no es seguro compartirlo entre threads del lado CPU.