d115d8e830
Nuevo dominio cpp/functions/datascience con primitivas puras CPU para post- proceso de samples Monte Carlo y diagnostico de cadenas MCMC. Diseñadas como gemelas CPU de los kernels GPU (rng pareja con gpu_rng_glsl, MH 1D/ND con mc_metropolis_hastings_gpu) para validar numericamente y para datasets pequeños donde el dispatch GPU no compensa. - rng: xoshiro256++ con uniform / normal (Box-Muller) / below (Lemire) / categorical. Determinista bit-exacto dado seed. - stats_summary: sum (Kahan), mean, var/std (Welford one-pass), min, max, quantile / percentile (R type-7). - autocorr: r(k), ACF, tau_int (Sokal) — diagnostico ACF y ESS. - rhat_ess: Gelman-Rubin clasico y split + ESS basico (multi-chain). - beta_dist: lgamma (Lanczos), beta_pdf, beta_cdf (continued fraction), beta_quantile, mean/var/std — para inferencia Beta-Binomial. - drawdown: max_dd absoluto/pct + underwater series para sesiones simuladas y backtests. - samples_to_grid_2d: binning 2D CPU para alimentar heatmap_cpp_viz / contour_cpp_viz desde samples (x[], y[]). - metropolis_hastings: MH 1D y ND con target log-pdf como std::function (no normalizada). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.8 KiB
3.8 KiB
name, kind, lang, domain, version, purity, signature, description, tags, uses_functions, uses_types, returns, returns_optional, error_type, imports, tested, tests, test_file_path, file_path, params, output
| name | kind | lang | domain | version | purity | signature | description | tags | uses_functions | uses_types | returns | returns_optional | error_type | imports | tested | tests | test_file_path | file_path | params | output | ||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| metropolis_hastings | function | cpp | datascience | 1.0.0 | pure | MHResult mh_run_1d(const std::function<double(double)>& log_pdf, double x0, double sigma, size_t n, double* out, Rng&); MHResult mh_run_nd(const std::function<double(const double*)>& log_pdf, const double* x0, const double* sigma, int d, size_t n, double* out, Rng&) | Metropolis-Hastings 1D y d-dimensional con proposal Gaussian symmetric. Target log-pdf inyectable via std::function (no necesita normalizarse). Devuelve cadena en out[] y accept rate. Pareja CPU del mc_metropolis_hastings_gpu. |
|
|
false |
|
false | cpp/functions/datascience/metropolis_hastings.cpp |
|
MHResult con n_samples, n_accepted y accept_rate. Tipico target: 0.20-0.40 accept rate (ajustar proposal_sigma). |
metropolis_hastings
Random-walk Metropolis estandar. Replica el sampler que usan los 4 calculadores MCMC del set, en formato componible.
Patron 1D (mcmc-bayes / mcmc-full Beta posterior)
double k = 7.0, n_obs = 10.0; // observados
auto log_post = [k, n_obs](double theta) -> double {
if (theta <= 0.0 || theta >= 1.0) return -1e300;
// Beta(2, 2) prior + Binomial likelihood
return (k + 1.0) * std::log(theta) +
(n_obs - k + 1.0) * std::log(1.0 - theta);
};
fn::ds::Rng r;
fn::ds::rng_seed(r, 42);
std::vector<double> chain(10000);
auto res = fn::ds::mh_run_1d(log_post,
/*x0=*/0.5,
/*sigma=*/0.1,
chain.size(),
chain.data(),
r);
// Burn-in: descartar primeros 1000
// res.accept_rate ~ 0.3 si sigma esta bien ajustada
Patron 2D (mcmc-visualizer)
auto log_density = [](const double* xy) -> double {
double x = xy[0], y = xy[1];
// Bimodal target del visualizer
double d1 = (x-2)*(x-2) + (y-2)*(y-2);
double d2 = (x+2)*(x+2) + (y+2)*(y+2);
return std::log(std::exp(-0.5*d1) + std::exp(-0.5*d2));
};
double x0[2] = {0.0, 0.0};
double sigma[2] = {1.0, 1.0};
std::vector<double> chain(10000 * 2);
fn::ds::mh_run_nd(log_density, x0, sigma, 2,
10000, chain.data(), r);
Notas
- No hay tuning automatico del proposal sigma. Si el accept rate es <0.1, baja sigma; si >0.6, sube. Para tunings serios usar adaptive MH (no incluido aqui).
target_log_pdfse evalua 2 veces por sample en el peor caso (current cached, proposal nuevo). Si tu log-pdf es caro (10ms+ por eval), N=10^4 cuesta minutos — para esos casos preferir GPU MH (no caben dependencias caras en GLSL — tienes que portar la log-pdf).- Compatible con
rhat_splityess_basic: corre M cadenas con seeds distintos, apila enchains[m*n + i], llama los diagnosticos. - 1D y ND comparten la maquinaria pero ND lleva el coste de los
std::vector<double>por sample. Si rendimiento importa y conoces d en compile-time, fork-and-specialize.