feat(0133-1+2): columnar snapshot + string interning in data_table

Change 1 — Columnar Snapshot Internal:
- Add ColumnSnapshot struct (type + str_ids/i64/f64 per column) in data_table_internal.h
- Add SnapshotCache struct with pointer-identity sentinel (last_cells_ptr)
- Add SnapshotCache field to UiState singleton
- In render(): rebuild snapshot after join materialization when cells ptr changes
  Uses same pointer-identity pattern as existing stats_last_cells in State
  Int/Float columns parsed once via parse_number; String/Auto interned

Change 2 — String Interning:
- Add StringPool struct (strings + unordered_map<string_view, uint32_t>) to data_table_types.h
- StringPool is per-State (NOT global) for table isolation
- intern(sv) inserts if absent, returns stable uint32_t index
- Cleared + rebuilt on each snapshot rebuild for index coherence
- Add string_pool field to State struct

Documentation:
- Extended header comment in data_table_internal.h describing design,
  StringPool API, invariants (pointer-identity, row→snapshot_row),
  and how stats_last_cells and snapshot coexist independently

Build: fn_module_data_table + tables_qa pass, no new errors (only
pre-existing -Wformat-truncation warnings unrelated to this change).
Public API (data_table.h, TableInput, render() signature) unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-22 23:35:12 +02:00
parent 3791afa216
commit ce7470d5f5
3 changed files with 199 additions and 0 deletions
+45
View File
@@ -8,6 +8,8 @@
#include "compute_column_stats.h"
#include <string>
#include <string_view>
#include <unordered_map>
#include <utility>
#include <vector>
@@ -353,6 +355,44 @@ struct VizPanel {
mutable ViewMode last_non_table = ViewMode::Bar;
};
// ----------------------------------------------------------------------------
// StringPool — interning de strings para columnas de texto (issue 0133).
// Una instancia por State (NOT global) para aislar tablas independientes.
//
// intern(sv) devuelve un indice uint32_t estable para la vida del rebuild.
// El pool se limpia (clear()) al inicio de cada rebuild de snapshot columnar.
//
// Invariante de invalidacion de string_view:
// - El vector `strings` se reserva con reserve() ANTES del primer intern()
// para evitar reallocs que invalidarian los string_view del mapa.
// Si la estimacion es insuficiente (columna con mas unicos de lo esperado),
// el mapa se reconstruye post-push_back: intern() verifica cap antes de
// insertar en el map para cubrir este caso.
// ----------------------------------------------------------------------------
struct StringPool {
std::vector<std::string> strings; // strings unicos, por indice
std::unordered_map<std::string_view, uint32_t> index; // sv→id (sv apunta a strings[i])
void clear() {
strings.clear();
index.clear();
}
// intern: inserta si no existe. Devuelve indice estable.
uint32_t intern(std::string_view sv) {
auto it = index.find(sv);
if (it != index.end()) return it->second;
uint32_t id = (uint32_t)strings.size();
strings.emplace_back(sv);
// Re-apuntar el string_view al almacenamiento interno (strings[id]).
index.emplace(std::string_view(strings[id]), id);
return id;
}
const std::string& at(uint32_t id) const { return strings[id]; }
bool empty() const { return strings.empty(); }
};
// ----------------------------------------------------------------------------
// State: stage pipeline + viz globales.
// ----------------------------------------------------------------------------
@@ -419,6 +459,11 @@ struct State {
std::vector<DrillStep> drill_back;
std::vector<DrillStep> drill_forward;
// String interning pool (issue 0133, Change 2).
// Limpiado y repoblado en cada rebuild del snapshot columnar.
// NOT global — una instancia por State para aislar tablas independientes.
StringPool string_pool;
// Helpers (definidos en compute_stage.cpp).
Stage& raw();
const Stage& raw() const;