- cmd/fn/doctor.go - cmd/fn/main.go - cpp/apps/primitives_gallery/playground/tables/CMakeLists.txt - cpp/apps/primitives_gallery/playground/tables/data_table.cpp - cpp/apps/primitives_gallery/playground/tables/data_table_logic.cpp - cpp/apps/primitives_gallery/playground/tables/data_table_logic.h - cpp/apps/primitives_gallery/playground/tables/self_test.cpp - cpp/apps/primitives_gallery/playground/tables/tql.cpp - cpp/apps/primitives_gallery/playground/tables/viz.cpp - cpp/apps/primitives_gallery/playground/tables/viz.h - ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
TQL — Table Query Language
TQL es un DSL declarativo para transformar tablas en el playground/aplicaciones del registry que muestran datos tabulares. Inspirado en Metabase MBQL pero serializado como Lua table (no JSON) — aprovecha el lua_engine ya embebido y permite expresiones inline que reusan el DSL Lua para columnas custom.
Diseñado para que un LLM pueda emitir/leer TQL y orquestar visualizaciones complejas con un prompt.
Como Metabase representa visualizaciones con MBQL
Metabase separa dos planos que TQL replica:
1. dataset_query (= TQL stages) — QUE datos
Pipeline MBQL: filter, breakout, aggregation, order-by, expressions. Define los datos a producir. Es la unica parte ejecutable.
2. display + visualization_settings — COMO mostrar
Independiente de los datos. La misma query puede renderizarse de N formas con un click. Estructura Metabase:
{
"display": "table",
"visualization_settings": {
"table.columns": [
{"name": "lang", "enabled": true, "fieldRef": [...]},
{"name": "count", "enabled": true, "fieldRef": [...]}
],
"table.column_widths": [120, 80],
"table.column_formatting": [
{"columns": ["count"], "type": "single",
"operator": ">", "value": 100, "color": "#e08060"}
],
"graph.dimensions": ["lang"],
"graph.metrics": ["count"],
"graph.colors": ["#ff7f50"],
"stackable.stack_type": null
}
}
display valores comunes: table, scalar, bar, line, area, pie, row, funnel, gauge, map, pivot, combo, scatter, waterfall.
visualization_settings:
- table.columns: visibilidad/orden de cols. Cada entrada
{name, enabled, fieldRef}. - table.column_formatting: reglas de color condicional. Operator-based (
>,=,is-null,<). - graph.dimensions/metrics: que cols van en X / Y para charts.
- graph.colors: paleta.
- stackable.stack_type: stack mode (
stacked,normalized, null).
Drill-through actions (Metabase): tras click en una celda, se ofrecen acciones:
view-details: ver detalle row.filter: anadir filtro por valor.drill-through: navegar a sub-query.zoom: re-agrupar a granularidad inferior (ej. mes → dia).summarize: pivotar.quick-filter: presets.
Implicaciones para TQL
TQL adopta esa separacion: stages (data) + display + columns (viz). Mismo patron, sintaxis Lua.
Cuando un boton futuro "Add visualization" se construya, anade un nuevo display + viz settings a una query existente sin tocar stages. Asi tendremos M visualizaciones (table, bar, line, scatter) sobre los mismos datos transformados.
Modelo conceptual
Una consulta TQL es una lista de stages que se evaluan secuencialmente. El output del stage N alimenta al stage N+1:
input_cells (raw dataset)
│
▼
┌─────────────┐
│ Stage 0 │ filter + expressions + sort (sin agrupar)
│ RAW │ output = filtered/sorted cells with same shape
└─────────────┘
│
▼
┌─────────────┐
│ Stage 1 │ filter + breakout + aggregation + sort
│ GROUPED │ output = rows agrupadas (cols = breakouts + aggs)
└─────────────┘
│
▼
┌─────────────┐
│ Stage 2 │ filter + breakout + aggregation + sort sobre stage 1
│ GROUPED │ ...
└─────────────┘
Invariantes:
- Stage 0 SIEMPRE es Raw — NO admite
breakoutniaggregation. Solofilter,expressions,sort. Garantiza inmutabilidad de los datos crudos. - Stages 1+ admiten todas las clauses.
- Cada stage opera sobre los headers/cols PRODUCIDOS por el stage anterior. Tras un breakout las cols disponibles cambian.
Sintaxis raiz
return {
version = 1,
display = "table",
-- DATA: pipeline de transformacion
stages = {
-- Stage 0 (Raw): filter + expressions + sort
{
filter = { ... },
expressions = { ... },
sort = { ... },
},
-- Stage 1 (Grouped):
{
filter = { ... }, -- sobre output de stage 0
breakout = { ... },
aggregation = { ... },
sort = { ... },
},
-- Stage 2, 3, ... (ilimitado)
},
-- VIZ: como renderizar la tabla resultante
columns = {
{name = "lang", type = "string", visible = true, order = 1},
{name = "size_kb", type = "float", visible = true, order = 2,
color_rules = { {equals = "0.0", color = "#e08060"} }},
{name = "internal", type = "string", visible = false, order = 3},
},
visualization_settings = {
-- Futuro: opciones especificas del display (chart axes, paleta, stack, etc.)
},
}
stages es obligatorio y debe tener al menos 1 elemento (el Raw). Resto es opcional.
filter
Lista de predicados. Multiples filters se combinan con AND implicito.
filter = {
{"=", "lang", "go"},
{">=", "coverage_pct", 80},
{"contains", "name", "filter"},
}
Operadores soportados:
| Op | Aplica a | Semantica |
|---|---|---|
= != |
cualquier tipo | igualdad numerica o lexical |
> >= < <= |
numericas, fechas ISO | comparacion numerica/lexical |
contains !contains |
strings | substring |
starts |
strings | prefijo |
ends |
strings | sufijo |
Range filter = 2 filtros encadenados >= y <=.
expressions
Mapping nombre -> formula Lua. Las expressions crean columnas computadas accesibles en el mismo stage (filter/sort/aggregation las pueden referenciar). Sintaxis Lua identica a las columnas custom interactivas. Ver seccion Lua DSL.
expressions = {
size_bytes = "[size_kb] * 1024",
is_top = "[coverage_pct] >= 90",
full_name = "fn.concat([lang], ':', [domain])",
}
El tipo de la columna se auto-detecta evaluando un sample.
breakout
Lista de nombres de columnas por las que agrupar. Orden importa: define la jerarquia visual del grupo.
breakout = { "lang", "domain" }
Cada combinacion unica de valores (lang, domain) produce una fila en el output. Si breakout esta vacio pero hay aggregation, todo el dataset se reduce a UNA sola fila.
Disponible solo en stages >= 1.
aggregation
Lista de reducciones aplicadas a cada grupo. Sintaxis {fn, col?, arg?}.
aggregation = {
{"count"},
{"sum", "deps_count"},
{"avg", "size_kb"},
{"min", "size_kb"},
{"max", "coverage_pct"},
{"distinct", "name"},
{"stddev", "coverage_pct"},
{"median", "size_kb"},
{"p25", "size_kb"},
{"p75", "size_kb"},
{"p90", "size_kb"},
{"p99", "size_kb"},
{"percentile", "size_kb", 0.95},
}
Funciones soportadas:
| Fn | Argumentos | Output type | Notas |
|---|---|---|---|
count |
— | Int | rows del grupo |
sum |
col | Float | suma de celdas numericas no-vacias |
avg |
col | Float | media aritmetica |
min |
col | mismo que col | numerico si la col es numerica; lexical si String/Date |
max |
col | mismo que col | idem |
distinct |
col | Int | count distinct values |
stddev |
col | Float | desviacion estandar poblacional |
median |
col | Float | percentil 50 con interpolacion lineal |
p25 p75 p90 p99 |
col | Float | percentiles fijos |
percentile |
col, p (0..1) | Float | percentil generico, p es float 0..1 |
Alias por defecto del output column:
count→"count"distinct col→"distinct_<col>"percentile col p→"p<int(p*100)>_<col>"(ej.p95_size_kb)- resto →
"<fn>_<col>"(ej.avg_size_kb)
Para override del alias, se acepta una forma extendida (futura, no implementada en v1):
{"avg", "size_kb", alias = "mean_size"}
v1: alias auto-generado. Si necesitas un alias custom, crea una expression encima.
Celdas vacias se IGNORAN al agregar (no rompen el calculo).
columns
Per-col render config. Array de entries, una por columna efectiva (originales + derivadas). Si TQL incluye columns, los campos sobrescriben el estado UI:
columns = {
{
name = "lang", -- header name (obligatorio)
type = "string", -- opcional. Para orig cols: informacional. Para derived: mutable
visible = true, -- opcional, default true
order = 1, -- opcional. Posicion visual 1-based. Default = orden de aparicion
color_rules = { -- opcional. Lista de {equals, color hex}
{equals = "go", color = "#86b56b"},
{equals = "py", color = "#6b8eb5"},
},
},
-- ...
}
Tipos validos (type): "string", "int", "float", "bool", "date", "json", "auto".
Color hex format: "#rrggbb" o "#rrggbbaa". Si solo 6 digitos, alpha = ff.
Cols que no aparecen en columns: mantienen su estado UI actual (visible, posicion natural).
display
Tipo de visualizacion. v1 solo "table". Futuro: "bar", "line", "scatter", "pie", "scalar", "area", "pivot". Default: "table".
visualization_settings
Reservado para configuracion especifica por tipo de display. v1 vacio. Futuro:
visualization_settings = {
["graph.dimensions"] = {"lang"},
["graph.metrics"] = {"count"},
["graph.colors"] = {"#86b56b", "#6b8eb5"},
["stackable.stack_type"] = "stacked",
}
Sintaxis Metabase: las keys con . van entre brackets [].
sort
Lista de clauses. Multi-sort por orden de aparicion (primera = primaria).
sort = {
{"desc", "count"},
{"asc", "lang"},
}
Direcciones: "asc" o "desc".
La col puede ser cualquier header del output del stage (incluyendo aggregations).
Si sort esta vacio, el output mantiene el orden de aparicion de los grupos (estable).
Ejemplo round-trip completo
Pregunta: "Para las funciones puras con cobertura >= 80%, agrupa por lenguaje y dominio, dame conteo + tamaño medio, ordena por conteo descendente, luego sobre eso agrupa solo por lenguaje para ver totales."
return {
stages = {
-- Stage 0: Raw + filter
{
filter = {
{"=", "purity", "pure"},
{">=", "coverage_pct", 80},
},
},
-- Stage 1: group by (lang, domain)
{
breakout = { "lang", "domain" },
aggregation = {
{"count"},
{"avg", "size_kb"},
},
sort = { {"desc", "count"} },
},
-- Stage 2: re-group by lang sobre el output de stage 1
{
breakout = { "lang" },
aggregation = {
{"sum", "count"}, -- suma del count del stage 1 (total funcs por lang)
{"avg", "avg_size_kb"}, -- media de las medias
},
sort = { {"desc", "sum_count"} },
},
}
}
Drill-down (semantica)
Si el usuario interactua con una celda agrupada del stage N, hace drill-down:
- Toma los valores de breakout en esa fila (ej.
lang=go,domain=core). - Vuelve al stage N-1.
- PUSH a
filterdel stage N-1 cada(breakout_col = breakout_value). - El usuario "esta examinando" las filas originales que componen ese grupo.
No elimina stages posteriores (los conserva por si quiere volver a avanzar).
Esto es equivalente a:
-- estado original
{stages = { {filter={}}, {breakout={"lang","domain"}, aggregation={{"count"}}} }}
-- tras drill-down en row (go, core):
{stages = {
{filter = {{"=","lang","go"}, {"=","domain","core"}}}, -- stage 0 ahora filtrado
{breakout={"lang","domain"}, aggregation={{"count"}}} -- stage 1 intacto
}}
-- active_stage = 0 (vuelve a Raw)
Lua DSL de expresiones
Las strings dentro de expressions siguen el mini-DSL Lua de columnas custom. Reglas:
- Acceso a columnas:
[col_name]orow.col_nameorow["col_name"]. - Index 1-based:
row[1]. - Auto-return: si la expresion no empieza por
return/if/for/while/do/local/repeat/function, se prependereturnautomatico. - Type-aware: cell de col Int/Float llega como number; Bool como boolean; resto como string. Vacia = nil.
- UTF-8 ok en nombres
[año]. - Comentarios
--y--[[ ]]respetados. - Builtins disponibles via
fn.*:upper, lower, length, substring, contains, starts_with, ends_with, replace, trim, concat, to_number, to_string, to_bool, is_null, is_empty, coalesce, parse_date, year, month, day. - Sandbox: sin
io,require,dofile,loadfile,load,package,debug.osrecortado adate/time/difftime/clock.
Ejemplos:
[size_kb] * 1024 -- arithmetic
fn.concat([lang], ":", [domain]) -- string compose
if [coverage_pct] >= 90 then "well" else "low" end
fn.year([updated_at]) -- date helper
Restricciones v1
| No soportado | Workaround |
|---|---|
| Joins entre tablas | Pre-procesar fuera del registry. |
| Subqueries SQL | Usar stages encadenados (modelo equivalente). |
HAVING post-aggregation |
Stage siguiente con filter sobre cols agregadas. |
LIMIT |
TBD — añadir como limit = N en stage v2. |
| Window functions | TBD. |
| Custom aggregation Lua | TBD — {"lua", "col", "<body>"}. |
| Alias custom en aggregation v1 | Crear expression post-grupo. |
Prompt template para LLM
Cuando expongas TQL a un LLM, dale este preambulo:
You output TQL — a Lua table that describes a table transformation. Format:
return {
stages = {
{ filter = {...}, expressions = {...}, sort = {...} }, -- Stage 0 (Raw)
{ filter = {...}, breakout = {...}, aggregation = {...}, sort = {...} }, -- Stage 1+
...
}
}
Rules:
- Stage 0 NEVER has breakout/aggregation (immutable raw layer).
- Filters: {"op", "col", value}. Ops: =, !=, >, >=, <, <=, contains, !contains, starts, ends.
- Aggregations: {"fn", "col"} or {"count"} or {"percentile", "col", 0.95}.
Available fns: count, sum, avg, min, max, distinct, stddev, median, p25, p75, p90, p99, percentile.
- Sort: {{"desc", "col"}, ...}. Multi-sort por orden de la lista.
- Expressions value es una expresion Lua. Acceso a cols via [col_name].
The available columns of the current input table are: <inject runtime>.
The available column types: <inject runtime>.
Output ONLY the Lua table, no markdown.
API C++ (referencia)
Modulo data_table_logic.h:
// Tipos
enum class AggFn { Count, Sum, Avg, Min, Max, Distinct, Stddev, Median, P25, P75, P90, P99, Percentile };
struct Aggregation { AggFn fn; std::string col; double arg; std::string alias; };
struct SortClause { std::string col; bool desc; };
struct Stage {
std::vector<Filter> filters;
std::vector<DerivedColumn> derived;
std::vector<std::string> breakouts;
std::vector<Aggregation> aggregations;
std::vector<SortClause> sorts;
};
// Funciones puras
std::string aggregation_alias(const Aggregation& a);
ColumnType aggregation_type (const Aggregation& a,
const std::vector<std::string>& in_headers,
const std::vector<ColumnType>& in_types);
StageOutput compute_stage(const char* const* in_cells, int in_rows, int in_cols,
const std::vector<std::string>& in_headers,
const std::vector<ColumnType>& in_types,
const Stage& stage);
compute_stage es PURO. Sin side effects. Testeable headless. Encadenable: alimenta su StageOutput.cells/headers/types como input del proximo compute_stage.
Estado de implementacion
| Feature | Status |
|---|---|
Stage + Aggregation types |
done |
compute_stage (filter + group + agg + sort) |
done (Phase 1) |
| Todas las aggregations (count..percentile) | done |
aggregation_alias / aggregation_type |
done |
| Multi-sort por stage | done |
| Tests E2E logica | done (37 checks) |
tql_emit / tql_apply (Lua round-trip) |
Phase 2 (pendiente) |
State refactor a vector<Stage> |
Phase 3 (pendiente) |
| UI breadcrumb stages + chips por stage | Phase 3 (pendiente) |
| Drill-down interactivo | Phase 3 (pendiente) |
| Show TQL / Apply TQL modals | Phase 2 |
| Multi-sort drag-reorder | Phase 4 |
Ver cpp/apps/primitives_gallery/playground/tables/ para la implementacion del playground.
SQL transpile subset (fase 11 — issue 0080)
TQL emite SQL DuckDB equivalente para que agentes LLM puedan generar TQL o SQL contra los mismos datos. Modulo tql_to_sql.{h,cpp} provee emit_sql(State, tables). Mapeo MBQL-style con CTE chain t0..tN.
Lua subset transpilable
Lua sigue potente y sin limites en runtime general (formula eval en derived cols TQL puro). El subset SOLO aplica al pedir tql_to_sql::emit_sql(). Fuera del subset → error compile-time con causa concreta + workaround.
Permitido (transpila a SQL DuckDB):
| Lua | SQL DuckDB | Ejemplo |
|---|---|---|
| Literales numero/string/bool/nil | mismas ('x', TRUE, NULL) |
42, "hola", nil |
Col ref: [colname] |
"colname" (quoted) |
[size_kb] → "size_kb" |
Aritmetica: + - * / % - (unary) |
mismas | [a] + [b] * 2 → ("a" + ("b" * 2)) |
Comparacion: == ~= < <= > >= |
= <> < <= > >= |
[n] >= 10 → ("n" >= 10) |
Logica: and or not |
AND OR NOT |
[a] and [b] → ("a" AND "b") |
String concat: .. |
|| |
[a] .. "_" .. [b] → ("a" || '_' || "b") |
Ternary: if A then B else C end |
CASE WHEN A THEN B ELSE C END |
obligatorio else |
math.floor/ceil/abs/sqrt/sin/cos/log/exp |
floor/ceiling/abs/sqrt/sin/cos/ln/exp |
math.floor([x]) |
math.min(a,b)/max(a,b) |
least(a,b)/greatest(a,b) |
math.min([a], 100) |
string.upper/lower/len(s) |
upper(s)/lower(s)/length(s) |
string.upper([name]) |
string.sub(s, i [, j]) |
substring(s, i [, j-i+1]) |
string.sub([s], 1, 3) |
tostring(x)/tonumber(x) |
CAST(x AS VARCHAR)/CAST(x AS DOUBLE) |
tonumber([n]) |
| Parentesis y precedencia Lua | mismas | (a + b) * c |
Fuera de subset (error compile-time):
- Closures:
function() ... end - Loops:
for/while/repeat - Locals:
local x = ... - Tables:
{...},t[k],t.field,table.* - Multi-return, vararg
... string.gsub/find/match/format/byte/char/rep- IO/OS/debug:
io.*,os.*,debug.*,package,require,print - Coroutines, metatables,
pcall/xpcall,rawget/rawset - Recursion, multi-statement bodies (
;) - Length operator
# - Method calls
: - Ternary sin else:
if A then B end(subset requiere ambas ramas)
Error message ejemplo
SQL transpile error en derived col 'fullname':
formula = "[first] .. ' ' .. string.gsub([last], 'X', 'Y')"
causa: function 'string.gsub' not in SQL transpile whitelist
ver docs/TQL.md#sql-transpile-subset
workaround: usar TQL puro (sin SQL emit) o reescribir formula
Stage → SQL mapeo
| TQL element | SQL DuckDB |
|---|---|
| Stage 0 Raw | CTE t0 AS (SELECT cols+derived FROM main_t [JOIN ...] [WHERE filters] [ORDER BY sorts]) |
| Stage N>=1 | CTE tN AS (SELECT breakouts+aggs FROM tN-1 [GROUP BY ...] [ORDER BY ...]) |
breakout "col" |
"col" |
breakout "col:month" |
date_trunc('month', "col") |
breakout "col:year/week/day/hour" |
date_trunc('year/week/day/hour', "col") |
| Aggregation Count | COUNT(*) |
| Aggregation Sum/Avg/Min/Max/Stddev | SUM/AVG/MIN/MAX/STDDEV("col") |
| Aggregation Distinct | COUNT(DISTINCT "col") |
| Aggregation Median/P25/P75/P90/P99 | quantile_cont("col", p) |
| Aggregation Percentile p | quantile_cont("col", p) |
| Filter Op::Eq/Neq/Gt/Gte/Lt/Lte | "col" = ? etc (params bound) |
| Filter Op::Contains | "col" LIKE '%v%' (param %v%) |
| Filter Op::StartsWith / EndsWith | LIKE 'v%' / LIKE '%v' |
Sort {desc, "col"} |
ORDER BY "col" DESC |
| Join Left/Inner/Right/Full | LEFT/INNER/RIGHT/FULL OUTER JOIN ... ON ... |
Join multi-key on={{l1,r1},{l2,r2}} |
ON l.l1 = r.r1 AND l.l2 = r.r2 |
| Join fields | cols alias.field AS "alias.field" |
main_source |
FROM "main_source_name" |
Doctrina (Metabase-style)
- One-way: TQL → SQL OK. SQL → TQL no soportado. Razon: traduccion inversa lossy (CTEs, window fns, set ops, lateral, correlated subqueries no caben en TQL).
- Output: SQL string siempre emitible. Ejecucion requiere DuckDB linkado (build flag
FN_TQL_DUCKDB=1, opcional). - Agente flow: TQL default. SQL solo si app linko DuckDB. UI Ask AI muestra toggle SQL solo cuando disponible.
Ver issue 0080 + tql_to_sql.{h,cpp} para implementacion.