acecbbc821
- docs/TQL.md: añadidas secciones joins, views, main_source, 24 viz tokens completos
(extraidos de tql_helpers.cpp), color_rules, fn.* builtins completos (20 funciones),
funciones bloqueadas del sandbox, tabla de estado de implementacion actualizada.
Nota al pie referencia los 129 checks roundtrip (41 emit + 88 apply).
- functions/infra/audit_cpp_apps.go: añadida AuditCppTableMigration() que escanea
.cpp de cada app imgui buscando ImGui::BeginTable; status CANDIDATE/MIXED/clean
segun si usa data_table_cpp_viz en uses_functions.
- cmd/fn/doctor.go: fn doctor cpp-apps ahora incluye seccion BeginTable migration
con tabwriter CANDIDATE/MIXED; --json produce {conformance, table_migration}.
doctorAll incluye cpp_table_migration en el mapa JSON.
- .claude/rules/fn_doctor.md: tabla de subcomandos y acciones complementarias
actualizadas con el nuevo check.
- dev/issues/0081 movido a completed/ con status done y notas de deuda documentadas.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
852 lines
30 KiB
Markdown
852 lines
30 KiB
Markdown
# TQL — Table Query Language
|
|
|
|
TQL es un DSL declarativo para transformar tablas en el playground/aplicaciones del registry que muestran datos tabulares. Inspirado en Metabase MBQL pero serializado como **Lua table** (no JSON) — aprovecha el `lua_engine` ya embebido y permite expresiones inline que reusan el DSL Lua para columnas custom.
|
|
|
|
Diseñado para que un LLM pueda emitir/leer TQL y orquestar visualizaciones complejas con un prompt.
|
|
|
|
---
|
|
|
|
## Como Metabase representa visualizaciones con MBQL
|
|
|
|
Metabase separa **dos planos** que TQL replica:
|
|
|
|
### 1. `dataset_query` (= TQL `stages`) — QUE datos
|
|
|
|
Pipeline MBQL: filter, breakout, aggregation, order-by, expressions. Define los datos a producir. Es la unica parte ejecutable.
|
|
|
|
### 2. `display` + `visualization_settings` — COMO mostrar
|
|
|
|
Independiente de los datos. La misma query puede renderizarse de N formas con un click. Estructura Metabase:
|
|
|
|
```json
|
|
{
|
|
"display": "table",
|
|
"visualization_settings": {
|
|
"table.columns": [
|
|
{"name": "lang", "enabled": true, "fieldRef": [...]},
|
|
{"name": "count", "enabled": true, "fieldRef": [...]}
|
|
],
|
|
"table.column_widths": [120, 80],
|
|
"table.column_formatting": [
|
|
{"columns": ["count"], "type": "single",
|
|
"operator": ">", "value": 100, "color": "#e08060"}
|
|
],
|
|
"graph.dimensions": ["lang"],
|
|
"graph.metrics": ["count"],
|
|
"graph.colors": ["#ff7f50"],
|
|
"stackable.stack_type": null
|
|
}
|
|
}
|
|
```
|
|
|
|
**`display`** valores comunes: `table`, `scalar`, `bar`, `line`, `area`, `pie`, `row`, `funnel`, `gauge`, `map`, `pivot`, `combo`, `scatter`, `waterfall`.
|
|
|
|
**`visualization_settings`**:
|
|
- **table.columns**: visibilidad/orden de cols. Cada entrada `{name, enabled, fieldRef}`.
|
|
- **table.column_formatting**: reglas de color condicional. Operator-based (`>`, `=`, `is-null`, `<`).
|
|
- **graph.dimensions/metrics**: que cols van en X / Y para charts.
|
|
- **graph.colors**: paleta.
|
|
- **stackable.stack_type**: stack mode (`stacked`, `normalized`, null).
|
|
|
|
**Drill-through actions** (Metabase): tras click en una celda, se ofrecen acciones:
|
|
- `view-details`: ver detalle row.
|
|
- `filter`: anadir filtro por valor.
|
|
- `drill-through`: navegar a sub-query.
|
|
- `zoom`: re-agrupar a granularidad inferior (ej. mes → dia).
|
|
- `summarize`: pivotar.
|
|
- `quick-filter`: presets.
|
|
|
|
### Implicaciones para TQL
|
|
|
|
TQL adopta esa separacion: `stages` (data) + `display` + `columns` (viz) + `views` (paneles adicionales). Mismo patron, sintaxis Lua.
|
|
|
|
Cuando un boton futuro "Add visualization" se construya, anade un nuevo `display` + viz settings a una query existente sin tocar `stages`. Asi tendremos M visualizaciones (table, bar, line, scatter) sobre los mismos datos transformados.
|
|
|
|
---
|
|
|
|
## Modelo conceptual
|
|
|
|
Una consulta TQL es una lista de **stages** que se evaluan secuencialmente. El output del stage N alimenta al stage N+1:
|
|
|
|
```
|
|
input_cells (raw dataset)
|
|
│
|
|
▼
|
|
┌─────────────┐
|
|
│ Stage 0 │ filter + expressions + sort (sin agrupar)
|
|
│ RAW │ output = filtered/sorted cells with same shape
|
|
└─────────────┘
|
|
│
|
|
▼
|
|
┌─────────────┐
|
|
│ Stage 1 │ filter + breakout + aggregation + sort
|
|
│ GROUPED │ output = rows agrupadas (cols = breakouts + aggs)
|
|
└─────────────┘
|
|
│
|
|
▼
|
|
┌─────────────┐
|
|
│ Stage 2 │ filter + breakout + aggregation + sort sobre stage 1
|
|
│ GROUPED │ ...
|
|
└─────────────┘
|
|
```
|
|
|
|
**Invariantes:**
|
|
- Stage 0 SIEMPRE es Raw — NO admite `breakout` ni `aggregation`. Solo `filter`, `expressions`, `sort`. Garantiza inmutabilidad de los datos crudos.
|
|
- Stages 1+ admiten todas las clauses.
|
|
- Cada stage opera sobre los headers/cols PRODUCIDOS por el stage anterior. Tras un breakout las cols disponibles cambian.
|
|
|
|
---
|
|
|
|
## Sintaxis raiz
|
|
|
|
```lua
|
|
return {
|
|
version = 1,
|
|
display = "table",
|
|
main_source = "functions", -- opcional: nombre de la fuente principal
|
|
|
|
-- JOINS: unir tablas adicionales antes de stage 0
|
|
joins = { ... }, -- opcional
|
|
|
|
-- DATA: pipeline de transformacion
|
|
stages = {
|
|
-- Stage 0 (Raw): filter + expressions + sort
|
|
{
|
|
filter = { ... },
|
|
expressions = { ... },
|
|
sort = { ... },
|
|
},
|
|
-- Stage 1 (Grouped):
|
|
{
|
|
filter = { ... }, -- sobre output de stage 0
|
|
breakout = { ... },
|
|
aggregation = { ... },
|
|
sort = { ... },
|
|
},
|
|
-- Stage 2, 3, ... (ilimitado)
|
|
},
|
|
|
|
-- VIZ: como renderizar la tabla resultante
|
|
columns = {
|
|
{name = "lang", type = "string", visible = true, order = 1},
|
|
{name = "size_kb", type = "float", visible = true, order = 2,
|
|
color_rules = { {equals = "0.0", color = "#e08060"} }},
|
|
{name = "internal", type = "string", visible = false, order = 3},
|
|
},
|
|
|
|
-- VIEWS: paneles de visualizacion (index 1 = principal, resto = extras)
|
|
views = {
|
|
{display = "table"},
|
|
{display = "bar", x_col = "lang", y_cols = {"count"}},
|
|
},
|
|
|
|
visualization_settings = {},
|
|
}
|
|
```
|
|
|
|
`stages` es **obligatorio** y debe tener al menos 1 elemento (el Raw). Resto es opcional.
|
|
|
|
---
|
|
|
|
## `main_source`
|
|
|
|
Campo de cadena opcional. Identifica el nombre de la tabla/fuente principal del dataset. Usado por `tql_to_sql` para generar el `FROM "main_source"` correcto en el SQL emitido. Si esta vacio, el motor usa la tabla por defecto del contexto.
|
|
|
|
```lua
|
|
main_source = "functions"
|
|
```
|
|
|
|
En el SQL emitido: `FROM "functions"`. Util cuando la app expone multiples tablas y el agente necesita especificar explicitamente cual es la base del query.
|
|
|
|
---
|
|
|
|
## `joins`
|
|
|
|
Lista de joins que se aplican antes de stage 0. Los campos de las tablas unidas se añaden como columnas adicionales accesibles en todos los stages.
|
|
|
|
```lua
|
|
joins = {
|
|
{
|
|
alias = "t", -- prefijo para sus columnas ("t.field")
|
|
source = "types", -- nombre de la tabla a unir
|
|
strategy = "left", -- "left" | "inner" | "right" | "full"
|
|
on = {{"id", "t.id"}}, -- pares {col_izq, col_der}
|
|
fields = {"t.algebraic", "t.description"}, -- cols a incluir (opcional)
|
|
},
|
|
{
|
|
alias = "u",
|
|
source = "unit_tests",
|
|
strategy = "inner",
|
|
on = {{"id", "u.function_id"}, {"lang", "u.lang"}}, -- multi-key
|
|
},
|
|
}
|
|
```
|
|
|
|
**Estrategias:**
|
|
|
|
| Token | Semantica SQL |
|
|
|---|---|
|
|
| `"left"` | `LEFT OUTER JOIN` — todas las filas de la izq, nulls donde no hay match |
|
|
| `"inner"` | `INNER JOIN` — solo filas con match en ambas tablas |
|
|
| `"right"` | `RIGHT OUTER JOIN` — todas las filas de la der |
|
|
| `"full"` | `FULL OUTER JOIN` — todas las filas de ambas tablas |
|
|
|
|
Default si `strategy` se omite: `"left"`.
|
|
|
|
**Campos tras el join:** accesibles como `"alias.field"` (ej. `"t.algebraic"`) en filters, breakouts, aggregations y expressions. Si `fields` se omite, se incluyen todas las columnas de la tabla unida con prefijo alias.
|
|
|
|
**Join multi-key:** `on` es lista de pares; se traduce a `ON l.k1 = r.k1 AND l.k2 = r.k2`.
|
|
|
|
---
|
|
|
|
## `filter`
|
|
|
|
Lista de predicados. Multiples filters se combinan con AND implicito.
|
|
|
|
```lua
|
|
filter = {
|
|
{"=", "lang", "go"},
|
|
{">=", "coverage_pct", 80},
|
|
{"contains", "name", "filter"},
|
|
}
|
|
```
|
|
|
|
**Operadores soportados:**
|
|
|
|
| Op | Aplica a | Semantica |
|
|
|---|---|---|
|
|
| `=` `!=` | cualquier tipo | igualdad numerica o lexical |
|
|
| `>` `>=` `<` `<=` | numericas, fechas ISO | comparacion numerica/lexical |
|
|
| `contains` `!contains` | strings | substring |
|
|
| `starts` | strings | prefijo |
|
|
| `ends` | strings | sufijo |
|
|
|
|
**Range filter** = 2 filtros encadenados `>=` y `<=`.
|
|
|
|
---
|
|
|
|
## `expressions`
|
|
|
|
Mapping `nombre -> formula Lua`. Las expressions crean columnas computadas accesibles en el mismo stage (filter/sort/aggregation las pueden referenciar). Sintaxis Lua identica a las columnas custom interactivas. Ver [seccion Lua DSL](#lua-dsl-de-expresiones).
|
|
|
|
```lua
|
|
expressions = {
|
|
size_bytes = "[size_kb] * 1024",
|
|
is_top = "[coverage_pct] >= 90",
|
|
full_name = "fn.concat([lang], ':', [domain])",
|
|
}
|
|
```
|
|
|
|
El tipo de la columna se auto-detecta evaluando un sample.
|
|
|
|
---
|
|
|
|
## `breakout`
|
|
|
|
Lista de nombres de columnas por las que agrupar. Orden importa: define la jerarquia visual del grupo.
|
|
|
|
```lua
|
|
breakout = { "lang", "domain" }
|
|
```
|
|
|
|
Cada combinacion unica de valores `(lang, domain)` produce una fila en el output. Si `breakout` esta vacio pero hay `aggregation`, todo el dataset se reduce a UNA sola fila.
|
|
|
|
**Breakout con granularidad de fecha** — sufijo `:granularity` en el nombre de la col:
|
|
|
|
```lua
|
|
breakout = { "created_at:month", "lang" }
|
|
-- equivale a GROUP BY date_trunc('month', created_at), lang
|
|
```
|
|
|
|
Granularidades disponibles: `year`, `month`, `week`, `day`, `hour`.
|
|
|
|
**Disponible solo en stages >= 1.**
|
|
|
|
---
|
|
|
|
## `aggregation`
|
|
|
|
Lista de reducciones aplicadas a cada grupo. Sintaxis `{fn, col?, arg?}`.
|
|
|
|
```lua
|
|
aggregation = {
|
|
{"count"},
|
|
{"sum", "deps_count"},
|
|
{"avg", "size_kb"},
|
|
{"min", "size_kb"},
|
|
{"max", "coverage_pct"},
|
|
{"distinct", "name"},
|
|
{"stddev", "coverage_pct"},
|
|
{"median", "size_kb"},
|
|
{"p25", "size_kb"},
|
|
{"p75", "size_kb"},
|
|
{"p90", "size_kb"},
|
|
{"p99", "size_kb"},
|
|
{"percentile", "size_kb", 0.95},
|
|
}
|
|
```
|
|
|
|
**Funciones soportadas:**
|
|
|
|
| Fn | Argumentos | Output type | Notas |
|
|
|---|---|---|---|
|
|
| `count` | — | Int | rows del grupo |
|
|
| `sum` | col | Float | suma de celdas numericas no-vacias |
|
|
| `avg` | col | Float | media aritmetica |
|
|
| `min` | col | mismo que col | numerico si la col es numerica; lexical si String/Date |
|
|
| `max` | col | mismo que col | idem |
|
|
| `distinct` | col | Int | count distinct values |
|
|
| `stddev` | col | Float | desviacion estandar poblacional |
|
|
| `median` | col | Float | percentil 50 con interpolacion lineal |
|
|
| `p25` `p75` `p90` `p99` | col | Float | percentiles fijos |
|
|
| `percentile` | col, p (0..1) | Float | percentil generico, p es float 0..1 |
|
|
|
|
**Alias por defecto del output column:**
|
|
- `count` → `"count"`
|
|
- `distinct col` → `"distinct_<col>"`
|
|
- `percentile col p` → `"p<int(p*100)>_<col>"` (ej. `p95_size_kb`)
|
|
- resto → `"<fn>_<col>"` (ej. `avg_size_kb`)
|
|
|
|
Para override del alias, se acepta una forma extendida (futura, no implementada en v1):
|
|
```lua
|
|
{"avg", "size_kb", alias = "mean_size"}
|
|
```
|
|
v1: alias auto-generado. Si necesitas un alias custom, crea una expression encima.
|
|
|
|
Celdas vacias se IGNORAN al agregar (no rompen el calculo).
|
|
|
|
---
|
|
|
|
## `columns`
|
|
|
|
Per-col render config. Array de entries, una por columna efectiva (originales + derivadas). Si TQL incluye `columns`, los campos sobrescriben el estado UI:
|
|
|
|
```lua
|
|
columns = {
|
|
{
|
|
name = "lang", -- header name (obligatorio)
|
|
type = "string", -- opcional. Para orig cols: informacional. Para derived: mutable
|
|
visible = true, -- opcional, default true
|
|
order = 1, -- opcional. Posicion visual 1-based. Default = orden de aparicion
|
|
color_rules = { -- opcional. Lista de {equals, color hex}
|
|
{equals = "go", color = "#86b56b"},
|
|
{equals = "py", color = "#6b8eb5"},
|
|
},
|
|
},
|
|
-- ...
|
|
}
|
|
```
|
|
|
|
**Tipos validos** (`type`): `"string"`, `"int"`, `"float"`, `"bool"`, `"date"`, `"json"`, `"auto"`.
|
|
|
|
**Color hex format**: `"#rrggbb"` o `"#rrggbbaa"`. Si solo 6 digitos, alpha = `ff`.
|
|
|
|
**Cols que no aparecen en `columns`**: mantienen su estado UI actual (visible, posicion natural).
|
|
|
|
### `color_rules`
|
|
|
|
Reglas de color condicional por valor exacto. Se aplican al renderizar cada celda de la columna: si el valor de la celda es igual a `equals`, la celda se colorea con `color`.
|
|
|
|
```lua
|
|
color_rules = {
|
|
{equals = "go", color = "#86b56b"}, -- verde para Go
|
|
{equals = "py", color = "#6b8eb5"}, -- azul para Python
|
|
{equals = "bash", color = "#b58f6b"}, -- naranja para Bash
|
|
}
|
|
```
|
|
|
|
- Solo soporta igualdad exacta (string match). Para rangos numericos, usa una expression que produzca una etiqueta ("high"/"low") y aplica color_rules sobre esa columna derivada.
|
|
- Multiples reglas se evaluan en orden; la primera que hace match gana.
|
|
- Si ningun match: color por defecto del tema.
|
|
|
|
---
|
|
|
|
## `display`
|
|
|
|
Tipo de visualizacion del panel principal. Default: `"table"`.
|
|
|
|
**Tokens validos (extraidos de `tql_helpers.cpp`):**
|
|
|
|
| Token | Tipo de chart |
|
|
|---|---|
|
|
| `"table"` | Tabla de datos (default) |
|
|
| `"bar"` | Barras horizontales |
|
|
| `"column"` | Barras verticales |
|
|
| `"grouped_bar"` | Barras agrupadas por categoria |
|
|
| `"stacked_bar"` | Barras apiladas |
|
|
| `"line"` | Lineas |
|
|
| `"area"` | Area rellena |
|
|
| `"stairs"` | Escalera (step function) |
|
|
| `"scatter"` | Dispersion XY |
|
|
| `"bubble"` | Dispersion XY con tamano variable |
|
|
| `"histogram"` | Histograma 1D |
|
|
| `"hist2d"` | Histograma 2D |
|
|
| `"heatmap"` | Mapa de calor |
|
|
| `"boxplot"` | Caja y bigotes |
|
|
| `"stem"` | Stem plot |
|
|
| `"errorbars"` | Barras de error |
|
|
| `"pie"` | Sectores (pie chart) |
|
|
| `"donut"` | Donut |
|
|
| `"funnel"` | Embudo |
|
|
| `"waterfall"` | Cascada |
|
|
| `"kpi"` | Metrica KPI (numero grande) |
|
|
| `"kpi_grid"` | Grid de KPIs |
|
|
| `"candlestick"` | Velas (OHLC) |
|
|
| `"radar"` | Radar / spider |
|
|
|
|
Token invalido: `tql_apply` genera warning `"unknown display"` y cae a `"table"`.
|
|
|
|
---
|
|
|
|
## `views`
|
|
|
|
Array de paneles de visualizacion. El indice 1 es el panel principal (equivale al `display` + `viz_config` del State); el resto son paneles extra que se muestran junto a la tabla.
|
|
|
|
```lua
|
|
views = {
|
|
-- Panel 0 (principal)
|
|
{display = "bar", x_col = "lang", y_cols = {"count"}, color = "#86b56b"},
|
|
-- Panel 1 (extra)
|
|
{display = "pie", cat_col = "lang", y_cols = {"sum_size_kb"}, show_legend = true},
|
|
}
|
|
```
|
|
|
|
**Campos por panel:**
|
|
|
|
| Campo | Tipo | Para que |
|
|
|---|---|---|
|
|
| `display` | string | Token de tipo de chart (ver tabla `display`) |
|
|
| `x_col` | string | Columna para eje X (bar, column, line, area, scatter, bubble, etc.) |
|
|
| `y_cols` | `{string,...}` | Columnas para eje Y. Multiple = multiple series |
|
|
| `cat_col` | string | Columna de categorias (pie, donut, funnel, radar) |
|
|
| `size_col` | string | Columna para tamano del burbuja (bubble) |
|
|
| `color` | string | Color primario `"#rrggbb"`. Sirve para series unicas o acento |
|
|
| `hist_bins` | int | Numero de bins para histogram / hist2d |
|
|
| `pie_radius` | float | Radio del donut interior (donut, 0.0 = pie solido) |
|
|
| `show_legend` | bool | Mostrar leyenda. Default `true` |
|
|
| `show_markers` | bool | Puntos en lineas/area. Default `false` |
|
|
| `locked` | bool | Panel fijo — el usuario no puede cerrarlo ni cambiar tipo |
|
|
|
|
Si `views` se omite, el emit lo serializa con un panel minimo que replica `state.display`.
|
|
|
|
---
|
|
|
|
## `visualization_settings`
|
|
|
|
Reservado para configuracion especifica por tipo de display. v1 siempre vacio (`{}`). Emitido por `tql_emit` para mantener el round-trip completo. Futuro:
|
|
|
|
```lua
|
|
visualization_settings = {
|
|
["graph.dimensions"] = {"lang"},
|
|
["graph.metrics"] = {"count"},
|
|
["graph.colors"] = {"#86b56b", "#6b8eb5"},
|
|
["stackable.stack_type"] = "stacked",
|
|
}
|
|
```
|
|
|
|
Sintaxis Metabase: las keys con `.` van entre brackets `[]`.
|
|
|
|
---
|
|
|
|
## `sort`
|
|
|
|
Lista de clauses. Multi-sort por orden de aparicion (primera = primaria).
|
|
|
|
```lua
|
|
sort = {
|
|
{"desc", "count"},
|
|
{"asc", "lang"},
|
|
}
|
|
```
|
|
|
|
Direcciones: `"asc"` o `"desc"`.
|
|
|
|
La col puede ser cualquier header del output del stage (incluyendo aggregations).
|
|
|
|
Si `sort` esta vacio, el output mantiene el orden de aparicion de los grupos (estable).
|
|
|
|
---
|
|
|
|
## Ejemplo round-trip completo
|
|
|
|
Pregunta: "Para las funciones puras con cobertura >= 80%, agrupa por lenguaje y dominio, dame conteo + tamaño medio, ordena por conteo descendente, luego sobre eso agrupa solo por lenguaje para ver totales."
|
|
|
|
```lua
|
|
return {
|
|
version = 1,
|
|
display = "table",
|
|
stages = {
|
|
-- Stage 0: Raw + filter
|
|
{
|
|
filter = {
|
|
{"=", "purity", "pure"},
|
|
{">=", "coverage_pct", 80},
|
|
},
|
|
},
|
|
-- Stage 1: group by (lang, domain)
|
|
{
|
|
breakout = { "lang", "domain" },
|
|
aggregation = {
|
|
{"count"},
|
|
{"avg", "size_kb"},
|
|
},
|
|
sort = { {"desc", "count"} },
|
|
},
|
|
-- Stage 2: re-group by lang sobre el output de stage 1
|
|
{
|
|
breakout = { "lang" },
|
|
aggregation = {
|
|
{"sum", "count"}, -- suma del count del stage 1 (total funcs por lang)
|
|
{"avg", "avg_size_kb"}, -- media de las medias
|
|
},
|
|
sort = { {"desc", "sum_count"} },
|
|
},
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Ejemplo con join + views
|
|
|
|
```lua
|
|
return {
|
|
version = 1,
|
|
display = "bar",
|
|
main_source = "functions",
|
|
joins = {
|
|
{
|
|
alias = "u",
|
|
source = "unit_tests",
|
|
strategy = "left",
|
|
on = {{"id", "u.function_id"}},
|
|
fields = {"u.name"},
|
|
},
|
|
},
|
|
stages = {
|
|
{ filter = {{"=", "lang", "go"}} },
|
|
{
|
|
breakout = {"domain"},
|
|
aggregation = {{"count"}, {"distinct", "id"}},
|
|
sort = {{"desc", "count"}},
|
|
},
|
|
},
|
|
columns = {
|
|
{name = "domain", type = "string", visible = true, order = 1},
|
|
{name = "count", type = "int", visible = true, order = 2},
|
|
},
|
|
views = {
|
|
{display = "bar", x_col = "domain", y_cols = {"count"}, show_legend = false},
|
|
{display = "donut", cat_col = "domain", y_cols = {"count"}, show_legend = true},
|
|
},
|
|
visualization_settings = {},
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Drill-down (semantica)
|
|
|
|
Si el usuario interactua con una celda agrupada del stage N, hace **drill-down**:
|
|
|
|
1. Toma los valores de breakout en esa fila (ej. `lang=go`, `domain=core`).
|
|
2. Vuelve al stage N-1.
|
|
3. PUSH a `filter` del stage N-1 cada `(breakout_col = breakout_value)`.
|
|
4. El usuario "esta examinando" las filas originales que componen ese grupo.
|
|
|
|
**No** elimina stages posteriores (los conserva por si quiere volver a avanzar).
|
|
|
|
Esto es equivalente a:
|
|
|
|
```lua
|
|
-- estado original
|
|
{stages = { {filter={}}, {breakout={"lang","domain"}, aggregation={{"count"}}} }}
|
|
|
|
-- tras drill-down en row (go, core):
|
|
{stages = {
|
|
{filter = {{"=","lang","go"}, {"=","domain","core"}}}, -- stage 0 ahora filtrado
|
|
{breakout={"lang","domain"}, aggregation={{"count"}}} -- stage 1 intacto
|
|
}}
|
|
-- active_stage = 0 (vuelve a Raw)
|
|
```
|
|
|
|
---
|
|
|
|
## Lua DSL de expresiones
|
|
|
|
Las strings dentro de `expressions` siguen el mini-DSL Lua de columnas custom. Reglas:
|
|
|
|
- Acceso a columnas: `[col_name]` o `row.col_name` o `row["col_name"]`.
|
|
- Index 1-based: `row[1]`.
|
|
- Auto-return: si la expresion no empieza por `return`/`if`/`for`/`while`/`do`/`local`/`repeat`/`function`, se prepende `return ` automatico.
|
|
- Type-aware: cell de col Int/Float llega como number; Bool como boolean; resto como string. Vacia = nil.
|
|
- UTF-8 ok en nombres `[año]`.
|
|
- Comentarios `--` y `--[[ ]]` respetados.
|
|
- Nombres de cols con espacios y puntos soportados en brackets: `[col con espacio]`, `[alias.field]`.
|
|
|
|
Ejemplos:
|
|
|
|
```lua
|
|
[size_kb] * 1024 -- arithmetic
|
|
fn.concat([lang], ":", [domain]) -- string compose
|
|
if [coverage_pct] >= 90 then "well" else "low" end
|
|
fn.year([updated_at]) -- date helper
|
|
fn.coalesce([error_type], "none") -- null handling
|
|
```
|
|
|
|
---
|
|
|
|
## Funciones Lua disponibles (`fn.*`)
|
|
|
|
El sandbox expone estas funciones via la tabla global `fn`. Registradas en `lua_engine.cpp::register_builtins`:
|
|
|
|
| Funcion | Firma | Que hace |
|
|
|---|---|---|
|
|
| `fn.upper(s)` | string -> string | Convierte a mayusculas (ASCII) |
|
|
| `fn.lower(s)` | string -> string | Convierte a minusculas (ASCII) |
|
|
| `fn.length(s)` | string -> int | Longitud en bytes (`strlen`); nil -> 0 |
|
|
| `fn.substring(s, start [, len])` | string, int[, int] -> string | Subcadena 1-based; len omitido = hasta el final |
|
|
| `fn.contains(haystack, needle)` | string, string -> bool | True si needle aparece en haystack |
|
|
| `fn.starts_with(s, prefix)` | string, string -> bool | True si s empieza por prefix |
|
|
| `fn.ends_with(s, suffix)` | string, string -> bool | True si s termina por suffix |
|
|
| `fn.replace(s, find, repl)` | string, string, string -> string | Reemplaza todas las ocurrencias de find por repl |
|
|
| `fn.trim(s)` | string -> string | Elimina espacios/tabs/newlines del inicio y fin |
|
|
| `fn.concat(...)` | vararg -> string | Concatena N argumentos como string |
|
|
| `fn.to_number(s)` | string -> number\|nil | Parsea a numero; nil si no parseable |
|
|
| `fn.to_string(x)` | any -> string | Convierte a string (usa `luaL_tolstring`) |
|
|
| `fn.to_bool(x)` | any -> bool | True si `"true"` o `"1"` |
|
|
| `fn.is_null(x)` | any -> bool | True si x es nil |
|
|
| `fn.is_empty(x)` | any -> bool | True si x es nil o string vacia |
|
|
| `fn.coalesce(...)` | vararg -> any | Devuelve el primer argumento no-nil |
|
|
| `fn.parse_date(s)` | string -> table\|nil | Parsea `"YYYY-MM-DD"` -> `{year, month, day}` |
|
|
| `fn.year(s)` | string -> int\|nil | Extrae el año de `"YYYY-..."` |
|
|
| `fn.month(s)` | string -> int\|nil | Extrae el mes de `"YYYY-MM-..."` |
|
|
| `fn.day(s)` | string -> int\|nil | Extrae el dia de `"YYYY-MM-DD"` |
|
|
|
|
Ademas, las librerias Lua estandar `string`, `table`, `math`, `os` (recortado) estan disponibles.
|
|
|
|
---
|
|
|
|
## Sandbox — funciones bloqueadas
|
|
|
|
El engine aplica el sandbox via `lua_engine.cpp::apply_sandbox`. Globals eliminados:
|
|
|
|
| Global | Por que bloqueado |
|
|
|---|---|
|
|
| `io` | I/O de archivos y stdin/stdout |
|
|
| `require` | Carga de modulos externos |
|
|
| `loadfile` | Ejecucion de archivos Lua arbitrarios |
|
|
| `dofile` | Idem |
|
|
| `load` | Compilacion y ejecucion de strings arbitrarias |
|
|
| `package` | Sistema de paquetes Lua |
|
|
| `debug` | Introspection de call stack / upvalues |
|
|
|
|
`os` se sustituye por una version recortada que solo expone: `os.date`, `os.time`, `os.difftime`, `os.clock`. El resto de `os` (ejecutar comandos, salir, setenv, etc.) se elimina.
|
|
|
|
Las formulas de expresiones se compilan con `luaL_loadbufferx(..., "t")` — el flag `"t"` rechaza bytecode precompilado (solo acepta texto source).
|
|
|
|
---
|
|
|
|
## Restricciones v1
|
|
|
|
| No soportado | Workaround |
|
|
|---|---|
|
|
| `HAVING` post-aggregation | Stage siguiente con `filter` sobre cols agregadas. |
|
|
| `LIMIT` | TBD — añadir como `limit = N` en stage v2. |
|
|
| Window functions | TBD. |
|
|
| Custom aggregation Lua | TBD — `{"lua", "col", "<body>"}`. |
|
|
| Alias custom en aggregation v1 | Crear expression post-grupo. |
|
|
| color_rules con rangos numericos | Usar expression que emita etiquetas; aplicar color_rules sobre la etiqueta. |
|
|
| Multiples fuentes sin join | Declarar cada fuente adicional en `joins`. |
|
|
|
|
---
|
|
|
|
## Prompt template para LLM
|
|
|
|
Cuando expongas TQL a un LLM, dale este preambulo:
|
|
|
|
```
|
|
You output TQL — a Lua table that describes a table transformation. Format:
|
|
|
|
return {
|
|
version = 1,
|
|
display = "table", -- table|bar|column|grouped_bar|stacked_bar|line|area|stairs|scatter|
|
|
-- bubble|histogram|hist2d|heatmap|boxplot|stem|errorbars|
|
|
-- pie|donut|funnel|waterfall|kpi|kpi_grid|candlestick|radar
|
|
main_source = "...", -- optional: name of main table/source
|
|
joins = { ... }, -- optional: join additional tables
|
|
stages = {
|
|
{ filter = {...}, expressions = {...}, sort = {...} }, -- Stage 0 (Raw)
|
|
{ filter = {...}, breakout = {...}, aggregation = {...}, sort = {...} }, -- Stage 1+
|
|
...
|
|
},
|
|
views = {
|
|
{display="...", x_col="...", y_cols={...}, cat_col="...", color="...", ...}, -- panel 0 = main
|
|
... -- extra panels
|
|
},
|
|
}
|
|
|
|
Rules:
|
|
- Stage 0 NEVER has breakout/aggregation (immutable raw layer).
|
|
- Filters: {"op", "col", value}. Ops: =, !=, >, >=, <, <=, contains, !contains, starts, ends.
|
|
- Aggregations: {"fn", "col"} or {"count"} or {"percentile", "col", 0.95}.
|
|
Available fns: count, sum, avg, min, max, distinct, stddev, median, p25, p75, p90, p99, percentile.
|
|
- Sort: {{"desc", "col"}, ...}. Multi-sort por orden de la lista.
|
|
- Expressions value es una expresion Lua. Acceso a cols via [col_name].
|
|
- Joins: alias + source + strategy (left/inner/right/full) + on pairs + optional fields list.
|
|
- Views: array de paneles, index 1 = principal. display token from the list above.
|
|
- color_rules: [{equals="val", color="#rrggbb"}, ...] dentro de cada entry de columns.
|
|
|
|
The available columns of the current input table are: <inject runtime>.
|
|
The available column types: <inject runtime>.
|
|
Output ONLY the Lua table, no markdown.
|
|
```
|
|
|
|
---
|
|
|
|
## API C++ (referencia)
|
|
|
|
Modulo `data_table_logic.h`:
|
|
|
|
```cpp
|
|
// Tipos
|
|
enum class AggFn { Count, Sum, Avg, Min, Max, Distinct, Stddev, Median, P25, P75, P90, P99, Percentile };
|
|
struct Aggregation { AggFn fn; std::string col; double arg; std::string alias; };
|
|
struct SortClause { std::string col; bool desc; };
|
|
struct Stage {
|
|
std::vector<Filter> filters;
|
|
std::vector<DerivedColumn> derived;
|
|
std::vector<std::string> breakouts;
|
|
std::vector<Aggregation> aggregations;
|
|
std::vector<SortClause> sorts;
|
|
};
|
|
|
|
// Funciones puras
|
|
std::string aggregation_alias(const Aggregation& a);
|
|
ColumnType aggregation_type (const Aggregation& a,
|
|
const std::vector<std::string>& in_headers,
|
|
const std::vector<ColumnType>& in_types);
|
|
|
|
StageOutput compute_stage(const char* const* in_cells, int in_rows, int in_cols,
|
|
const std::vector<std::string>& in_headers,
|
|
const std::vector<ColumnType>& in_types,
|
|
const Stage& stage);
|
|
```
|
|
|
|
`compute_stage` es PURO. Sin side effects. Testeable headless. Encadenable: alimenta su `StageOutput.cells/headers/types` como input del proximo `compute_stage`.
|
|
|
|
---
|
|
|
|
## Estado de implementacion
|
|
|
|
| Feature | Status |
|
|
|---|---|
|
|
| `Stage` + `Aggregation` types | done |
|
|
| `compute_stage` (filter + group + agg + sort) | done |
|
|
| Todas las aggregations (count..percentile) | done |
|
|
| `aggregation_alias` / `aggregation_type` | done |
|
|
| Multi-sort por stage | done |
|
|
| Tests E2E logica | done (129 checks en tql_emit_test + tql_apply_test) |
|
|
| `tql_emit` / `tql_apply` (Lua round-trip) | done |
|
|
| `views` (paneles de visualizacion) | done |
|
|
| `main_source` | done |
|
|
| `joins` (left/inner/right/full, multi-key, fields) | done |
|
|
| `color_rules` por columna | done |
|
|
| `breakout` con granularidad de fecha | done |
|
|
| Lua sandbox (`fn.*` builtins, sin io/require/load) | done |
|
|
| 24 tipos de viz (table, bar, column, pie, donut...) | done |
|
|
| `tql_to_sql` (SQL DuckDB emit) | done (issue 0080) |
|
|
| State refactor a `vector<Stage>` | done |
|
|
| UI breadcrumb stages + chips por stage | done |
|
|
| Drill-down interactivo | done |
|
|
| Show TQL / Apply TQL modals | done |
|
|
| Multi-sort drag-reorder | done |
|
|
|
|
---
|
|
|
|
## SQL transpile subset (fase 11 — issue 0080)
|
|
|
|
TQL emite SQL DuckDB equivalente para que agentes LLM puedan generar TQL o SQL contra los mismos datos. Modulo `tql_to_sql.{h,cpp}` provee `emit_sql(State, tables)`. Mapeo MBQL-style con CTE chain `t0..tN`.
|
|
|
|
### Lua subset transpilable
|
|
|
|
Lua sigue **potente y sin limites en runtime general** (formula eval en derived cols TQL puro). El subset SOLO aplica al pedir `tql_to_sql::emit_sql()`. Fuera del subset → error compile-time con causa concreta + workaround.
|
|
|
|
**Permitido (transpila a SQL DuckDB):**
|
|
|
|
| Lua | SQL DuckDB | Ejemplo |
|
|
|---|---|---|
|
|
| Literales numero/string/bool/nil | mismas (`'x'`, `TRUE`, `NULL`) | `42`, `"hola"`, `nil` |
|
|
| Col ref: `[colname]` | `"colname"` (quoted) | `[size_kb]` → `"size_kb"` |
|
|
| Aritmetica: `+ - * / % - (unary)` | mismas | `[a] + [b] * 2` → `("a" + ("b" * 2))` |
|
|
| Comparacion: `== ~= < <= > >=` | `= <> < <= > >=` | `[n] >= 10` → `("n" >= 10)` |
|
|
| Logica: `and or not` | `AND OR NOT` | `[a] and [b]` → `("a" AND "b")` |
|
|
| String concat: `..` | `\|\|` | `[a] .. "_" .. [b]` → `("a" \|\| '_' \|\| "b")` |
|
|
| Ternary: `if A then B else C end` | `CASE WHEN A THEN B ELSE C END` | obligatorio `else` |
|
|
| `math.floor/ceil/abs/sqrt/sin/cos/log/exp` | `floor/ceiling/abs/sqrt/sin/cos/ln/exp` | `math.floor([x])` |
|
|
| `math.min(a,b)/max(a,b)` | `least(a,b)/greatest(a,b)` | `math.min([a], 100)` |
|
|
| `string.upper/lower/len(s)` | `upper(s)/lower(s)/length(s)` | `string.upper([name])` |
|
|
| `string.sub(s, i [, j])` | `substring(s, i [, j-i+1])` | `string.sub([s], 1, 3)` |
|
|
| `tostring(x)/tonumber(x)` | `CAST(x AS VARCHAR)/CAST(x AS DOUBLE)` | `tonumber([n])` |
|
|
| Parentesis y precedencia Lua | mismas | `(a + b) * c` |
|
|
|
|
**Fuera de subset (error compile-time):**
|
|
|
|
- Closures: `function() ... end`
|
|
- Loops: `for/while/repeat`
|
|
- Locals: `local x = ...`
|
|
- Tables: `{...}`, `t[k]`, `t.field`, `table.*`
|
|
- Multi-return, vararg `...`
|
|
- `string.gsub/find/match/format/byte/char/rep`
|
|
- IO/OS/debug: `io.*`, `os.*`, `debug.*`, `package`, `require`, `print`
|
|
- Coroutines, metatables, `pcall/xpcall`, `rawget/rawset`
|
|
- Recursion, multi-statement bodies (`;`)
|
|
- Length operator `#`
|
|
- Method calls `:`
|
|
- Ternary sin else: `if A then B end` (subset requiere ambas ramas)
|
|
|
|
### Error message ejemplo
|
|
|
|
```
|
|
SQL transpile error en derived col 'fullname':
|
|
formula = "[first] .. ' ' .. string.gsub([last], 'X', 'Y')"
|
|
causa: function 'string.gsub' not in SQL transpile whitelist
|
|
ver docs/TQL.md#sql-transpile-subset
|
|
workaround: usar TQL puro (sin SQL emit) o reescribir formula
|
|
```
|
|
|
|
### Stage → SQL mapeo
|
|
|
|
| TQL element | SQL DuckDB |
|
|
|---|---|
|
|
| Stage 0 Raw | CTE `t0 AS (SELECT cols+derived FROM main_t [JOIN ...] [WHERE filters] [ORDER BY sorts])` |
|
|
| Stage N>=1 | CTE `tN AS (SELECT breakouts+aggs FROM tN-1 [GROUP BY ...] [ORDER BY ...])` |
|
|
| breakout `"col"` | `"col"` |
|
|
| breakout `"col:month"` | `date_trunc('month', "col")` |
|
|
| breakout `"col:year/week/day/hour"` | `date_trunc('year/week/day/hour', "col")` |
|
|
| Aggregation Count | `COUNT(*)` |
|
|
| Aggregation Sum/Avg/Min/Max/Stddev | `SUM/AVG/MIN/MAX/STDDEV("col")` |
|
|
| Aggregation Distinct | `COUNT(DISTINCT "col")` |
|
|
| Aggregation Median/P25/P75/P90/P99 | `quantile_cont("col", p)` |
|
|
| Aggregation Percentile p | `quantile_cont("col", p)` |
|
|
| Filter Op::Eq/Neq/Gt/Gte/Lt/Lte | `"col" = ?` etc (params bound) |
|
|
| Filter Op::Contains | `"col" LIKE '%v%'` (param `%v%`) |
|
|
| Filter Op::StartsWith / EndsWith | `LIKE 'v%'` / `LIKE '%v'` |
|
|
| Sort `{desc, "col"}` | `ORDER BY "col" DESC` |
|
|
| Join Left/Inner/Right/Full | `LEFT/INNER/RIGHT/FULL OUTER JOIN ... ON ...` |
|
|
| Join multi-key `on={{l1,r1},{l2,r2}}` | `ON l.l1 = r.r1 AND l.l2 = r.r2` |
|
|
| Join fields | cols `alias.field AS "alias.field"` |
|
|
| `main_source` | `FROM "main_source_name"` |
|
|
|
|
### Doctrina (Metabase-style)
|
|
|
|
- **One-way:** TQL → SQL OK. SQL → TQL no soportado. Razon: traduccion inversa lossy (CTEs, window fns, set ops, lateral, correlated subqueries no caben en TQL).
|
|
- **Output:** SQL string siempre emitible. Ejecucion requiere DuckDB linkado (build flag `FN_TQL_DUCKDB=1`, opcional).
|
|
- **Agente flow:** TQL default. SQL solo si app linko DuckDB. UI Ask AI muestra toggle SQL solo cuando disponible.
|
|
|
|
Ver issue 0080 + `tql_to_sql.{h,cpp}` para implementacion.
|
|
|
|
---
|
|
|
|
*Generado a partir de los tests roundtrip en `cpp/functions/core/tql_emit_test.cpp` y `cpp/functions/core/tql_apply_test.cpp` — 129 checks (41 emit + 88 apply) en verde garantizan compatibilidad del round-trip State <-> Lua.*
|