fn_registry/dev/issues/0080-tables-llm-api.md at e3c8979e8d961fc67124c8eb4cf4c27fdd5f6813

Files

T

egutierrez e3c8979e8d chore: auto-commit (95 archivos)

- cmd/fn/doctor.go
- cmd/fn/main.go
- cpp/apps/primitives_gallery/playground/tables/CMakeLists.txt
- cpp/apps/primitives_gallery/playground/tables/data_table.cpp
- cpp/apps/primitives_gallery/playground/tables/data_table_logic.cpp
- cpp/apps/primitives_gallery/playground/tables/data_table_logic.h
- cpp/apps/primitives_gallery/playground/tables/self_test.cpp
- cpp/apps/primitives_gallery/playground/tables/tql.cpp
- cpp/apps/primitives_gallery/playground/tables/viz.cpp
- cpp/apps/primitives_gallery/playground/tables/viz.h
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-13 00:50:34 +02:00

11 KiB

Raw Blame History

id, title, status, priority, created, updated, notes, related_components

title

status

priority

created

updated

notes

related_components

0080

tables playground — LLM "Ask AI" + TQL/SQL emit (fase 11)

partial

medium

2026-05-12

2026-05-13

pure layer + LLM client + Ask AI modal DONE. DuckDB adapter v2 (opcional, build flag FN_TQL_DUCKDB=1)

cpp/apps/primitives_gallery/playground/tables

lua_engine

tql

duckdb

Contexto

Fase 11 del roadmap del tables playground. Dos capacidades que se construyen juntas porque comparten infra (prompt schema, runtime adapter, tests round-trip):

LLM "Ask AI" — usuario o agente pregunta en lenguaje natural, modelo devuelve un nuevo TQL (o SQL DuckDB si esta linkado).
TQL → SQL (DuckDB) emitter — permite a agentes escribir SQL contra el mismo modelo de datos. Ejecutable si la app linkó DuckDB; si no, solo emite el string.

Diseño one-way: TQL → SQL si, SQL → TQL no. Razon documentada en investigacion Metabase MBQL ↔ SQL: la traduccion inversa es lossy (CTEs, window fns, set ops, lateral, correlated subqueries no caben en MBQL/TQL). Patron canonico Malloy/Cube/LookML/Metabase = compile-down one-way.

Cambios

1. UI "Ask AI"

Boton "Ask AI" en toolbar (al lado de "+ Viz").
Modal:
- InputText multiline para la pregunta.
- Toggle output mode: TQL (default) | SQL (DuckDB) (visible solo si app fue compilada con FN_TQL_DUCKDB=1).
- Boton "Send" + spinner.
- Diff side-by-side: actual vs propuesto (texto highlight).
- Botones "Apply" / "Reject" / "Edit before apply".

2. Backend LLM

Provider: Anthropic Claude. API key via pass anthropic/api-key.
Endpoint: https://api.anthropic.com/v1/messages. Model: claude-sonnet-4-6. Override env FN_LLM_MODEL.
Cliente HTTP: cURL via popen (sin deps nuevas).
Prompt template incluye:
- Esquema TQL (de docs/TQL.md).
- Si SQL mode: dialecto DuckDB + funciones DuckDB relevantes (date_trunc, regexp_replace, etc.).
- Cols disponibles del stage 0 (name, type) + cols joinables.
- Grammar Lua subset (ver §4) cuando aplique.
- Funciones Lua disponibles (de lua_engine).
- TQL actual.
- Pregunta del user.
Response: extraer lua (TQL) o sql block del markdown, strip prose.

3. TQL → SQL DuckDB emitter

Nuevo modulo tql_to_sql.{h,cpp} (pure). Funciones:

struct SqlEmit {
    std::string sql;            // SELECT ... statement
    std::vector<std::string> params;  // bound values (?-placeholders)
    std::vector<std::string> warnings;
    std::string error;          // si emit fallo (subset out of bounds)
};

// Pure: emite SQL DuckDB equivalente a la pipeline State (stages 0..active).
// `tables` provee el schema de cada TableInput (no los cells — el caller
// decide como hidratar las tablas en DuckDB).
SqlEmit emit_sql(const State& state, const std::vector<TableInput>& tables,
                  int up_to_stage = -1 /* default = active_stage */);

Mapeo MBQL-style:

Stage 0 = CTE base t0 con SELECT cols + derived FROM main_t [LEFT/INNER/RIGHT/FULL JOIN joinables ON ...].
Stage N = CTE tN con SELECT breakouts, aggregations FROM tN-1 [WHERE filters] [GROUP BY breakouts] [ORDER BY sorts].
Final query SELECT * FROM t<active>.

Stage emit detalle:

filter Op::Eq col = "v" → WHERE col = ? con params.push_back(v) (DuckDB acepta $1/?).
breakout "ts:month" → date_trunc('month', ts) AS "ts:month". Granularity sufijo → DuckDB date_trunc.
aggregation count → COUNT(*) AS count.
aggregation p95(col) → quantile_cont(col, 0.95) AS p95_col.
aggregation distinct col → COUNT(DISTINCT col) AS distinct_col.
sort {desc, col} → ORDER BY col DESC.
Joins: 4 strategies mapean directo a LEFT/INNER/RIGHT/FULL JOIN ... ON l.k = r.k.
Derived cols: transpiladas via Lua subset (§4). Si formula fuera de subset → SqlEmit.error = "lua formula 'X' out of subset: <razon>".

Salida es string SQL valido DuckDB. No ejecuta — eso es responsabilidad del adapter opcional (§5).

4. Lua subset transpilable a SQL — GRAMATICA

Documentar en docs/TQL.md seccion nueva "SQL transpile subset".

Reglas duras: Lua sigue siendo potente y sin limites en runtime general. El subset solo aplica si el caller pide tql_to_sql::emit_sql(). Fuera del subset → error claro en tiempo de emit, NO en tiempo de eval. El playground sigue ejecutando Lua arbitrario sin restriccion.

Subset permitido (transpila a SQL):

Lua	SQL DuckDB
Literales: numero, string `"x"`, bool `true/false`, `nil`	`1.5`, `'x'`, `TRUE/FALSE`, `NULL`
Col ref: `[colname]`	`colname` (identifier quoted si necesario)
Aritmetica: `+ - * / % - (unary)`	mismas
Comparacion: `== ~= < <= > >=`	`= <> < <= > >=`
Logica: `and or not`	`AND OR NOT`
String concat: `..`	`\|\|`
Ternary: `if A then B else C end`	`CASE WHEN A THEN B ELSE C END`
Ternary inline: `(A and B) or C` (pattern comun Lua)	`CASE WHEN A THEN B ELSE C END`
`math.floor/ceil/abs/round/sqrt/sin/cos/log`	`floor/ceiling/abs/round/sqrt/sin/cos/ln`
`math.min(a,b)/max(a,b)`	`least(a,b)/greatest(a,b)`
`string.upper/lower/len(s)`	`upper(s)/lower(s)/length(s)`
`string.sub(s, i, j)`	`substring(s, i, j-i+1)`
`tostring(x)/tonumber(x)`	`CAST(x AS VARCHAR)/CAST(x AS DOUBLE)`
Paréntesis y precedencia	mismas

Fuera de subset (error compile-time):

Closures: function() ... end
Loops: for/while/repeat
Locals: local x = ...
Tables: {...}, t[k], t.field, table.*
Multi-return / vararg
string.gsub/find/match/format (mapeo manual posible v2)
IO: io.*, os.*, print
Coroutines, metatables, debug
Recursion, multi-statement bodies

Error message ejemplo:

SQL transpile error en derived col 'fullname':
  formula = "[first] .. ' ' .. table.concat(parts, ',')"
  causa: 'table.concat' no esta en SQL transpile subset
  ver docs/TQL.md#sql-transpile-subset
  workaround: usar TQL puro (sin SQL emit) o reescribir formula con `..`

Helper: tql_to_sql::is_transpilable(formula, error_out) pure fn que valida una formula sin emitir.

5. DuckDB adapter (opcional)

Build flag FN_TQL_DUCKDB=1 en cpp/CMakeLists.txt opta-in. Vendor DuckDB header-only o lib (depende de tamaño). Default OFF — playground sigue compilando sin DuckDB.

API adapter:

namespace tql_duckdb {
struct Result {
    StageOutput out;        // materializado como TableInput compatible
    std::string error;
    double duration_ms = 0;
};
// Hidrata `tables` como views temp + ejecuta sql + materializa resultado.
Result execute(const std::string& sql,
                const std::vector<std::string>& params,
                const std::vector<TableInput>& tables);
}

Apps que lo usen (registry_dashboard, sqlite_api): linkean DuckDB + invocan adapter cuando user/agent pide SQL output. Playground por defecto NO linka — Ask AI solo ofrece SQL mode si #ifdef FN_TQL_DUCKDB.

6. Validacion + safety

Antes de aplicar TQL del LLM: tql::apply dry-run. Si fail, mostrar error + "Ask AI again with this error".
Antes de ejecutar SQL del LLM: parsing DuckDB en sandbox read-only (DuckDB connection sin INSERT/UPDATE/DELETE/DROP, attach read-only).
Lua sandbox ya cubre side effects en formulas TQL.

7. Streaming

Stream tokens via SSE (stream=true Anthropic).
Texto en vivo en modal.
Cuando termina, parse lua/sql block final.

8. Persistencia conversacion

UiState guarda lista de turns (pregunta + output propuesto + apply result + engine usado TQL/SQL).
Siguiente "Ask AI" turn incluye history previa.
Boton "Reset chat".
NO persistido en TQL (UI state efimero).

9. Coste / rate limit

Mostrar tokens estimados antes de enviar (rough char count / 4).
Cap input a 8000 tokens.
Error handling: 429 / 5xx → mensaje + reintentar.

Tests

Pure (sin red, sin DuckDB linkado)

Lua subset validator: is_transpilable true para casos subset, false con error claro para fuera de subset (closures, loops, table.*, string.gsub, etc.).
TQL → SQL emit golden tests (~20 casos):
- stage 0 simple filter + sort → SELECT ... WHERE ... ORDER BY ...
- stage 1 group + count → CTE chain con GROUP BY
- granularity sufijo :month → date_trunc('month', ts)
- join 4 strategies con multi-key
- derived cols subset → CASE/expressions
- derived cols fuera subset → SqlEmit.error no vacio + warning
- aggregation p25/p50/p75/p99 → quantile_cont(col, p)
- empty pipeline → SELECT * FROM t0
TQL parseo: prompt build incluye schema + TQL + pregunta en formato esperado (mockear HTTP).
Response parse: extrae lua/sql block correctamente.

Round-trip (requiere DuckDB linkado)

Solo corren si FN_TQL_DUCKDB=1:

TQL → emit SQL → ejecutar DuckDB → resultado coincide bit-a-bit con compute_stage pure sobre los mismos cells.
Casos: filter, group+agg, join inner, multi-stage chain, breakout granularity month/week, derived col [a] + [b] * 2.

LLM (red real, opt-in)

Test integration con ANTHROPIC_API_KEY real (make test-llm): pregunta simple → recibe TQL valido → apply OK.
Mock test (CI): cURL stub responde con JSON predefinido → parser extrae bloque OK.

No-objetivos

SQL → TQL: no se implementa. Documentado en doc + en mensajes de error del Ask AI ("no soportamos SQL como input, use TQL").
Multi-provider (OpenAI, local): fase futura. Anthropic hardcoded v1.
Generacion de viz desde LLM mas alla de display token: la viz la elige TQL existente.
Lua subset extension (string.gsub, regex, table.*): postpone v2 si demanda real.
DuckDB write ops: solo SELECT/CTE. Apps que quieran INSERT/UPDATE lo hacen fuera del playground.

Flujo agente (resumen)

Agente -> "muestrame top 10 langs por total size"
LLM (TQL default) -> emite TQL { stages = {...} }
tql::apply -> State + dry-run OK
User clickea Apply -> compute_stage en memoria

Agente -> "lo mismo pero como SQL"
[Si FN_TQL_DUCKDB=1 y app linkó adapter]
LLM (SQL mode toggled) -> emite SELECT ... DuckDB
duckdb::execute(sql, params, tables) -> resultado materializado
[Si NO linkado] -> error "SQL mode requiere DuckDB. Compila con FN_TQL_DUCKDB=1"

Riesgos

Subset Lua restrictivo en SQL emit: usuarios usan Lua arbitrario en playground → al pedir SQL falla. Mitigacion: error message claro + sugerencia workaround.
DuckDB tamaño: lib ~10MB. Solo se paga si app opta-in con build flag.
Dialect drift DuckDB: funciones SQL pueden cambiar entre versiones. Pinear DuckDB version en CMake.
LLM hallucinations: TQL invalido → dry-run rechaza con error. Loop "Ask AI again with this error" recupera.
API key leak: pass integration mantiene fuera del repo. Build flag NUNCA imprime key.
Coste tokens: prompt grande (schema + grammar + TQL). Cap input + warning visual.

11 KiB Raw Blame History