5d2a14e50a
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
122 lines
6.0 KiB
Markdown
122 lines
6.0 KiB
Markdown
---
|
|
name: dag_engine
|
|
lang: go
|
|
domain: infra
|
|
description: "Motor de ejecucion de DAGs con CLI y interfaz web. Reemplaza Dagu con implementacion propia compatible con el formato YAML existente. Almacena historial de ejecuciones en SQLite."
|
|
tags: [service, dag, workflow, scheduler, web, cron]
|
|
uses_functions:
|
|
- dag_parse_go_core
|
|
- dag_validate_go_core
|
|
- dag_topo_sort_go_core
|
|
- dag_resolve_env_go_core
|
|
- parse_cron_expr_go_core
|
|
- next_cron_time_go_core
|
|
- cron_ticker_go_infra
|
|
- find_go_core
|
|
- process_spawn_go_infra
|
|
- process_wait_go_infra
|
|
uses_types:
|
|
- dag_definition_go_core
|
|
- dag_step_go_core
|
|
- dag_validation_result_go_core
|
|
- cron_schedule_go_core
|
|
- process_handle_go_infra
|
|
- process_result_go_infra
|
|
- DagRun_go_infra
|
|
- DagStepResult_go_infra
|
|
framework: "net/http + vite + react"
|
|
entry_point: "main.go"
|
|
dir_path: "apps/dag_engine"
|
|
---
|
|
|
|
## Arquitectura
|
|
|
|
CLI + servidor web en un unico binario:
|
|
|
|
```
|
|
dag-engine run <path.yaml> # ejecuta un DAG desde terminal
|
|
dag-engine list [dir] # lista DAGs con schedule y estado
|
|
dag-engine status [dag_name] # historial de ejecuciones
|
|
dag-engine validate <path.yaml> # valida sin ejecutar
|
|
dag-engine server # arranca HTTP + frontend web
|
|
```
|
|
|
|
### Backend (Go)
|
|
|
|
- `net/http` con `ServeMux` (Go 1.22+ pattern routing)
|
|
- SQLite via `go-sqlite3` para historial de runs
|
|
- Executor: parse -> validate -> topo_sort -> spawn/wait por nivel -> store
|
|
- Scheduler: cron_ticker por cada DAG con schedule
|
|
|
|
### Frontend (Vite + React + Mantine)
|
|
|
|
- DagList: tabla de DAGs con schedule, tags, ultimo status
|
|
- DagDetail: metadata + "Run Now" + historial
|
|
- RunDetail: timeline de steps con stdout/stderr expandible
|
|
|
|
### Storage
|
|
|
|
SQLite `dag_engine.db`:
|
|
- `dag_runs`: id, dag_name, status, trigger, started_at, finished_at, error
|
|
- `dag_step_results`: id, run_id, step_name, status, exit_code, stdout, stderr, duration_ms
|
|
|
|
### Build
|
|
|
|
```bash
|
|
cd frontend && pnpm install && pnpm build
|
|
cd .. && CGO_ENABLED=1 go build -tags fts5 -o dag-engine .
|
|
```
|
|
|
|
### Uso
|
|
|
|
```bash
|
|
# CLI
|
|
./dag-engine run ~/dagu/dags/example.yaml
|
|
./dag-engine list ~/dagu/dags/
|
|
|
|
# Servidor web
|
|
./dag-engine server --port 8090 --dags-dir ~/dagu/dags/ --scheduler
|
|
# Browser: http://localhost:8090
|
|
```
|
|
|
|
## Notas
|
|
|
|
Compatible con el formato YAML de Dagu. Lee DAGs existentes de `~/dagu/dags/` sin modificaciones.
|
|
Puerto por defecto 8090 (mismo que Dagu).
|
|
|
|
### 2026-05-16 — Fix function-not-found en steps `function:` + panel Logs en RunDetail `[done]`
|
|
|
|
Sintoma: `fn_backup` y `daily-registry-audit` fallaron 3 noches seguidas con `error: function "<id>" not found (tried as ID and name)` aunque las funciones existen en `registry.db` raiz.
|
|
|
|
Raiz: servicio systemd `dag_engine.service` tiene `WorkingDirectory=/home/lucas/fn_registry/apps/dag_engine`. Binario `fn` resuelve `registry.db` por (1) `FN_REGISTRY_ROOT`, (2) `root()` walk-up buscando `go.mod`, (3) exe dir (`cmd/fn/ops.go:1597-1628`). Sin `FN_REGISTRY_ROOT` seteado, (2) encuentra el `go.mod` de `apps/dag_engine/` y devuelve ese dir — donde habia una copia stale `apps/dag_engine/registry.db` (262 KB, May 15) sin las funciones recien creadas. Viola regla `.claude/rules/db_locations.md` (registry.db SOLO en raiz).
|
|
|
|
Fix:
|
|
- Borrado `apps/dag_engine/registry.db` stale.
|
|
- `~/.config/systemd/user/dag_engine.service`: anadido `Environment=FN_REGISTRY_ROOT=/home/lucas/fn_registry`, `FN_BIN=/home/lucas/fn_registry/fn`, `PATH=/usr/local/go/bin:/home/lucas/go/bin:...`, `HOME=/home/lucas`. Sin PATH el step `go vet` fallaba con `exec: "go": executable file not found in $PATH`.
|
|
- `apps/dag_engine/executor.go`: para steps `function:` el spawn exporta `FN_REGISTRY_ROOT=<root>` en env y, si `step.dir`/`working_dir` vacios, fija `dir = fnRegistryRoot`. Belt-and-suspenders: aunque alguien lance el binario sin systemd, los `function:` steps usan el root canonico.
|
|
|
|
Verificacion: `POST /api/dags/daily-registry-audit/run` -> step `audit_capabilities` pasa (387 ms) en vez de fallar con not-found. Restantes failures (`audit_artefacts` exit 1, `fn_backup` exit 4 sin respetar `continue_on.exit_code`) son bugs reales independientes — fuera de scope.
|
|
|
|
### 2026-05-16 — Panel Logs en RunDetail (frontend) `[done]`
|
|
|
|
- `apps/dag_engine/frontend/src/pages/RunDetail.tsx`: nuevo `<Paper>` "Logs" al final con `<Code block>` scrollable (max-h 480) + `CopyButton` de Mantine (icono toggle copy/check teal).
|
|
- Helper `buildLogText(run, steps)` compone texto plano: metadata del run (dag, path, status, trigger, started/finished ISO, duration ms, error) + por step (`[status] name exit=N Nms`, started, finished, error, stdout, stderr indentado 4 espacios).
|
|
- Permite pegar log entero al LLM para debugging sin abrir N collapses del `StepTimeline`.
|
|
- Build frontend pendiente: `pnpm build` rompe por errores preexistentes (`StepTimeline.tsx:49` usa API legacy `<Collapse in={opened}>`; `main.tsx:1` importa `@mantine/core/styles.css` sin tipos). Edit de RunDetail type-checkea limpio.
|
|
|
|
### 2026-05-16 — BBDDs canonicas (referencia rapida)
|
|
|
|
- `dag_engine.db`: `apps/dag_engine/dag_engine.db` (+ WAL sidecars). Migrations en `apps/dag_engine/store/migrations/` (`001_init.sql`, `002_step_function_id.sql`). Tablas `dag_runs`, `dag_step_results`.
|
|
- NO debe coexistir copia de `registry.db` en este dir (viola `db_locations.md`). Si reaparece: borrarla.
|
|
|
|
## Lo siguiente que pega
|
|
|
|
- `audit_artefacts` falla con exit 1 en `daily-registry-audit` — investigar stderr real (probablemente artefacto huerfano o git drift). Step independiente, no bloquea el resto del DAG.
|
|
- `fn_backup` step `run_backup_all` sale con exit 4 y el DAG no respeta `continue_on.exit_code: [4]`. Bug en executor: parsear `step.ContinueOn.ExitCode []int` y comparar con `result.ExitCode`. Hoy solo se mira `step.ContinueOn.Failure` (bool).
|
|
- Frontend `pnpm build` roto por API drift de Mantine en `StepTimeline.tsx` (`<Collapse in={opened}>`) y CSS type import en `main.tsx`. Fix junto con un refresh general de tipos.
|
|
|
|
## Documentacion de usuario
|
|
|
|
Guia completa (formato YAML, anadir DAGs, troubleshooting, endpoints HTTP):
|
|
**[apps/dag_engine/README.md](README.md)**.
|