merge issue/0113: agent_runner_api service Go :8486 + agent_runs.db
This commit is contained in:
@@ -0,0 +1,113 @@
|
||||
---
|
||||
id: "0113"
|
||||
title: "Service agent_runner_api: orquestador de workflows con worktrees + DoD"
|
||||
status: done
|
||||
type: app
|
||||
domain:
|
||||
- agents
|
||||
- workflows
|
||||
- apps-infra
|
||||
scope: app
|
||||
priority: alta
|
||||
depends: []
|
||||
blocks:
|
||||
- "0115"
|
||||
- "0116"
|
||||
- "0117"
|
||||
- "0118"
|
||||
related:
|
||||
- "0008"
|
||||
- "0069"
|
||||
created: 2026-05-18
|
||||
updated: 2026-05-18
|
||||
tags: [agents, service, worktrees, dod, claude-headless]
|
||||
flow: "0008"
|
||||
---
|
||||
|
||||
# 0113 — Service `agent_runner_api`
|
||||
|
||||
## Problema
|
||||
|
||||
Hoy hay tres puntos donde se lanza Claude:
|
||||
1. `apps/skill_tree/main.cpp::spawn_claude_terminal` — abre `wt.exe` con `claude --dangerously-skip-permissions`. Termina sin trazabilidad.
|
||||
2. `parallel-fix-issues` skill — worktrees paralelos pero stateless.
|
||||
3. `fn-orquestador` (issue 0069) — autonomous loop dentro de Claude Code.
|
||||
|
||||
Ninguno persiste runs, evidencias o DoD. No hay manera de saber que workflows estan vivos cross-app.
|
||||
|
||||
## Decision
|
||||
|
||||
Service Go nuevo `apps/agent_runner_api/` puerto `:8486`, tag `service`. Single source of truth de:
|
||||
- workflows declarados (templates de prompt + DoD schema)
|
||||
- runs activos (worktree + subprocess Claude + status)
|
||||
- evidencias DoD (path/url/log/cmd output + validated_by)
|
||||
|
||||
Endpoints minimos:
|
||||
- `POST /api/runs` — crea worktree + lanza claude headless. Body: `{issue_id|card_id, mode, kanban_app}`.
|
||||
- `GET /api/runs` — lista runs (filtros status/app/since).
|
||||
- `GET /api/runs/:id` — detalle run.
|
||||
- `GET /api/runs/:id/sse` — stream progreso.
|
||||
- `POST /api/runs/:id/evidence` — agente adjunta evidencia.
|
||||
- `POST /api/runs/:id/evidence/:eid/validate` — humano aprueba.
|
||||
- `POST /api/runs/:id/merge` — TBD merge (todos items validated).
|
||||
- `POST /api/runs/:id/abort` — kill subprocess + worktree remove.
|
||||
- `GET /api/health` — 200 OK.
|
||||
|
||||
## Schema `agent_runs.db`
|
||||
|
||||
Migrations en `apps/agent_runner_api/migrations/`:
|
||||
|
||||
- `001_workflows.sql` — templates: `id, name, prompt_template, dod_schema_json, created_at`.
|
||||
- `002_runs.sql` — `id, workflow_id, issue_id, card_id, kanban_app, branch, worktree_path, status, started_at, finished_at, agent_pid, agent_log_path`.
|
||||
- `003_worktrees.sql` — `id, run_id, path, branch, created_at, removed_at`.
|
||||
- `004_dod_items.sql` — un row por item declarado: `id, run_id, item_key, kind, expected, required, status (pending|done|validated|failed)`.
|
||||
- `005_dod_evidence.sql` — un row por evidencia adjunta: `id, dod_item_id, kind, payload_path, payload_url, payload_text, attached_at, validated_at, validated_by`.
|
||||
|
||||
Aplicadas via `embed.FS + applyMigrations()` al arrancar.
|
||||
|
||||
## Frontmatter `app.md` (service)
|
||||
|
||||
```yaml
|
||||
tags: [service, agents, go]
|
||||
service:
|
||||
port: 8486
|
||||
health_endpoint: /api/health
|
||||
health_timeout_s: 3
|
||||
systemd_unit: agent_runner_api.service
|
||||
systemd_scope: user
|
||||
restart_policy: always
|
||||
runtime: systemd-user
|
||||
pc_targets:
|
||||
- aurgi-pc
|
||||
- home-wsl
|
||||
is_local_only: true
|
||||
```
|
||||
|
||||
## Criterios de aceptacion
|
||||
|
||||
- [ ] `apps/agent_runner_api/` scaffold Go (main.go, db.go, handlers.go, sse.go, agent_spawn.go).
|
||||
- [ ] Migrations 001-005 versionadas + aplicadas al arrancar (idempotente).
|
||||
- [ ] Endpoints arriba implementados con tests `*_test.go`.
|
||||
- [ ] systemd unit `agent_runner_api.service` con `Restart=always`.
|
||||
- [ ] `app.md` con trio + bloque `service:` completo (issue 0105).
|
||||
- [ ] `fn doctor services-spec` valida bloque.
|
||||
- [ ] Smoke test: POST /api/runs con issue dummy crea worktree real en `/tmp/wt-test-<id>`, persiste row en `agent_runs`, lanza echo subprocess (no claude real en test).
|
||||
- [ ] Cleanup en abort: subprocess killed + `git worktree remove --force` + row marcada `aborted`.
|
||||
- [ ] e2e_checks: build, migration apply, health, smoke run dummy, cleanup.
|
||||
- [ ] Documentado en `docs/capabilities/agents.md` (capability group `agents`, ver 0114).
|
||||
|
||||
## Gotchas
|
||||
|
||||
- `git worktree add` falla si el branch ya existe. Reset hard antes (mismo patron que `autonomous_loop.md`).
|
||||
- Worktree y main repo comparten `.git/hooks/`. Pre-commit puede bloquear. Permitir `--no-verify` SOLO si `events_json[].decision="skip_hook"` documentado.
|
||||
- `claude --headless` necesita PATH correcto (`~/.local/bin`). Service systemd corre con user env: verificar `Environment=PATH=...`.
|
||||
- Subprocess Claude puede correr horas. NO bloquear handler HTTP: spawn async, devolver `run_id` inmediato, monitorear PID en goroutine.
|
||||
- SSE: clientes ImGui (kanban_cpp, skill_tree) deben reconectar con `Last-Event-ID`.
|
||||
- Paths protegidos (`dev/autonomous_protected_paths.json`) aplican igual aqui. Reusar logica de fn-orquestador.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- UI propio del service (es backend puro; UIs son kanban_cpp + skill_tree).
|
||||
- Auth/auth tokens (local-only por ahora; agregar en issue separado si se expone fuera de localhost).
|
||||
- Webhook Gitea para auto-trigger desde commits.
|
||||
- Schedule cron para workflows recurrentes.
|
||||
Reference in New Issue
Block a user