docs: cerrar issue 0113 — agent_runner_api scaffold + DoD + worktrees

This commit is contained in:
2026-05-18 18:14:39 +02:00
parent ddb5366884
commit 975c9f66ae
@@ -0,0 +1,113 @@
---
id: "0113"
title: "Service agent_runner_api: orquestador de workflows con worktrees + DoD"
status: done
type: app
domain:
- agents
- workflows
- apps-infra
scope: app
priority: alta
depends: []
blocks:
- "0115"
- "0116"
- "0117"
- "0118"
related:
- "0008"
- "0069"
created: 2026-05-18
updated: 2026-05-18
tags: [agents, service, worktrees, dod, claude-headless]
flow: "0008"
---
# 0113 — Service `agent_runner_api`
## Problema
Hoy hay tres puntos donde se lanza Claude:
1. `apps/skill_tree/main.cpp::spawn_claude_terminal` — abre `wt.exe` con `claude --dangerously-skip-permissions`. Termina sin trazabilidad.
2. `parallel-fix-issues` skill — worktrees paralelos pero stateless.
3. `fn-orquestador` (issue 0069) — autonomous loop dentro de Claude Code.
Ninguno persiste runs, evidencias o DoD. No hay manera de saber que workflows estan vivos cross-app.
## Decision
Service Go nuevo `apps/agent_runner_api/` puerto `:8486`, tag `service`. Single source of truth de:
- workflows declarados (templates de prompt + DoD schema)
- runs activos (worktree + subprocess Claude + status)
- evidencias DoD (path/url/log/cmd output + validated_by)
Endpoints minimos:
- `POST /api/runs` — crea worktree + lanza claude headless. Body: `{issue_id|card_id, mode, kanban_app}`.
- `GET /api/runs` — lista runs (filtros status/app/since).
- `GET /api/runs/:id` — detalle run.
- `GET /api/runs/:id/sse` — stream progreso.
- `POST /api/runs/:id/evidence` — agente adjunta evidencia.
- `POST /api/runs/:id/evidence/:eid/validate` — humano aprueba.
- `POST /api/runs/:id/merge` — TBD merge (todos items validated).
- `POST /api/runs/:id/abort` — kill subprocess + worktree remove.
- `GET /api/health` — 200 OK.
## Schema `agent_runs.db`
Migrations en `apps/agent_runner_api/migrations/`:
- `001_workflows.sql` — templates: `id, name, prompt_template, dod_schema_json, created_at`.
- `002_runs.sql``id, workflow_id, issue_id, card_id, kanban_app, branch, worktree_path, status, started_at, finished_at, agent_pid, agent_log_path`.
- `003_worktrees.sql``id, run_id, path, branch, created_at, removed_at`.
- `004_dod_items.sql` — un row por item declarado: `id, run_id, item_key, kind, expected, required, status (pending|done|validated|failed)`.
- `005_dod_evidence.sql` — un row por evidencia adjunta: `id, dod_item_id, kind, payload_path, payload_url, payload_text, attached_at, validated_at, validated_by`.
Aplicadas via `embed.FS + applyMigrations()` al arrancar.
## Frontmatter `app.md` (service)
```yaml
tags: [service, agents, go]
service:
port: 8486
health_endpoint: /api/health
health_timeout_s: 3
systemd_unit: agent_runner_api.service
systemd_scope: user
restart_policy: always
runtime: systemd-user
pc_targets:
- aurgi-pc
- home-wsl
is_local_only: true
```
## Criterios de aceptacion
- [ ] `apps/agent_runner_api/` scaffold Go (main.go, db.go, handlers.go, sse.go, agent_spawn.go).
- [ ] Migrations 001-005 versionadas + aplicadas al arrancar (idempotente).
- [ ] Endpoints arriba implementados con tests `*_test.go`.
- [ ] systemd unit `agent_runner_api.service` con `Restart=always`.
- [ ] `app.md` con trio + bloque `service:` completo (issue 0105).
- [ ] `fn doctor services-spec` valida bloque.
- [ ] Smoke test: POST /api/runs con issue dummy crea worktree real en `/tmp/wt-test-<id>`, persiste row en `agent_runs`, lanza echo subprocess (no claude real en test).
- [ ] Cleanup en abort: subprocess killed + `git worktree remove --force` + row marcada `aborted`.
- [ ] e2e_checks: build, migration apply, health, smoke run dummy, cleanup.
- [ ] Documentado en `docs/capabilities/agents.md` (capability group `agents`, ver 0114).
## Gotchas
- `git worktree add` falla si el branch ya existe. Reset hard antes (mismo patron que `autonomous_loop.md`).
- Worktree y main repo comparten `.git/hooks/`. Pre-commit puede bloquear. Permitir `--no-verify` SOLO si `events_json[].decision="skip_hook"` documentado.
- `claude --headless` necesita PATH correcto (`~/.local/bin`). Service systemd corre con user env: verificar `Environment=PATH=...`.
- Subprocess Claude puede correr horas. NO bloquear handler HTTP: spawn async, devolver `run_id` inmediato, monitorear PID en goroutine.
- SSE: clientes ImGui (kanban_cpp, skill_tree) deben reconectar con `Last-Event-ID`.
- Paths protegidos (`dev/autonomous_protected_paths.json`) aplican igual aqui. Reusar logica de fn-orquestador.
## Out of scope
- UI propio del service (es backend puro; UIs son kanban_cpp + skill_tree).
- Auth/auth tokens (local-only por ahora; agregar en issue separado si se expone fuera de localhost).
- Webhook Gitea para auto-trigger desde commits.
- Schedule cron para workflows recurrentes.