114 lines
4.7 KiB
Markdown
114 lines
4.7 KiB
Markdown
---
|
|
id: "0113"
|
|
title: "Service agent_runner_api: orquestador de workflows con worktrees + DoD"
|
|
status: done
|
|
type: app
|
|
domain:
|
|
- agents
|
|
- workflows
|
|
- apps-infra
|
|
scope: app
|
|
priority: alta
|
|
depends: []
|
|
blocks:
|
|
- "0115"
|
|
- "0116"
|
|
- "0117"
|
|
- "0118"
|
|
related:
|
|
- "0008"
|
|
- "0069"
|
|
created: 2026-05-18
|
|
updated: 2026-05-18
|
|
tags: [agents, service, worktrees, dod, claude-headless]
|
|
flow: "0008"
|
|
---
|
|
|
|
# 0113 — Service `agent_runner_api`
|
|
|
|
## Problema
|
|
|
|
Hoy hay tres puntos donde se lanza Claude:
|
|
1. `apps/skill_tree/main.cpp::spawn_claude_terminal` — abre `wt.exe` con `claude --dangerously-skip-permissions`. Termina sin trazabilidad.
|
|
2. `parallel-fix-issues` skill — worktrees paralelos pero stateless.
|
|
3. `fn-orquestador` (issue 0069) — autonomous loop dentro de Claude Code.
|
|
|
|
Ninguno persiste runs, evidencias o DoD. No hay manera de saber que workflows estan vivos cross-app.
|
|
|
|
## Decision
|
|
|
|
Service Go nuevo `apps/agent_runner_api/` puerto `:8486`, tag `service`. Single source of truth de:
|
|
- workflows declarados (templates de prompt + DoD schema)
|
|
- runs activos (worktree + subprocess Claude + status)
|
|
- evidencias DoD (path/url/log/cmd output + validated_by)
|
|
|
|
Endpoints minimos:
|
|
- `POST /api/runs` — crea worktree + lanza claude headless. Body: `{issue_id|card_id, mode, kanban_app}`.
|
|
- `GET /api/runs` — lista runs (filtros status/app/since).
|
|
- `GET /api/runs/:id` — detalle run.
|
|
- `GET /api/runs/:id/sse` — stream progreso.
|
|
- `POST /api/runs/:id/evidence` — agente adjunta evidencia.
|
|
- `POST /api/runs/:id/evidence/:eid/validate` — humano aprueba.
|
|
- `POST /api/runs/:id/merge` — TBD merge (todos items validated).
|
|
- `POST /api/runs/:id/abort` — kill subprocess + worktree remove.
|
|
- `GET /api/health` — 200 OK.
|
|
|
|
## Schema `agent_runs.db`
|
|
|
|
Migrations en `apps/agent_runner_api/migrations/`:
|
|
|
|
- `001_workflows.sql` — templates: `id, name, prompt_template, dod_schema_json, created_at`.
|
|
- `002_runs.sql` — `id, workflow_id, issue_id, card_id, kanban_app, branch, worktree_path, status, started_at, finished_at, agent_pid, agent_log_path`.
|
|
- `003_worktrees.sql` — `id, run_id, path, branch, created_at, removed_at`.
|
|
- `004_dod_items.sql` — un row por item declarado: `id, run_id, item_key, kind, expected, required, status (pending|done|validated|failed)`.
|
|
- `005_dod_evidence.sql` — un row por evidencia adjunta: `id, dod_item_id, kind, payload_path, payload_url, payload_text, attached_at, validated_at, validated_by`.
|
|
|
|
Aplicadas via `embed.FS + applyMigrations()` al arrancar.
|
|
|
|
## Frontmatter `app.md` (service)
|
|
|
|
```yaml
|
|
tags: [service, agents, go]
|
|
service:
|
|
port: 8486
|
|
health_endpoint: /api/health
|
|
health_timeout_s: 3
|
|
systemd_unit: agent_runner_api.service
|
|
systemd_scope: user
|
|
restart_policy: always
|
|
runtime: systemd-user
|
|
pc_targets:
|
|
- aurgi-pc
|
|
- home-wsl
|
|
is_local_only: true
|
|
```
|
|
|
|
## Criterios de aceptacion
|
|
|
|
- [ ] `apps/agent_runner_api/` scaffold Go (main.go, db.go, handlers.go, sse.go, agent_spawn.go).
|
|
- [ ] Migrations 001-005 versionadas + aplicadas al arrancar (idempotente).
|
|
- [ ] Endpoints arriba implementados con tests `*_test.go`.
|
|
- [ ] systemd unit `agent_runner_api.service` con `Restart=always`.
|
|
- [ ] `app.md` con trio + bloque `service:` completo (issue 0105).
|
|
- [ ] `fn doctor services-spec` valida bloque.
|
|
- [ ] Smoke test: POST /api/runs con issue dummy crea worktree real en `/tmp/wt-test-<id>`, persiste row en `agent_runs`, lanza echo subprocess (no claude real en test).
|
|
- [ ] Cleanup en abort: subprocess killed + `git worktree remove --force` + row marcada `aborted`.
|
|
- [ ] e2e_checks: build, migration apply, health, smoke run dummy, cleanup.
|
|
- [ ] Documentado en `docs/capabilities/agents.md` (capability group `agents`, ver 0114).
|
|
|
|
## Gotchas
|
|
|
|
- `git worktree add` falla si el branch ya existe. Reset hard antes (mismo patron que `autonomous_loop.md`).
|
|
- Worktree y main repo comparten `.git/hooks/`. Pre-commit puede bloquear. Permitir `--no-verify` SOLO si `events_json[].decision="skip_hook"` documentado.
|
|
- `claude --headless` necesita PATH correcto (`~/.local/bin`). Service systemd corre con user env: verificar `Environment=PATH=...`.
|
|
- Subprocess Claude puede correr horas. NO bloquear handler HTTP: spawn async, devolver `run_id` inmediato, monitorear PID en goroutine.
|
|
- SSE: clientes ImGui (kanban_cpp, skill_tree) deben reconectar con `Last-Event-ID`.
|
|
- Paths protegidos (`dev/autonomous_protected_paths.json`) aplican igual aqui. Reusar logica de fn-orquestador.
|
|
|
|
## Out of scope
|
|
|
|
- UI propio del service (es backend puro; UIs son kanban_cpp + skill_tree).
|
|
- Auth/auth tokens (local-only por ahora; agregar en issue separado si se expone fuera de localhost).
|
|
- Webhook Gitea para auto-trigger desde commits.
|
|
- Schedule cron para workflows recurrentes.
|