--- id: "0113" title: "Service agent_runner_api: orquestador de workflows con worktrees + DoD" status: pendiente type: app domain: - agents - workflows - apps-infra scope: app priority: alta depends: [] blocks: - "0115" - "0116" - "0117" - "0118" related: - "0008" - "0069" created: 2026-05-18 updated: 2026-05-18 tags: [agents, service, worktrees, dod, claude-headless] flow: "0008" --- # 0113 — Service `agent_runner_api` ## Problema Hoy hay tres puntos donde se lanza Claude: 1. `apps/skill_tree/main.cpp::spawn_claude_terminal` — abre `wt.exe` con `claude --dangerously-skip-permissions`. Termina sin trazabilidad. 2. `parallel-fix-issues` skill — worktrees paralelos pero stateless. 3. `fn-orquestador` (issue 0069) — autonomous loop dentro de Claude Code. Ninguno persiste runs, evidencias o DoD. No hay manera de saber que workflows estan vivos cross-app. ## Decision Service Go nuevo `apps/agent_runner_api/` puerto `:8486`, tag `service`. Single source of truth de: - workflows declarados (templates de prompt + DoD schema) - runs activos (worktree + subprocess Claude + status) - evidencias DoD (path/url/log/cmd output + validated_by) Endpoints minimos: - `POST /api/runs` — crea worktree + lanza claude headless. Body: `{issue_id|card_id, mode, kanban_app}`. - `GET /api/runs` — lista runs (filtros status/app/since). - `GET /api/runs/:id` — detalle run. - `GET /api/runs/:id/sse` — stream progreso. - `POST /api/runs/:id/evidence` — agente adjunta evidencia. - `POST /api/runs/:id/evidence/:eid/validate` — humano aprueba. - `POST /api/runs/:id/merge` — TBD merge (todos items validated). - `POST /api/runs/:id/abort` — kill subprocess + worktree remove. - `GET /api/health` — 200 OK. ## Schema `agent_runs.db` Migrations en `apps/agent_runner_api/migrations/`: - `001_workflows.sql` — templates: `id, name, prompt_template, dod_schema_json, created_at`. - `002_runs.sql` — `id, workflow_id, issue_id, card_id, kanban_app, branch, worktree_path, status, started_at, finished_at, agent_pid, agent_log_path`. - `003_worktrees.sql` — `id, run_id, path, branch, created_at, removed_at`. - `004_dod_items.sql` — un row por item declarado: `id, run_id, item_key, kind, expected, required, status (pending|done|validated|failed)`. - `005_dod_evidence.sql` — un row por evidencia adjunta: `id, dod_item_id, kind, payload_path, payload_url, payload_text, attached_at, validated_at, validated_by`. Aplicadas via `embed.FS + applyMigrations()` al arrancar. ## Frontmatter `app.md` (service) ```yaml tags: [service, agents, go] service: port: 8486 health_endpoint: /api/health health_timeout_s: 3 systemd_unit: agent_runner_api.service systemd_scope: user restart_policy: always runtime: systemd-user pc_targets: - aurgi-pc - home-wsl is_local_only: true ``` ## Criterios de aceptacion - [ ] `apps/agent_runner_api/` scaffold Go (main.go, db.go, handlers.go, sse.go, agent_spawn.go). - [ ] Migrations 001-005 versionadas + aplicadas al arrancar (idempotente). - [ ] Endpoints arriba implementados con tests `*_test.go`. - [ ] systemd unit `agent_runner_api.service` con `Restart=always`. - [ ] `app.md` con trio + bloque `service:` completo (issue 0105). - [ ] `fn doctor services-spec` valida bloque. - [ ] Smoke test: POST /api/runs con issue dummy crea worktree real en `/tmp/wt-test-`, persiste row en `agent_runs`, lanza echo subprocess (no claude real en test). - [ ] Cleanup en abort: subprocess killed + `git worktree remove --force` + row marcada `aborted`. - [ ] e2e_checks: build, migration apply, health, smoke run dummy, cleanup. - [ ] Documentado en `docs/capabilities/agents.md` (capability group `agents`, ver 0114). ## Gotchas - `git worktree add` falla si el branch ya existe. Reset hard antes (mismo patron que `autonomous_loop.md`). - Worktree y main repo comparten `.git/hooks/`. Pre-commit puede bloquear. Permitir `--no-verify` SOLO si `events_json[].decision="skip_hook"` documentado. - `claude --headless` necesita PATH correcto (`~/.local/bin`). Service systemd corre con user env: verificar `Environment=PATH=...`. - Subprocess Claude puede correr horas. NO bloquear handler HTTP: spawn async, devolver `run_id` inmediato, monitorear PID en goroutine. - SSE: clientes ImGui (kanban_cpp, skill_tree) deben reconectar con `Last-Event-ID`. - Paths protegidos (`dev/autonomous_protected_paths.json`) aplican igual aqui. Reusar logica de fn-orquestador. ## Out of scope - UI propio del service (es backend puro; UIs son kanban_cpp + skill_tree). - Auth/auth tokens (local-only por ahora; agregar en issue separado si se expone fuera de localhost). - Webhook Gitea para auto-trigger desde commits. - Schedule cron para workflows recurrentes.