--- id: "0128" title: "agents_and_robots: HTTP API + SSE + apikey + TLS subdominio" status: pendiente type: feature domain: - agents - infra - deploy scope: app priority: alta depends: [] blocks: - "0129" related: [] created: 2026-05-22 updated: 2026-05-22 tags: [agents_and_robots, http, sse, apikey, traefik, systemd] dod_evidence_schema: - id: build_ok kind: cmd expected: "cd projects/element_agents/apps/agents_and_robots && go build -tags goolm ./cmd/launcher → exit 0" required: true - id: api_list_authorized kind: cmd expected: "curl -fsS -H 'Authorization: Bearer $AGENTS_API_KEY' https://agents.organic-machine.com/agents devuelve JSON con N>=7 agentes" required: true - id: api_list_unauthorized_401 kind: cmd expected: "curl -s -o /dev/null -w '%{http_code}' https://agents.organic-machine.com/agents == 401" required: true - id: api_start_stop_roundtrip kind: cmd expected: "POST /agents/test-bot/stop → POST /agents/test-bot/start: status running confirmado via GET /agents/test-bot tras 2s" required: true - id: sse_logs_streaming kind: cmd expected: "curl -N -H 'Authorization: Bearer $KEY' https://agents.organic-machine.com/sse/agents/assistant-bot/logs entrega >=1 line en 5s con agente activo" required: true - id: sse_status_broadcast kind: cmd expected: "curl -N /sse/status recibe evento {agent_id, old_status, new_status} tras stop/start manual" required: true - id: systemd_active kind: cmd expected: "ssh organic-machine.com 'systemctl is-active agents_and_robots.service' == active" required: true - id: traefik_route kind: url expected: "agents.organic-machine.com resuelve y devuelve cert LE valido (curl -vI muestra subject CN=agents.organic-machine.com)" required: true - id: app_md_drift_fixed kind: cmd expected: "fn doctor services-spec apps/element_agents/apps/agents_and_robots reporta OK (sin drift runtime/systemd)" required: true --- # 0128 — agents_and_robots HTTP API + SSE + apikey + TLS ## Contexto Hoy `agents_and_robots` solo expone control via `agentctl` CLI local (filesystem-based, `shell/process.Manager`). No hay forma remota de gestionar agentes. Necesitamos backend HTTP seguro para que un frontend local C++ (issue 0129) pueda listar, start/stop/restart agentes, y streamear logs/status en vivo. ## Decision **Integrar daemon HTTP DENTRO de `cmd/launcher`** como goroutine. Comparte `process.Manager` + acceso a `shell/memory/*.db` + Matrix clients. Un solo proceso, sin drift entre daemon y supervisor. **Auth:** `Authorization: Bearer ` con `subtle.ConstantTimeCompare`. Clave 32 bytes hex en `.env` (`AGENTS_API_KEY`). 401 sin header o key invalida. **TLS:** Traefik en VPS organic-machine.com con LE cert auto. Subdominio `agents.organic-machine.com` (DNS A record nuevo → IP del VPS). Ruta Traefik `agents.organic-machine.com → 127.0.0.1:8487`. **SSE in-memory pubsub.** NATS OFF de momento (1 cliente local, broker = overhead). Documentar TODO en app.md para anadir bus si llega 2do consumidor. ## Scope v0.1 (lean) | Verbo | Path | Wrap | |---|---|---| | GET | `/health` | 200 OK sin auth (liveness) | | GET | `/agents` | `Scan` + `StatusAll` + `msg_count_24h` (query `shell/memory/*.db`) | | GET | `/agents/{id}` | detail + config + `LogTail(200)` | | POST | `/agents/{id}/start` | `Manager.Start` | | POST | `/agents/{id}/stop` | `Manager.Stop` | | POST | `/agents/{id}/restart` | Stop+Start con espera health | | GET | `/agents/{id}/logs?n=200` | `LogTail` snapshot | **SSE:** - `GET /sse/status` — broadcast cambios de status (poll cada 2s + diff) - `GET /sse/agents/{id}/logs` — tail -f del logfile, emite line events **Fuera de scope v0.1** (queda v0.2): - POST `/agents/{id}/message` (send Matrix message) - PUT `/agents/{id}/config` (config edit) - SSE messages stream ## Tareas 1. **Nuevo paquete `internal/api`** con server HTTP (stdlib `net/http`, sin gin/echo). - `api.New(mgr *process.Manager, apiKey string, port int) *Server` - `Server.Run(ctx) error` arranca y bloquea hasta ctx done. - Middleware: log + auth + recover. 2. **Handlers REST** sobre `process.Manager`. Tests unitarios con mock manager. 3. **SSE pubsub in-memory** (`internal/api/pubsub.go`): - `Bus` con `Subscribe(topic) <-chan event` + `Publish(topic, event)`. - Poller goroutine que llama `StatusAll` cada 2s y publica diffs. - Tail goroutine por logfile (`file_tail_follow` — buscar en registry o crear). 4. **Integrar en launcher** — `cmd/launcher/main.go` arranca `api.Server` en goroutine si `--api-port > 0`. 5. **Crear systemd unit** `/etc/systemd/system/agents_and_robots.service` con `Restart=always`, `EnvironmentFile=.env`, `ExecStart=.../bin/launcher --log-level info --api-port 8487`. 6. **Traefik route + DNS:** - Anadir `agents.organic-machine.com` en DNS (A record). - Anadir config Traefik (label en docker-compose del stack o file provider) apuntando a `127.0.0.1:8487`. 7. **Fix drift app.md** — `runtime: systemd-system` ahora es verdad. Verificar con `fn doctor services-spec`. 8. **Tests:** - Go: pkg `internal/api` con httptest. - e2e: `e2e_checks` en `app.md` con curl smoke. 9. **Deploy:** - `rsync_deploy_bash_infra` o `deploy_server` target nuevo. - Generar `AGENTS_API_KEY` con `openssl rand -hex 32` y escribir `.env` remoto. - `systemctl enable --now agents_and_robots.service`. ## Funciones del registry a usar / proponer Buscar antes de codear: - `mcp__registry__fn_search query="tail follow file" lang="go"` — ¿existe `file_tail_follow_go_infra`? Si no, delegar a fn-constructor. - `mcp__registry__fn_search query="http auth bearer" lang="go"` — middleware auth. - `mcp__registry__fn_search query="sse server" lang="go"` — helper SSE. - `systemd_generate_unit_go_infra` + `systemd_install_go_infra` — generar/instalar unit. ## Acceptance - [ ] `curl -fsS -H 'Authorization: Bearer $KEY' https://agents.organic-machine.com/agents` devuelve lista correcta. - [ ] Sin header → 401. Con key invalida → 401. Key valida → 200. - [ ] Start/Stop/Restart cambian estado real del proceso (verificable con `ps`). - [ ] SSE logs entrega lineas en menos de 1s de aparecer en el archivo. - [ ] SSE status broadcast tras stop/start manual. - [ ] systemd unit activo y reinicia tras kill -9. - [ ] `fn doctor services-spec` reporta OK. - [ ] Tests Go pasan. ## DoD humano - **Donde:** terminal local → `curl https://agents.organic-machine.com/agents`. SSE verificable con `curl -N`. - **Latencia:** SSE log lag < 1s. REST list < 200ms. - **Onboarding:** README de agents_and_robots actualizado con seccion "HTTP API" + ejemplos curl. ## Riesgos - DNS propagation puede tardar (configurar con TTL bajo). - Traefik en este VPS: verificar si esta gestionado por Coolify o standalone — anadir ruta donde corresponda. - `LogTail` actual solo lee snapshot — necesitamos `tail -f` real para SSE. Si no existe en el registry, ronda previa.