cc1e88fe55
Both issues delivered end-to-end: 0128 (backend, merged via dataforge/agents_and_robots/pulls/1): - HTTP daemon in cmd/launcher with apikey Bearer auth + SSE - LIVE at https://agents.organic-machine.com via Coolify Traefik + LE cert - systemd Restart=always - Unified status autodetect fix applied 0129 (frontend, merged via dataforge/agents_dashboard/pulls/1): - C++ ImGui app in projects/element_agents/apps/agents_dashboard - 4 panels: Connection / Agents / Logs / Status - secret_store_cpp_infra new function (DPAPI Windows / XOR Linux) - Deployed to Windows Desktop, App Hub tarjeta visible Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
153 lines
6.9 KiB
Markdown
153 lines
6.9 KiB
Markdown
---
|
|
id: "0128"
|
|
title: "agents_and_robots: HTTP API + SSE + apikey + TLS subdominio"
|
|
status: pendiente
|
|
type: feature
|
|
domain:
|
|
- agents
|
|
- infra
|
|
- deploy
|
|
scope: app
|
|
priority: alta
|
|
depends: []
|
|
blocks:
|
|
- "0129"
|
|
related: []
|
|
created: 2026-05-22
|
|
updated: 2026-05-22
|
|
tags: [agents_and_robots, http, sse, apikey, traefik, systemd]
|
|
dod_evidence_schema:
|
|
- id: build_ok
|
|
kind: cmd
|
|
expected: "cd projects/element_agents/apps/agents_and_robots && go build -tags goolm ./cmd/launcher → exit 0"
|
|
required: true
|
|
- id: api_list_authorized
|
|
kind: cmd
|
|
expected: "curl -fsS -H 'Authorization: Bearer $AGENTS_API_KEY' https://agents.organic-machine.com/agents devuelve JSON con N>=7 agentes"
|
|
required: true
|
|
- id: api_list_unauthorized_401
|
|
kind: cmd
|
|
expected: "curl -s -o /dev/null -w '%{http_code}' https://agents.organic-machine.com/agents == 401"
|
|
required: true
|
|
- id: api_start_stop_roundtrip
|
|
kind: cmd
|
|
expected: "POST /agents/test-bot/stop → POST /agents/test-bot/start: status running confirmado via GET /agents/test-bot tras 2s"
|
|
required: true
|
|
- id: sse_logs_streaming
|
|
kind: cmd
|
|
expected: "curl -N -H 'Authorization: Bearer $KEY' https://agents.organic-machine.com/sse/agents/assistant-bot/logs entrega >=1 line en 5s con agente activo"
|
|
required: true
|
|
- id: sse_status_broadcast
|
|
kind: cmd
|
|
expected: "curl -N /sse/status recibe evento {agent_id, old_status, new_status} tras stop/start manual"
|
|
required: true
|
|
- id: systemd_active
|
|
kind: cmd
|
|
expected: "ssh organic-machine.com 'systemctl is-active agents_and_robots.service' == active"
|
|
required: true
|
|
- id: traefik_route
|
|
kind: url
|
|
expected: "agents.organic-machine.com resuelve y devuelve cert LE valido (curl -vI muestra subject CN=agents.organic-machine.com)"
|
|
required: true
|
|
- id: app_md_drift_fixed
|
|
kind: cmd
|
|
expected: "fn doctor services-spec apps/element_agents/apps/agents_and_robots reporta OK (sin drift runtime/systemd)"
|
|
required: true
|
|
---
|
|
|
|
# 0128 — agents_and_robots HTTP API + SSE + apikey + TLS
|
|
|
|
## Contexto
|
|
|
|
Hoy `agents_and_robots` solo expone control via `agentctl` CLI local (filesystem-based, `shell/process.Manager`). No hay forma remota de gestionar agentes.
|
|
|
|
Necesitamos backend HTTP seguro para que un frontend local C++ (issue 0129) pueda listar, start/stop/restart agentes, y streamear logs/status en vivo.
|
|
|
|
## Decision
|
|
|
|
**Integrar daemon HTTP DENTRO de `cmd/launcher`** como goroutine. Comparte `process.Manager` + acceso a `shell/memory/*.db` + Matrix clients. Un solo proceso, sin drift entre daemon y supervisor.
|
|
|
|
**Auth:** `Authorization: Bearer <AGENTS_API_KEY>` con `subtle.ConstantTimeCompare`. Clave 32 bytes hex en `.env` (`AGENTS_API_KEY`). 401 sin header o key invalida.
|
|
|
|
**TLS:** Traefik en VPS organic-machine.com con LE cert auto. Subdominio `agents.organic-machine.com` (DNS A record nuevo → IP del VPS). Ruta Traefik `agents.organic-machine.com → 127.0.0.1:8487`.
|
|
|
|
**SSE in-memory pubsub.** NATS OFF de momento (1 cliente local, broker = overhead). Documentar TODO en app.md para anadir bus si llega 2do consumidor.
|
|
|
|
## Scope v0.1 (lean)
|
|
|
|
| Verbo | Path | Wrap |
|
|
|---|---|---|
|
|
| GET | `/health` | 200 OK sin auth (liveness) |
|
|
| GET | `/agents` | `Scan` + `StatusAll` + `msg_count_24h` (query `shell/memory/*.db`) |
|
|
| GET | `/agents/{id}` | detail + config + `LogTail(200)` |
|
|
| POST | `/agents/{id}/start` | `Manager.Start` |
|
|
| POST | `/agents/{id}/stop` | `Manager.Stop` |
|
|
| POST | `/agents/{id}/restart` | Stop+Start con espera health |
|
|
| GET | `/agents/{id}/logs?n=200` | `LogTail` snapshot |
|
|
|
|
**SSE:**
|
|
- `GET /sse/status` — broadcast cambios de status (poll cada 2s + diff)
|
|
- `GET /sse/agents/{id}/logs` — tail -f del logfile, emite line events
|
|
|
|
**Fuera de scope v0.1** (queda v0.2):
|
|
- POST `/agents/{id}/message` (send Matrix message)
|
|
- PUT `/agents/{id}/config` (config edit)
|
|
- SSE messages stream
|
|
|
|
## Tareas
|
|
|
|
1. **Nuevo paquete `internal/api`** con server HTTP (stdlib `net/http`, sin gin/echo).
|
|
- `api.New(mgr *process.Manager, apiKey string, port int) *Server`
|
|
- `Server.Run(ctx) error` arranca y bloquea hasta ctx done.
|
|
- Middleware: log + auth + recover.
|
|
2. **Handlers REST** sobre `process.Manager`. Tests unitarios con mock manager.
|
|
3. **SSE pubsub in-memory** (`internal/api/pubsub.go`):
|
|
- `Bus` con `Subscribe(topic) <-chan event` + `Publish(topic, event)`.
|
|
- Poller goroutine que llama `StatusAll` cada 2s y publica diffs.
|
|
- Tail goroutine por logfile (`file_tail_follow` — buscar en registry o crear).
|
|
4. **Integrar en launcher** — `cmd/launcher/main.go` arranca `api.Server` en goroutine si `--api-port > 0`.
|
|
5. **Crear systemd unit** `/etc/systemd/system/agents_and_robots.service` con `Restart=always`, `EnvironmentFile=.env`, `ExecStart=.../bin/launcher --log-level info --api-port 8487`.
|
|
6. **Traefik route + DNS:**
|
|
- Anadir `agents.organic-machine.com` en DNS (A record).
|
|
- Anadir config Traefik (label en docker-compose del stack o file provider) apuntando a `127.0.0.1:8487`.
|
|
7. **Fix drift app.md** — `runtime: systemd-system` ahora es verdad. Verificar con `fn doctor services-spec`.
|
|
8. **Tests:**
|
|
- Go: pkg `internal/api` con httptest.
|
|
- e2e: `e2e_checks` en `app.md` con curl smoke.
|
|
9. **Deploy:**
|
|
- `rsync_deploy_bash_infra` o `deploy_server` target nuevo.
|
|
- Generar `AGENTS_API_KEY` con `openssl rand -hex 32` y escribir `.env` remoto.
|
|
- `systemctl enable --now agents_and_robots.service`.
|
|
|
|
## Funciones del registry a usar / proponer
|
|
|
|
Buscar antes de codear:
|
|
|
|
- `mcp__registry__fn_search query="tail follow file" lang="go"` — ¿existe `file_tail_follow_go_infra`? Si no, delegar a fn-constructor.
|
|
- `mcp__registry__fn_search query="http auth bearer" lang="go"` — middleware auth.
|
|
- `mcp__registry__fn_search query="sse server" lang="go"` — helper SSE.
|
|
- `systemd_generate_unit_go_infra` + `systemd_install_go_infra` — generar/instalar unit.
|
|
|
|
## Acceptance
|
|
|
|
- [ ] `curl -fsS -H 'Authorization: Bearer $KEY' https://agents.organic-machine.com/agents` devuelve lista correcta.
|
|
- [ ] Sin header → 401. Con key invalida → 401. Key valida → 200.
|
|
- [ ] Start/Stop/Restart cambian estado real del proceso (verificable con `ps`).
|
|
- [ ] SSE logs entrega lineas en menos de 1s de aparecer en el archivo.
|
|
- [ ] SSE status broadcast tras stop/start manual.
|
|
- [ ] systemd unit activo y reinicia tras kill -9.
|
|
- [ ] `fn doctor services-spec` reporta OK.
|
|
- [ ] Tests Go pasan.
|
|
|
|
## DoD humano
|
|
|
|
- **Donde:** terminal local → `curl https://agents.organic-machine.com/agents`. SSE verificable con `curl -N`.
|
|
- **Latencia:** SSE log lag < 1s. REST list < 200ms.
|
|
- **Onboarding:** README de agents_and_robots actualizado con seccion "HTTP API" + ejemplos curl.
|
|
|
|
## Riesgos
|
|
|
|
- DNS propagation puede tardar (configurar con TTL bajo).
|
|
- Traefik en este VPS: verificar si esta gestionado por Coolify o standalone — anadir ruta donde corresponda.
|
|
- `LogTail` actual solo lee snapshot — necesitamos `tail -f` real para SSE. Si no existe en el registry, ronda previa.
|