b9716a7cd6
Snapshot de WIP acumulado de sesiones previas antes de merge wave 1 del flow 0008 (kanban_cpp + agent_runner_api + DoD schema). Incluye: - dev/flows/0008-kanban-cpp-and-agent-workflows.md - dev/issues/0112-0119*.md (7 sub-issues) - WIP previo en cmd/fn/doctor.go, registry/*, modules/, cpp/, etc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
114 lines
5.5 KiB
Markdown
114 lines
5.5 KiB
Markdown
---
|
|
id: "0105"
|
|
title: "Estandarizar bloque service: en app.md + indexer + fn doctor services-spec"
|
|
status: in-progress
|
|
type: feature
|
|
domain:
|
|
- meta
|
|
- apps-infra
|
|
- deploy
|
|
- telemetry
|
|
scope: multi-app
|
|
priority: alta
|
|
depends: []
|
|
blocks:
|
|
- "0106"
|
|
related:
|
|
- "0085"
|
|
- "0086"
|
|
created: 2026-05-17
|
|
updated: 2026-05-17
|
|
tags: [services, monitoring, frontmatter, indexer, fn-doctor, pc-locations]
|
|
---
|
|
|
|
# 0105 — Estandarizar `service:` en app.md
|
|
|
|
## Problema
|
|
|
|
Diagnostico (2026-05-17): `sqlite_api` cayo 20h sin alerta. Causa: nadie monitoriza. Causa-de-causa: no hay forma uniforme de saber "esta app DEBE estar corriendo en este PC con este puerto y este health endpoint".
|
|
|
|
Hoy:
|
|
- 10 apps con `tag: service` en `registry.db`.
|
|
- 8/10 con `systemctl active=inactive` segun `fn doctor services` (algunas porque viven solo en remoto, otras porque genuinamente murieron).
|
|
- `port` se descubre por `--port` en `ExecStart` de un unit file que puede o no existir local.
|
|
- `health_endpoint` solo declarado en `deploy_server/operations.db` para 1 target (registry_api).
|
|
- `systemd_unit` se asume = `<name>.service`, no documentado.
|
|
- `pc_targets` (en que PCs DEBE correr) no existe en ninguna parte.
|
|
|
|
Consecuencia: imposible escribir un monitor que reconcilie "esperado vs real" sin hardcodear cada app.
|
|
|
|
## Decision
|
|
|
|
Anadir bloque `service:` opcional al frontmatter de `app.md`. Obligatorio para apps con `tag: service`. Indexer parsea y persiste. `fn doctor services-spec` audita.
|
|
|
|
## Schema del bloque
|
|
|
|
```yaml
|
|
service:
|
|
# Endpoints HTTP (opcional — apps stdio/daemon dejan null o omiten)
|
|
port: 8484
|
|
health_endpoint: /api/health # ruta GET, 200 == sano
|
|
health_timeout_s: 3
|
|
|
|
# Identidad systemd (cuando aplica)
|
|
systemd_unit: sqlite_api.service # nombre exacto
|
|
systemd_scope: user # user|system|null (docker-compose)
|
|
restart_policy: always # always|on-failure|none
|
|
|
|
# Estrategia de runtime (extiende systemd_scope para casos no-systemd)
|
|
runtime: systemd-user # systemd-user|systemd-system|docker-compose|stdio|manual
|
|
|
|
# Donde DEBE correr — referencia pc_locations.pc_id
|
|
pc_targets:
|
|
- aurgi-pc
|
|
- home-wsl
|
|
|
|
# Banderas
|
|
is_local_only: false # true => no se monitoriza por SSH; siempre via mecanismo local
|
|
```
|
|
|
|
Reglas:
|
|
- `port` null si la app no expone HTTP (stdio MCP, daemons sin API).
|
|
- `health_endpoint` null si no hay http; monitor cae a check de proceso (systemd active + port listening).
|
|
- `pc_targets` LISTA de `pc_id` de `pc_locations`. Vacia => no se monitoriza.
|
|
- `runtime: docker-compose` => monitor chequea contenedores via `docker compose ps` por SSH al PC target.
|
|
- `is_local_only: true` => monitor solo se ejecuta en el PC donde corre el daemon (no se intenta SSH al propio host).
|
|
|
|
## Tareas
|
|
|
|
- [x] Auditar 10 services existentes (port real, unit name, descripcion)
|
|
- [ ] Editar 10 app.md con bloque `service:` realista
|
|
- [ ] Migration: anadir columnas a tabla `apps` (`port INTEGER`, `health_endpoint TEXT`, `health_timeout_s INTEGER`, `systemd_unit TEXT`, `systemd_scope TEXT`, `restart_policy TEXT`, `runtime TEXT`, `is_local_only INTEGER`)
|
|
- [ ] Migration: nueva tabla `service_targets (app_id TEXT, pc_id TEXT, role TEXT DEFAULT 'primary', PRIMARY KEY(app_id, pc_id))`
|
|
- [ ] Indexer: parsear bloque `service:` desde frontmatter y rellenar columnas + `service_targets`
|
|
- [ ] `fn doctor services-spec` (Go func + subcommand): lista apps con `tag: service` y bloque incompleto. Salida tabwriter + `--json`
|
|
- [ ] Test: `fn index` sobre fixture con bloque service produce filas correctas
|
|
- [ ] Fix retroactivo: `~/.config/systemd/user/sqlite_api.service` con `Restart=always` (no `on-failure` — TERM no es failure)
|
|
|
|
## Materia: 10 apps actuales
|
|
|
|
| App | dir | port | health | unit | scope | pc_targets | runtime |
|
|
|---|---|---|---|---|---|---|---|
|
|
| sqlite_api | projects/fn_monitoring/apps/sqlite_api | 8484 | /api/status | sqlite_api.service | user | aurgi-pc, home-wsl | systemd-user |
|
|
| dag_engine | apps/dag_engine | 8090 | /api/dags | dag_engine.service | user | aurgi-pc, home-wsl | systemd-user |
|
|
| call_monitor | projects/fn_monitoring/apps/call_monitor | null | null | call_monitor.service | user | aurgi-pc, home-wsl | systemd-user |
|
|
| kanban | apps/kanban | 8095 | /api/board | kanban.service | user | aurgi-pc | systemd-user |
|
|
| deploy_server | apps/deploy_server | 9090 | /api/health | deploy_server.service | user | aurgi-pc | systemd-user |
|
|
| registry_mcp | apps/registry_mcp | null | null | registry_mcp.service | user | aurgi-pc | stdio (manual) |
|
|
| registry_api | apps/registry_api | 8420 | /api/status | null | null | organic-machine.com | docker-compose |
|
|
| footprint_geo_stack | apps/footprint_geo_stack | 3000 | null | null | null | aurgi-pc | docker-compose |
|
|
| element_matrix_chat | projects/element_agents/apps/element_matrix_chat | null | null | null | null | organic-machine.com | docker-compose |
|
|
| agents_and_robots | projects/element_agents/apps/agents_and_robots | null | null | agents_and_robots.service | system | organic-machine.com | systemd-remote |
|
|
|
|
## DoD
|
|
|
|
- 10 app.md con bloque `service:` valido (parseable, valores reales).
|
|
- `fn index` puebla `apps.port/...` y `service_targets`.
|
|
- `fn doctor services-spec` reporta `OK` para los 10.
|
|
- Migration aplica idempotente en `registry.db` de aurgi-pc + home-wsl.
|
|
- `services_status_go_infra` extendida para leer datos del nuevo schema (no hardcoded port discovery).
|
|
|
|
## Bloquea
|
|
|
|
- 0106: app `services_monitor` (UI + backend `services_api`). Necesita `service_targets` + `apps.port`/`health_endpoint` poblados.
|