Files
fn_registry/dev/issues/0105-service-frontmatter-standardization.md
T
egutierrez b9716a7cd6 chore: snapshot WIP previo + flow 0008 + 7 sub-issues (0112-0119)
Snapshot de WIP acumulado de sesiones previas antes de merge wave 1
del flow 0008 (kanban_cpp + agent_runner_api + DoD schema).

Incluye:
- dev/flows/0008-kanban-cpp-and-agent-workflows.md
- dev/issues/0112-0119*.md (7 sub-issues)
- WIP previo en cmd/fn/doctor.go, registry/*, modules/, cpp/, etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:17:08 +02:00

114 lines
5.5 KiB
Markdown

---
id: "0105"
title: "Estandarizar bloque service: en app.md + indexer + fn doctor services-spec"
status: in-progress
type: feature
domain:
- meta
- apps-infra
- deploy
- telemetry
scope: multi-app
priority: alta
depends: []
blocks:
- "0106"
related:
- "0085"
- "0086"
created: 2026-05-17
updated: 2026-05-17
tags: [services, monitoring, frontmatter, indexer, fn-doctor, pc-locations]
---
# 0105 — Estandarizar `service:` en app.md
## Problema
Diagnostico (2026-05-17): `sqlite_api` cayo 20h sin alerta. Causa: nadie monitoriza. Causa-de-causa: no hay forma uniforme de saber "esta app DEBE estar corriendo en este PC con este puerto y este health endpoint".
Hoy:
- 10 apps con `tag: service` en `registry.db`.
- 8/10 con `systemctl active=inactive` segun `fn doctor services` (algunas porque viven solo en remoto, otras porque genuinamente murieron).
- `port` se descubre por `--port` en `ExecStart` de un unit file que puede o no existir local.
- `health_endpoint` solo declarado en `deploy_server/operations.db` para 1 target (registry_api).
- `systemd_unit` se asume = `<name>.service`, no documentado.
- `pc_targets` (en que PCs DEBE correr) no existe en ninguna parte.
Consecuencia: imposible escribir un monitor que reconcilie "esperado vs real" sin hardcodear cada app.
## Decision
Anadir bloque `service:` opcional al frontmatter de `app.md`. Obligatorio para apps con `tag: service`. Indexer parsea y persiste. `fn doctor services-spec` audita.
## Schema del bloque
```yaml
service:
# Endpoints HTTP (opcional — apps stdio/daemon dejan null o omiten)
port: 8484
health_endpoint: /api/health # ruta GET, 200 == sano
health_timeout_s: 3
# Identidad systemd (cuando aplica)
systemd_unit: sqlite_api.service # nombre exacto
systemd_scope: user # user|system|null (docker-compose)
restart_policy: always # always|on-failure|none
# Estrategia de runtime (extiende systemd_scope para casos no-systemd)
runtime: systemd-user # systemd-user|systemd-system|docker-compose|stdio|manual
# Donde DEBE correr — referencia pc_locations.pc_id
pc_targets:
- aurgi-pc
- home-wsl
# Banderas
is_local_only: false # true => no se monitoriza por SSH; siempre via mecanismo local
```
Reglas:
- `port` null si la app no expone HTTP (stdio MCP, daemons sin API).
- `health_endpoint` null si no hay http; monitor cae a check de proceso (systemd active + port listening).
- `pc_targets` LISTA de `pc_id` de `pc_locations`. Vacia => no se monitoriza.
- `runtime: docker-compose` => monitor chequea contenedores via `docker compose ps` por SSH al PC target.
- `is_local_only: true` => monitor solo se ejecuta en el PC donde corre el daemon (no se intenta SSH al propio host).
## Tareas
- [x] Auditar 10 services existentes (port real, unit name, descripcion)
- [ ] Editar 10 app.md con bloque `service:` realista
- [ ] Migration: anadir columnas a tabla `apps` (`port INTEGER`, `health_endpoint TEXT`, `health_timeout_s INTEGER`, `systemd_unit TEXT`, `systemd_scope TEXT`, `restart_policy TEXT`, `runtime TEXT`, `is_local_only INTEGER`)
- [ ] Migration: nueva tabla `service_targets (app_id TEXT, pc_id TEXT, role TEXT DEFAULT 'primary', PRIMARY KEY(app_id, pc_id))`
- [ ] Indexer: parsear bloque `service:` desde frontmatter y rellenar columnas + `service_targets`
- [ ] `fn doctor services-spec` (Go func + subcommand): lista apps con `tag: service` y bloque incompleto. Salida tabwriter + `--json`
- [ ] Test: `fn index` sobre fixture con bloque service produce filas correctas
- [ ] Fix retroactivo: `~/.config/systemd/user/sqlite_api.service` con `Restart=always` (no `on-failure` — TERM no es failure)
## Materia: 10 apps actuales
| App | dir | port | health | unit | scope | pc_targets | runtime |
|---|---|---|---|---|---|---|---|
| sqlite_api | projects/fn_monitoring/apps/sqlite_api | 8484 | /api/status | sqlite_api.service | user | aurgi-pc, home-wsl | systemd-user |
| dag_engine | apps/dag_engine | 8090 | /api/dags | dag_engine.service | user | aurgi-pc, home-wsl | systemd-user |
| call_monitor | projects/fn_monitoring/apps/call_monitor | null | null | call_monitor.service | user | aurgi-pc, home-wsl | systemd-user |
| kanban | apps/kanban | 8095 | /api/board | kanban.service | user | aurgi-pc | systemd-user |
| deploy_server | apps/deploy_server | 9090 | /api/health | deploy_server.service | user | aurgi-pc | systemd-user |
| registry_mcp | apps/registry_mcp | null | null | registry_mcp.service | user | aurgi-pc | stdio (manual) |
| registry_api | apps/registry_api | 8420 | /api/status | null | null | organic-machine.com | docker-compose |
| footprint_geo_stack | apps/footprint_geo_stack | 3000 | null | null | null | aurgi-pc | docker-compose |
| element_matrix_chat | projects/element_agents/apps/element_matrix_chat | null | null | null | null | organic-machine.com | docker-compose |
| agents_and_robots | projects/element_agents/apps/agents_and_robots | null | null | agents_and_robots.service | system | organic-machine.com | systemd-remote |
## DoD
- 10 app.md con bloque `service:` valido (parseable, valores reales).
- `fn index` puebla `apps.port/...` y `service_targets`.
- `fn doctor services-spec` reporta `OK` para los 10.
- Migration aplica idempotente en `registry.db` de aurgi-pc + home-wsl.
- `services_status_go_infra` extendida para leer datos del nuevo schema (no hardcoded port discovery).
## Bloquea
- 0106: app `services_monitor` (UI + backend `services_api`). Necesita `service_targets` + `apps.port`/`health_endpoint` poblados.