Files
agents_and_robots/pkg/tools/devicemesh/README.md
T
egutierrez bcd246bf85 feat(0144a): tool registry framework para device-mesh
Anade pkg/tools/devicemesh con Client HTTP al device_agent + ToolRegistry
con 16 tools standard (exec, fs.*, git.*, docker.*, proc.*, pkg.*, shell.eval).
RegisterBuiltins filtra por mode user/sudo via RequiresApproval flag.
Hook al pkg/decision con ActionKindDeviceMesh + DeviceMeshAction.
Runner soporta dispatch via NewRunnerWithDeviceMesh (back-compat NewRunner).

Tests: 25 nuevos en devicemesh + 4 en runner. Build clean.
2026-05-24 14:07:13 +02:00

200 lines
13 KiB
Markdown

# pkg/tools/devicemesh
Tool registry framework that lets an LLM agent in `agents_and_robots` (VPS) call capabilities exposed by a remote `device_agent` over the WireGuard mesh.
Issue: [0144a](../../../dev/issues/0144-agent-per-machine-llm.md) (POC for the broader 0144 spec).
## What it does
```
LLM (Claude)
│ tool_call exec {argv:["ls","/tmp"]}
ToolRegistry.Call("exec", input)
│ 1. ValidateInput against tool's InputSchema
│ 2. ArgMapping(input) → device-facing args
│ 3. Client.Call(CapabilityRequest{capability: "shell.exec", args})
│ 4. ResultMapping(resp.Result) → LLM-facing output
HTTP POST http://10.42.0.10:7474/capability (over mesh WG)
device_agent on home-wsl runs the binary, returns audit_hash + result
```
The LLM never sees the HTTP layer; it sees a flat list of named tools with JSON-Schema inputs.
## Pieces
| File | Purpose |
|---|---|
| `client.go` | HTTP client to `POST /capability` and `GET /health` of the remote `device_agent`. Generates `request_id` (req_<12bytehex>) and `nonce` (16 random bytes base64) when missing. |
| `types.go` | `ToolSpec` + `ToolRegistry`. Thread-safe registry, `Call` is the single dispatch entry point. |
| `schema.go` | Mini JSON-Schema validator (object/array/string/integer/number/boolean + required + additionalProperties + enum). Enough to reject LLM mistakes without pulling a heavy dep. |
| `tools_builtin.go` | The standard catalog: exec, shell.eval, fs.read, fs.write, fs.list, fs.stat, git.clone, git.commit, git.push, pkg.install, pkg.search, proc.list, proc.kill, docker.list, docker.exec, docker.logs. `RegisterBuiltins(reg, ModeUser|ModeSudo|ModeAll)` filters by `RequiresApproval`. `shell.eval` is special-cased to be registered in BOTH modes, with `RequiresApproval=true` forced in `ModeSudo` via `withApprovalRequired`. |
## How to register a new tool
```go
import "github.com/enmanuel/agents/pkg/tools/devicemesh"
reg.Register(devicemesh.ToolSpec{
Name: "screenshot",
Description: "Capture the display on the remote device. Returns PNG base64.",
Capability: "display.capture",
InputSchema: map[string]any{
"type": "object",
"additionalProperties": false,
"properties": map[string]any{
"format": map[string]any{"type": "string", "enum": []any{"png", "jpeg"}},
},
},
ArgMapping: func(in map[string]any) (map[string]any, error) {
// pure transform LLM → device
return in, nil
},
ResultMapping: func(r map[string]any) (any, error) {
// pure transform device → LLM
return r, nil
},
RequiresApproval: false, // user-scope
})
```
Then add the tool name to `cfg.DeviceMesh.ToolsAllowed` in the agent's `config.yaml`.
## Wiring (issue 0144c — done)
The launcher now constructs the device mesh registry from `cfg.DeviceMesh` and surfaces every spec as a regular `tools.Tool` consumed by the existing LLM tool-use loop. No special LLM path; the LLM does not know (or care) that the tool's `Exec` ends up making an HTTP call over WireGuard.
```
config.AgentConfig.DeviceMesh (yaml block)
▼ buildDeviceMeshRegistry(cfg, logger) ← devagents/registry_build.go
│ 1. resolve URL (env var override wins when present + non-empty)
│ 2. NewClient(url) + apply Timeout
│ 3. RegisterBuiltins(reg, mode) ← user | sudo | all
│ 4. FilterByAllowed(reg, tools_allowed)
▼ devicemesh.ToolsForLLM(reg) ← pkg/tools/devicemesh/adapter.go
│ 1 tools.Tool per spec; Def.Parameters
│ compressed from JSON-Schema; Exec
│ closure routes through reg.Call
▼ tools.Registry.Register(...) ← devagents/registry_build.go
▼ devagents/llm.go runLLM tool-use loop ← unchanged
```
The same `*ToolRegistry` is also passed to `effects.NewRunnerWithDeviceMesh` so any rule that emits `decision.ActionKindDeviceMesh` (orchestrator pipelines, `!exec` builtin command, etc.) hits the same dispatcher. Both paths produce the same JSON envelope, so audit chains line up regardless of where the call originated.
### Config block
The agent's `config.yaml` opts in via:
```yaml
device_mesh:
enabled: true
device_id: home-wsl # logged as audit context; aliased as "host"
mode: user # user | sudo | all
device_agent_url: "http://10.42.0.10:7474"
device_agent_url_env: AGENT_HOME_WSL_DEVICE_MESH_URL # optional; wins when set + non-empty
manifest_id: manifest_home-wsl_v1 # metadata only; the device enforces
client_timeout_s: 60 # aliased as "timeout_seconds"
tools_allowed: # whitelist; empty = keep everything mode allowed
- exec
- fs.read
- fs.list
```
Names in `tools_allowed` that the catalog does not provide are logged with a `WARN device_mesh tools_allowed lists unknown tool` and dropped. The template ships extras like `project.create`, `memory.recall`, etc. that arrive in 0144d/e — they degrade gracefully today.
### LLM-side view of a device tool
The adapter compresses the device-mesh `InputSchema` into the flatter `tools.Def.Parameters` shape (each top-level property becomes one `tools.Param`). The description is enriched with a stable marker so the model can spot remote tools at a glance:
```
exec → "Execute a command on the remote device. argv is parsed as exec.Command (NO shell). ... [device_mesh: shell.exec]"
pkg.install → "Install an OS package ... [device_mesh: pkg.install] (approval required)"
```
When `RequiresApproval=true`, the marker also reminds the model the call may be queued, which feeds back into the system prompt rules of `agent-<host>-sudo`.
### Approval flow + LLM tool-result mapping
When the device_agent returns `approval_status="queued"` and the operator does not click 👍 within the timeout (0134 §6.5), the device returns `approval_status="timeout"` or `ok=false, error="approval_required"`. The adapter does NOT silence this — it surfaces the error verbatim:
```
ToolRegistry.Call(...) → returns err = "devicemesh: shell.exec: approval_required"
tools.Result{Err: err}
runLLM → appends `role='tool'` message with `error: devicemesh: shell.exec: approval_required`
LLM next iteration → can apologize to operator and ask for retry.
```
The actual approval UX (operator clicks 👍 in `#operator-approvals`) is the device_agent's responsibility (issue 0134 §6, validated end-to-end in flow 0009). Nothing new on the agents_and_robots side.
### What this issue does NOT do
- **Matrix-side approval rendering** is 0144f — `!preapprove`, `!approve req_id`, pre-approval cache.
- **ed25519 manifest signing** is 0144h — today the wire format is correct but unsigned.
- **`call_monitor` telemetry hook** that emits `function_id = capability_<name>_<lang>_<domain>` per call is 0144 §13 (separate plumbing in the audit writer).
- **Cross-room correlation** (`delegate_sudo` posting to `#<host>-sudo` and the bot copying the reply back) is its own issue (0144 main spec §3.3 + 0144c original plan — left intentionally for the room/bus layer once approval is wired).
## shell.eval — the powerful tool
`shell.eval` is the **only** built-in tool that lets the LLM execute arbitrary free-form shell text on the device. Every other tool has a tightly-scoped JSON schema (paths, argv lists, container ids); `shell.eval` accepts a single string that the device hands to bash (Linux/WSL) or PowerShell (Windows) unmodified.
It exists because no structured tool can cover every legal shell idiom: pipes, redirects, here-docs, `$()` expansions, complex globs, environment-aware composition. Without `shell.eval`, the LLM resorts to multi-step `exec` chains that lose fidelity (no shell metacharacters allowed in `exec`'s `argv`). With it, the LLM can ask for "give me the size of every `.log` in `/var/log` sorted desc" in one round-trip.
### Guardrails (all device-side)
The flag on `ToolSpec.RequiresApproval` is metadata only. The real protections live in the `device_agent`:
1. **Hardcoded blocklist** — destructive patterns (`rm -rf /`, `dd if=/dev/...`, `mkfs`, fork-bombs `:(){:|:&};:`, `shutdown`, `reboot`, `:>/dev/sda`, ...) always reject regardless of agent role or operator. There is no override.
2. **Auto-approve whitelist** — read-only / inspection patterns (`^git `, `^ls `, `^cat `, `^grep `, `^ps `, `^uptime`, `^df `, ...) execute directly without operator prompt. The whitelist lives in the device manifest, not here.
3. **Operator approval** — anything that is neither blocked nor auto-approved returns `approval_status="queued"` in the result. The device sends an approval request to `#operator-approvals` in Element and waits up to 60s for the operator to confirm; on timeout the call returns `approval_status="timeout"` and the LLM must reword or `!retry`.
The fields the LLM gets back from `shell.eval`: `stdout`, `stderr`, `exit_code`, `approval_status`, `cmd_executed` (post-normalization), `truncated` (true if output was capped), `duration_ms`.
### When the LLM should call shell.eval
Use it as the **fallback** for cases none of the structured tools cover:
- Pipes, redirects, sub-shells, here-docs.
- One-liners that combine `find` + `xargs` + `awk`.
- Quick sanity checks (`uptime && df -h`).
- Composing CLI tools the agent isn't going to call enough to warrant a dedicated tool spec.
Avoid it for things that *do* have a structured tool: `fs.read`, `fs.list`, `git.commit`, `docker.exec`, etc. Those have predictable JSON shapes, narrower attack surface, and richer result mapping.
### Designing manifests for user vs sudo agents
`RegisterBuiltins` registers `shell.eval` in **both** `ModeUser` and `ModeSudo` because the device_agent — not the registry — decides what is safe. Recommended manifest defaults:
| Agent role | `RequiresApproval` (LLM-facing metadata) | Device manifest |
|---|---|---|
| `agent-<host>` (user) | `false` | Auto-approve whitelist + operator approval for anything else. Hardcoded blocklist active. |
| `agent-<host>-sudo` (sudo) | `true` (forced via `withApprovalRequired`) | **Every** invocation requires explicit operator approval. No auto-approve whitelist. Hardcoded blocklist active. |
The `withApprovalRequired` helper clones the spec returned by `shellEvalSpec()` and flips `RequiresApproval=true` without mutating the source, so `ModeUser` registries that re-register after a `ModeSudo` run still get the unmodified spec. See `tools_builtin.go::RegisterBuiltins` for the special-case wiring.
See also: `apps/device_agent/` (where the blocklist + auto-approve whitelist + approval flow live) and issue 0144 §6.4 for the RBAC design.
## POC limitations (intentional)
These are out of scope for 0144a and tracked in sibling issues:
- **No retry**. A single `Call` failure surfaces immediately. The spec accepts this: tool failures go back to the LLM as a `role='tool'` error message and the LLM decides what to do (issue 0144 §7.1 reglas operativas 2).
- **No pre-approval cache**. `RequiresApproval` is metadata only; the actual gate lives on the device_agent (0144 §3) and the pre-approvals table (0144f).
- **No streaming**. Tools are request/response. Long-running commands (`apt-get install` of a 200MB package) block until done or timeout. Streaming for logs is its own future issue.
- **No exponential backoff**. The Go HTTP client's transport defaults apply (TCP retries on connect, no per-request retry).
- **No output sanitization**. The Runner formats the result as JSON; sanitization against prompt-injection payloads is 0144g.
- **No telemetry to `call_monitor`**. The hook for `function_id = capability_<name>_<lang>_<domain>` is part of the agent runtime wiring (0144c) — this package emits no metrics on its own.
- **No manifest signing on the request side**. The Client envelope matches the 0134 §2.1 wire format but does NOT sign; manifest signing arrives in 0144h.
## Why these specific design choices
- `Args map[string]any` (object) NOT `[]string` (positional). The current `device_agent` POC uses `[]string` for `shell.exec` (see `apps/device_agent/capability.go`). The 0134 protocol and 0144 spec call for object-shaped args because most capabilities (`fs.read`, `git.clone`, `docker.exec`) are not naturally positional. 0144h migrates the device_agent.
- `ResultMapping` returns `any` instead of `map[string]any`. Some tools (eg the test's `echo` example) collapse their output to a string. The Runner JSON-encodes whatever comes back so the LLM always sees a stable representation.
- `Capability` is a field on `ToolSpec`, not derived from `Name`. The 1:1 mapping is the common case (`fs.read``fs.read`), but `docker.list``docker.container.list` and `project.create` (future) compose multiple capabilities, so the indirection pays for itself.
- Pure/impure split inside one package. `ToolSpec`, schema, mappings, registry are pure data and pure functions. Only `Client.Call` and `Client.Health` do I/O. The runtime composes them; tests substitute the Client.