# pkg/tools/devicemesh Tool registry framework that lets an LLM agent in `agents_and_robots` (VPS) call capabilities exposed by a remote `device_agent` over the WireGuard mesh. Issue: [0144a](../../../dev/issues/0144-agent-per-machine-llm.md) (POC for the broader 0144 spec). ## What it does ``` LLM (Claude) │ tool_call exec {argv:["ls","/tmp"]} ▼ ToolRegistry.Call("exec", input) │ 1. ValidateInput against tool's InputSchema │ 2. ArgMapping(input) → device-facing args │ 3. Client.Call(CapabilityRequest{capability: "shell.exec", args}) │ 4. ResultMapping(resp.Result) → LLM-facing output ▼ HTTP POST http://10.42.0.10:7474/capability (over mesh WG) ▼ device_agent on home-wsl runs the binary, returns audit_hash + result ``` The LLM never sees the HTTP layer; it sees a flat list of named tools with JSON-Schema inputs. ## Pieces | File | Purpose | |---|---| | `client.go` | HTTP client to `POST /capability` and `GET /health` of the remote `device_agent`. Generates `request_id` (req_<12bytehex>) and `nonce` (16 random bytes base64) when missing. | | `types.go` | `ToolSpec` + `ToolRegistry`. Thread-safe registry, `Call` is the single dispatch entry point. | | `schema.go` | Mini JSON-Schema validator (object/array/string/integer/number/boolean + required + additionalProperties + enum). Enough to reject LLM mistakes without pulling a heavy dep. | | `tools_builtin.go` | The standard catalog: exec, shell.eval, fs.read, fs.write, fs.list, fs.stat, git.clone, git.commit, git.push, pkg.install, pkg.search, proc.list, proc.kill, docker.list, docker.exec, docker.logs. `RegisterBuiltins(reg, ModeUser|ModeSudo|ModeAll)` filters by `RequiresApproval`. `shell.eval` is special-cased to be registered in BOTH modes, with `RequiresApproval=true` forced in `ModeSudo` via `withApprovalRequired`. | ## How to register a new tool ```go import "github.com/enmanuel/agents/pkg/tools/devicemesh" reg.Register(devicemesh.ToolSpec{ Name: "screenshot", Description: "Capture the display on the remote device. Returns PNG base64.", Capability: "display.capture", InputSchema: map[string]any{ "type": "object", "additionalProperties": false, "properties": map[string]any{ "format": map[string]any{"type": "string", "enum": []any{"png", "jpeg"}}, }, }, ArgMapping: func(in map[string]any) (map[string]any, error) { // pure transform LLM → device return in, nil }, ResultMapping: func(r map[string]any) (any, error) { // pure transform device → LLM return r, nil }, RequiresApproval: false, // user-scope }) ``` Then add the tool name to `cfg.DeviceMesh.ToolsAllowed` in the agent's `config.yaml`. ## Wiring (issue 0144c — done) The launcher now constructs the device mesh registry from `cfg.DeviceMesh` and surfaces every spec as a regular `tools.Tool` consumed by the existing LLM tool-use loop. No special LLM path; the LLM does not know (or care) that the tool's `Exec` ends up making an HTTP call over WireGuard. ``` config.AgentConfig.DeviceMesh (yaml block) │ ▼ buildDeviceMeshRegistry(cfg, logger) ← devagents/registry_build.go │ 1. resolve URL (env var override wins when present + non-empty) │ 2. NewClient(url) + apply Timeout │ 3. RegisterBuiltins(reg, mode) ← user | sudo | all │ 4. FilterByAllowed(reg, tools_allowed) │ ▼ devicemesh.ToolsForLLM(reg) ← pkg/tools/devicemesh/adapter.go │ 1 tools.Tool per spec; Def.Parameters │ compressed from JSON-Schema; Exec │ closure routes through reg.Call │ ▼ tools.Registry.Register(...) ← devagents/registry_build.go │ ▼ devagents/llm.go runLLM tool-use loop ← unchanged ``` The same `*ToolRegistry` is also passed to `effects.NewRunnerWithDeviceMesh` so any rule that emits `decision.ActionKindDeviceMesh` (orchestrator pipelines, `!exec` builtin command, etc.) hits the same dispatcher. Both paths produce the same JSON envelope, so audit chains line up regardless of where the call originated. ### Config block The agent's `config.yaml` opts in via: ```yaml device_mesh: enabled: true device_id: home-wsl # logged as audit context; aliased as "host" mode: user # user | sudo | all device_agent_url: "http://10.42.0.10:7474" device_agent_url_env: AGENT_HOME_WSL_DEVICE_MESH_URL # optional; wins when set + non-empty manifest_id: manifest_home-wsl_v1 # metadata only; the device enforces client_timeout_s: 60 # aliased as "timeout_seconds" tools_allowed: # whitelist; empty = keep everything mode allowed - exec - fs.read - fs.list ``` Names in `tools_allowed` that the catalog does not provide are logged with a `WARN device_mesh tools_allowed lists unknown tool` and dropped. The template ships extras like `project.create`, `memory.recall`, etc. that arrive in 0144d/e — they degrade gracefully today. ### LLM-side view of a device tool The adapter compresses the device-mesh `InputSchema` into the flatter `tools.Def.Parameters` shape (each top-level property becomes one `tools.Param`). The description is enriched with a stable marker so the model can spot remote tools at a glance: ``` exec → "Execute a command on the remote device. argv is parsed as exec.Command (NO shell). ... [device_mesh: shell.exec]" pkg.install → "Install an OS package ... [device_mesh: pkg.install] (approval required)" ``` When `RequiresApproval=true`, the marker also reminds the model the call may be queued, which feeds back into the system prompt rules of `agent--sudo`. ### Approval flow + LLM tool-result mapping When the device_agent returns `approval_status="queued"` and the operator does not click 👍 within the timeout (0134 §6.5), the device returns `approval_status="timeout"` or `ok=false, error="approval_required"`. The adapter does NOT silence this — it surfaces the error verbatim: ``` ToolRegistry.Call(...) → returns err = "devicemesh: shell.exec: approval_required" tools.Result{Err: err} runLLM → appends `role='tool'` message with `error: devicemesh: shell.exec: approval_required` LLM next iteration → can apologize to operator and ask for retry. ``` The actual approval UX (operator clicks 👍 in `#operator-approvals`) is the device_agent's responsibility (issue 0134 §6, validated end-to-end in flow 0009). Nothing new on the agents_and_robots side. ### What this issue does NOT do - **Matrix-side approval rendering** is 0144f — `!preapprove`, `!approve req_id`, pre-approval cache. - **ed25519 manifest signing** is 0144h — today the wire format is correct but unsigned. - **`call_monitor` telemetry hook** that emits `function_id = capability___` per call is 0144 §13 (separate plumbing in the audit writer). - **Cross-room correlation** (`delegate_sudo` posting to `#-sudo` and the bot copying the reply back) is its own issue (0144 main spec §3.3 + 0144c original plan — left intentionally for the room/bus layer once approval is wired). ## shell.eval — the powerful tool `shell.eval` is the **only** built-in tool that lets the LLM execute arbitrary free-form shell text on the device. Every other tool has a tightly-scoped JSON schema (paths, argv lists, container ids); `shell.eval` accepts a single string that the device hands to bash (Linux/WSL) or PowerShell (Windows) unmodified. It exists because no structured tool can cover every legal shell idiom: pipes, redirects, here-docs, `$()` expansions, complex globs, environment-aware composition. Without `shell.eval`, the LLM resorts to multi-step `exec` chains that lose fidelity (no shell metacharacters allowed in `exec`'s `argv`). With it, the LLM can ask for "give me the size of every `.log` in `/var/log` sorted desc" in one round-trip. ### Guardrails (all device-side) The flag on `ToolSpec.RequiresApproval` is metadata only. The real protections live in the `device_agent`: 1. **Hardcoded blocklist** — destructive patterns (`rm -rf /`, `dd if=/dev/...`, `mkfs`, fork-bombs `:(){:|:&};:`, `shutdown`, `reboot`, `:>/dev/sda`, ...) always reject regardless of agent role or operator. There is no override. 2. **Auto-approve whitelist** — read-only / inspection patterns (`^git `, `^ls `, `^cat `, `^grep `, `^ps `, `^uptime`, `^df `, ...) execute directly without operator prompt. The whitelist lives in the device manifest, not here. 3. **Operator approval** — anything that is neither blocked nor auto-approved returns `approval_status="queued"` in the result. The device sends an approval request to `#operator-approvals` in Element and waits up to 60s for the operator to confirm; on timeout the call returns `approval_status="timeout"` and the LLM must reword or `!retry`. The fields the LLM gets back from `shell.eval`: `stdout`, `stderr`, `exit_code`, `approval_status`, `cmd_executed` (post-normalization), `truncated` (true if output was capped), `duration_ms`. ### When the LLM should call shell.eval Use it as the **fallback** for cases none of the structured tools cover: - Pipes, redirects, sub-shells, here-docs. - One-liners that combine `find` + `xargs` + `awk`. - Quick sanity checks (`uptime && df -h`). - Composing CLI tools the agent isn't going to call enough to warrant a dedicated tool spec. Avoid it for things that *do* have a structured tool: `fs.read`, `fs.list`, `git.commit`, `docker.exec`, etc. Those have predictable JSON shapes, narrower attack surface, and richer result mapping. ### Designing manifests for user vs sudo agents `RegisterBuiltins` registers `shell.eval` in **both** `ModeUser` and `ModeSudo` because the device_agent — not the registry — decides what is safe. Recommended manifest defaults: | Agent role | `RequiresApproval` (LLM-facing metadata) | Device manifest | |---|---|---| | `agent-` (user) | `false` | Auto-approve whitelist + operator approval for anything else. Hardcoded blocklist active. | | `agent--sudo` (sudo) | `true` (forced via `withApprovalRequired`) | **Every** invocation requires explicit operator approval. No auto-approve whitelist. Hardcoded blocklist active. | The `withApprovalRequired` helper clones the spec returned by `shellEvalSpec()` and flips `RequiresApproval=true` without mutating the source, so `ModeUser` registries that re-register after a `ModeSudo` run still get the unmodified spec. See `tools_builtin.go::RegisterBuiltins` for the special-case wiring. See also: `apps/device_agent/` (where the blocklist + auto-approve whitelist + approval flow live) and issue 0144 §6.4 for the RBAC design. ## POC limitations (intentional) These are out of scope for 0144a and tracked in sibling issues: - **No retry**. A single `Call` failure surfaces immediately. The spec accepts this: tool failures go back to the LLM as a `role='tool'` error message and the LLM decides what to do (issue 0144 §7.1 reglas operativas 2). - **No pre-approval cache**. `RequiresApproval` is metadata only; the actual gate lives on the device_agent (0144 §3) and the pre-approvals table (0144f). - **No streaming**. Tools are request/response. Long-running commands (`apt-get install` of a 200MB package) block until done or timeout. Streaming for logs is its own future issue. - **No exponential backoff**. The Go HTTP client's transport defaults apply (TCP retries on connect, no per-request retry). - **No output sanitization**. The Runner formats the result as JSON; sanitization against prompt-injection payloads is 0144g. - **No telemetry to `call_monitor`**. The hook for `function_id = capability___` is part of the agent runtime wiring (0144c) — this package emits no metrics on its own. - **No manifest signing on the request side**. The Client envelope matches the 0134 §2.1 wire format but does NOT sign; manifest signing arrives in 0144h. ## Why these specific design choices - `Args map[string]any` (object) NOT `[]string` (positional). The current `device_agent` POC uses `[]string` for `shell.exec` (see `apps/device_agent/capability.go`). The 0134 protocol and 0144 spec call for object-shaped args because most capabilities (`fs.read`, `git.clone`, `docker.exec`) are not naturally positional. 0144h migrates the device_agent. - `ResultMapping` returns `any` instead of `map[string]any`. Some tools (eg the test's `echo` example) collapse their output to a string. The Runner JSON-encodes whatever comes back so the LLM always sees a stable representation. - `Capability` is a field on `ToolSpec`, not derived from `Name`. The 1:1 mapping is the common case (`fs.read` → `fs.read`), but `docker.list` → `docker.container.list` and `project.create` (future) compose multiple capabilities, so the indirection pays for itself. - Pure/impure split inside one package. `ToolSpec`, schema, mappings, registry are pure data and pure functions. Only `Client.Call` and `Client.Health` do I/O. The runtime composes them; tests substitute the Client.