bcd246bf85
Anade pkg/tools/devicemesh con Client HTTP al device_agent + ToolRegistry con 16 tools standard (exec, fs.*, git.*, docker.*, proc.*, pkg.*, shell.eval). RegisterBuiltins filtra por mode user/sudo via RequiresApproval flag. Hook al pkg/decision con ActionKindDeviceMesh + DeviceMeshAction. Runner soporta dispatch via NewRunnerWithDeviceMesh (back-compat NewRunner). Tests: 25 nuevos en devicemesh + 4 en runner. Build clean.
200 lines
13 KiB
Markdown
200 lines
13 KiB
Markdown
# pkg/tools/devicemesh
|
|
|
|
Tool registry framework that lets an LLM agent in `agents_and_robots` (VPS) call capabilities exposed by a remote `device_agent` over the WireGuard mesh.
|
|
|
|
Issue: [0144a](../../../dev/issues/0144-agent-per-machine-llm.md) (POC for the broader 0144 spec).
|
|
|
|
## What it does
|
|
|
|
```
|
|
LLM (Claude)
|
|
│ tool_call exec {argv:["ls","/tmp"]}
|
|
▼
|
|
ToolRegistry.Call("exec", input)
|
|
│ 1. ValidateInput against tool's InputSchema
|
|
│ 2. ArgMapping(input) → device-facing args
|
|
│ 3. Client.Call(CapabilityRequest{capability: "shell.exec", args})
|
|
│ 4. ResultMapping(resp.Result) → LLM-facing output
|
|
▼
|
|
HTTP POST http://10.42.0.10:7474/capability (over mesh WG)
|
|
▼
|
|
device_agent on home-wsl runs the binary, returns audit_hash + result
|
|
```
|
|
|
|
The LLM never sees the HTTP layer; it sees a flat list of named tools with JSON-Schema inputs.
|
|
|
|
## Pieces
|
|
|
|
| File | Purpose |
|
|
|---|---|
|
|
| `client.go` | HTTP client to `POST /capability` and `GET /health` of the remote `device_agent`. Generates `request_id` (req_<12bytehex>) and `nonce` (16 random bytes base64) when missing. |
|
|
| `types.go` | `ToolSpec` + `ToolRegistry`. Thread-safe registry, `Call` is the single dispatch entry point. |
|
|
| `schema.go` | Mini JSON-Schema validator (object/array/string/integer/number/boolean + required + additionalProperties + enum). Enough to reject LLM mistakes without pulling a heavy dep. |
|
|
| `tools_builtin.go` | The standard catalog: exec, shell.eval, fs.read, fs.write, fs.list, fs.stat, git.clone, git.commit, git.push, pkg.install, pkg.search, proc.list, proc.kill, docker.list, docker.exec, docker.logs. `RegisterBuiltins(reg, ModeUser|ModeSudo|ModeAll)` filters by `RequiresApproval`. `shell.eval` is special-cased to be registered in BOTH modes, with `RequiresApproval=true` forced in `ModeSudo` via `withApprovalRequired`. |
|
|
|
|
## How to register a new tool
|
|
|
|
```go
|
|
import "github.com/enmanuel/agents/pkg/tools/devicemesh"
|
|
|
|
reg.Register(devicemesh.ToolSpec{
|
|
Name: "screenshot",
|
|
Description: "Capture the display on the remote device. Returns PNG base64.",
|
|
Capability: "display.capture",
|
|
InputSchema: map[string]any{
|
|
"type": "object",
|
|
"additionalProperties": false,
|
|
"properties": map[string]any{
|
|
"format": map[string]any{"type": "string", "enum": []any{"png", "jpeg"}},
|
|
},
|
|
},
|
|
ArgMapping: func(in map[string]any) (map[string]any, error) {
|
|
// pure transform LLM → device
|
|
return in, nil
|
|
},
|
|
ResultMapping: func(r map[string]any) (any, error) {
|
|
// pure transform device → LLM
|
|
return r, nil
|
|
},
|
|
RequiresApproval: false, // user-scope
|
|
})
|
|
```
|
|
|
|
Then add the tool name to `cfg.DeviceMesh.ToolsAllowed` in the agent's `config.yaml`.
|
|
|
|
## Wiring (issue 0144c — done)
|
|
|
|
The launcher now constructs the device mesh registry from `cfg.DeviceMesh` and surfaces every spec as a regular `tools.Tool` consumed by the existing LLM tool-use loop. No special LLM path; the LLM does not know (or care) that the tool's `Exec` ends up making an HTTP call over WireGuard.
|
|
|
|
```
|
|
config.AgentConfig.DeviceMesh (yaml block)
|
|
│
|
|
▼ buildDeviceMeshRegistry(cfg, logger) ← devagents/registry_build.go
|
|
│ 1. resolve URL (env var override wins when present + non-empty)
|
|
│ 2. NewClient(url) + apply Timeout
|
|
│ 3. RegisterBuiltins(reg, mode) ← user | sudo | all
|
|
│ 4. FilterByAllowed(reg, tools_allowed)
|
|
│
|
|
▼ devicemesh.ToolsForLLM(reg) ← pkg/tools/devicemesh/adapter.go
|
|
│ 1 tools.Tool per spec; Def.Parameters
|
|
│ compressed from JSON-Schema; Exec
|
|
│ closure routes through reg.Call
|
|
│
|
|
▼ tools.Registry.Register(...) ← devagents/registry_build.go
|
|
│
|
|
▼ devagents/llm.go runLLM tool-use loop ← unchanged
|
|
```
|
|
|
|
The same `*ToolRegistry` is also passed to `effects.NewRunnerWithDeviceMesh` so any rule that emits `decision.ActionKindDeviceMesh` (orchestrator pipelines, `!exec` builtin command, etc.) hits the same dispatcher. Both paths produce the same JSON envelope, so audit chains line up regardless of where the call originated.
|
|
|
|
### Config block
|
|
|
|
The agent's `config.yaml` opts in via:
|
|
|
|
```yaml
|
|
device_mesh:
|
|
enabled: true
|
|
device_id: home-wsl # logged as audit context; aliased as "host"
|
|
mode: user # user | sudo | all
|
|
device_agent_url: "http://10.42.0.10:7474"
|
|
device_agent_url_env: AGENT_HOME_WSL_DEVICE_MESH_URL # optional; wins when set + non-empty
|
|
manifest_id: manifest_home-wsl_v1 # metadata only; the device enforces
|
|
client_timeout_s: 60 # aliased as "timeout_seconds"
|
|
tools_allowed: # whitelist; empty = keep everything mode allowed
|
|
- exec
|
|
- fs.read
|
|
- fs.list
|
|
```
|
|
|
|
Names in `tools_allowed` that the catalog does not provide are logged with a `WARN device_mesh tools_allowed lists unknown tool` and dropped. The template ships extras like `project.create`, `memory.recall`, etc. that arrive in 0144d/e — they degrade gracefully today.
|
|
|
|
### LLM-side view of a device tool
|
|
|
|
The adapter compresses the device-mesh `InputSchema` into the flatter `tools.Def.Parameters` shape (each top-level property becomes one `tools.Param`). The description is enriched with a stable marker so the model can spot remote tools at a glance:
|
|
|
|
```
|
|
exec → "Execute a command on the remote device. argv is parsed as exec.Command (NO shell). ... [device_mesh: shell.exec]"
|
|
pkg.install → "Install an OS package ... [device_mesh: pkg.install] (approval required)"
|
|
```
|
|
|
|
When `RequiresApproval=true`, the marker also reminds the model the call may be queued, which feeds back into the system prompt rules of `agent-<host>-sudo`.
|
|
|
|
### Approval flow + LLM tool-result mapping
|
|
|
|
When the device_agent returns `approval_status="queued"` and the operator does not click 👍 within the timeout (0134 §6.5), the device returns `approval_status="timeout"` or `ok=false, error="approval_required"`. The adapter does NOT silence this — it surfaces the error verbatim:
|
|
|
|
```
|
|
ToolRegistry.Call(...) → returns err = "devicemesh: shell.exec: approval_required"
|
|
tools.Result{Err: err}
|
|
runLLM → appends `role='tool'` message with `error: devicemesh: shell.exec: approval_required`
|
|
LLM next iteration → can apologize to operator and ask for retry.
|
|
```
|
|
|
|
The actual approval UX (operator clicks 👍 in `#operator-approvals`) is the device_agent's responsibility (issue 0134 §6, validated end-to-end in flow 0009). Nothing new on the agents_and_robots side.
|
|
|
|
### What this issue does NOT do
|
|
|
|
- **Matrix-side approval rendering** is 0144f — `!preapprove`, `!approve req_id`, pre-approval cache.
|
|
- **ed25519 manifest signing** is 0144h — today the wire format is correct but unsigned.
|
|
- **`call_monitor` telemetry hook** that emits `function_id = capability_<name>_<lang>_<domain>` per call is 0144 §13 (separate plumbing in the audit writer).
|
|
- **Cross-room correlation** (`delegate_sudo` posting to `#<host>-sudo` and the bot copying the reply back) is its own issue (0144 main spec §3.3 + 0144c original plan — left intentionally for the room/bus layer once approval is wired).
|
|
|
|
## shell.eval — the powerful tool
|
|
|
|
`shell.eval` is the **only** built-in tool that lets the LLM execute arbitrary free-form shell text on the device. Every other tool has a tightly-scoped JSON schema (paths, argv lists, container ids); `shell.eval` accepts a single string that the device hands to bash (Linux/WSL) or PowerShell (Windows) unmodified.
|
|
|
|
It exists because no structured tool can cover every legal shell idiom: pipes, redirects, here-docs, `$()` expansions, complex globs, environment-aware composition. Without `shell.eval`, the LLM resorts to multi-step `exec` chains that lose fidelity (no shell metacharacters allowed in `exec`'s `argv`). With it, the LLM can ask for "give me the size of every `.log` in `/var/log` sorted desc" in one round-trip.
|
|
|
|
### Guardrails (all device-side)
|
|
|
|
The flag on `ToolSpec.RequiresApproval` is metadata only. The real protections live in the `device_agent`:
|
|
|
|
1. **Hardcoded blocklist** — destructive patterns (`rm -rf /`, `dd if=/dev/...`, `mkfs`, fork-bombs `:(){:|:&};:`, `shutdown`, `reboot`, `:>/dev/sda`, ...) always reject regardless of agent role or operator. There is no override.
|
|
2. **Auto-approve whitelist** — read-only / inspection patterns (`^git `, `^ls `, `^cat `, `^grep `, `^ps `, `^uptime`, `^df `, ...) execute directly without operator prompt. The whitelist lives in the device manifest, not here.
|
|
3. **Operator approval** — anything that is neither blocked nor auto-approved returns `approval_status="queued"` in the result. The device sends an approval request to `#operator-approvals` in Element and waits up to 60s for the operator to confirm; on timeout the call returns `approval_status="timeout"` and the LLM must reword or `!retry`.
|
|
|
|
The fields the LLM gets back from `shell.eval`: `stdout`, `stderr`, `exit_code`, `approval_status`, `cmd_executed` (post-normalization), `truncated` (true if output was capped), `duration_ms`.
|
|
|
|
### When the LLM should call shell.eval
|
|
|
|
Use it as the **fallback** for cases none of the structured tools cover:
|
|
|
|
- Pipes, redirects, sub-shells, here-docs.
|
|
- One-liners that combine `find` + `xargs` + `awk`.
|
|
- Quick sanity checks (`uptime && df -h`).
|
|
- Composing CLI tools the agent isn't going to call enough to warrant a dedicated tool spec.
|
|
|
|
Avoid it for things that *do* have a structured tool: `fs.read`, `fs.list`, `git.commit`, `docker.exec`, etc. Those have predictable JSON shapes, narrower attack surface, and richer result mapping.
|
|
|
|
### Designing manifests for user vs sudo agents
|
|
|
|
`RegisterBuiltins` registers `shell.eval` in **both** `ModeUser` and `ModeSudo` because the device_agent — not the registry — decides what is safe. Recommended manifest defaults:
|
|
|
|
| Agent role | `RequiresApproval` (LLM-facing metadata) | Device manifest |
|
|
|---|---|---|
|
|
| `agent-<host>` (user) | `false` | Auto-approve whitelist + operator approval for anything else. Hardcoded blocklist active. |
|
|
| `agent-<host>-sudo` (sudo) | `true` (forced via `withApprovalRequired`) | **Every** invocation requires explicit operator approval. No auto-approve whitelist. Hardcoded blocklist active. |
|
|
|
|
The `withApprovalRequired` helper clones the spec returned by `shellEvalSpec()` and flips `RequiresApproval=true` without mutating the source, so `ModeUser` registries that re-register after a `ModeSudo` run still get the unmodified spec. See `tools_builtin.go::RegisterBuiltins` for the special-case wiring.
|
|
|
|
See also: `apps/device_agent/` (where the blocklist + auto-approve whitelist + approval flow live) and issue 0144 §6.4 for the RBAC design.
|
|
|
|
## POC limitations (intentional)
|
|
|
|
These are out of scope for 0144a and tracked in sibling issues:
|
|
|
|
- **No retry**. A single `Call` failure surfaces immediately. The spec accepts this: tool failures go back to the LLM as a `role='tool'` error message and the LLM decides what to do (issue 0144 §7.1 reglas operativas 2).
|
|
- **No pre-approval cache**. `RequiresApproval` is metadata only; the actual gate lives on the device_agent (0144 §3) and the pre-approvals table (0144f).
|
|
- **No streaming**. Tools are request/response. Long-running commands (`apt-get install` of a 200MB package) block until done or timeout. Streaming for logs is its own future issue.
|
|
- **No exponential backoff**. The Go HTTP client's transport defaults apply (TCP retries on connect, no per-request retry).
|
|
- **No output sanitization**. The Runner formats the result as JSON; sanitization against prompt-injection payloads is 0144g.
|
|
- **No telemetry to `call_monitor`**. The hook for `function_id = capability_<name>_<lang>_<domain>` is part of the agent runtime wiring (0144c) — this package emits no metrics on its own.
|
|
- **No manifest signing on the request side**. The Client envelope matches the 0134 §2.1 wire format but does NOT sign; manifest signing arrives in 0144h.
|
|
|
|
## Why these specific design choices
|
|
|
|
- `Args map[string]any` (object) NOT `[]string` (positional). The current `device_agent` POC uses `[]string` for `shell.exec` (see `apps/device_agent/capability.go`). The 0134 protocol and 0144 spec call for object-shaped args because most capabilities (`fs.read`, `git.clone`, `docker.exec`) are not naturally positional. 0144h migrates the device_agent.
|
|
- `ResultMapping` returns `any` instead of `map[string]any`. Some tools (eg the test's `echo` example) collapse their output to a string. The Runner JSON-encodes whatever comes back so the LLM always sees a stable representation.
|
|
- `Capability` is a field on `ToolSpec`, not derived from `Name`. The 1:1 mapping is the common case (`fs.read` → `fs.read`), but `docker.list` → `docker.container.list` and `project.create` (future) compose multiple capabilities, so the indirection pays for itself.
|
|
- Pure/impure split inside one package. `ToolSpec`, schema, mappings, registry are pure data and pure functions. Only `Client.Call` and `Client.Health` do I/O. The runtime composes them; tests substitute the Client.
|