agents_and_robots/pkg/tools/devicemesh/README.md

# pkg/tools/devicemesh

Tool registry framework that lets an LLM agent in `agents_and_robots` (VPS) call capabilities exposed by a remote `device_agent` over the WireGuard mesh.

Issue: [0144a](../../../dev/issues/0144-agent-per-machine-llm.md) (POC for the broader 0144 spec).

## What it does

```
LLM (Claude)
  │  tool_call exec {argv:["ls","/tmp"]}
  ▼
ToolRegistry.Call("exec", input)
  │  1. ValidateInput against tool's InputSchema
  │  2. ArgMapping(input) → device-facing args
  │  3. Client.Call(CapabilityRequest{capability: "shell.exec", args})
  │  4. ResultMapping(resp.Result) → LLM-facing output
  ▼
HTTP POST http://10.42.0.10:7474/capability   (over mesh WG)
  ▼
device_agent on home-wsl runs the binary, returns audit_hash + result
```

The LLM never sees the HTTP layer; it sees a flat list of named tools with JSON-Schema inputs.

## Pieces

| File | Purpose |
|---|---|
| `client.go` | HTTP client to `POST /capability` and `GET /health` of the remote `device_agent`. Generates `request_id` (req_<12bytehex>) and `nonce` (16 random bytes base64) when missing. |
| `types.go` | `ToolSpec` + `ToolRegistry`. Thread-safe registry, `Call` is the single dispatch entry point. |
| `schema.go` | Mini JSON-Schema validator (object/array/string/integer/number/boolean + required + additionalProperties + enum). Enough to reject LLM mistakes without pulling a heavy dep. |
| `tools_builtin.go` | The standard catalog: exec, shell.eval, fs.read, fs.write, fs.list, fs.stat, git.clone, git.commit, git.push, pkg.install, pkg.search, proc.list, proc.kill, docker.list, docker.exec, docker.logs. `RegisterBuiltins(reg, ModeUser|ModeSudo|ModeAll)` filters by `RequiresApproval`. `shell.eval` is special-cased to be registered in BOTH modes, with `RequiresApproval=true` forced in `ModeSudo` via `withApprovalRequired`. |

## How to register a new tool

```go
import "github.com/enmanuel/agents/pkg/tools/devicemesh"

reg.Register(devicemesh.ToolSpec{
    Name:        "screenshot",
    Description: "Capture the display on the remote device. Returns PNG base64.",
    Capability:  "display.capture",
    InputSchema: map[string]any{
        "type":                 "object",
        "additionalProperties": false,
        "properties": map[string]any{
            "format": map[string]any{"type": "string", "enum": []any{"png", "jpeg"}},
        },
    },
    ArgMapping: func(in map[string]any) (map[string]any, error) {
        // pure transform LLM → device
        return in, nil
    },
    ResultMapping: func(r map[string]any) (any, error) {
        // pure transform device → LLM
        return r, nil
    },
    RequiresApproval: false, // user-scope
})
```

Then add the tool name to `cfg.DeviceMesh.ToolsAllowed` in the agent's `config.yaml`.

## Wiring (issue 0144c — done)

The launcher now constructs the device mesh registry from `cfg.DeviceMesh` and surfaces every spec as a regular `tools.Tool` consumed by the existing LLM tool-use loop. No special LLM path; the LLM does not know (or care) that the tool's `Exec` ends up making an HTTP call over WireGuard.

```
config.AgentConfig.DeviceMesh (yaml block)
    │
    ▼  buildDeviceMeshRegistry(cfg, logger)   ← devagents/registry_build.go
    │   1. resolve URL (env var override wins when present + non-empty)
    │   2. NewClient(url) + apply Timeout
    │   3. RegisterBuiltins(reg, mode)        ← user | sudo | all
    │   4. FilterByAllowed(reg, tools_allowed)
    │
    ▼  devicemesh.ToolsForLLM(reg)            ← pkg/tools/devicemesh/adapter.go
    │   1 tools.Tool per spec; Def.Parameters
    │   compressed from JSON-Schema; Exec
    │   closure routes through reg.Call
    │
    ▼  tools.Registry.Register(...)           ← devagents/registry_build.go
    │
    ▼  devagents/llm.go runLLM tool-use loop  ← unchanged
```

The same `*ToolRegistry` is also passed to `effects.NewRunnerWithDeviceMesh` so any rule that emits `decision.ActionKindDeviceMesh` (orchestrator pipelines, `!exec` builtin command, etc.) hits the same dispatcher. Both paths produce the same JSON envelope, so audit chains line up regardless of where the call originated.

### Config block

The agent's `config.yaml` opts in via:

```yaml
device_mesh:
  enabled: true
  device_id: home-wsl                # logged as audit context; aliased as "host"
  mode: user                         # user | sudo | all
  device_agent_url: "http://10.42.0.10:7474"
  device_agent_url_env: AGENT_HOME_WSL_DEVICE_MESH_URL  # optional; wins when set + non-empty
  manifest_id: manifest_home-wsl_v1  # metadata only; the device enforces
  client_timeout_s: 60               # aliased as "timeout_seconds"
  tools_allowed:                     # whitelist; empty = keep everything mode allowed
    - exec
    - fs.read
    - fs.list
```

Names in `tools_allowed` that the catalog does not provide are logged with a `WARN device_mesh tools_allowed lists unknown tool` and dropped. The template ships extras like `project.create`, `memory.recall`, etc. that arrive in 0144d/e — they degrade gracefully today.

### LLM-side view of a device tool

The adapter compresses the device-mesh `InputSchema` into the flatter `tools.Def.Parameters` shape (each top-level property becomes one `tools.Param`). The description is enriched with a stable marker so the model can spot remote tools at a glance:

```
exec  →  "Execute a command on the remote device. argv is parsed as exec.Command (NO shell). ... [device_mesh: shell.exec]"
pkg.install  →  "Install an OS package ... [device_mesh: pkg.install] (approval required)"
```

When `RequiresApproval=true`, the marker also reminds the model the call may be queued, which feeds back into the system prompt rules of `agent-<host>-sudo`.

### Approval flow + LLM tool-result mapping

When the device_agent returns `approval_status="queued"` and the operator does not click 👍 within the timeout (0134 §6.5), the device returns `approval_status="timeout"` or `ok=false, error="approval_required"`. The adapter does NOT silence this — it surfaces the error verbatim:

```
ToolRegistry.Call(...) → returns err = "devicemesh: shell.exec: approval_required"
tools.Result{Err: err}
runLLM → appends `role='tool'` message with `error: devicemesh: shell.exec: approval_required`
LLM next iteration → can apologize to operator and ask for retry.
```

The actual approval UX (operator clicks 👍 in `#operator-approvals`) is the device_agent's responsibility (issue 0134 §6, validated end-to-end in flow 0009). Nothing new on the agents_and_robots side.

### What this issue does NOT do

- **Matrix-side approval rendering** is 0144f — `!preapprove`, `!approve req_id`, pre-approval cache.
- **ed25519 manifest signing** is 0144h — today the wire format is correct but unsigned.
- **`call_monitor` telemetry hook** that emits `function_id = capability_<name>_<lang>_<domain>` per call is 0144 §13 (separate plumbing in the audit writer).
- **Cross-room correlation** (`delegate_sudo` posting to `#<host>-sudo` and the bot copying the reply back) is its own issue (0144 main spec §3.3 + 0144c original plan — left intentionally for the room/bus layer once approval is wired).

## shell.eval — the powerful tool

`shell.eval` is the **only** built-in tool that lets the LLM execute arbitrary free-form shell text on the device. Every other tool has a tightly-scoped JSON schema (paths, argv lists, container ids); `shell.eval` accepts a single string that the device hands to bash (Linux/WSL) or PowerShell (Windows) unmodified.

It exists because no structured tool can cover every legal shell idiom: pipes, redirects, here-docs, `$()` expansions, complex globs, environment-aware composition. Without `shell.eval`, the LLM resorts to multi-step `exec` chains that lose fidelity (no shell metacharacters allowed in `exec`'s `argv`). With it, the LLM can ask for "give me the size of every `.log` in `/var/log` sorted desc" in one round-trip.

### Guardrails (all device-side)

The flag on `ToolSpec.RequiresApproval` is metadata only. The real protections live in the `device_agent`:

1. **Hardcoded blocklist** — destructive patterns (`rm -rf /`, `dd if=/dev/...`, `mkfs`, fork-bombs `:(){:|:&};:`, `shutdown`, `reboot`, `:>/dev/sda`, ...) always reject regardless of agent role or operator. There is no override.
2. **Auto-approve whitelist** — read-only / inspection patterns (`^git `, `^ls `, `^cat `, `^grep `, `^ps `, `^uptime`, `^df `, ...) execute directly without operator prompt. The whitelist lives in the device manifest, not here.
3. **Operator approval** — anything that is neither blocked nor auto-approved returns `approval_status="queued"` in the result. The device sends an approval request to `#operator-approvals` in Element and waits up to 60s for the operator to confirm; on timeout the call returns `approval_status="timeout"` and the LLM must reword or `!retry`.

The fields the LLM gets back from `shell.eval`: `stdout`, `stderr`, `exit_code`, `approval_status`, `cmd_executed` (post-normalization), `truncated` (true if output was capped), `duration_ms`.

### When the LLM should call shell.eval

Use it as the **fallback** for cases none of the structured tools cover:

- Pipes, redirects, sub-shells, here-docs.
- One-liners that combine `find` + `xargs` + `awk`.
- Quick sanity checks (`uptime && df -h`).
- Composing CLI tools the agent isn't going to call enough to warrant a dedicated tool spec.

Avoid it for things that *do* have a structured tool: `fs.read`, `fs.list`, `git.commit`, `docker.exec`, etc. Those have predictable JSON shapes, narrower attack surface, and richer result mapping.

### Designing manifests for user vs sudo agents

`RegisterBuiltins` registers `shell.eval` in **both** `ModeUser` and `ModeSudo` because the device_agent — not the registry — decides what is safe. Recommended manifest defaults:

| Agent role | `RequiresApproval` (LLM-facing metadata) | Device manifest |
|---|---|---|
| `agent-<host>` (user) | `false` | Auto-approve whitelist + operator approval for anything else. Hardcoded blocklist active. |
| `agent-<host>-sudo` (sudo) | `true` (forced via `withApprovalRequired`) | **Every** invocation requires explicit operator approval. No auto-approve whitelist. Hardcoded blocklist active. |

The `withApprovalRequired` helper clones the spec returned by `shellEvalSpec()` and flips `RequiresApproval=true` without mutating the source, so `ModeUser` registries that re-register after a `ModeSudo` run still get the unmodified spec. See `tools_builtin.go::RegisterBuiltins` for the special-case wiring.

See also: `apps/device_agent/` (where the blocklist + auto-approve whitelist + approval flow live) and issue 0144 §6.4 for the RBAC design.

## POC limitations (intentional)

These are out of scope for 0144a and tracked in sibling issues:

- **No retry**. A single `Call` failure surfaces immediately. The spec accepts this: tool failures go back to the LLM as a `role='tool'` error message and the LLM decides what to do (issue 0144 §7.1 reglas operativas 2).
- **No pre-approval cache**. `RequiresApproval` is metadata only; the actual gate lives on the device_agent (0144 §3) and the pre-approvals table (0144f).
- **No streaming**. Tools are request/response. Long-running commands (`apt-get install` of a 200MB package) block until done or timeout. Streaming for logs is its own future issue.
- **No exponential backoff**. The Go HTTP client's transport defaults apply (TCP retries on connect, no per-request retry).
- **No output sanitization**. The Runner formats the result as JSON; sanitization against prompt-injection payloads is 0144g.
- **No telemetry to `call_monitor`**. The hook for `function_id = capability_<name>_<lang>_<domain>` is part of the agent runtime wiring (0144c) — this package emits no metrics on its own.
- **No manifest signing on the request side**. The Client envelope matches the 0134 §2.1 wire format but does NOT sign; manifest signing arrives in 0144h.

## Why these specific design choices

- `Args map[string]any` (object) NOT `[]string` (positional). The current `device_agent` POC uses `[]string` for `shell.exec` (see `apps/device_agent/capability.go`). The 0134 protocol and 0144 spec call for object-shaped args because most capabilities (`fs.read`, `git.clone`, `docker.exec`) are not naturally positional. 0144h migrates the device_agent.
- `ResultMapping` returns `any` instead of `map[string]any`. Some tools (eg the test's `echo` example) collapse their output to a string. The Runner JSON-encodes whatever comes back so the LLM always sees a stable representation.
- `Capability` is a field on `ToolSpec`, not derived from `Name`. The 1:1 mapping is the common case (`fs.read` → `fs.read`), but `docker.list` → `docker.container.list` and `project.create` (future) compose multiple capabilities, so the indirection pays for itself.
- Pure/impure split inside one package. `ToolSpec`, schema, mappings, registry are pure data and pure functions. Only `Client.Call` and `Client.Health` do I/O. The runtime composes them; tests substitute the Client.