89 lines
3.9 KiB
Markdown
89 lines
3.9 KiB
Markdown
# browser_mcp
|
|
|
|
MCP server (Go) that exposes the registry's CDP browser-control functions
|
|
(`fn-registry/functions/browser`) as MCP tools. Drive a live Chrome/Chromium over the
|
|
Chrome DevTools Protocol: navigate, read the DOM, click, manage cookies, evaluate
|
|
JavaScript, operate iframes, and persist/restore session state.
|
|
|
|
36 tools total, grouped by domain. See `app.md` for the full per-tool reference and the
|
|
"Omitido en v1" section.
|
|
|
|
## Security: isolated Chrome by default (port 9333)
|
|
|
|
**By default the MCP operates on its OWN isolated Chrome, NOT the user's daily browser.**
|
|
|
|
In this ecosystem the user's daily chromium has CDP enabled globally on port **9222** (via
|
|
`/etc/chromium.d/cdp`). If the MCP defaulted there, the agent could drive the user's own
|
|
tabs (banking, email). To prevent that:
|
|
|
|
- The default CDP port is **9333** (the MCP's dedicated Chrome), not 9222.
|
|
- `browser_launch` without `user_data_dir` uses a dedicated isolated profile
|
|
(`<tmp>/browser_mcp_userdata`) on port 9333.
|
|
- **Port 9222 = the daily browser.** Pass `port: 9222` explicitly, with care, only when you
|
|
deliberately want to attach to it.
|
|
|
|
## Build
|
|
|
|
```bash
|
|
cd projects/web_scraping/apps/browser_mcp
|
|
go mod tidy # first time only
|
|
go build -o browser_mcp .
|
|
```
|
|
|
|
`browser_mcp` only imports `fn-registry/functions/browser` (no sqlite/cgo), so a plain
|
|
`go build` works. If transitive deps ever require it, fall back to
|
|
`CGO_ENABLED=1 go build -tags fts5 -o browser_mcp .`.
|
|
|
|
## Architecture: live CDP connection pool
|
|
|
|
Unlike `registry_mcp` (one DB handle), `browser_mcp` keeps a **pool of live CDP
|
|
connections** keyed by port. A CDP connection is a live WebSocket session to a "page"
|
|
tab; reusing it avoids paying the ~50-200ms handshake on every tool and preserves state
|
|
between tools (e.g. the persistent dialog auto-handler armed by `handle_dialog`). The
|
|
pool retries once on a dead-connection error (Chrome may have closed the tab between
|
|
tools). See `pool.go` and `deps.withConn` in `main.go`.
|
|
|
|
## Register in Claude Code
|
|
|
|
Add to a `.mcp.json` (the project's `projects/web_scraping/.mcp.json` already has it):
|
|
|
|
```json
|
|
{
|
|
"mcpServers": {
|
|
"browser": {
|
|
"command": "/home/enmanuel/fn_registry/projects/web_scraping/apps/browser_mcp/browser_mcp",
|
|
"args": []
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
For an inspection-only session that cannot mutate browser state, pass `"args": ["--read-only"]`.
|
|
|
|
## Transports
|
|
|
|
- **stdio** (default) — for MCP clients.
|
|
- **HTTP** — `./browser_mcp --http :7740` (Streamable HTTP). `--bind 0.0.0.0` requires
|
|
`REGISTRY_API_TOKEN` (bearer auth).
|
|
|
|
## Example session
|
|
|
|
The default port is **9333** (the MCP's isolated Chrome). A typical LLM-readiness agent
|
|
flow — launch isolated Chrome, pick the right tab, perceive the page, act, read result:
|
|
|
|
```
|
|
browser_launch { "url": "https://example.com" } # -> "launched pid=... port=9333 user_data_dir=<tmp>/browser_mcp_userdata"
|
|
tab_list { } # -> JSON list of targets (id, type, url, title)
|
|
tab_select { "match": "example.com" } # -> "selected target matching: example.com" (deterministic, by id or URL substring)
|
|
page_perceive { } # -> indented accessibility outline (roles, names, #ref) — the LLM "sees" the page compactly
|
|
dom_click { "selector": "a" } # act on what you perceived
|
|
page_get_text { "selector": "body", "max_bytes": 20000 } # -> visible innerText, compact (does NOT blow up the context like page_get_html)
|
|
browser_disconnect{ }
|
|
```
|
|
|
|
To attach to the daily browser instead, pass `port: 9222` explicitly in each call (with care).
|
|
|
|
Cookies, iframes (`frame_list` -> `frame_eval`/`frame_get_html`), keyboard/scroll
|
|
(`press_key`, `scroll`), JS dialogs (`handle_dialog`), and session persistence
|
|
(`storage_save` / `storage_load`) follow the same per-port pattern.
|