feat: llm_cli — CLI autocontenida para chatear con Claude

Un solo archivo (llm.py) que habla directo con la API de Anthropic Messages
usando el token OAuth que Claude Code guarda en ~/.claude/.credentials.json.
Sin servidor, sin arranque: chat interactivo con memoria, one-shot, pipe, y
bucle de tools propias (run_tool_loop). Empaqueta standalone la logica del
grupo claude-direct del registry para poder distribuirla (basta llm.py +
README.md, el receptor solo necesita httpx + Claude Code logueado).
This commit is contained in:
agent
2026-06-05 00:15:52 +02:00
commit cc01845acf
4 changed files with 513 additions and 0 deletions
+90
View File
@@ -0,0 +1,90 @@
# llm — chat con Claude desde la terminal
CLI mínima y autocontenida para hablar con Claude directamente desde la terminal.
Usa el token que **Claude Code ya guarda en tu máquina**, así que no hay nada que
configurar: instala `httpx`, ejecútala y escribe.
Un solo archivo (`llm.py`), respuesta en streaming, arranque instantáneo (no lanza
ningún proceso de fondo).
## Requisitos
- Python 3.9+
- `httpx``pip install httpx`
- **Claude Code instalado y con sesión iniciada** (para que exista
`~/.claude/.credentials.json`, de donde se lee el token automáticamente).
## Instalar
```bash
pip install httpx
# copia llm.py donde quieras; opcionalmente hazlo ejecutable:
chmod +x llm.py
```
## Usar
```bash
# Chat interactivo (con memoria de la conversación)
python3 llm.py
# Respuesta de una sola pregunta
python3 llm.py "que es un pseudo-terminal en una frase"
# Por pipe
echo "resume esto en 2 lineas: ..." | python3 llm.py
# Elegir modelo / system prompt
python3 llm.py --model claude-opus-4-8 --system "responde conciso" "explica los punteros"
```
### Comandos del chat interactivo
| Comando | Acción |
|---|---|
| `/model <id>` | Cambia de modelo (`claude-opus-4-8`, `claude-haiku-4-5-20251001`, ...) |
| `/system <texto>` | Fija un system prompt |
| `/reset` | Empieza una conversación nueva |
| `/exit` | Salir |
Modelo por defecto: `claude-haiku-4-5-20251001` (rápido). Usa `--model claude-opus-4-8`
para respuestas más potentes.
## Tus propias herramientas (tools)
`llm.py` incluye `run_tool_loop`, un bucle agéntico donde defines tus tools y el
código que las ejecuta. Tú controlas qué puede hacer el modelo:
```python
import llm
def get_time(inp):
import datetime
return {"now": datetime.datetime.now().isoformat()}
tools = [{
"name": "get_time",
"description": "Devuelve la fecha y hora actual",
"input_schema": {"type": "object", "properties": {}},
}]
res = llm.run_tool_loop(
messages=[{"role": "user", "content": "que hora es?"}],
tools=tools, dispatch={"get_time": get_time},
model="claude-haiku-4-5-20251001",
on_text=lambda d: print(d, end="", flush=True),
)
print("\n", res["final_text"])
```
El modelo pedirá la tool, tu función la ejecuta, el resultado vuelve al modelo y
responde. Añade las tools que quieras (consultar una base de datos, llamar a una
API, leer un archivo...).
## API del módulo
| Función | Para qué |
|---|---|
| `stream_messages(messages, model, system, tools, max_tokens, token)` | Stream crudo de eventos (text / tool_use / done / error) |
| `run_tool_loop(messages, tools, dispatch, ...)` | Bucle agéntico con tus tools |
| `load_oauth_token()` | Lee el token de `~/.claude/.credentials.json` (refresh best-effort) |
+78
View File
@@ -0,0 +1,78 @@
---
name: llm_cli
lang: py
domain: core
version: 1.0.0
description: "CLI minima y autocontenida para chatear con Claude desde la terminal usando el token OAuth que Claude Code guarda en la maquina. Un solo archivo (llm.py), sin servidor ni dependencias mas alla de httpx. Chat interactivo con memoria, one-shot, pipe, y bucle de tools propias. Pensada para distribuir: se copia y se ejecuta."
tags: [cli, claude, llm, anthropic, oauth, chat, standalone, distributable]
uses_functions: []
uses_types: []
framework: ""
entry_point: "llm.py"
dir_path: "apps/llm_cli"
icon:
phosphor: "terminal-window"
accent: "#a855f7"
e2e_checks:
- id: imports
cmd: "python3 -c 'import llm; assert callable(llm.stream_messages) and callable(llm.run_tool_loop)'"
timeout_s: 15
- id: help
cmd: "python3 llm.py --help"
expect_stdout_contains: "interactive chat"
timeout_s: 15
---
# llm_cli
CLI autocontenida para chatear con Claude desde la terminal. Empaqueta en un solo archivo
(`llm.py`) la logica del grupo de funciones `claude-direct` del registry, para poder copiarla y
distribuirla sin necesidad del registry.
## Que es
`llm.py` habla directamente con la API de Anthropic Messages usando el token OAuth que Claude Code
guarda en `~/.claude/.credentials.json`. No lanza ningun proceso claude ni mantiene servidor:
arranque instantaneo, respuesta en streaming.
Es una **version standalone** (codigo copiado, no importado) de las funciones del registry
`load_claude_oauth_token_py_core`, `stream_anthropic_messages_py_core` y
`run_claude_tool_loop_py_core`. Por eso `uses_functions` esta vacio: la app no depende del registry
en runtime — todo vive en `llm.py` para que sea distribuible.
## Uso
```bash
pip install httpx
python3 llm.py # chat interactivo (con memoria)
python3 llm.py "que es Go en una frase" # one-shot
echo "resume esto: ..." | python3 llm.py # pipe
python3 llm.py --model claude-opus-4-8 "..." # elegir modelo
```
Comandos del chat: `/model <id>`, `/system <texto>`, `/reset`, `/exit`.
## Tools propias
`llm.run_tool_loop(messages, tools, dispatch, ...)` corre un bucle agentico donde el usuario define
sus tools (definiciones Anthropic) y el `dispatch` que las ejecuta. Ver `README.md` para el ejemplo
completo.
## Distribuir
Para enviarsela a alguien: basta `llm.py` + `README.md`. El receptor instala `httpx`, tiene Claude
Code logueado, y ejecuta `python3 llm.py`. Nada mas.
## Requisitos
- Python 3.9+, `httpx`.
- Claude Code instalado y con sesion iniciada (provee `~/.claude/.credentials.json`).
## Gotchas
- **Rate limits**: el plan limita la frecuencia; en rafagas se reciben `HTTP 429`. Espacia las
llamadas o usa `claude-haiku-4-5-20251001` (mas cuota) para uso frecuente.
- **Token expirado**: normalmente Claude Code lo refresca en background. Si la CLI lleva mucho
cerrada, abrela una vez para refrescar el token.
- **Modelo**: `claude-opus-4-8` es valido; ids con sufijo de fecha inexistente dan `404`.
Executable
+344
View File
@@ -0,0 +1,344 @@
#!/usr/bin/env python3
"""llm — a tiny, self-contained CLI to chat with Claude from the terminal.
It talks straight to the Anthropic Messages API using the OAuth token that Claude
Code already stores on your machine (~/.claude/.credentials.json), so there is
nothing to configure: install httpx, run it, type.
Usage:
python3 llm.py # interactive chat (with memory)
python3 llm.py "your question" # one-shot answer
echo "summarize this" | python3 llm.py
python3 llm.py --model claude-opus-4-8 "explain pointers"
Interactive commands:
/model <id> switch model (e.g. claude-opus-4-8, claude-haiku-4-5-20251001)
/system <text> set a system prompt
/reset start a fresh conversation
/exit quit
Requirements:
- Python 3.9+
- httpx (pip install httpx)
- Claude Code installed and logged in (so ~/.claude/.credentials.json exists)
"""
import json
import os
import sys
import time
from pathlib import Path
from typing import Callable, Iterator
API_URL = "https://api.anthropic.com/v1/messages"
ANTHROPIC_VERSION = "2023-06-01"
DEFAULT_MODEL = "claude-haiku-4-5-20251001"
CREDENTIALS_PATH = "~/.claude/.credentials.json"
# ---------------------------------------------------------------------------
# Token: read (and best-effort refresh) the Claude Code OAuth access token
# ---------------------------------------------------------------------------
def load_oauth_token(credentials_path: str = "", refresh_if_expired: bool = True) -> str:
"""Return the Claude Code OAuth access token from the local credentials file."""
path = Path(credentials_path or CREDENTIALS_PATH).expanduser()
if not path.exists():
raise FileNotFoundError(
f"No se encontro {path}. Instala Claude Code e inicia sesion "
"(`claude`) para generar las credenciales."
)
data = json.loads(path.read_text(encoding="utf-8"))
oauth = data["claudeAiOauth"]
access_token = oauth["accessToken"]
expires_at_ms = oauth.get("expiresAt", 0)
refresh_token = oauth.get("refreshToken", "")
now_ms = int(time.time() * 1000)
if expires_at_ms and expires_at_ms <= now_ms and refresh_if_expired and refresh_token:
new_token = _try_refresh(refresh_token)
if new_token:
return new_token
print("warning: token may be expired; refresh failed.", file=sys.stderr)
return access_token
def _try_refresh(refresh_token: str) -> str:
"""Best-effort OAuth refresh. Returns '' on failure (claude normally refreshes)."""
try:
import httpx
resp = httpx.post(
"https://auth.anthropic.com/oauth/token",
data={"grant_type": "refresh_token", "refresh_token": refresh_token},
headers={"content-type": "application/x-www-form-urlencoded"},
timeout=10.0,
)
if resp.status_code == 200:
return resp.json().get("access_token", "")
except Exception as exc: # noqa: BLE001
print(f"warning: refresh failed: {exc}", file=sys.stderr)
return ""
# ---------------------------------------------------------------------------
# SSE parsing (pure)
# ---------------------------------------------------------------------------
def _parse_sse_chunk(chunk: str) -> Iterator[dict]:
event = ""
data = ""
for line in chunk.splitlines():
if line.startswith("event:"):
event = line[len("event:"):].strip()
elif line.startswith("data:"):
data = line[len("data:"):].strip()
elif line == "":
if data and data != "[DONE]":
try:
yield from _normalise(event, json.loads(data))
except json.JSONDecodeError:
pass
event = data = ""
if data and data != "[DONE]":
try:
yield from _normalise(event, json.loads(data))
except json.JSONDecodeError:
pass
def _normalise(event_type: str, data: dict) -> Iterator[dict]:
if event_type == "content_block_start":
block = data.get("content_block", {})
if block.get("type") == "tool_use":
yield {"type": "tool_use_start", "id": block.get("id", ""),
"name": block.get("name", ""), "index": data.get("index", 0)}
elif event_type == "content_block_delta":
delta = data.get("delta", {})
if delta.get("type") == "text_delta":
yield {"type": "text", "text": delta.get("text", "")}
elif delta.get("type") == "input_json_delta":
yield {"type": "tool_input_delta", "index": data.get("index", 0),
"partial_json": delta.get("partial_json", "")}
elif event_type == "message_delta":
stop = data.get("delta", {}).get("stop_reason", "")
if stop:
yield {"type": "done", "stop_reason": stop}
# ---------------------------------------------------------------------------
# Streaming call
# ---------------------------------------------------------------------------
def stream_messages(messages: list, model: str = DEFAULT_MODEL, system: str = "",
tools: list = None, max_tokens: int = 4096, token: str = "") -> Iterator[dict]:
"""Stream an Anthropic Messages API call. Yields text/tool_use/done/error events."""
if not token:
try:
token = load_oauth_token()
except Exception as exc: # noqa: BLE001
yield {"type": "error", "message": f"failed to load token: {exc}"}
return
body = {"model": model, "max_tokens": max_tokens, "messages": messages, "stream": True}
if system:
body["system"] = system
if tools:
body["tools"] = tools
headers = {"authorization": f"Bearer {token}", "anthropic-version": ANTHROPIC_VERSION,
"content-type": "application/json"}
try:
import httpx
except ImportError:
yield {"type": "error", "message": "falta httpx. Instala con: pip install httpx"}
return
try:
with httpx.stream("POST", API_URL, json=body, headers=headers, timeout=120.0) as resp:
if resp.status_code != 200:
err = resp.read().decode("utf-8", errors="replace")
yield {"type": "error", "message": f"HTTP {resp.status_code}: {err[:400]}"}
return
buf = ""
for chunk in resp.iter_text():
buf += chunk
while "\n\n" in buf:
block, buf = buf.split("\n\n", 1)
yield from _parse_sse_chunk(block + "\n\n")
if buf.strip():
yield from _parse_sse_chunk(buf + "\n\n")
except Exception as exc: # noqa: BLE001
yield {"type": "error", "message": str(exc)}
# ---------------------------------------------------------------------------
# Agentic tool loop (define your own tools)
# ---------------------------------------------------------------------------
def run_tool_loop(messages: list, tools: list, dispatch: dict, model: str = DEFAULT_MODEL,
system: str = "", max_tokens: int = 4096, max_iters: int = 8,
on_text: Callable[[str], None] = None) -> dict:
"""Run a tool-use loop. `tools` are Anthropic tool defs; `dispatch` maps name->callable."""
iterations = 0
final_text = ""
stop_reason = "max_iters"
for _ in range(max_iters):
iterations += 1
text_parts = []
tool_uses = []
index_map = {}
cur_stop = "end_turn"
for ev in stream_messages(messages, model=model, system=system, tools=tools, max_tokens=max_tokens):
t = ev.get("type")
if t == "text":
text_parts.append(ev["text"])
if on_text:
on_text(ev["text"])
elif t == "tool_use_start":
entry = {"id": ev["id"], "name": ev["name"], "index": ev["index"], "parts": []}
tool_uses.append(entry)
index_map[ev["index"]] = entry
elif t == "tool_input_delta":
if ev["index"] in index_map:
index_map[ev["index"]]["parts"].append(ev["partial_json"])
elif t == "done":
cur_stop = ev.get("stop_reason", "end_turn")
elif t == "error":
return {"messages": messages, "final_text": "".join(text_parts),
"stop_reason": "error", "iterations": iterations,
"error": ev.get("message", "")}
final_text = "".join(text_parts)
stop_reason = cur_stop
content = []
if final_text:
content.append({"type": "text", "text": final_text})
for tu in tool_uses:
raw = "".join(tu["parts"])
try:
parsed = json.loads(raw) if raw else {}
except json.JSONDecodeError:
parsed = {"_raw": raw}
content.append({"type": "tool_use", "id": tu["id"], "name": tu["name"], "input": parsed})
messages.append({"role": "assistant", "content": content})
if stop_reason != "tool_use" or not tool_uses:
break
results = []
for tu in tool_uses:
raw = "".join(tu["parts"])
try:
parsed = json.loads(raw) if raw else {}
except json.JSONDecodeError:
parsed = {"_raw": raw}
if tu["name"] not in dispatch:
rc, is_err = f"Error: tool '{tu['name']}' not found", True
else:
try:
rv = dispatch[tu["name"]](parsed)
rc, is_err = (rv if isinstance(rv, str) else json.dumps(rv)), False
except Exception as exc: # noqa: BLE001
rc, is_err = f"Error executing {tu['name']}: {exc}", True
r = {"type": "tool_result", "tool_use_id": tu["id"], "content": rc}
if is_err:
r["is_error"] = True
results.append(r)
messages.append({"role": "user", "content": results})
else:
stop_reason = "max_iters"
return {"messages": messages, "final_text": final_text,
"stop_reason": stop_reason, "iterations": iterations}
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def _ask_once(messages: list, model: str, system: str) -> str:
"""Stream one assistant turn to stdout, append it to messages, return the text."""
parts = []
for ev in stream_messages(messages, model=model, system=system):
if ev["type"] == "text":
parts.append(ev["text"])
sys.stdout.write(ev["text"])
sys.stdout.flush()
elif ev["type"] == "error":
sys.stderr.write("\n[error] " + str(ev.get("message", "")) + "\n")
return ""
sys.stdout.write("\n")
text = "".join(parts)
if text:
messages.append({"role": "assistant", "content": text})
return text
def _repl(model: str, system: str):
print(f"llm · {model} · escribe tu mensaje (/model, /system, /reset, /exit)")
messages = []
while True:
try:
line = input("\n> ").strip()
except (EOFError, KeyboardInterrupt):
print()
return
if not line:
continue
if line in ("/exit", "/quit", "/q"):
return
if line == "/reset":
messages = []
print("— conversacion nueva —")
continue
if line.startswith("/model"):
parts = line.split(maxsplit=1)
if len(parts) == 2:
model = parts[1].strip()
print(f"modelo: {model}")
else:
print(f"modelo actual: {model}")
continue
if line.startswith("/system"):
parts = line.split(maxsplit=1)
system = parts[1].strip() if len(parts) == 2 else ""
print(f"system: {system or '(ninguno)'}")
continue
messages.append({"role": "user", "content": line})
_ask_once(messages, model, system)
def main(argv):
model = DEFAULT_MODEL
system = ""
rest = []
i = 0
while i < len(argv):
a = argv[i]
if a in ("--model", "-m") and i + 1 < len(argv):
model = argv[i + 1]; i += 2
elif a in ("--system", "-s") and i + 1 < len(argv):
system = argv[i + 1]; i += 2
elif a in ("--help", "-h"):
print(__doc__); return 0
else:
rest.append(a); i += 1
prompt = " ".join(rest).strip()
if not prompt and not sys.stdin.isatty():
prompt = sys.stdin.read().strip()
if prompt: # one-shot
messages = [{"role": "user", "content": prompt}]
_ask_once(messages, model, system)
return 0
_repl(model, system) # interactive
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
+1
View File
@@ -0,0 +1 @@
httpx>=0.24