feat(cybersecurity): auto-commit con 48 cambios
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,95 @@
|
||||
---
|
||||
name: tee_anthropic_sse
|
||||
kind: function
|
||||
lang: py
|
||||
domain: cybersecurity
|
||||
version: "1.0.0"
|
||||
purity: impure
|
||||
signature: "class AnthropicSSETee — mitmproxy addon loaded via mitmdump -s"
|
||||
description: "Addon de mitmproxy que intercepta el stream SSE de la API de Anthropic (/v1/messages) y emite cada evento significativo a stdout como NDJSON en tiempo real. Cada interaccion de la CLI claude dispara una o varias llamadas a /v1/messages; el addon las etiqueta con stream_id, model y has_tools para que el consumidor pueda distinguir la respuesta principal (claude-opus-X con tools) de las auxiliares (titulo/clasificador en haiku sin tools). Las funciones puras split_sse_events y event_to_ndjson son testeables sin mitmproxy."
|
||||
tags: [web-proxy, claude, mitmproxy, sse, streaming, anthropic, cybersecurity]
|
||||
uses_functions: []
|
||||
uses_types: []
|
||||
returns: []
|
||||
returns_optional: false
|
||||
error_type: "error_go_core"
|
||||
imports: ["json", "os", "sys", "mitmproxy"]
|
||||
tested: true
|
||||
tests:
|
||||
- "split buffer completo devuelve 8 bloques"
|
||||
- "split bloques contienen event y data"
|
||||
- "split buffer cortado preserva incompleto"
|
||||
- "split resto mas continuacion reconstruye evento"
|
||||
- "split buffer vacio"
|
||||
- "split evento unico sin separador final"
|
||||
- "text delta p"
|
||||
- "text delta ong"
|
||||
- "message stop con stop holder previo"
|
||||
- "ping devuelve lista vacia"
|
||||
- "content block start text devuelve vacio"
|
||||
- "content block start tool use"
|
||||
- "tool json delta"
|
||||
- "json invalido en data devuelve vacio"
|
||||
- "bloque sin data devuelve vacio"
|
||||
- "integracion secuencia completa produce pong y stop"
|
||||
- "integracion stream id se propaga"
|
||||
- "integracion determinismo"
|
||||
test_file_path: "python/functions/cybersecurity/tests/test_tee_anthropic_sse.py"
|
||||
file_path: "python/functions/cybersecurity/tee_anthropic_sse.py"
|
||||
params:
|
||||
- name: mitmdump_invocation
|
||||
desc: "No recibe argumentos directos. Se carga con `mitmdump -s tee_anthropic_sse.py`. El puerto del proxy se controla con el flag -p de mitmdump (ej. -p 8901). La flag -q suprime el log de mitmdump en stderr dejando solo el NDJSON en stdout."
|
||||
- name: FN_WIRE_ONLY_TOOLS
|
||||
desc: "Variable de entorno opcional. Si vale '1', suprime los streams cuyo request body no incluye el array 'tools' (llamadas auxiliares de titulo/clasificador que usan haiku). Por defecto (sin la env) emite todos los streams etiquetados con stream_id, model y has_tools para que el consumidor filtre."
|
||||
output: "NDJSON a stdout, un objeto JSON por linea (flush inmediato). Tipos de linea: message_start{stream_id,model,has_tools} al inicio de cada stream; text_delta{stream_id,text} por cada fragmento de texto del modelo; tool_use_start{stream_id,tool_name,tool_id} cuando el modelo inicia una herramienta; tool_json_delta{stream_id,partial_json} por cada fragmento de argumentos JSON de la herramienta; message_stop{stream_id,stop_reason} al finalizar el stream. stderr recibe solo mensajes de diagnóstico del addon (errores, warnings), nunca NDJSON."
|
||||
---
|
||||
|
||||
## Ejemplo
|
||||
|
||||
```bash
|
||||
# Terminal 1: lanzar mitmproxy como proxy de interceptacion
|
||||
# -q suprime el log de mitmdump; solo se ve el NDJSON en stdout
|
||||
mitmdump -p 8901 \
|
||||
-s python/functions/cybersecurity/tee_anthropic_sse.py \
|
||||
-q
|
||||
|
||||
# Terminal 2: lanzar claude por el proxy
|
||||
# NODE_EXTRA_CA_CERTS hace que el runtime Node de claude confie en la CA de mitmproxy
|
||||
export HTTPS_PROXY=http://127.0.0.1:8901
|
||||
export NODE_EXTRA_CA_CERTS="$HOME/.mitmproxy/mitmproxy-ca-cert.pem"
|
||||
claude -p "di hola"
|
||||
|
||||
# Salida en stdout de mitmdump (Terminal 1):
|
||||
# {"type": "message_start", "stream_id": 1, "model": "claude-haiku-4-5", "has_tools": false}
|
||||
# {"type": "text_delta", "stream_id": 1, "text": "H"}
|
||||
# {"type": "text_delta", "stream_id": 1, "text": "ola"}
|
||||
# {"type": "message_stop", "stream_id": 1, "stop_reason": "end_turn"}
|
||||
# ...
|
||||
# {"type": "message_start", "stream_id": 2, "model": "claude-opus-4-8", "has_tools": true}
|
||||
# {"type": "text_delta", "stream_id": 2, "text": "Hola"}
|
||||
# ...
|
||||
|
||||
# Filtrar solo la respuesta principal (has_tools=true) con jq:
|
||||
mitmdump -p 8901 -s python/functions/cybersecurity/tee_anthropic_sse.py -q \
|
||||
| jq -c 'select(.has_tools == true or .stream_id != null and (.type == "text_delta"))'
|
||||
|
||||
# O usar la variable de entorno para que el addon ya filtre en origen:
|
||||
FN_WIRE_ONLY_TOOLS=1 mitmdump -p 8901 \
|
||||
-s python/functions/cybersecurity/tee_anthropic_sse.py -q
|
||||
```
|
||||
|
||||
## Cuando usarla
|
||||
|
||||
Cuando quieras el texto exacto que el modelo genera en tiempo real desde una sesion claude (TUI interactiva o `claude -p`), interceptando la red, sin parsear el render de la terminal ni depender de warmup/idle de la TUI. Util para: capturar la salida del modelo para procesado downstream (logging estructurado, metricas de tokens, replay), observar tool_use en construccion (argumentos parciales), o depurar la diferencia entre streams principales y auxiliares en una misma sesion TUI.
|
||||
|
||||
## Gotchas
|
||||
|
||||
- **Descompresion via strip de Accept-Encoding**: el hook `request` elimina el header `Accept-Encoding` de las llamadas a `/v1/messages` para que la API responda con el SSE SIN comprimir. Esto es obligatorio: el modo streaming de mitmproxy (`flow.response.stream`) entrega al tee los bytes CRUDOS del cuerpo, que si vinieran con `Content-Encoding: gzip`/`br` nunca contendrian el delimitador `\n\n` de eventos SSE (se veria binario) y no se emitiria ningun delta. Verificado empiricamente el 2026-06-04: sin el strip, solo se emitia `message_start`; con el strip, los `text_delta` salen correctamente. La alternativa (un decompresor de streaming con estado por flujo) es mas fragil. El coste es unos bytes extra en el salto local, irrelevante.
|
||||
- **NO usar `--set stream_large_bodies`**: el modo streaming se activa con `flow.response.stream = func` en `responseheaders`, sin necesidad de ese flag. Ademas `stream_large_bodies=N` bajo rompe el acceso a `flow.request.content` (necesario para `has_tools`), porque tambien streamea el cuerpo del request y deja de bufferearlo.
|
||||
- **Requiere mitmproxy + CA confiada por claude**: la CA de mitmproxy (`~/.mitmproxy/mitmproxy-ca-cert.pem`) debe estar configurada en `NODE_EXTRA_CA_CERTS` para que el runtime Node de la CLI claude acepte el certificado MITM. Sin esto, claude rechaza la conexion con error de TLS. Instalar mitmproxy: `uv tool install mitmproxy` o `pip install mitmproxy`. claude tambien respeta `HTTPS_PROXY` para enrutar su trafico por el proxy.
|
||||
- **Una interaccion TUI dispara varias /v1/messages**: la respuesta real del usuario usa el modelo principal (p.ej. claude-opus-4-8) y su request body incluye el array `tools` con las herramientas de Claude Code. Las llamadas auxiliares (generador de titulo, clasificador) usan claude-haiku y su request NO lleva `tools`. Usa `has_tools=true` o `FN_WIRE_ONLY_TOOLS=1` para aislar la respuesta principal y no mezclar streams.
|
||||
- **Solo funciona mientras claude no haga TLS pinning**: hoy (2026-06-04) la CLI claude no hace certificate pinning, por lo que el MITM funciona con `NODE_EXTRA_CA_CERTS`. Si una version futura de claude añade pinning, el addon dejara de interceptar.
|
||||
- **Es trafico de tu propia cuenta y maquina**: el addon captura unicamente el trafico local que tu proxy intercepta. No hay acceso a otras cuentas ni sesiones remotas. El NDJSON se emite solo a stdout local.
|
||||
- **El endpoint puede cambiar**: la CLI claude hoy usa `POST /v1/messages?beta=true`. El addon filtra por prefix `/v1/messages` para tolerar variantes de query string, pero si Anthropic cambia la ruta base en versiones futuras del protocolo, actualizar el check en `responseheaders`.
|
||||
- **Chunks parciales**: el addon mantiene un buffer por stream para manejar eventos SSE partidos entre chunks TCP. Si mitmdump se mata con SIGKILL durante un stream activo, el ultimo bloque incompleto del buffer se descarta (no se emite un message_stop artificial).
|
||||
- **stdout debe ser exclusivamente NDJSON**: no añadir prints de debug a stdout; redirigir diagnosticos a stderr. Si se canaliza la salida a `jq` u otro parser, cualquier linea no-JSON rompe el pipeline.
|
||||
@@ -0,0 +1,278 @@
|
||||
"""mitmproxy addon that tees Anthropic SSE streams to stdout as NDJSON.
|
||||
|
||||
Load with: mitmdump -p 8901 -s tee_anthropic_sse.py -q
|
||||
|
||||
For each POST /v1/messages response that streams text/event-stream, the addon
|
||||
emits one NDJSON line per meaningful SSE event to stdout:
|
||||
|
||||
{"type":"message_start","stream_id":1,"model":"claude-opus-4-8","has_tools":true}
|
||||
{"type":"text_delta","stream_id":1,"text":"Hello"}
|
||||
{"type":"tool_use_start","stream_id":1,"tool_name":"Bash","tool_id":"toolu_01..."}
|
||||
{"type":"tool_json_delta","stream_id":1,"partial_json":"{\"command\":\"ls"}
|
||||
{"type":"message_stop","stream_id":1,"stop_reason":"end_turn"}
|
||||
|
||||
stdout is EXCLUSIVELY NDJSON — suitable for piping. All addon diagnostics go
|
||||
to stderr.
|
||||
|
||||
Set FN_WIRE_ONLY_TOOLS=1 to suppress streams whose request body has no "tools"
|
||||
array (title generators, classifiers, etc.).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from typing import Iterator
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pure helpers — testable without mitmproxy
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def split_sse_events(buf: bytes) -> tuple[list[str], bytes]:
|
||||
"""Split a byte buffer into complete SSE event blocks and a leftover tail.
|
||||
|
||||
SSE events are separated by a blank line (``\\n\\n``). Any bytes after
|
||||
the last complete event are returned unchanged as *leftover* so they can
|
||||
be prepended to the next chunk.
|
||||
|
||||
Args:
|
||||
buf: Raw bytes accumulated from one or more SSE chunks.
|
||||
|
||||
Returns:
|
||||
A 2-tuple ``(events, leftover)`` where *events* is a list of complete
|
||||
event block strings (without the trailing ``\\n\\n``) and *leftover*
|
||||
is the remaining bytes that do not yet form a complete event.
|
||||
"""
|
||||
text = buf.decode("utf-8", errors="replace")
|
||||
# Split on the blank-line delimiter that separates SSE events.
|
||||
parts = text.split("\n\n")
|
||||
# The last element is either empty (buffer ended exactly on \n\n) or an
|
||||
# incomplete event that must be carried forward.
|
||||
complete = [p for p in parts[:-1] if p.strip()]
|
||||
leftover_str = parts[-1]
|
||||
return complete, leftover_str.encode("utf-8")
|
||||
|
||||
|
||||
def event_to_ndjson(
|
||||
event_block: str,
|
||||
stream_id: int,
|
||||
stop_holder: dict,
|
||||
) -> list[dict]:
|
||||
"""Parse one SSE event block and return zero or more NDJSON dicts.
|
||||
|
||||
Args:
|
||||
event_block: A single SSE event block string (the content between two
|
||||
``\\n\\n`` separators), e.g. ``"event: content_block_delta\\ndata: {...}"``.
|
||||
stream_id: Monotonic integer that identifies the current stream.
|
||||
stop_holder: A mutable dict used to carry ``stop_reason`` across calls.
|
||||
The caller passes the same dict for all events of one stream; this
|
||||
function writes ``stop_holder["stop_reason"]`` on ``message_delta``
|
||||
events and reads it on ``message_stop``.
|
||||
|
||||
Returns:
|
||||
A (possibly empty) list of dicts ready to be JSON-serialised as NDJSON.
|
||||
"""
|
||||
event_type = ""
|
||||
data_str = ""
|
||||
|
||||
for line in event_block.splitlines():
|
||||
if line.startswith("event:"):
|
||||
event_type = line[len("event:"):].strip()
|
||||
elif line.startswith("data:"):
|
||||
data_str = line[len("data:"):].strip()
|
||||
|
||||
if not data_str:
|
||||
return []
|
||||
|
||||
try:
|
||||
data = json.loads(data_str)
|
||||
except json.JSONDecodeError:
|
||||
return []
|
||||
|
||||
results: list[dict] = []
|
||||
|
||||
if event_type == "content_block_delta":
|
||||
delta = data.get("delta", {})
|
||||
delta_type = delta.get("type", "")
|
||||
if delta_type == "text_delta":
|
||||
results.append(
|
||||
{
|
||||
"type": "text_delta",
|
||||
"stream_id": stream_id,
|
||||
"text": delta.get("text", ""),
|
||||
}
|
||||
)
|
||||
elif delta_type == "input_json_delta":
|
||||
results.append(
|
||||
{
|
||||
"type": "tool_json_delta",
|
||||
"stream_id": stream_id,
|
||||
"partial_json": delta.get("partial_json", ""),
|
||||
}
|
||||
)
|
||||
|
||||
elif event_type == "content_block_start":
|
||||
cb = data.get("content_block", {})
|
||||
if cb.get("type") == "tool_use":
|
||||
results.append(
|
||||
{
|
||||
"type": "tool_use_start",
|
||||
"stream_id": stream_id,
|
||||
"tool_name": cb.get("name", ""),
|
||||
"tool_id": cb.get("id", ""),
|
||||
}
|
||||
)
|
||||
# content_block_start for text blocks → nothing to emit
|
||||
|
||||
elif event_type == "message_delta":
|
||||
delta = data.get("delta", {})
|
||||
reason = delta.get("stop_reason")
|
||||
if reason:
|
||||
stop_holder["stop_reason"] = reason
|
||||
|
||||
elif event_type == "message_stop":
|
||||
results.append(
|
||||
{
|
||||
"type": "message_stop",
|
||||
"stream_id": stream_id,
|
||||
"stop_reason": stop_holder.get("stop_reason", "end_turn"),
|
||||
}
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# mitmproxy addon
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class AnthropicSSETee:
|
||||
"""mitmproxy addon: tee Anthropic /v1/messages SSE streams to stdout.
|
||||
|
||||
One instance is shared across all intercepted flows. Each SSE stream gets
|
||||
a monotonically increasing ``stream_id`` so the consumer can correlate
|
||||
lines from concurrent or sequential streams.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._stream_counter: int = 0
|
||||
self._wire_only_tools: bool = os.environ.get("FN_WIRE_ONLY_TOOLS", "") == "1"
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# mitmproxy hooks
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def request(self, flow) -> None: # noqa: ANN001
|
||||
"""Called when a request is received (before it is sent upstream).
|
||||
|
||||
For the target /v1/messages endpoint, strip the Accept-Encoding header so
|
||||
the API responds with an uncompressed SSE stream. Otherwise the streaming
|
||||
tee would see gzip/brotli bytes (which never contain the ``\\n\\n`` event
|
||||
delimiter) and a stateful streaming decompressor would be required. The
|
||||
extra bytes on the local hop are irrelevant; claude still parses the SSE
|
||||
normally.
|
||||
"""
|
||||
req = flow.request
|
||||
if req.method == "POST" and req.path.startswith("/v1/messages"):
|
||||
req.headers.pop("accept-encoding", None)
|
||||
|
||||
def responseheaders(self, flow) -> None: # noqa: ANN001
|
||||
"""Called when response headers are received (before body).
|
||||
|
||||
If the flow is a streaming Anthropic messages endpoint, activate
|
||||
mitmproxy's streaming mode and attach the tee function so the response
|
||||
body is forwarded to claude in real time while we parse it.
|
||||
"""
|
||||
req = flow.request
|
||||
resp = flow.response
|
||||
|
||||
# Filter: must be POST /v1/messages (with or without query params)
|
||||
if req.method != "POST":
|
||||
return
|
||||
if not req.path.startswith("/v1/messages"):
|
||||
return
|
||||
|
||||
# Filter: response must be SSE
|
||||
ct = resp.headers.get("content-type", "")
|
||||
if "event-stream" not in ct:
|
||||
return
|
||||
|
||||
# Parse request body for metadata
|
||||
try:
|
||||
body = json.loads(req.content or b"{}")
|
||||
except (json.JSONDecodeError, Exception):
|
||||
# Cannot parse body — skip this flow without breaking the proxy
|
||||
print(
|
||||
f"[tee_anthropic_sse] WARN: could not parse request body for {req.path}",
|
||||
file=sys.stderr,
|
||||
flush=True,
|
||||
)
|
||||
return
|
||||
|
||||
model: str = body.get("model", "unknown")
|
||||
has_tools: bool = bool(body.get("tools"))
|
||||
|
||||
# Optionally suppress non-tool streams (title/classifier calls)
|
||||
if self._wire_only_tools and not has_tools:
|
||||
return
|
||||
|
||||
self._stream_counter += 1
|
||||
stream_id = self._stream_counter
|
||||
|
||||
# Emit the stream-start event so the consumer knows what is coming
|
||||
_emit({"type": "message_start", "stream_id": stream_id, "model": model, "has_tools": has_tools})
|
||||
|
||||
# Build the per-stream tee closure and hand it to mitmproxy
|
||||
flow.response.stream = _make_tee(stream_id)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Private helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _emit(obj: dict) -> None:
|
||||
"""Write one NDJSON line to stdout (flush immediately)."""
|
||||
print(json.dumps(obj), flush=True)
|
||||
|
||||
|
||||
def _make_tee(stream_id: int):
|
||||
"""Return a mitmproxy streaming function for a single SSE stream.
|
||||
|
||||
The returned callable is assigned to ``flow.response.stream`` and will be
|
||||
called by mitmproxy for each chunk of the response body. It MUST return
|
||||
the chunk unchanged so claude receives the full stream.
|
||||
"""
|
||||
buf: bytearray = bytearray()
|
||||
stop_holder: dict = {}
|
||||
|
||||
def tee(chunk: bytes) -> bytes:
|
||||
nonlocal buf
|
||||
buf.extend(chunk)
|
||||
try:
|
||||
events, leftover = split_sse_events(bytes(buf))
|
||||
buf = bytearray(leftover)
|
||||
for block in events:
|
||||
for obj in event_to_ndjson(block, stream_id, stop_holder):
|
||||
_emit(obj)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
print(
|
||||
f"[tee_anthropic_sse] ERROR in tee for stream {stream_id}: {exc}",
|
||||
file=sys.stderr,
|
||||
flush=True,
|
||||
)
|
||||
# Always return the original chunk — claude must receive its stream
|
||||
return chunk
|
||||
|
||||
return tee
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# mitmproxy entrypoint
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
addons = [AnthropicSSETee()]
|
||||
@@ -0,0 +1,283 @@
|
||||
"""Tests para las funciones puras de tee_anthropic_sse.
|
||||
|
||||
Cubre split_sse_events y event_to_ndjson sin necesitar mitmproxy.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
from tee_anthropic_sse import split_sse_events, event_to_ndjson
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SSE fixture — captura real de la API de Anthropic
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_RAW_SSE = (
|
||||
b"event: message_start\n"
|
||||
b'data: {"type":"message_start","message":{"model":"claude-opus-4-8","id":"msg_x",'
|
||||
b'"type":"message","role":"assistant","content":[],"stop_reason":null}}\n'
|
||||
b"\n"
|
||||
b"event: content_block_start\n"
|
||||
b'data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}\n'
|
||||
b"\n"
|
||||
b"event: ping\n"
|
||||
b'data: {"type": "ping"}\n'
|
||||
b"\n"
|
||||
b"event: content_block_delta\n"
|
||||
b'data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"P"}}\n'
|
||||
b"\n"
|
||||
b"event: content_block_delta\n"
|
||||
b'data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"ONG"}}\n'
|
||||
b"\n"
|
||||
b"event: content_block_stop\n"
|
||||
b'data: {"type":"content_block_stop","index":0}\n'
|
||||
b"\n"
|
||||
b"event: message_delta\n"
|
||||
b'data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},'
|
||||
b'"usage":{"output_tokens":5}}\n'
|
||||
b"\n"
|
||||
b"event: message_stop\n"
|
||||
b'data: {"type":"message_stop"}\n'
|
||||
b"\n"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# split_sse_events
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_split_buffer_completo_devuelve_8_bloques():
|
||||
"""Con el buffer completo devuelve los 8 bloques y leftover vacio."""
|
||||
events, leftover = split_sse_events(_RAW_SSE)
|
||||
assert len(events) == 8
|
||||
assert leftover == b""
|
||||
|
||||
|
||||
def test_split_bloques_contienen_event_y_data():
|
||||
"""Cada bloque contiene las lineas event: y data: esperadas."""
|
||||
events, _ = split_sse_events(_RAW_SSE)
|
||||
assert "event: message_start" in events[0]
|
||||
assert "event: ping" in events[2]
|
||||
assert "event: message_stop" in events[7]
|
||||
|
||||
|
||||
def test_split_buffer_cortado_preserva_incompleto():
|
||||
"""Con un buffer cortado a la mitad de un evento, devuelve solo los completos."""
|
||||
# Encontrar la SEGUNDA aparicion de content_block_delta (quinto evento en total)
|
||||
first_occ = _RAW_SSE.find(b"event: content_block_delta\ndata:")
|
||||
second_occ = _RAW_SSE.find(b"event: content_block_delta\ndata:", first_occ + 1)
|
||||
# Cortar en medio del data: del segundo content_block_delta
|
||||
cut_buf = _RAW_SSE[:second_occ + 20]
|
||||
|
||||
events, leftover = split_sse_events(cut_buf)
|
||||
# Debe haber exactamente 4 eventos completos:
|
||||
# message_start, content_block_start, ping, primer content_block_delta
|
||||
assert len(events) == 4
|
||||
# El leftover no debe estar vacio (el segundo delta queda a medias)
|
||||
assert len(leftover) > 0
|
||||
|
||||
|
||||
def test_split_resto_mas_continuacion_reconstruye_evento():
|
||||
"""Concatenar leftover + continuacion reconstituye el evento cortado."""
|
||||
# Cortar justo antes del \n\n que cierra el primer delta
|
||||
cut_point = _RAW_SSE.find(b"\n\nevent: content_block_delta\n", 100)
|
||||
first_half = _RAW_SSE[:cut_point + 1] # termina dentro del separador
|
||||
second_half = _RAW_SSE[cut_point + 1:]
|
||||
|
||||
events1, leftover1 = split_sse_events(first_half)
|
||||
combined = leftover1 + second_half
|
||||
events2, leftover2 = split_sse_events(combined)
|
||||
|
||||
# La union debe cubrir todos los bloques del segundo tramo
|
||||
all_events = events1 + events2
|
||||
assert len(all_events) == 8
|
||||
assert leftover2 == b""
|
||||
|
||||
|
||||
def test_split_buffer_vacio():
|
||||
"""Buffer vacio devuelve lista vacia y leftover vacio."""
|
||||
events, leftover = split_sse_events(b"")
|
||||
assert events == []
|
||||
assert leftover == b""
|
||||
|
||||
|
||||
def test_split_evento_unico_sin_separador_final():
|
||||
"""Un evento sin separador final queda como leftover."""
|
||||
chunk = b"event: ping\ndata: {\"type\":\"ping\"}"
|
||||
events, leftover = split_sse_events(chunk)
|
||||
assert events == []
|
||||
assert b"ping" in leftover
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# event_to_ndjson
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_text_delta_p():
|
||||
"""content_block_delta text_delta 'P' -> [{type:text_delta, stream_id:1, text:'P'}]."""
|
||||
block = (
|
||||
"event: content_block_delta\n"
|
||||
'data: {"type":"content_block_delta","index":0,'
|
||||
'"delta":{"type":"text_delta","text":"P"}}'
|
||||
)
|
||||
result = event_to_ndjson(block, 1, {})
|
||||
assert result == [{"type": "text_delta", "stream_id": 1, "text": "P"}]
|
||||
|
||||
|
||||
def test_text_delta_ong():
|
||||
"""content_block_delta text_delta 'ONG' -> text 'ONG'."""
|
||||
block = (
|
||||
"event: content_block_delta\n"
|
||||
'data: {"type":"content_block_delta","index":0,'
|
||||
'"delta":{"type":"text_delta","text":"ONG"}}'
|
||||
)
|
||||
result = event_to_ndjson(block, 1, {})
|
||||
assert result == [{"type": "text_delta", "stream_id": 1, "text": "ONG"}]
|
||||
|
||||
|
||||
def test_message_stop_con_stop_holder_previo():
|
||||
"""message_stop con stop_holder ya cargado -> stop_reason end_turn."""
|
||||
stop_holder: dict = {}
|
||||
|
||||
# Primero simular message_delta para poblar el holder
|
||||
delta_block = (
|
||||
"event: message_delta\n"
|
||||
'data: {"type":"message_delta","delta":{"stop_reason":"end_turn",'
|
||||
'"stop_sequence":null},"usage":{"output_tokens":5}}'
|
||||
)
|
||||
event_to_ndjson(delta_block, 1, stop_holder)
|
||||
assert stop_holder.get("stop_reason") == "end_turn"
|
||||
|
||||
# Ahora message_stop
|
||||
stop_block = (
|
||||
"event: message_stop\n"
|
||||
'data: {"type":"message_stop"}'
|
||||
)
|
||||
result = event_to_ndjson(stop_block, 1, stop_holder)
|
||||
assert result == [{"type": "message_stop", "stream_id": 1, "stop_reason": "end_turn"}]
|
||||
|
||||
|
||||
def test_ping_devuelve_lista_vacia():
|
||||
"""ping -> []."""
|
||||
block = "event: ping\ndata: {\"type\": \"ping\"}"
|
||||
result = event_to_ndjson(block, 1, {})
|
||||
assert result == []
|
||||
|
||||
|
||||
def test_content_block_start_text_devuelve_vacio():
|
||||
"""content_block_start para un bloque de texto -> []."""
|
||||
block = (
|
||||
"event: content_block_start\n"
|
||||
'data: {"type":"content_block_start","index":0,'
|
||||
'"content_block":{"type":"text","text":""}}'
|
||||
)
|
||||
result = event_to_ndjson(block, 1, {})
|
||||
assert result == []
|
||||
|
||||
|
||||
def test_content_block_start_tool_use():
|
||||
"""content_block_start tool_use -> tool_use_start con name e id."""
|
||||
block = (
|
||||
"event: content_block_start\n"
|
||||
'data: {"type":"content_block_start","index":1,'
|
||||
'"content_block":{"type":"tool_use","id":"toolu_01abc","name":"Bash"}}'
|
||||
)
|
||||
result = event_to_ndjson(block, 2, {})
|
||||
assert result == [
|
||||
{
|
||||
"type": "tool_use_start",
|
||||
"stream_id": 2,
|
||||
"tool_name": "Bash",
|
||||
"tool_id": "toolu_01abc",
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
def test_tool_json_delta():
|
||||
"""content_block_delta input_json_delta -> tool_json_delta."""
|
||||
# Construir el bloque SSE con json.dumps para que el partial_json quede
|
||||
# correctamente escapado dentro del JSON del campo data:
|
||||
import json as _json
|
||||
data_payload = {
|
||||
"type": "content_block_delta",
|
||||
"index": 1,
|
||||
"delta": {
|
||||
"type": "input_json_delta",
|
||||
"partial_json": '{"command":"ls',
|
||||
},
|
||||
}
|
||||
block = "event: content_block_delta\ndata: " + _json.dumps(data_payload)
|
||||
result = event_to_ndjson(block, 3, {})
|
||||
assert result == [
|
||||
{
|
||||
"type": "tool_json_delta",
|
||||
"stream_id": 3,
|
||||
"partial_json": '{"command":"ls',
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
def test_json_invalido_en_data_devuelve_vacio():
|
||||
"""Linea data: con JSON invalido -> [] (sin excepcion)."""
|
||||
block = "event: content_block_delta\ndata: {esto no es json"
|
||||
result = event_to_ndjson(block, 1, {})
|
||||
assert result == []
|
||||
|
||||
|
||||
def test_bloque_sin_data_devuelve_vacio():
|
||||
"""Bloque sin linea data: -> []."""
|
||||
block = "event: content_block_stop\n"
|
||||
result = event_to_ndjson(block, 1, {})
|
||||
assert result == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Integración del parseo: secuencia completa produce PONG + message_stop
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_integracion_secuencia_completa_produce_pong_y_stop():
|
||||
"""Los 8 bloques en orden producen text_delta 'P'+'ONG' y un message_stop end_turn."""
|
||||
events, leftover = split_sse_events(_RAW_SSE)
|
||||
assert leftover == b""
|
||||
|
||||
stop_holder: dict = {}
|
||||
all_ndjson: list[dict] = []
|
||||
for block in events:
|
||||
all_ndjson.extend(event_to_ndjson(block, 1, stop_holder))
|
||||
|
||||
text_deltas = [o for o in all_ndjson if o["type"] == "text_delta"]
|
||||
message_stops = [o for o in all_ndjson if o["type"] == "message_stop"]
|
||||
|
||||
concatenated = "".join(d["text"] for d in text_deltas)
|
||||
assert concatenated == "PONG"
|
||||
|
||||
assert len(message_stops) == 1
|
||||
assert message_stops[0]["stop_reason"] == "end_turn"
|
||||
assert message_stops[0]["stream_id"] == 1
|
||||
|
||||
|
||||
def test_integracion_stream_id_se_propaga():
|
||||
"""stream_id se propaga correctamente a todos los eventos emitidos."""
|
||||
events, _ = split_sse_events(_RAW_SSE)
|
||||
stop_holder: dict = {}
|
||||
for block in events:
|
||||
for obj in event_to_ndjson(block, 42, stop_holder):
|
||||
assert obj["stream_id"] == 42
|
||||
|
||||
|
||||
def test_integracion_determinismo():
|
||||
"""Parsear el mismo buffer dos veces produce exactamente el mismo resultado."""
|
||||
def parse_all(stream_id: int) -> list[dict]:
|
||||
evs, _ = split_sse_events(_RAW_SSE)
|
||||
holder: dict = {}
|
||||
result: list[dict] = []
|
||||
for b in evs:
|
||||
result.extend(event_to_ndjson(b, stream_id, holder))
|
||||
return result
|
||||
|
||||
assert parse_all(1) == parse_all(1)
|
||||
Reference in New Issue
Block a user