Files
fn_registry/python/functions/datascience/detect_drift.md
T
egutierrez cfdf515228 chore: auto-commit (799 archivos)
- .claude/CLAUDE.md
- .claude/commands/subagentes.md
- .claude/rules/INDEX.md
- .mcp.json
- bash/functions/cybersecurity/analyze_dns.md
- bash/functions/cybersecurity/audit_http_headers.md
- bash/functions/cybersecurity/audit_ssh_config.md
- bash/functions/cybersecurity/check_firewall.md
- bash/functions/cybersecurity/detect_suspicious_users.md
- bash/functions/cybersecurity/encrypt_file.md
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 00:28:20 +02:00

67 lines
2.8 KiB
Markdown

---
name: detect_drift
kind: function
lang: py
domain: datascience
version: "1.0.0"
purity: pure
signature: "def detect_drift(history: list[dict], current: dict, fields: list[str], threshold: float = 2.0) -> list[dict]"
description: "Detecta drift estadistico comparando metricas de la ejecucion actual contra el historial usando z-score. Si |z| > threshold, el campo ha drifteado. Util para monitorizar executions en operations.db."
tags: [drift, statistics, z-score, monitoring, executions, operations, datascience, pendiente-usar]
uses_functions: []
uses_types: []
returns: []
returns_optional: false
error_type: ""
imports: [math]
params:
- name: history
desc: "lista de dicts con metricas historicas (ej: [{'records_out': 100, 'duration_ms': 500}, ...])"
- name: current
desc: "dict con metricas actuales a comparar (ej: {'records_out': 50, 'duration_ms': 2000})"
- name: fields
desc: "lista de nombres de campos a monitorizar en current vs history (ej: ['records_out', 'duration_ms'])"
- name: threshold
desc: "umbral de z-score para declarar drift (tipico: 2.0-3.0). Mayor = menos sensible."
output: "lista de dicts con {field, current, mean, std, z_score, drifted} para cada campo monitoreado"
tested: true
tests:
- "campo con drift claro (z > threshold)"
- "campo estable (z < threshold)"
- "historial con un solo punto → std=0, no puede calcular → drifted=False con nota"
- "historial vacio → todos drifted=False"
- "threshold custom"
test_file_path: "python/functions/datascience/detect_drift_test.py"
file_path: "python/functions/datascience/detect_drift.py"
---
## Ejemplo
```python
history = [
{"records_out": 100, "duration_ms": 500},
{"records_out": 105, "duration_ms": 480},
{"records_out": 98, "duration_ms": 510},
]
current = {"records_out": 50, "duration_ms": 2000}
results = detect_drift(history, current, ["records_out", "duration_ms"])
# [
# {"field": "records_out", "current": 50, "mean": 101.0, "std": 3.6, "z_score": -14.2, "drifted": True},
# {"field": "duration_ms", "current": 2000, "mean": 496.7, "std": 15.3, "z_score": 98.3, "drifted": True},
# ]
```
## Notas
Funcion pura. Solo stdlib (`math`).
El z-score usa desviacion estandar poblacional (dividir por N, no N-1) para ser consistente con historial de cualquier tamanio.
Casos especiales:
- **Historial vacio**: z_score=0.0, drifted=False para todos los campos.
- **Un solo punto en historial**: std=0.0, z_score=0.0, drifted=False. No hay suficiente historia para calcular variabilidad.
- **Std=0 con N>=2**: todos los valores historicos identicos. z_score=0.0, drifted=False (cualquier desviacion seria tecnicamente infinita, pero se asume que el sistema es muy estable).
Pensado para el paso ANALIZAR del bucle reactivo: comparar `metrics` de la ejecucion actual con executions historicas de `operations.db`.