fn_registry/python/functions/datascience/detect_drift.md at 2f7fdd407be5873dabe2c0c68dc1acaee4030d9b

Files

T

egutierrez a03675113a chore: auto-commit (286 archivos)

- .claude/agents/fn-orquestador/SKILL.md
- .claude/commands/fn_claude.md
- .claude/rules/INDEX.md
- .claude/rules/cpp_apps.md
- .claude/rules/ids_naming.md
- CHANGELOG.md
- apps/dag_engine/README.md
- apps/dag_engine/api.go
- apps/dag_engine/dags_migrated/example.yaml
- apps/dag_engine/dags_migrated/example_lineage_tracking.yaml
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 16:33:22 +02:00

2.8 KiB

Raw Blame History

name, kind, lang, domain, version, purity, signature, description, tags, uses_functions, uses_types, returns, returns_optional, error_type, imports, params, output, tested, tests, test_file_path, file_path

name

kind

lang

domain

version

purity

signature

description

tags

uses_functions

uses_types

returns

returns_optional

error_type

imports

params

output

tested

tests

test_file_path

file_path

detect_drift

function

datascience

1.0.0

pure

def detect_drift(history: list[dict], current: dict, fields: list[str], threshold: float = 2.0) -> list[dict]

Detecta drift estadistico comparando metricas de la ejecucion actual contra el historial usando z-score. Si |z| > threshold, el campo ha drifteado. Util para monitorizar executions en operations.db.

drift

statistics

z-score

monitoring

executions

operations

datascience

pendiente-usar

validator

false

math

name	desc
history	lista de dicts con metricas historicas (ej: [{'records_out': 100, 'duration_ms': 500}, ...])

name	desc
current	dict con metricas actuales a comparar (ej: {'records_out': 50, 'duration_ms': 2000})

name	desc
fields	lista de nombres de campos a monitorizar en current vs history (ej: ['records_out', 'duration_ms'])

name	desc
threshold	umbral de z-score para declarar drift (tipico: 2.0-3.0). Mayor = menos sensible.

lista de dicts con {field, current, mean, std, z_score, drifted} para cada campo monitoreado

true

campo con drift claro (z > threshold)

campo estable (z < threshold)

historial con un solo punto → std=0, no puede calcular → drifted=False con nota

historial vacio → todos drifted=False

threshold custom

python/functions/datascience/detect_drift_test.py

python/functions/datascience/detect_drift.py

Ejemplo

history = [
    {"records_out": 100, "duration_ms": 500},
    {"records_out": 105, "duration_ms": 480},
    {"records_out": 98,  "duration_ms": 510},
]
current = {"records_out": 50, "duration_ms": 2000}

results = detect_drift(history, current, ["records_out", "duration_ms"])
# [
#   {"field": "records_out", "current": 50, "mean": 101.0, "std": 3.6, "z_score": -14.2, "drifted": True},
#   {"field": "duration_ms", "current": 2000, "mean": 496.7, "std": 15.3, "z_score": 98.3, "drifted": True},
# ]

Notas

Funcion pura. Solo stdlib (math).

El z-score usa desviacion estandar poblacional (dividir por N, no N-1) para ser consistente con historial de cualquier tamanio.

Casos especiales:

Historial vacio: z_score=0.0, drifted=False para todos los campos.
Un solo punto en historial: std=0.0, z_score=0.0, drifted=False. No hay suficiente historia para calcular variabilidad.
Std=0 con N>=2: todos los valores historicos identicos. z_score=0.0, drifted=False (cualquier desviacion seria tecnicamente infinita, pero se asume que el sistema es muy estable).

Pensado para el paso ANALIZAR del bucle reactivo: comparar metrics de la ejecucion actual con executions historicas de operations.db.

2.8 KiB Raw Blame History

Ejemplo

Notas

2.8 KiB

Raw Blame History