feat: funciones Python datascience, finance, cybersecurity y pipelines

Datascience: aggregate_by_group, deduplicate_entities/relations, detect_drift, diff_entities/relations, extract_entities/relations_llm, hotness_score, melt, merge_graphs, pivot, build_entity/relation_schema_prompt. Finance: avellaneda_stoikov_quotes, generate_gbm_prices, generate_taker_order, hawkes_intensity + módulo finance.py. Cybersecurity: envelope_encrypt/decrypt + módulo cybersecurity.py. Pipelines: extraction_pipeline, monte_carlo_market, run_market_sim. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:11:32 +02:00
parent 25a392df48
commit 63a9cb5273
62 changed files with 5376 additions and 0 deletions
@@ -0,0 +1,56 @@
+---
+name: detect_drift
+kind: function
+lang: py
+domain: datascience
+version: "1.0.0"
+purity: pure
+signature: "def detect_drift(history: list[dict], current: dict, fields: list[str], threshold: float = 2.0) -> list[dict]"
+description: "Detecta drift estadistico comparando metricas de la ejecucion actual contra el historial usando z-score. Si |z| > threshold, el campo ha drifteado. Util para monitorizar executions en operations.db."
+tags: [drift, statistics, z-score, monitoring, executions, operations, datascience]
+uses_functions: []
+uses_types: []
+returns: []
+returns_optional: false
+error_type: ""
+imports: [math]
+tested: true
+tests:
+  - "campo con drift claro (z > threshold)"
+  - "campo estable (z < threshold)"
+  - "historial con un solo punto → std=0, no puede calcular → drifted=False con nota"
+  - "historial vacio → todos drifted=False"
+  - "threshold custom"
+test_file_path: "python/functions/datascience/detect_drift_test.py"
+file_path: "python/functions/datascience/detect_drift.py"
+---
+
+## Ejemplo
+
+```python
+history = [
+    {"records_out": 100, "duration_ms": 500},
+    {"records_out": 105, "duration_ms": 480},
+    {"records_out": 98,  "duration_ms": 510},
+]
+current = {"records_out": 50, "duration_ms": 2000}
+
+results = detect_drift(history, current, ["records_out", "duration_ms"])
+# [
+#   {"field": "records_out", "current": 50, "mean": 101.0, "std": 3.6, "z_score": -14.2, "drifted": True},
+#   {"field": "duration_ms", "current": 2000, "mean": 496.7, "std": 15.3, "z_score": 98.3, "drifted": True},
+# ]
+```
+
+## Notas
+
+Funcion pura. Solo stdlib (`math`).
+
+El z-score usa desviacion estandar poblacional (dividir por N, no N-1) para ser consistente con historial de cualquier tamanio.
+
+Casos especiales:
+- **Historial vacio**: z_score=0.0, drifted=False para todos los campos.
+- **Un solo punto en historial**: std=0.0, z_score=0.0, drifted=False. No hay suficiente historia para calcular variabilidad.
+- **Std=0 con N>=2**: todos los valores historicos identicos. z_score=0.0, drifted=False (cualquier desviacion seria tecnicamente infinita, pero se asume que el sistema es muy estable).
+
+Pensado para el paso ANALIZAR del bucle reactivo: comparar `metrics` de la ejecucion actual con executions historicas de `operations.db`.