feat: enrichers offline split_sentences + extract_iocs_text
Para probar la app sin depender de red (DDG bloquea con captcha desde ciertas IPs). Ambos aplican grouping (umbral 50, preview K=10) replicando el patron de web_search. - split_sentences: parte texto en frases (regex), crea nodos Sentence conectados con SENTENCE_OF. - extract_iocs_text: variante de extract_text_entities que lee directo metadata.text/description/name, sin requerir fetch previo. Vendoriza extract_iocs_py_cybersecurity. Multi-tipo, agrupado en un solo Group heterogeneo (decision 6 multi-grupo-por-tipo es fase 2). - Tipo Sentence en types.yaml. Tests pytest cubren below/above threshold para ambos.
This commit is contained in:
@@ -0,0 +1,11 @@
|
||||
id: extract_iocs_text
|
||||
name: "Extract IoCs from text"
|
||||
description: "Extrae IoCs (IPs, emails, dominios, hashes, crypto wallets, CVEs, MAC, telefonos) directamente del texto del nodo. No requiere fetch previo. Sin red."
|
||||
applies_to: [text, Text]
|
||||
emits: [Email, IPAddress, Domain, FileHash, CryptoWallet, CVE, MACAddress, Phone]
|
||||
relations: [EXTRACTED_FROM]
|
||||
uses_functions:
|
||||
- extract_iocs_py_cybersecurity
|
||||
params:
|
||||
- { name: types, type: string, default: "", description: "CSV de tipos a extraer; vacio = todos" }
|
||||
- { name: max_entities, type: int, default: 500 }
|
||||
Reference in New Issue
Block a user