--- name: extract_file_hashes kind: function lang: py domain: cybersecurity version: "1.0.0" purity: pure signature: "def extract_file_hashes(text: str) -> list[dict]" description: "Extrae hashes MD5/SHA1/SHA256/SHA512 de un texto, con offsets y algoritmo deducido por longitud (32, 40, 64 o 128 hex). Util para extraer IoCs de reportes de threat intelligence." tags: [ioc, hash, md5, sha1, sha256, sha512, regex, extract, cybersecurity, python] uses_functions: [] uses_types: [] returns: [] returns_optional: false error_type: "" imports: [re] params: - name: text desc: "string de texto del que extraer hashes hex" output: "lista de dicts con {value, start, end, type='file_hash', algorithm} por cada hash encontrado" tested: true tests: - "MD5 (32 hex), SHA1 (40), SHA256 (64), SHA512 (128)" - "Longitudes intermedias se ignoran" - "Insensible a mayusculas en hex" test_file_path: "python/functions/cybersecurity/tests/test_extract_iocs.py" file_path: "python/functions/cybersecurity/extract_file_hashes.py" --- ## Ejemplo ```python extract_file_hashes("MD5: 5d41402abc4b2a76b9719d911017c592 SHA1: aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d") # [{"value": "5d41402abc4b2a76b9719d911017c592", "start": 5, "end": 37, # "type": "file_hash", "algorithm": "md5"}, # {"value": "aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d", "start": 44, "end": 84, # "type": "file_hash", "algorithm": "sha1"}] ``` ## Notas Detecta solo longitudes canonicas (32/40/64/128 hex). Una secuencia hex de 50 caracteres se ignora. Word-boundary `\b` evita matchear sub-strings de hex mas largo. ETH wallets (`0x` + 40 hex = 42 chars totales) NO matchean este extractor por el `\b` y la ausencia del prefijo `0x` en este patron — el pipeline `extract_iocs` deduplica overlaps si los hubiera.