# Funciones para FuzzyGraph: Extraccion automatica de entidades y relaciones Diseño original para `apps/fuzzygraph`. Pipeline completa: ``` documento → extract_text → preprocess → split_chunks → extract_entities_llm (por chunk) → deduplicate_entities → extract_relations_llm (por chunk + entities) → deduplicate_relations → insert operations.db ``` ## Dependencias del registry existentes | Funcion existente | ID | Para que | |---|---|---| | levenshtein_distance | `levenshtein_distance_py_cybersecurity` | Fuzzy matching de nombres | | jaccard_similarity | `jaccard_similarity_py_cybersecurity` | Fuzzy matching de tokens | | extract_urls | `extract_urls_py_cybersecurity` | Pre-extraer URLs como entidades Domain | | normalize_url | `normalize_url_py_cybersecurity` | Normalizar URLs antes de dedup | ## Dependencias de specs pendientes (OpenViking/MiroFish) | Spec | Para que | |---|---| | mf_09 extract_text_from_file | Sacar texto de PDF/MD/TXT | | mf_01 split_text_into_chunks | Chunks con overlap | | mf_02 preprocess_text | Normalizar whitespace | | mf_04 parse_llm_json | Limpiar JSON del LLM | | mf_06 retry_with_backoff | Reintentar llamadas LLM | | mf_11 call_batch_with_retry | Procesar chunks en batch | ## Funciones nuevas (este directorio) | # | Archivo | Dominio | Funcion | |---|---------|---------|---------| | 01 | fg_extract_entities_llm.md | datascience | extract_entities_llm | | 02 | fg_extract_relations_llm.md | datascience | extract_relations_llm | | 03 | fg_deduplicate_entities.md | datascience | deduplicate_entities | | 04 | fg_deduplicate_relations.md | datascience | deduplicate_relations | | 05 | fg_build_entity_schema_prompt.md | datascience | build_entity_schema_prompt, build_relation_schema_prompt | | 06 | fg_normalize_entity_name.md | core | normalize_entity_name | | 07 | fg_merge_entity_attributes.md | core | merge_entity_attributes | | 08 | fg_extraction_pipeline.md | pipelines | extraction_pipeline (orquestador completo) | ## Tipos nuevos | # | Archivo | Dominio | Tipos | |---|---------|---------|-------| | 09 | fg_type_extraction.md | datascience | EntityCandidate, RelationCandidate, ExtractionResult, DeduplicationResult |