a9cd28b010
Tracking de issues completados (jupyter tools) y funciones implementadas (specs de diseño ya resueltas).
2.2 KiB
2.2 KiB
Funciones para FuzzyGraph: Extraccion automatica de entidades y relaciones
Diseño original para apps/fuzzygraph. Pipeline completa:
documento → extract_text → preprocess → split_chunks
→ extract_entities_llm (por chunk) → deduplicate_entities
→ extract_relations_llm (por chunk + entities) → deduplicate_relations
→ insert operations.db
Dependencias del registry existentes
| Funcion existente | ID | Para que |
|---|---|---|
| levenshtein_distance | levenshtein_distance_py_cybersecurity |
Fuzzy matching de nombres |
| jaccard_similarity | jaccard_similarity_py_cybersecurity |
Fuzzy matching de tokens |
| extract_urls | extract_urls_py_cybersecurity |
Pre-extraer URLs como entidades Domain |
| normalize_url | normalize_url_py_cybersecurity |
Normalizar URLs antes de dedup |
Dependencias de specs pendientes (OpenViking/MiroFish)
| Spec | Para que |
|---|---|
| mf_09 extract_text_from_file | Sacar texto de PDF/MD/TXT |
| mf_01 split_text_into_chunks | Chunks con overlap |
| mf_02 preprocess_text | Normalizar whitespace |
| mf_04 parse_llm_json | Limpiar JSON del LLM |
| mf_06 retry_with_backoff | Reintentar llamadas LLM |
| mf_11 call_batch_with_retry | Procesar chunks en batch |
Funciones nuevas (este directorio)
| # | Archivo | Dominio | Funcion |
|---|---|---|---|
| 01 | fg_extract_entities_llm.md | datascience | extract_entities_llm |
| 02 | fg_extract_relations_llm.md | datascience | extract_relations_llm |
| 03 | fg_deduplicate_entities.md | datascience | deduplicate_entities |
| 04 | fg_deduplicate_relations.md | datascience | deduplicate_relations |
| 05 | fg_build_entity_schema_prompt.md | datascience | build_entity_schema_prompt, build_relation_schema_prompt |
| 06 | fg_normalize_entity_name.md | core | normalize_entity_name |
| 07 | fg_merge_entity_attributes.md | core | merge_entity_attributes |
| 08 | fg_extraction_pipeline.md | pipelines | extraction_pipeline (orquestador completo) |
Tipos nuevos
| # | Archivo | Dominio | Tipos |
|---|---|---|---|
| 09 | fg_type_extraction.md | datascience | EntityCandidate, RelationCandidate, ExtractionResult, DeduplicationResult |