chore: añade directorio dev/ con issues y funciones implementadas

Tracking de issues completados (jupyter tools) y funciones implementadas (specs de diseño ya resueltas).
This commit is contained in:
2026-04-05 18:19:36 +02:00
parent 806c819cf7
commit a9cd28b010
64 changed files with 3680 additions and 0 deletions
@@ -0,0 +1,49 @@
# Funciones para FuzzyGraph: Extraccion automatica de entidades y relaciones
Diseño original para `apps/fuzzygraph`. Pipeline completa:
```
documento → extract_text → preprocess → split_chunks
→ extract_entities_llm (por chunk) → deduplicate_entities
→ extract_relations_llm (por chunk + entities) → deduplicate_relations
→ insert operations.db
```
## Dependencias del registry existentes
| Funcion existente | ID | Para que |
|---|---|---|
| levenshtein_distance | `levenshtein_distance_py_cybersecurity` | Fuzzy matching de nombres |
| jaccard_similarity | `jaccard_similarity_py_cybersecurity` | Fuzzy matching de tokens |
| extract_urls | `extract_urls_py_cybersecurity` | Pre-extraer URLs como entidades Domain |
| normalize_url | `normalize_url_py_cybersecurity` | Normalizar URLs antes de dedup |
## Dependencias de specs pendientes (OpenViking/MiroFish)
| Spec | Para que |
|---|---|
| mf_09 extract_text_from_file | Sacar texto de PDF/MD/TXT |
| mf_01 split_text_into_chunks | Chunks con overlap |
| mf_02 preprocess_text | Normalizar whitespace |
| mf_04 parse_llm_json | Limpiar JSON del LLM |
| mf_06 retry_with_backoff | Reintentar llamadas LLM |
| mf_11 call_batch_with_retry | Procesar chunks en batch |
## Funciones nuevas (este directorio)
| # | Archivo | Dominio | Funcion |
|---|---------|---------|---------|
| 01 | fg_extract_entities_llm.md | datascience | extract_entities_llm |
| 02 | fg_extract_relations_llm.md | datascience | extract_relations_llm |
| 03 | fg_deduplicate_entities.md | datascience | deduplicate_entities |
| 04 | fg_deduplicate_relations.md | datascience | deduplicate_relations |
| 05 | fg_build_entity_schema_prompt.md | datascience | build_entity_schema_prompt, build_relation_schema_prompt |
| 06 | fg_normalize_entity_name.md | core | normalize_entity_name |
| 07 | fg_merge_entity_attributes.md | core | merge_entity_attributes |
| 08 | fg_extraction_pipeline.md | pipelines | extraction_pipeline (orquestador completo) |
## Tipos nuevos
| # | Archivo | Dominio | Tipos |
|---|---------|---------|-------|
| 09 | fg_type_extraction.md | datascience | EntityCandidate, RelationCandidate, ExtractionResult, DeduplicationResult |