chore: auto-commit (286 archivos)

- .claude/agents/fn-orquestador/SKILL.md
- .claude/commands/fn_claude.md
- .claude/rules/INDEX.md
- .claude/rules/cpp_apps.md
- .claude/rules/ids_naming.md
- CHANGELOG.md
- apps/dag_engine/README.md
- apps/dag_engine/api.go
- apps/dag_engine/dags_migrated/example.yaml
- apps/dag_engine/dags_migrated/example_lineage_tracking.yaml
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 16:33:22 +02:00
parent 0b9af8f1bb
commit a03675113a
281 changed files with 12596 additions and 19526 deletions
+32 -3
View File
@@ -1,6 +1,6 @@
# Capability: nlp
_(Descripcion del grupo — editar a mano)_
Pipeline NLP para extraccion de entities/relations sobre documentos en castellano (proyecto OSINT principalmente). Cubre: lectura de PDFs (`extract_pdf_text`, `clean_pdf_text`), OCR fallback, chunking (`chunk_with_overlap`), inferencia GLiNER (NER) + GLiREL (relation extraction), dedup (`dedup_entities`, `dedup_relations`), agregacion (`aggregate_extraction_results`), extraccion de elementos especificos (URLs, crypto wallets, IPs, dominios).
## Funciones
@@ -43,8 +43,37 @@ _(Descripcion del grupo — editar a mano)_
## Ejemplo canonico
_(Anadir 1-2 bloques de codigo end-to-end)_
### Pipeline completo PDF -> entities + relations
```python
import os, sys
sys.path.insert(0, os.path.join(os.environ["FN_REGISTRY_ROOT"], "python", "functions"))
from core import extract_pdf_text, clean_pdf_text, chunk_with_overlap
from datascience import (
gliner_extract_entities, glirel_extract_relations,
dedup_entities, dedup_relations, aggregate_extraction_results,
)
raw = extract_pdf_text("/path/to/doc.pdf")
text = clean_pdf_text(raw)
chunks = chunk_with_overlap(text, size=512, overlap=64)
entities = []
relations = []
for ch in chunks:
entities.extend(gliner_extract_entities(ch, labels=["PERSON","ORG","ACCOUNT"]))
relations.extend(glirel_extract_relations(ch, entity_pairs=entities))
result = aggregate_extraction_results(
entities=dedup_entities(entities),
relations=dedup_relations(relations),
)
```
## Fronteras
_(Que NO cubre este grupo)_
- **NO entrena modelos**. Solo inferencia con GLiNER/GLiREL pre-entrenados (HuggingFace).
- **NO maneja embeddings densos** (sentence-transformers / e5). Para vectores, usa funciones del grupo `ml`.
- **NO hace traduccion ni summarization LLM**. Solo NER + RE. Para LLM, ver tag `llm`.
- **NO escribe a BD** automaticamente. La persistencia (vault, sqlite, parquet) la maneja el caller via funciones de `infra`/`datascience`.