--- name: extract_graph_gliner2 kind: function lang: py domain: datascience version: "1.0.0" purity: impure signature: "def extract_graph_gliner2(text: str, entity_labels: list[str], relation_labels: list | dict, model: Any, threshold: float = 0.3, include_confidence: bool = False) -> dict" description: "Extrae entidades + relaciones en una sola pasada con GLiNER2. Wrapper de alto nivel: construye schema, ejecuta extraccion, normaliza a dict plano. No aplica post-filtrado ni coreference." tags: [gliner2, ner, relation-extraction, nlp, extraction, graph, zero-shot, datascience, python, apache2] uses_functions: - gliner2_load_model_py_datascience uses_types: [] returns: [] returns_optional: false error_type: "error_go_core" imports: [time, typing.Any] params: - name: text desc: "Texto a analizar. Recomendado hasta 1500 chars (pre-chunkeado con chunk_with_overlap). Textos mas largos degradan el recall de GLiNER2." - name: entity_labels desc: "Lista de strings con los tipos de entidad en minusculas snake_case. E.g. ['person', 'organization', 'location']. Labels en snake_case mejoran el recall segun notebook 08." - name: relation_labels desc: "Lista de strings o dict {label: description} con los tipos de relacion. E.g. ['works_at', 'ceo_of'] o {'works_at': 'person works at an organization'}." - name: model desc: "Instancia GLiNER2 cargada con gliner2_load_model. Inyectada por el caller (no se carga aqui)." - name: threshold desc: "Umbral de confianza entre 0 y 1. 0.3 validado empiricamente en notebook 04 (gliner_glirel_tuning). Valores mas bajos = mas recall, mas ruido." - name: include_confidence desc: "Si True, GLiNER2 devuelve scores internos por entidad y relacion. False por defecto para output mas limpio." output: "Dict con tres campos: 'entities' -> {type: [name, ...]}, 'relation_extraction' -> {rel_type: [(head, tail), ...]}, 'elapsed_s' -> float. Compatible con aggregate_extraction_results." tested: true tests: - "output tiene claves entities relation_extraction elapsed_s" - "stub model retorna shape correcto" test_file_path: "python/functions/datascience/tests/test_extract_graph_gliner2.py" file_path: "python/functions/datascience/extract_graph_gliner2.py" notes: | LICENSE: GLiNER2 (fastino/gliner2-large-v1) es Apache 2.0 — uso comercial OK. impure: invoca inferencia del modelo (side effect computacional + tiempo variable). El model se inyecta externamente para permitir cache y reutilizacion entre llamadas. Para textos largos usar chunk_with_overlap antes y llamar esta funcion por chunk, luego agregar con aggregate_extraction_results. --- ## Ejemplo ```python from datascience.gliner2_load_model import gliner2_load_model from datascience.extract_graph_gliner2 import extract_graph_gliner2 model = gliner2_load_model(device="auto") result = extract_graph_gliner2( text="Carlos Torres es presidente de BBVA, con sede en Bilbao.", entity_labels=["person", "organization", "location"], relation_labels=["president_of", "headquartered_in"], model=model, threshold=0.3, ) # result["entities"] -> {"person": ["Carlos Torres"], ...} # result["relation_extraction"]-> {"president_of": [("Carlos Torres", "BBVA")]} # result["elapsed_s"] -> 0.234 ```