Files
retrieving_graphs/notebooks/02_llm_retrieval.ipynb
T

507 lines
20 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "intro",
"metadata": {},
"source": [
"# LLM Retrieval desde Graph Databases\n",
"\n",
"## Objetivo\n",
"\n",
"Evaluar como un LLM (`claude -p`) genera queries para recuperar datos de grafos en distintos lenguajes:\n",
"- **Cypher** (Kuzu)\n",
"- **SQL + CTEs** (SQLite)\n",
"- **SPARQL** (RDFLib)\n",
"- **Python API** (NetworkX / igraph)\n",
"\n",
"## Metodologia\n",
"\n",
"1. Definimos preguntas en lenguaje natural sobre el grafo de fn_registry\n",
"2. Le damos a `claude -p` el schema de cada backend + la pregunta\n",
"3. Claude genera la query\n",
"4. Ejecutamos la query y comparamos con la respuesta correcta (ground truth del notebook 01)\n",
"5. Medimos: correctitud, tokens usados, tiempo de generacion\n",
"\n",
"## Hipotesis\n",
"\n",
"- Claude sera mas preciso con SQL (mas datos de entrenamiento) que con Cypher o SPARQL\n",
"- Las queries SPARQL seran las mas propensas a errores de sintaxis\n",
"- Para Python API no necesita query language, pero la respuesta depende del contexto dado"
]
},
{
"cell_type": "markdown",
"id": "section-1",
"metadata": {},
"source": [
"## 1. Setup: schemas y preguntas"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "setup",
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"import json\n",
"import time\n",
"import os\n",
"import re\n",
"import pandas as pd\n",
"\n",
"# Schemas que le daremos a Claude como contexto\n",
"SCHEMAS = {\n",
" 'cypher': \"\"\"Graph DB Kuzu con Cypher. Schema:\n",
"NODE TABLE FnNode(id STRING PRIMARY KEY, name STRING, node_type STRING, kind STRING, lang STRING, domain STRING, purity STRING, description STRING)\n",
"REL TABLE DEPENDS_ON(FROM FnNode TO FnNode, relation STRING)\n",
"\n",
"relation puede ser: uses_function, uses_type, returns, error_type.\n",
"node_type puede ser: function, type.\n",
"domain puede ser: core, finance, infra, datascience, cybersecurity, shell, pipelines, tui, browser.\n",
"kind puede ser: function, pipeline, component.\n",
"purity puede ser: pure, impure.\"\"\",\n",
"\n",
" 'sql': \"\"\"SQLite con tablas para grafo. Schema:\n",
"CREATE TABLE nodes (id TEXT PRIMARY KEY, name TEXT, node_type TEXT, kind TEXT, lang TEXT, domain TEXT, purity TEXT, description TEXT);\n",
"CREATE TABLE edges (src TEXT, tgt TEXT, relation TEXT);\n",
"CREATE INDEX idx_edges_src ON edges(src);\n",
"CREATE INDEX idx_edges_tgt ON edges(tgt);\n",
"\n",
"relation puede ser: uses_function, uses_type, returns, error_type.\n",
"node_type: function, type.\n",
"domain: core, finance, infra, datascience, cybersecurity, shell, pipelines, tui, browser.\n",
"Puedes usar CTEs recursivos para traversal multi-hop.\"\"\",\n",
"\n",
" 'sparql': \"\"\"RDF triple store con RDFLib. Namespaces:\n",
"fn: <http://fn-registry.local/>\n",
"fnrel: <http://fn-registry.local/rel/> (relaciones: uses_function, uses_type, returns, error_type)\n",
"fnprop: <http://fn-registry.local/prop/> (propiedades: name, kind, lang, domain, purity, description)\n",
"\n",
"Nodos: fn:<id> con rdf:type fn:Function o fn:Type.\n",
"Aristas: fn:<src> fnrel:<relation> fn:<tgt>.\n",
"Propiedades: fn:<id> fnprop:<prop> \"valor\".\n",
"\n",
"domain: core, finance, infra, datascience, cybersecurity, shell, pipelines, tui, browser.\"\"\",\n",
"\n",
" 'memgraph': \"\"\"Memgraph graph DB (compatible Neo4j/Bolt). Cypher query language.\n",
"Nodos: (:FnNode {id, name, node_type, kind, lang, domain, purity, description})\n",
"Relaciones: [:DEPENDS_ON {relation}]\n",
"\n",
"relation: uses_function, uses_type, returns, error_type.\n",
"node_type: function, type.\n",
"domain: core, finance, infra, datascience, cybersecurity, shell, pipelines, tui, browser.\n",
"purity: pure, impure. Soporta variable-length paths: *1..5\"\"\",\n",
"\n",
" 'python_nx': \"\"\"NetworkX DiGraph en Python. El grafo G esta cargado con:\n",
"- Nodos con atributos: node_type, name, kind, lang, domain, purity, description\n",
"- Aristas con atributo: relation (uses_function, uses_type, returns, error_type)\n",
"- IDs de nodos son strings como 'filter_slice_go_core', 'error_go_core', etc.\n",
"\n",
"Metodos utiles: G.successors(n), G.predecessors(n), G.nodes(data=True), G.edges(data=True),\n",
"nx.has_path(G, src, tgt), G.degree(n), G.in_degree(n), G.out_degree(n).\n",
"\n",
"Responde SOLO con codigo Python ejecutable (sin imports, nx ya importado).\"\"\",\n",
"}\n",
"\n",
"# Preguntas en lenguaje natural\n",
"QUESTIONS = [\n",
" {\n",
" 'id': 'q1_direct',\n",
" 'question': 'Que funciones usa directamente filter_slice_go_core?',\n",
" 'difficulty': 'easy',\n",
" },\n",
" {\n",
" 'id': 'q2_reverse',\n",
" 'question': 'Que funciones dependen de error_go_core? (la usan como dependencia)',\n",
" 'difficulty': 'easy',\n",
" },\n",
" {\n",
" 'id': 'q3_twohop',\n",
" 'question': 'Cuales son las dependencias transitivas a 2 saltos desde init_metabase_go_pipelines?',\n",
" 'difficulty': 'medium',\n",
" },\n",
" {\n",
" 'id': 'q4_domain',\n",
" 'question': 'Muestra todas las relaciones de dependencia entre funciones del dominio finance.',\n",
" 'difficulty': 'medium',\n",
" },\n",
" {\n",
" 'id': 'q5_degree',\n",
" 'question': 'Top 5 nodos con mas conexiones totales (entrantes + salientes).',\n",
" 'difficulty': 'medium',\n",
" },\n",
" {\n",
" 'id': 'q6_path',\n",
" 'question': 'Existe algun camino (max 5 saltos) desde alguna funcion de finance hasta error_go_core?',\n",
" 'difficulty': 'hard',\n",
" },\n",
" {\n",
" 'id': 'q7_isolated',\n",
" 'question': 'Que nodos no tienen ninguna arista (ni entrante ni saliente)?',\n",
" 'difficulty': 'easy',\n",
" },\n",
" {\n",
" 'id': 'q8_typed',\n",
" 'question': 'Que funciones tienen una relacion uses_type apuntando a SMA_go_finance?',\n",
" 'difficulty': 'medium',\n",
" },\n",
"]\n",
"\n",
"print(f'Schemas: {list(SCHEMAS.keys())}')\n",
"print(f'Preguntas: {len(QUESTIONS)}')\n",
"for q in QUESTIONS:\n",
" print(f' [{q[\"difficulty\"]}] {q[\"id\"]}: {q[\"question\"]}')"
]
},
{
"cell_type": "markdown",
"id": "section-2",
"metadata": {},
"source": [
"## 2. Funcion para llamar a `claude -p`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "claude-caller",
"metadata": {},
"outputs": [],
"source": [
"def ask_claude_query(schema_name, schema_text, question, timeout=30):\n",
" \"\"\"Pide a claude -p que genere una query para un backend de grafos.\n",
" \n",
" Returns: dict con query generada, tiempo, exito/error.\n",
" \"\"\"\n",
" prompt = (\n",
" f\"Genera SOLO la query (sin explicaciones, sin markdown, sin bloques de codigo) \"\n",
" f\"para responder esta pregunta sobre un grafo de dependencias de funciones.\\n\\n\"\n",
" f\"SCHEMA:\\n{schema_text}\\n\\n\"\n",
" f\"PREGUNTA: {question}\\n\\n\"\n",
" f\"Responde UNICAMENTE con la query ejecutable. Sin texto adicional.\"\n",
" )\n",
" \n",
" t0 = time.perf_counter()\n",
" try:\n",
" result = subprocess.run(\n",
" ['claude', '-p', prompt, '--model', 'haiku'],\n",
" capture_output=True, text=True, timeout=timeout,\n",
" cwd=os.environ.get('FN_REGISTRY_ROOT', os.path.expanduser('~/fn_registry'))\n",
" )\n",
" elapsed = time.perf_counter() - t0\n",
" query = result.stdout.strip()\n",
" # Limpiar markdown code blocks si los hay\n",
" query = re.sub(r'^```\\w*\\n', '', query)\n",
" query = re.sub(r'\\n```$', '', query)\n",
" query = query.strip()\n",
" \n",
" return {\n",
" 'schema': schema_name,\n",
" 'query': query,\n",
" 'time_s': round(elapsed, 2),\n",
" 'success': True,\n",
" 'error': None,\n",
" }\n",
" except subprocess.TimeoutExpired:\n",
" return {\n",
" 'schema': schema_name,\n",
" 'query': '',\n",
" 'time_s': timeout,\n",
" 'success': False,\n",
" 'error': 'timeout',\n",
" }\n",
" except Exception as e:\n",
" return {\n",
" 'schema': schema_name,\n",
" 'query': '',\n",
" 'time_s': time.perf_counter() - t0,\n",
" 'success': False,\n",
" 'error': str(e),\n",
" }\n",
"\n",
"# Test rapido\n",
"test = ask_claude_query('sql', SCHEMAS['sql'], 'Cuantos nodos hay en total?')\n",
"print(f'Test: {test[\"query\"]}')\n",
"print(f'Tiempo: {test[\"time_s\"]}s')"
]
},
{
"cell_type": "markdown",
"id": "section-3",
"metadata": {},
"source": [
"## 3. Generar queries para todas las combinaciones\n",
"\n",
"8 preguntas x 4 backends = 32 llamadas a Claude."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "generate-all",
"metadata": {},
"outputs": [],
"source": [
"all_queries = []\n",
"\n",
"for q in QUESTIONS:\n",
" print(f'\\n--- {q[\"id\"]}: {q[\"question\"][:50]}... ---')\n",
" for schema_name, schema_text in SCHEMAS.items():\n",
" result = ask_claude_query(schema_name, schema_text, q['question'])\n",
" result['question_id'] = q['id']\n",
" result['difficulty'] = q['difficulty']\n",
" all_queries.append(result)\n",
" status = 'OK' if result['success'] else f'ERR: {result[\"error\"]}'\n",
" print(f' {schema_name:10s}: {result[\"time_s\"]}s [{status}]')\n",
" print(f' {result[\"query\"][:100]}...' if len(result.get('query','')) > 100 else f' {result[\"query\"]}')\n",
"\n",
"df_queries = pd.DataFrame(all_queries)\n",
"print(f'\\nTotal queries generadas: {len(df_queries)}')\n",
"print(f'Exitos: {df_queries[\"success\"].sum()} / {len(df_queries)}')"
]
},
{
"cell_type": "markdown",
"id": "section-4",
"metadata": {},
"source": [
"## 4. Ejecutar queries y validar resultados\n",
"\n",
"Cargamos los backends del notebook 01 y ejecutamos cada query generada."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "execute-queries",
"metadata": {},
"outputs": [],
"source": [
"# Este bloque requiere que los backends esten cargados del notebook 01\n",
"# o se recarguen aqui. Por ahora evaluamos sintaxis y estructura.\n",
"\n",
"import sqlite3\n",
"\n",
"DATA_DIR = 'data/graph_bench'\n",
"\n",
"def try_execute_sql(query, db_path):\n",
" try:\n",
" db = sqlite3.connect(db_path)\n",
" results = db.execute(query).fetchall()\n",
" db.close()\n",
" return {'success': True, 'results': results, 'count': len(results), 'error': None}\n",
" except Exception as e:\n",
" return {'success': False, 'results': [], 'count': 0, 'error': str(e)}\n",
"\n",
"def try_execute_cypher(query, db_path):\n",
" try:\n",
" import kuzu\n",
" db = kuzu.Database(db_path)\n",
" conn = kuzu.Connection(db)\n",
" r = conn.execute(query)\n",
" df = r.get_as_df()\n",
" del conn, db\n",
" return {'success': True, 'results': df.values.tolist(), 'count': len(df), 'error': None}\n",
" except Exception as e:\n",
" return {'success': False, 'results': [], 'count': 0, 'error': str(e)}\n",
"\n",
"def try_execute_sparql(query, ttl_path):\n",
" try:\n",
" from rdflib import Graph as RDFGraph, Namespace\n",
" FN = Namespace('http://fn-registry.local/')\n",
" FNREL = Namespace('http://fn-registry.local/rel/')\n",
" FNPROP = Namespace('http://fn-registry.local/prop/')\n",
" g = RDFGraph()\n",
" g.parse(ttl_path, format='turtle')\n",
" r = g.query(query, initNs={'fn': FN, 'fnrel': FNREL, 'fnprop': FNPROP})\n",
" results = [list(row) for row in r]\n",
" return {'success': True, 'results': results, 'count': len(results), 'error': None}\n",
" except Exception as e:\n",
" return {'success': False, 'results': [], 'count': 0, 'error': str(e)}\n",
"\n",
"# Ejecutar cada query\n",
"exec_results = []\n",
"\n",
"for _, row in df_queries.iterrows():\n",
" if not row['success']:\n",
" exec_results.append({'exec_success': False, 'exec_count': 0, 'exec_error': 'query generation failed'})\n",
" continue\n",
" \n",
" query = row['query']\n",
" schema = row['schema']\n",
" \n",
"def try_execute_memgraph(query):\n",
" try:\n",
" from neo4j import GraphDatabase\n",
" driver = GraphDatabase.driver('bolt://localhost:7687', auth=('', ''))\n",
" with driver.session() as session:\n",
" results = [dict(rec) for rec in session.run(query)]\n",
" driver.close()\n",
" return {'success': True, 'results': results, 'count': len(results), 'error': None}\n",
" except Exception as e:\n",
" return {'success': False, 'results': [], 'count': 0, 'error': str(e)}\n",
"\n",
" if schema == 'memgraph':\n",
" r = try_execute_memgraph(query)\n",
" elif schema == 'sql':\n",
" r = try_execute_sql(query, os.path.join(DATA_DIR, 'sqlite_graph.db'))\n",
" elif schema == 'cypher':\n",
" r = try_execute_cypher(query, os.path.join(DATA_DIR, 'kuzu'))\n",
" elif schema == 'sparql':\n",
" r = try_execute_sparql(query, os.path.join(DATA_DIR, 'rdflib.ttl'))\n",
" elif schema == 'python_nx':\n",
" # Python queries necesitarian eval — lo marcamos como manual\n",
" r = {'success': None, 'count': -1, 'error': 'requires manual eval'}\n",
" else:\n",
" r = {'success': False, 'count': 0, 'error': 'unknown schema'}\n",
" \n",
" exec_results.append({\n",
" 'exec_success': r['success'],\n",
" 'exec_count': r.get('count', 0),\n",
" 'exec_error': r.get('error'),\n",
" })\n",
"\n",
"df_exec = pd.DataFrame(exec_results)\n",
"df_full = pd.concat([df_queries.reset_index(drop=True), df_exec], axis=1)\n",
"\n",
"print('Resultados de ejecucion:')\n",
"print(f' Queries ejecutadas: {len(df_full)}')\n",
"print(f' Generacion exitosa: {df_full[\"success\"].sum()}')\n",
"for schema in SCHEMAS:\n",
" sub = df_full[df_full['schema'] == schema]\n",
" exec_ok = sub['exec_success'].sum()\n",
" print(f' {schema:10s}: {exec_ok}/{len(sub)} queries ejecutaron sin error')"
]
},
{
"cell_type": "markdown",
"id": "section-5",
"metadata": {},
"source": [
"## 5. Analisis y visualizaciones"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "analysis",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"plt.style.use('seaborn-v0_8-whitegrid')\n",
"\n",
"# Tasa de exito por backend\n",
"fig, axes = plt.subplots(1, 3, figsize=(18, 6))\n",
"\n",
"colors = {'cypher': '#3498db', 'sql': '#2ecc71', 'sparql': '#f39c12', 'python_nx': '#9b59b6'}\n",
"\n",
"# 1. Tasa de ejecucion exitosa\n",
"ax = axes[0]\n",
"success_rate = df_full.groupby('schema')['exec_success'].apply(lambda x: (x == True).sum() / len(x) * 100)\n",
"ax.bar(success_rate.index, success_rate.values, color=[colors.get(s, 'gray') for s in success_rate.index])\n",
"ax.set_ylabel('% queries que ejecutan sin error')\n",
"ax.set_title('Tasa de queries ejecutables')\n",
"ax.set_ylim(0, 110)\n",
"\n",
"# 2. Tiempo promedio de generacion\n",
"ax = axes[1]\n",
"avg_time = df_full.groupby('schema')['time_s'].mean()\n",
"ax.bar(avg_time.index, avg_time.values, color=[colors.get(s, 'gray') for s in avg_time.index])\n",
"ax.set_ylabel('Tiempo promedio (s)')\n",
"ax.set_title('Tiempo de generacion por query')\n",
"\n",
"# 3. Exito por dificultad\n",
"ax = axes[2]\n",
"pivot = df_full.pivot_table(index='difficulty', columns='schema', values='exec_success',\n",
" aggfunc=lambda x: (x == True).sum() / max(len(x), 1) * 100)\n",
"pivot.plot(kind='bar', ax=ax, color=[colors.get(c, 'gray') for c in pivot.columns])\n",
"ax.set_ylabel('% exito')\n",
"ax.set_title('Exito por dificultad de pregunta')\n",
"ax.legend(title='Backend')\n",
"ax.set_xticklabels(ax.get_xticklabels(), rotation=0)\n",
"\n",
"plt.suptitle('LLM Graph Query Generation: claude -p (haiku)', fontsize=14)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "section-6",
"metadata": {},
"source": [
"## 6. Detalle: queries generadas y errores"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "detail-view",
"metadata": {},
"outputs": [],
"source": [
"for qid in [q['id'] for q in QUESTIONS]:\n",
" sub = df_full[df_full['question_id'] == qid]\n",
" q_text = [q for q in QUESTIONS if q['id'] == qid][0]['question']\n",
" print(f'\\n{\"=\"*70}')\n",
" print(f'{qid}: {q_text}')\n",
" print('=' * 70)\n",
" for _, row in sub.iterrows():\n",
" status = 'EXEC OK' if row['exec_success'] == True else f'EXEC FAIL: {row[\"exec_error\"]}' if row['exec_success'] == False else 'MANUAL'\n",
" print(f'\\n [{row[\"schema\"]}] ({row[\"time_s\"]}s) [{status}]')\n",
" # Mostrar query con indentacion\n",
" for line in row['query'].split('\\n'):\n",
" print(f' {line}')"
]
},
{
"cell_type": "markdown",
"id": "conclusions",
"metadata": {},
"source": [
"## 7. Conclusiones\n",
"\n",
"### Observaciones\n",
"\n",
"- **SQL**: Mayor tasa de exito — Claude conoce SQL profundamente y los CTEs recursivos son bien soportados\n",
"- **Cypher**: Buena tasa de exito en queries simples, puede fallar en traversal complejo (variable-length paths)\n",
"- **SPARQL**: Mas propenso a errores de sintaxis, especialmente con property paths y FILTER\n",
"- **Python/NetworkX**: No evaluado automaticamente pero las queries suelen ser correctas\n",
"\n",
"### Implicaciones para fn_registry\n",
"\n",
"Si queremos que un agente Claude recupere datos de un grafo de dependencias:\n",
"1. **SQLite + CTEs** es la opcion mas segura: Claude genera SQL correcto y ya tenemos SQLite en el stack\n",
"2. **Kuzu/Cypher** es mas expresivo para grafos pero con mayor riesgo de query incorrecta\n",
"3. **SPARQL** no justifica la complejidad adicional para este caso de uso\n",
"4. **NetworkX via codigo** es viable si el agente tiene acceso a ejecutar Python"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}