{ "cells": [ { "cell_type": "markdown", "id": "intro", "metadata": {}, "source": [ "# LLM Retrieval desde Graph Databases\n", "\n", "## Objetivo\n", "\n", "Evaluar como un LLM (`claude -p`) genera queries para recuperar datos de grafos en distintos lenguajes:\n", "- **Cypher** (Kuzu)\n", "- **SQL + CTEs** (SQLite)\n", "- **SPARQL** (RDFLib)\n", "- **Python API** (NetworkX / igraph)\n", "\n", "## Metodologia\n", "\n", "1. Definimos preguntas en lenguaje natural sobre el grafo de fn_registry\n", "2. Le damos a `claude -p` el schema de cada backend + la pregunta\n", "3. Claude genera la query\n", "4. Ejecutamos la query y comparamos con la respuesta correcta (ground truth del notebook 01)\n", "5. Medimos: correctitud, tokens usados, tiempo de generacion\n", "\n", "## Hipotesis\n", "\n", "- Claude sera mas preciso con SQL (mas datos de entrenamiento) que con Cypher o SPARQL\n", "- Las queries SPARQL seran las mas propensas a errores de sintaxis\n", "- Para Python API no necesita query language, pero la respuesta depende del contexto dado" ] }, { "cell_type": "markdown", "id": "section-1", "metadata": {}, "source": [ "## 1. Setup: schemas y preguntas" ] }, { "cell_type": "code", "execution_count": null, "id": "setup", "metadata": {}, "outputs": [], "source": [ "import subprocess\n", "import json\n", "import time\n", "import os\n", "import re\n", "import pandas as pd\n", "\n", "# Schemas que le daremos a Claude como contexto\n", "SCHEMAS = {\n", " 'cypher': \"\"\"Graph DB Kuzu con Cypher. Schema:\n", "NODE TABLE FnNode(id STRING PRIMARY KEY, name STRING, node_type STRING, kind STRING, lang STRING, domain STRING, purity STRING, description STRING)\n", "REL TABLE DEPENDS_ON(FROM FnNode TO FnNode, relation STRING)\n", "\n", "relation puede ser: uses_function, uses_type, returns, error_type.\n", "node_type puede ser: function, type.\n", "domain puede ser: core, finance, infra, datascience, cybersecurity, shell, pipelines, tui, browser.\n", "kind puede ser: function, pipeline, component.\n", "purity puede ser: pure, impure.\"\"\",\n", "\n", " 'sql': \"\"\"SQLite con tablas para grafo. Schema:\n", "CREATE TABLE nodes (id TEXT PRIMARY KEY, name TEXT, node_type TEXT, kind TEXT, lang TEXT, domain TEXT, purity TEXT, description TEXT);\n", "CREATE TABLE edges (src TEXT, tgt TEXT, relation TEXT);\n", "CREATE INDEX idx_edges_src ON edges(src);\n", "CREATE INDEX idx_edges_tgt ON edges(tgt);\n", "\n", "relation puede ser: uses_function, uses_type, returns, error_type.\n", "node_type: function, type.\n", "domain: core, finance, infra, datascience, cybersecurity, shell, pipelines, tui, browser.\n", "Puedes usar CTEs recursivos para traversal multi-hop.\"\"\",\n", "\n", " 'sparql': \"\"\"RDF triple store con RDFLib. Namespaces:\n", "fn: \n", "fnrel: (relaciones: uses_function, uses_type, returns, error_type)\n", "fnprop: (propiedades: name, kind, lang, domain, purity, description)\n", "\n", "Nodos: fn: con rdf:type fn:Function o fn:Type.\n", "Aristas: fn: fnrel: fn:.\n", "Propiedades: fn: fnprop: \"valor\".\n", "\n", "domain: core, finance, infra, datascience, cybersecurity, shell, pipelines, tui, browser.\"\"\",\n", "\n", " 'memgraph': \"\"\"Memgraph graph DB (compatible Neo4j/Bolt). Cypher query language.\n", "Nodos: (:FnNode {id, name, node_type, kind, lang, domain, purity, description})\n", "Relaciones: [:DEPENDS_ON {relation}]\n", "\n", "relation: uses_function, uses_type, returns, error_type.\n", "node_type: function, type.\n", "domain: core, finance, infra, datascience, cybersecurity, shell, pipelines, tui, browser.\n", "purity: pure, impure. Soporta variable-length paths: *1..5\"\"\",\n", "\n", " 'python_nx': \"\"\"NetworkX DiGraph en Python. El grafo G esta cargado con:\n", "- Nodos con atributos: node_type, name, kind, lang, domain, purity, description\n", "- Aristas con atributo: relation (uses_function, uses_type, returns, error_type)\n", "- IDs de nodos son strings como 'filter_slice_go_core', 'error_go_core', etc.\n", "\n", "Metodos utiles: G.successors(n), G.predecessors(n), G.nodes(data=True), G.edges(data=True),\n", "nx.has_path(G, src, tgt), G.degree(n), G.in_degree(n), G.out_degree(n).\n", "\n", "Responde SOLO con codigo Python ejecutable (sin imports, nx ya importado).\"\"\",\n", "}\n", "\n", "# Preguntas en lenguaje natural\n", "QUESTIONS = [\n", " {\n", " 'id': 'q1_direct',\n", " 'question': 'Que funciones usa directamente filter_slice_go_core?',\n", " 'difficulty': 'easy',\n", " },\n", " {\n", " 'id': 'q2_reverse',\n", " 'question': 'Que funciones dependen de error_go_core? (la usan como dependencia)',\n", " 'difficulty': 'easy',\n", " },\n", " {\n", " 'id': 'q3_twohop',\n", " 'question': 'Cuales son las dependencias transitivas a 2 saltos desde init_metabase_go_pipelines?',\n", " 'difficulty': 'medium',\n", " },\n", " {\n", " 'id': 'q4_domain',\n", " 'question': 'Muestra todas las relaciones de dependencia entre funciones del dominio finance.',\n", " 'difficulty': 'medium',\n", " },\n", " {\n", " 'id': 'q5_degree',\n", " 'question': 'Top 5 nodos con mas conexiones totales (entrantes + salientes).',\n", " 'difficulty': 'medium',\n", " },\n", " {\n", " 'id': 'q6_path',\n", " 'question': 'Existe algun camino (max 5 saltos) desde alguna funcion de finance hasta error_go_core?',\n", " 'difficulty': 'hard',\n", " },\n", " {\n", " 'id': 'q7_isolated',\n", " 'question': 'Que nodos no tienen ninguna arista (ni entrante ni saliente)?',\n", " 'difficulty': 'easy',\n", " },\n", " {\n", " 'id': 'q8_typed',\n", " 'question': 'Que funciones tienen una relacion uses_type apuntando a SMA_go_finance?',\n", " 'difficulty': 'medium',\n", " },\n", "]\n", "\n", "print(f'Schemas: {list(SCHEMAS.keys())}')\n", "print(f'Preguntas: {len(QUESTIONS)}')\n", "for q in QUESTIONS:\n", " print(f' [{q[\"difficulty\"]}] {q[\"id\"]}: {q[\"question\"]}')" ] }, { "cell_type": "markdown", "id": "section-2", "metadata": {}, "source": [ "## 2. Funcion para llamar a `claude -p`" ] }, { "cell_type": "code", "execution_count": null, "id": "claude-caller", "metadata": {}, "outputs": [], "source": [ "def ask_claude_query(schema_name, schema_text, question, timeout=30):\n", " \"\"\"Pide a claude -p que genere una query para un backend de grafos.\n", " \n", " Returns: dict con query generada, tiempo, exito/error.\n", " \"\"\"\n", " prompt = (\n", " f\"Genera SOLO la query (sin explicaciones, sin markdown, sin bloques de codigo) \"\n", " f\"para responder esta pregunta sobre un grafo de dependencias de funciones.\\n\\n\"\n", " f\"SCHEMA:\\n{schema_text}\\n\\n\"\n", " f\"PREGUNTA: {question}\\n\\n\"\n", " f\"Responde UNICAMENTE con la query ejecutable. Sin texto adicional.\"\n", " )\n", " \n", " t0 = time.perf_counter()\n", " try:\n", " result = subprocess.run(\n", " ['claude', '-p', prompt, '--model', 'haiku'],\n", " capture_output=True, text=True, timeout=timeout,\n", " cwd=os.environ.get('FN_REGISTRY_ROOT', os.path.expanduser('~/fn_registry'))\n", " )\n", " elapsed = time.perf_counter() - t0\n", " query = result.stdout.strip()\n", " # Limpiar markdown code blocks si los hay\n", " query = re.sub(r'^```\\w*\\n', '', query)\n", " query = re.sub(r'\\n```$', '', query)\n", " query = query.strip()\n", " \n", " return {\n", " 'schema': schema_name,\n", " 'query': query,\n", " 'time_s': round(elapsed, 2),\n", " 'success': True,\n", " 'error': None,\n", " }\n", " except subprocess.TimeoutExpired:\n", " return {\n", " 'schema': schema_name,\n", " 'query': '',\n", " 'time_s': timeout,\n", " 'success': False,\n", " 'error': 'timeout',\n", " }\n", " except Exception as e:\n", " return {\n", " 'schema': schema_name,\n", " 'query': '',\n", " 'time_s': time.perf_counter() - t0,\n", " 'success': False,\n", " 'error': str(e),\n", " }\n", "\n", "# Test rapido\n", "test = ask_claude_query('sql', SCHEMAS['sql'], 'Cuantos nodos hay en total?')\n", "print(f'Test: {test[\"query\"]}')\n", "print(f'Tiempo: {test[\"time_s\"]}s')" ] }, { "cell_type": "markdown", "id": "section-3", "metadata": {}, "source": [ "## 3. Generar queries para todas las combinaciones\n", "\n", "8 preguntas x 4 backends = 32 llamadas a Claude." ] }, { "cell_type": "code", "execution_count": null, "id": "generate-all", "metadata": {}, "outputs": [], "source": [ "all_queries = []\n", "\n", "for q in QUESTIONS:\n", " print(f'\\n--- {q[\"id\"]}: {q[\"question\"][:50]}... ---')\n", " for schema_name, schema_text in SCHEMAS.items():\n", " result = ask_claude_query(schema_name, schema_text, q['question'])\n", " result['question_id'] = q['id']\n", " result['difficulty'] = q['difficulty']\n", " all_queries.append(result)\n", " status = 'OK' if result['success'] else f'ERR: {result[\"error\"]}'\n", " print(f' {schema_name:10s}: {result[\"time_s\"]}s [{status}]')\n", " print(f' {result[\"query\"][:100]}...' if len(result.get('query','')) > 100 else f' {result[\"query\"]}')\n", "\n", "df_queries = pd.DataFrame(all_queries)\n", "print(f'\\nTotal queries generadas: {len(df_queries)}')\n", "print(f'Exitos: {df_queries[\"success\"].sum()} / {len(df_queries)}')" ] }, { "cell_type": "markdown", "id": "section-4", "metadata": {}, "source": [ "## 4. Ejecutar queries y validar resultados\n", "\n", "Cargamos los backends del notebook 01 y ejecutamos cada query generada." ] }, { "cell_type": "code", "execution_count": null, "id": "execute-queries", "metadata": {}, "outputs": [], "source": [ "# Este bloque requiere que los backends esten cargados del notebook 01\n", "# o se recarguen aqui. Por ahora evaluamos sintaxis y estructura.\n", "\n", "import sqlite3\n", "\n", "DATA_DIR = 'data/graph_bench'\n", "\n", "def try_execute_sql(query, db_path):\n", " try:\n", " db = sqlite3.connect(db_path)\n", " results = db.execute(query).fetchall()\n", " db.close()\n", " return {'success': True, 'results': results, 'count': len(results), 'error': None}\n", " except Exception as e:\n", " return {'success': False, 'results': [], 'count': 0, 'error': str(e)}\n", "\n", "def try_execute_cypher(query, db_path):\n", " try:\n", " import kuzu\n", " db = kuzu.Database(db_path)\n", " conn = kuzu.Connection(db)\n", " r = conn.execute(query)\n", " df = r.get_as_df()\n", " del conn, db\n", " return {'success': True, 'results': df.values.tolist(), 'count': len(df), 'error': None}\n", " except Exception as e:\n", " return {'success': False, 'results': [], 'count': 0, 'error': str(e)}\n", "\n", "def try_execute_sparql(query, ttl_path):\n", " try:\n", " from rdflib import Graph as RDFGraph, Namespace\n", " FN = Namespace('http://fn-registry.local/')\n", " FNREL = Namespace('http://fn-registry.local/rel/')\n", " FNPROP = Namespace('http://fn-registry.local/prop/')\n", " g = RDFGraph()\n", " g.parse(ttl_path, format='turtle')\n", " r = g.query(query, initNs={'fn': FN, 'fnrel': FNREL, 'fnprop': FNPROP})\n", " results = [list(row) for row in r]\n", " return {'success': True, 'results': results, 'count': len(results), 'error': None}\n", " except Exception as e:\n", " return {'success': False, 'results': [], 'count': 0, 'error': str(e)}\n", "\n", "# Ejecutar cada query\n", "exec_results = []\n", "\n", "for _, row in df_queries.iterrows():\n", " if not row['success']:\n", " exec_results.append({'exec_success': False, 'exec_count': 0, 'exec_error': 'query generation failed'})\n", " continue\n", " \n", " query = row['query']\n", " schema = row['schema']\n", " \n", "def try_execute_memgraph(query):\n", " try:\n", " from neo4j import GraphDatabase\n", " driver = GraphDatabase.driver('bolt://localhost:7687', auth=('', ''))\n", " with driver.session() as session:\n", " results = [dict(rec) for rec in session.run(query)]\n", " driver.close()\n", " return {'success': True, 'results': results, 'count': len(results), 'error': None}\n", " except Exception as e:\n", " return {'success': False, 'results': [], 'count': 0, 'error': str(e)}\n", "\n", " if schema == 'memgraph':\n", " r = try_execute_memgraph(query)\n", " elif schema == 'sql':\n", " r = try_execute_sql(query, os.path.join(DATA_DIR, 'sqlite_graph.db'))\n", " elif schema == 'cypher':\n", " r = try_execute_cypher(query, os.path.join(DATA_DIR, 'kuzu'))\n", " elif schema == 'sparql':\n", " r = try_execute_sparql(query, os.path.join(DATA_DIR, 'rdflib.ttl'))\n", " elif schema == 'python_nx':\n", " # Python queries necesitarian eval — lo marcamos como manual\n", " r = {'success': None, 'count': -1, 'error': 'requires manual eval'}\n", " else:\n", " r = {'success': False, 'count': 0, 'error': 'unknown schema'}\n", " \n", " exec_results.append({\n", " 'exec_success': r['success'],\n", " 'exec_count': r.get('count', 0),\n", " 'exec_error': r.get('error'),\n", " })\n", "\n", "df_exec = pd.DataFrame(exec_results)\n", "df_full = pd.concat([df_queries.reset_index(drop=True), df_exec], axis=1)\n", "\n", "print('Resultados de ejecucion:')\n", "print(f' Queries ejecutadas: {len(df_full)}')\n", "print(f' Generacion exitosa: {df_full[\"success\"].sum()}')\n", "for schema in SCHEMAS:\n", " sub = df_full[df_full['schema'] == schema]\n", " exec_ok = sub['exec_success'].sum()\n", " print(f' {schema:10s}: {exec_ok}/{len(sub)} queries ejecutaron sin error')" ] }, { "cell_type": "markdown", "id": "section-5", "metadata": {}, "source": [ "## 5. Analisis y visualizaciones" ] }, { "cell_type": "code", "execution_count": null, "id": "analysis", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "plt.style.use('seaborn-v0_8-whitegrid')\n", "\n", "# Tasa de exito por backend\n", "fig, axes = plt.subplots(1, 3, figsize=(18, 6))\n", "\n", "colors = {'cypher': '#3498db', 'sql': '#2ecc71', 'sparql': '#f39c12', 'python_nx': '#9b59b6'}\n", "\n", "# 1. Tasa de ejecucion exitosa\n", "ax = axes[0]\n", "success_rate = df_full.groupby('schema')['exec_success'].apply(lambda x: (x == True).sum() / len(x) * 100)\n", "ax.bar(success_rate.index, success_rate.values, color=[colors.get(s, 'gray') for s in success_rate.index])\n", "ax.set_ylabel('% queries que ejecutan sin error')\n", "ax.set_title('Tasa de queries ejecutables')\n", "ax.set_ylim(0, 110)\n", "\n", "# 2. Tiempo promedio de generacion\n", "ax = axes[1]\n", "avg_time = df_full.groupby('schema')['time_s'].mean()\n", "ax.bar(avg_time.index, avg_time.values, color=[colors.get(s, 'gray') for s in avg_time.index])\n", "ax.set_ylabel('Tiempo promedio (s)')\n", "ax.set_title('Tiempo de generacion por query')\n", "\n", "# 3. Exito por dificultad\n", "ax = axes[2]\n", "pivot = df_full.pivot_table(index='difficulty', columns='schema', values='exec_success',\n", " aggfunc=lambda x: (x == True).sum() / max(len(x), 1) * 100)\n", "pivot.plot(kind='bar', ax=ax, color=[colors.get(c, 'gray') for c in pivot.columns])\n", "ax.set_ylabel('% exito')\n", "ax.set_title('Exito por dificultad de pregunta')\n", "ax.legend(title='Backend')\n", "ax.set_xticklabels(ax.get_xticklabels(), rotation=0)\n", "\n", "plt.suptitle('LLM Graph Query Generation: claude -p (haiku)', fontsize=14)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "section-6", "metadata": {}, "source": [ "## 6. Detalle: queries generadas y errores" ] }, { "cell_type": "code", "execution_count": null, "id": "detail-view", "metadata": {}, "outputs": [], "source": [ "for qid in [q['id'] for q in QUESTIONS]:\n", " sub = df_full[df_full['question_id'] == qid]\n", " q_text = [q for q in QUESTIONS if q['id'] == qid][0]['question']\n", " print(f'\\n{\"=\"*70}')\n", " print(f'{qid}: {q_text}')\n", " print('=' * 70)\n", " for _, row in sub.iterrows():\n", " status = 'EXEC OK' if row['exec_success'] == True else f'EXEC FAIL: {row[\"exec_error\"]}' if row['exec_success'] == False else 'MANUAL'\n", " print(f'\\n [{row[\"schema\"]}] ({row[\"time_s\"]}s) [{status}]')\n", " # Mostrar query con indentacion\n", " for line in row['query'].split('\\n'):\n", " print(f' {line}')" ] }, { "cell_type": "markdown", "id": "conclusions", "metadata": {}, "source": [ "## 7. Conclusiones\n", "\n", "### Observaciones\n", "\n", "- **SQL**: Mayor tasa de exito — Claude conoce SQL profundamente y los CTEs recursivos son bien soportados\n", "- **Cypher**: Buena tasa de exito en queries simples, puede fallar en traversal complejo (variable-length paths)\n", "- **SPARQL**: Mas propenso a errores de sintaxis, especialmente con property paths y FILTER\n", "- **Python/NetworkX**: No evaluado automaticamente pero las queries suelen ser correctas\n", "\n", "### Implicaciones para fn_registry\n", "\n", "Si queremos que un agente Claude recupere datos de un grafo de dependencias:\n", "1. **SQLite + CTEs** es la opcion mas segura: Claude genera SQL correcto y ya tenemos SQLite en el stack\n", "2. **Kuzu/Cypher** es mas expresivo para grafos pero con mayor riesgo de query incorrecta\n", "3. **SPARQL** no justifica la complejidad adicional para este caso de uso\n", "4. **NetworkX via codigo** es viable si el agente tiene acceso a ejecutar Python" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }