Files
domain_coverage_gaps/notebooks/01_domains_overview.ipynb
T
2026-05-14 02:06:42 +02:00

118 lines
6.5 KiB
Plaintext

{
"cells": [
{"cell_type": "markdown", "metadata": {}, "source": [
"# 01 — Panorama del registry por dominio\n",
"\n",
"**Objetivo**: ver cuantas funciones tenemos por dominio, pureza/test/pipelines, y listar las **mas interesantes** (por reutilizacion y signature) de cada dominio.\n",
"\n",
"**Fuente**: `registry.db` (FTS5 indexado por `fn index`).\n",
"\n",
"**Secciones**\n",
"1. Conteo por dominio + cuota de puras y testeadas\n",
"2. Top funciones por dominio (curado a mano tras revisar names+desc)\n",
"3. Conclusiones"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"import os, sqlite3\n",
"import pandas as pd\n",
"ROOT = os.environ['FN_REGISTRY_ROOT']\n",
"conn = sqlite3.connect(f'file:{ROOT}/registry.db?mode=ro', uri=True)\n",
"pd.set_option('display.max_colwidth', 110)"
]},
{"cell_type": "markdown", "metadata": {}, "source": ["## 1. Cuenta por dominio"]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"q = '''\n",
"SELECT domain,\n",
" COUNT(*) AS total,\n",
" SUM(CASE WHEN purity='pure' THEN 1 ELSE 0 END) AS pure,\n",
" SUM(CASE WHEN tested=1 THEN 1 ELSE 0 END) AS tested,\n",
" SUM(CASE WHEN kind='pipeline' THEN 1 ELSE 0 END) AS pipelines\n",
" FROM functions\n",
" GROUP BY domain\n",
" ORDER BY total DESC;\n",
"'''\n",
"df = pd.read_sql_query(q, conn)\n",
"df['pure_pct'] = (100*df['pure']/df['total']).round(1)\n",
"df['tested_pct'] = (100*df['tested']/df['total']).round(1)\n",
"df"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"import matplotlib.pyplot as plt\n",
"ax = df.set_index('domain')[['pure','total']].plot.bar(figsize=(11,4))\n",
"ax.set_title('Funciones por dominio (totales y puras)'); plt.tight_layout(); plt.show()"
]},
{"cell_type": "markdown", "metadata": {}, "source": [
"## 2. Top funciones interesantes por dominio\n",
"\n",
"Seleccion manual de las funciones mas reutilizables/expresivas de cada bloque (no es ranking automatico — el FTS no captura 'interesante')."
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"def top(domain, ids):\n",
" qmarks = ','.join('?'*len(ids))\n",
" df = pd.read_sql_query(\n",
" f\"SELECT id, lang, purity, signature, description FROM functions WHERE id IN ({qmarks})\",\n",
" conn, params=ids)\n",
" print(f'=== {domain} ({len(df)}) ===')\n",
" return df"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"top('finance', [\n",
" 'fetch_ohlcv_go_finance','tick_to_ohlcv_go_finance','stream_ticks_go_finance',\n",
" 'sma_go_finance','ema_go_finance','rsi_go_finance','vwap_go_finance',\n",
" 'bollinger_bands_go_finance','sharpe_ratio_go_finance','max_drawdown_go_finance',\n",
" 'avellaneda_stoikov_quotes_py_finance','hawkes_intensity_py_finance','generate_gbm_prices_py_finance',\n",
" 'write_ohlcv_to_parquet_go_finance','load_ohlcv_from_duckdb_go_finance'])"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"top('browser (CDP)', [\n",
" 'chrome_launch_go_browser','cdp_connect_go_browser','cdp_navigate_go_browser',\n",
" 'cdp_evaluate_go_browser','cdp_get_html_go_browser','cdp_screenshot_go_browser',\n",
" 'cdp_click_go_browser','cdp_click_text_go_browser','cdp_find_by_text_go_browser',\n",
" 'cdp_type_text_go_browser','cdp_wait_element_go_browser','cdp_wait_load_go_browser',\n",
" 'cdp_har_record_go_browser','cdp_set_cookie_go_browser','cdp_new_tab_go_browser'])"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"top('infra (HTTP/WS/SSE)', [\n",
" 'http_get_json_go_infra','http_post_json_go_infra','http_router_go_infra','http_serve_go_infra',\n",
" 'http_download_file_go_infra','http_cors_middleware_go_infra','http_logger_middleware_go_infra',\n",
" 'rate_limit_middleware_go_infra','jwt_middleware_go_infra','sse_handler_go_infra',\n",
" 'sse_send_go_infra','sse_keepalive_go_infra','ws_handler_go_infra','ws_upgrader_go_infra',\n",
" 'health_check_http_go_infra'])"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"top('datascience', [r[0] for r in conn.execute(\n",
" \"SELECT id FROM functions WHERE domain='datascience' AND purity='pure' ORDER BY name LIMIT 15\")])"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"top('cybersecurity', [r[0] for r in conn.execute(\n",
" \"SELECT id FROM functions WHERE domain='cybersecurity' ORDER BY tested DESC, name LIMIT 12\")])"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"top('ml', [r[0] for r in conn.execute(\n",
" \"SELECT id FROM functions WHERE domain='ml' ORDER BY name LIMIT 15\")])"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"top('pipelines', [r[0] for r in conn.execute(\n",
" \"SELECT id FROM functions WHERE domain='pipelines' ORDER BY name LIMIT 15\")])"
]},
{"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [
"top('gamedev (audio)', ['audio_engine_cpp_gamedev','audio_play_cpp_gamedev'])"
]},
{"cell_type": "markdown", "metadata": {}, "source": [
"## 3. Conclusiones\n",
"\n",
"- **infra (496)** y **core (240)** dominan: middleware HTTP, SSE/WS, SQLite, helpers Go puros.\n",
"- **finance (28)** ya tiene un mini stack de trading + market-making: indicadores, OHLCV, simulador Avellaneda-Stoikov + Hawkes + GBM.\n",
"- **browser (17)** = CDP completo en Go puro (sin chromedp). Base solida para scraping y RPA.\n",
"- **ml (25)** son casi todos **tipos** (`image_generator`, `model_ref`, `lora_ref`, `generation_config`) — el contrato esta definido, las **funciones de ejecucion estan vacias**.\n",
"- **audio**: solo playback (miniaudio en `gamedev`). 0 generacion, 0 STT/TTS, 0 voice conversion.\n",
"- **LLM/text**: 0 clientes — solo tipos `message/part/tool_part` en core."
]}
],
"metadata": {
"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"},
"language_info": {"name": "python", "version": "3.12"}
},
"nbformat": 4, "nbformat_minor": 5
}