Files
estudio_mercados/notebooks/binance/01_descarga_historica.ipynb
T

127 lines
7.6 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "8b2334dd",
"metadata": {},
"source": [
"# 01 — Descarga Masiva de Datos Historicos (Binance)\n",
"\n",
"Dos metodos para obtener datos historicos:\n",
"\n",
"1. **REST API** (`/api/v3/klines`) — Paginado, max 1000 velas por request. Ideal para 7 dias.\n",
"2. **Data Vision** (`data.binance.vision`) — CSVs comprimidos diarios/mensuales. Ideal para meses/anos.\n",
"\n",
"**Rate limits REST:** 6000 weight/min, klines cuesta 2 weight.\n",
"\n",
"| Endpoint | Max/req | Weight | Uso |\n",
"|---|---|---|---|\n",
"| `/api/v3/klines` | 1000 velas | 2 | Candlesticks OHLCV |\n",
"| `/api/v3/aggTrades` | 1000 trades | 2 | Trades agregados (max 1h window) |\n",
"| `/api/v3/historicalTrades` | 1000 trades | 25 | Trades individuales (requiere API key) |\n",
"| `data.binance.vision` | Sin limite | 0 | CSVs bulk diarios/mensuales |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3fe45a0a",
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import time\n",
"import datetime\n",
"import io\n",
"import zipfile\n",
"import pandas as pd\n",
"\n",
"BASE = \"https://api.binance.com\"\n",
"DATA_VISION = \"https://data.binance.vision\""
]
},
{
"cell_type": "markdown",
"id": "721f68a0",
"metadata": {},
"source": [
"## Metodo 1: REST API — Klines con paginacion automatica\n",
"\n",
"`GET /api/v3/klines` devuelve max 1000 velas. Paginamos con `startTime`/`endTime`.\n",
"\n",
"Para 7 dias de velas 1m: ceil(7*24*60/1000) = **11 requests** (22 weight total, trivial)."
]
},
{
"cell_type": "code",
"id": "24c183b7",
"source": "def parse_kline(k: list) -> dict:\n \"\"\"Parsea una vela raw de Binance a dict con tipos correctos.\"\"\"\n return {\n \"open_time\": pd.Timestamp(k[0], unit=\"ms\", tz=\"UTC\"),\n \"open\": float(k[1]),\n \"high\": float(k[2]),\n \"low\": float(k[3]),\n \"close\": float(k[4]),\n \"volume\": float(k[5]),\n \"close_time\": pd.Timestamp(k[6], unit=\"ms\", tz=\"UTC\"),\n \"quote_volume\": float(k[7]),\n \"trades\": int(k[8]),\n \"taker_buy_base_vol\": float(k[9]),\n \"taker_buy_quote_vol\": float(k[10]),\n }\n\n\ndef fetch_klines(symbol: str, interval: str, start_ms: int, end_ms: int, limit: int = 1000) -> list[dict]:\n \"\"\"Descarga klines con paginacion automatica.\"\"\"\n all_klines = []\n current = start_ms\n\n while current < end_ms:\n resp = requests.get(f\"{BASE}/api/v3/klines\", params={\n \"symbol\": symbol, \"interval\": interval,\n \"startTime\": current, \"endTime\": end_ms, \"limit\": limit,\n })\n resp.raise_for_status()\n data = resp.json()\n if not data:\n break\n\n all_klines.extend(parse_kline(k) for k in data)\n current = data[-1][6] + 1 # close_time + 1ms\n\n if len(data) < limit:\n break\n time.sleep(0.1)\n\n return all_klines",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "e3280dc5",
"source": "### Ejemplo: 7 dias de velas 1m para BTCUSDT",
"metadata": {}
},
{
"cell_type": "code",
"id": "b0b389b8",
"source": "now_ms = int(datetime.datetime.now(datetime.timezone.utc).timestamp() * 1000)\nseven_days_ago = now_ms - 7 * 24 * 60 * 60 * 1000\n\nklines = fetch_klines(\"BTCUSDT\", \"1m\", seven_days_ago, now_ms)\ndf_klines = pd.DataFrame(klines)\nprint(f\"Descargadas {len(df_klines)} velas de 1m ({len(df_klines)/1440:.1f} dias)\")\ndf_klines.head()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "e52d66d1",
"source": "## Metodo 2: Data Vision — CSVs bulk (meses/anos de datos)\n\n`data.binance.vision` publica CSVs comprimidos diarios y mensuales. Sin API key, sin rate limits.\n\nIdeal para backtesting con historicos largos. Disponible para spot y futures.",
"metadata": {}
},
{
"cell_type": "code",
"id": "a42f986c",
"source": "def download_data_vision(symbol: str, data_type: str, date_str: str,\n interval: str = \"1m\", market: str = \"spot\") -> str | None:\n \"\"\"\n Descarga CSV diario/mensual desde data.binance.vision.\n date_str: \"2024-01-15\" (diario) o \"2024-01\" (mensual)\n data_type: \"klines\", \"trades\", \"aggTrades\"\n \"\"\"\n granularity = \"daily\" if len(date_str) > 7 else \"monthly\"\n\n if data_type == \"klines\":\n path = f\"data/{market}/{granularity}/{data_type}/{symbol}/{interval}/{symbol}-{interval}-{date_str}.zip\"\n else:\n path = f\"data/{market}/{granularity}/{data_type}/{symbol}/{symbol}-{data_type}-{date_str}.zip\"\n\n resp = requests.get(f\"{DATA_VISION}/{path}\")\n if resp.status_code == 404:\n return None\n resp.raise_for_status()\n\n with zipfile.ZipFile(io.BytesIO(resp.content)) as zf:\n return zf.read(zf.namelist()[0]).decode(\"utf-8\")\n\n\ndef download_klines_bulk(symbol: str, days: int = 7, interval: str = \"1m\") -> pd.DataFrame:\n \"\"\"Descarga N dias de klines desde Data Vision y retorna DataFrame.\"\"\"\n from datetime import date, timedelta\n\n cols = [\"open_time\", \"open\", \"high\", \"low\", \"close\", \"volume\",\n \"close_time\", \"quote_volume\", \"trades\", \"taker_buy_base_vol\",\n \"taker_buy_quote_vol\", \"ignore\"]\n all_rows = []\n today = date.today()\n\n for i in range(2, days + 2): # empezar 2 dias atras (hoy puede no estar disponible)\n d = today - timedelta(days=i)\n csv_text = download_data_vision(symbol, \"klines\", d.isoformat(), interval)\n if csv_text:\n for line in csv_text.strip().split(\"\\n\"):\n all_rows.append(line.split(\",\"))\n\n df = pd.DataFrame(all_rows, columns=cols)\n for col in [\"open\", \"high\", \"low\", \"close\", \"volume\", \"quote_volume\",\n \"taker_buy_base_vol\", \"taker_buy_quote_vol\"]:\n df[col] = df[col].astype(float)\n df[\"open_time\"] = pd.to_datetime(df[\"open_time\"].astype(int), unit=\"ms\", utc=True)\n df[\"close_time\"] = pd.to_datetime(df[\"close_time\"].astype(int), unit=\"ms\", utc=True)\n df[\"trades\"] = df[\"trades\"].astype(int)\n return df.drop(columns=[\"ignore\"]).sort_values(\"open_time\").reset_index(drop=True)",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "89438f51",
"source": "### Ejemplo: 7 dias de BTCUSDT via Data Vision (mas rapido, sin rate limits)",
"metadata": {}
},
{
"cell_type": "code",
"id": "5368c4b7",
"source": "df_bulk = download_klines_bulk(\"BTCUSDT\", days=7, interval=\"1m\")\nprint(f\"Data Vision: {len(df_bulk)} velas ({len(df_bulk)/1440:.1f} dias)\")\nprint(f\"Rango: {df_bulk['open_time'].min()} -> {df_bulk['open_time'].max()}\")\nprint(f\"Precio: {df_bulk['close'].min():.2f} - {df_bulk['close'].max():.2f}\")\ndf_bulk.describe()",
"metadata": {},
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}