{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "daf8e241-af34-4594-b406-08fe56191cf6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "raw", "id": "51191f74-f5fc-46bc-abb9-72db3941c796", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": null, "id": "aa5ab0bb-f8ae-499e-8d9e-7d76c9a9b415", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "fdd06db5", "metadata": {}, "source": [ "# Benchmark de RAM: BGE-m3 vs multilingual-e5-small\n", "\n", "Medimos consumo real de memoria en cada fase:\n", "1. **Baseline** — RAM antes de cargar nada\n", "2. **Carga del modelo** — cuánta RAM consume tener el modelo en memoria\n", "3. **Encoding** — pico de RAM durante encoding a distintos tamaños de corpus\n", "4. **Scaling** — cómo escala la RAM con corpus de 100, 500, 1000, 5000 documentos" ] }, { "cell_type": "markdown", "id": "d3d149af", "metadata": {}, "source": [ "## 1. Setup" ] }, { "cell_type": "code", "execution_count": 1, "id": "9ee663ae", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sistema: 31970 MB total, 21550 MB disponible (32.6% usado)\n", "Proceso actual: 179 MB\n", "Modelos a probar: ['multi-e5-small', 'BGE-m3']\n" ] } ], "source": [ "import os, gc, time, tracemalloc\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import psutil\n", "\n", "plt.style.use('seaborn-v0_8-whitegrid')\n", "plt.rcParams['figure.figsize'] = (14, 6)\n", "\n", "MODELS = {\n", " 'multi-e5-small': {\n", " 'id': 'intfloat/multilingual-e5-small',\n", " 'q_prefix': 'query: ',\n", " 'd_prefix': 'passage: ',\n", " },\n", " 'BGE-m3': {\n", " 'id': 'BAAI/bge-m3',\n", " 'q_prefix': None,\n", " 'd_prefix': None,\n", " },\n", "}\n", "\n", "def get_process_ram_mb():\n", " \"\"\"RAM del proceso actual en MB (RSS).\"\"\"\n", " return psutil.Process(os.getpid()).memory_info().rss / (1024 * 1024)\n", "\n", "def get_system_ram():\n", " \"\"\"RAM del sistema: total, usada, disponible en MB.\"\"\"\n", " m = psutil.virtual_memory()\n", " return {'total_mb': m.total / (1024**2), 'used_mb': m.used / (1024**2), 'available_mb': m.available / (1024**2), 'percent': m.percent}\n", "\n", "sys_ram = get_system_ram()\n", "print(f'Sistema: {sys_ram[\"total_mb\"]:.0f} MB total, {sys_ram[\"available_mb\"]:.0f} MB disponible ({sys_ram[\"percent\"]}% usado)')\n", "print(f'Proceso actual: {get_process_ram_mb():.0f} MB')\n", "print(f'Modelos a probar: {list(MODELS.keys())}')" ] }, { "cell_type": "markdown", "id": "9f1571c1", "metadata": {}, "source": [ "## 2. Corpus sintético escalable\n", "\n", "Generamos corpus de distintos tamaños a partir de frases semilla en español." ] }, { "cell_type": "code", "execution_count": 2, "id": "f5a78c0b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Corpus 100 docs → ejemplo: \"Python es un lenguaje de programación de alto nivel conocido por su le...\"\n", "Corpus 500 docs → ejemplo: \"Python es un lenguaje de programación de alto nivel conocido por su le...\"\n", "Corpus 1000 docs → ejemplo: \"Python es un lenguaje de programación de alto nivel conocido por su le...\"\n", "Corpus 5000 docs → ejemplo: \"Python es un lenguaje de programación de alto nivel conocido por su le...\"\n" ] } ], "source": [ "import random\n", "\n", "# Frases semilla para generar corpus de cualquier tamaño\n", "SEEDS = [\n", " 'Python es un lenguaje de programación de alto nivel conocido por su legibilidad',\n", " 'La fotosíntesis convierte la luz solar en energía química en las células vegetales',\n", " 'El pan de masa madre requiere un cultivo fermentado de harina y agua',\n", " 'El interés compuesto hace crecer los ahorros exponencialmente a largo plazo',\n", " 'La selva amazónica produce aproximadamente el 20 por ciento del oxígeno mundial',\n", " 'Las redes neuronales profundas aprenden representaciones jerárquicas de los datos',\n", " 'La criptografía asimétrica usa pares de claves pública y privada',\n", " 'Los contenedores Docker encapsulan aplicaciones con todas sus dependencias',\n", " 'La teoría de grafos modela relaciones entre entidades como nodos y aristas',\n", " 'Las bases de datos columnares optimizan consultas analíticas sobre grandes volúmenes',\n", " 'El protocolo TCP garantiza la entrega ordenada y confiable de paquetes',\n", " 'La transformada de Fourier descompone señales en sus frecuencias componentes',\n", " 'Los microservicios dividen una aplicación en servicios pequeños e independientes',\n", " 'El álgebra lineal es fundamental para machine learning y procesamiento de señales',\n", " 'La reacción de Maillard crea el dorado y sabor al cocinar proteínas a alta temperatura',\n", " 'Los índices invertidos permiten búsqueda full-text eficiente en documentos',\n", " 'Las corrientes oceánicas regulan el clima global y los patrones meteorológicos',\n", " 'El garbage collector libera memoria de objetos que ya no son referenciados',\n", " 'La diversificación reduce el riesgo del portafolio distribuyendo inversiones',\n", " 'Los embeddings representan texto como vectores densos en un espacio semántico',\n", "]\n", "\n", "def generate_corpus(n):\n", " \"\"\"Genera corpus de n documentos variando las frases semilla.\"\"\"\n", " random.seed(42)\n", " suffixes = [\n", " '', ' en sistemas modernos', ' según investigaciones recientes',\n", " ' de manera eficiente', ' con aplicaciones prácticas',\n", " ' en el contexto actual', ' para usuarios avanzados',\n", " ' combinando múltiples enfoques', ' optimizando recursos',\n", " ' a gran escala',\n", " ]\n", " corpus = []\n", " for i in range(n):\n", " base = SEEDS[i % len(SEEDS)]\n", " suffix = suffixes[i % len(suffixes)]\n", " corpus.append(f'{base}{suffix}')\n", " return corpus\n", "\n", "CORPUS_SIZES = [100, 500, 1000, 5000]\n", "corpora = {n: generate_corpus(n) for n in CORPUS_SIZES}\n", "for n, c in corpora.items():\n", " print(f'Corpus {n:5d} docs → ejemplo: \"{c[0][:70]}...\"')" ] }, { "cell_type": "markdown", "id": "aca2f5ce", "metadata": {}, "source": [ "## 3. Benchmark de RAM por modelo\n", "\n", "Para cada modelo medimos:\n", "- **RAM al cargar** — delta RSS después de instanciar el modelo\n", "- **RAM pico encoding** — tracemalloc captura el pico durante encode\n", "- **Tiempo de encoding** — por cada tamaño de corpus\n", "- **RAM de embeddings** — cuánto pesan los arrays numpy resultantes" ] }, { "cell_type": "code", "execution_count": 3, "id": "a09ef594", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "abe7fa6799534cb89e3cc572e00969fd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading weights: 0%| | 0/199 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data", "transient": {} }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1mBertModel LOAD REPORT\u001b[0m from: intfloat/multilingual-e5-small\n", "Key | Status | | \n", "------------------------+------------+--+-\n", "embeddings.position_ids | UNEXPECTED | | \n", "\n", "Notes:\n", "- UNEXPECTED:\tcan be ignored when loading from different task/architecture; not ok if you expect identical arch.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "======================================================================\n", "multi-e5-small (117.7M params, dim=384)\n", "======================================================================\n", " RAM baseline: 881 MB\n", " RAM después de carga: 1565 MB (modelo = +684 MB)\n", " Tiempo de carga: 5.5s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 100 docs | encode= 0.28s ( 356.2 docs/s) | RAM delta=+ 191 MB | peak(tracemalloc)= 0.2 MB | embeddings=0.15 MB\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 500 docs | encode= 0.28s ( 1791.3 docs/s) | RAM delta=+ 2 MB | peak(tracemalloc)= 0.9 MB | embeddings=0.73 MB\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 1000 docs | encode= 0.55s ( 1807.4 docs/s) | RAM delta=+ 2 MB | peak(tracemalloc)= 1.8 MB | embeddings=1.46 MB\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 5000 docs | encode= 2.61s ( 1915.6 docs/s) | RAM delta=+ 15 MB | peak(tracemalloc)= 9.1 MB | embeddings=7.32 MB\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " RAM después de cleanup: 1757 MB (liberados ~-192 MB)\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9fe30614e002401f94625c0b152e3696", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading weights: 0%| | 0/391 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data", "transient": {} }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "======================================================================\n", "BGE-m3 (567.8M params, dim=1024)\n", "======================================================================\n", " RAM baseline: 1756 MB\n", " RAM después de carga: 1866 MB (modelo = +110 MB)\n", " Tiempo de carga: 10.0s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 100 docs | encode= 0.52s ( 192.5 docs/s) | RAM delta=+ 0 MB | peak(tracemalloc)= 0.5 MB | embeddings=0.39 MB\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 500 docs | encode= 1.00s ( 501.7 docs/s) | RAM delta=+ 0 MB | peak(tracemalloc)= 2.2 MB | embeddings=1.95 MB\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 1000 docs | encode= 1.87s ( 535.7 docs/s) | RAM delta=+ 0 MB | peak(tracemalloc)= 4.3 MB | embeddings=3.91 MB\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 5000 docs | encode= 9.01s ( 554.9 docs/s) | RAM delta=+ 24 MB | peak(tracemalloc)= 21.3 MB | embeddings=19.53 MB\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " RAM después de cleanup: 1833 MB (liberados ~33 MB)\n" ] } ], "source": [ "from sentence_transformers import SentenceTransformer\n", "\n", "def benchmark_model(name, config, corpora):\n", " \"\"\"Benchmark completo de RAM para un modelo.\"\"\"\n", " gc.collect()\n", " ram_before = get_process_ram_mb()\n", "\n", " # Cargar modelo\n", " t0 = time.perf_counter()\n", " model = SentenceTransformer(config['id'], trust_remote_code=True)\n", " load_time = time.perf_counter() - t0\n", " ram_after_load = get_process_ram_mb()\n", " ram_model = ram_after_load - ram_before\n", "\n", " # Parámetros del modelo\n", " n_params = sum(p.nelement() for p in model[0].auto_model.parameters()) / 1e6\n", " dim = model.get_sentence_embedding_dimension()\n", "\n", " print(f'\\n{\"=\" * 70}')\n", " print(f'{name} ({n_params:.1f}M params, dim={dim})')\n", " print(f'{\"=\" * 70}')\n", " print(f' RAM baseline: {ram_before:.0f} MB')\n", " print(f' RAM después de carga: {ram_after_load:.0f} MB (modelo = +{ram_model:.0f} MB)')\n", " print(f' Tiempo de carga: {load_time:.1f}s')\n", "\n", " results = []\n", " prefix = config['d_prefix']\n", "\n", " for corpus_size, corpus in corpora.items():\n", " gc.collect()\n", " texts = [f'{prefix}{doc}' for doc in corpus] if prefix else corpus\n", "\n", " # Medir pico de RAM con tracemalloc\n", " tracemalloc.start()\n", " ram_pre_encode = get_process_ram_mb()\n", " t0 = time.perf_counter()\n", "\n", " embeddings = model.encode(texts, normalize_embeddings=True,\n", " show_progress_bar=False, batch_size=32)\n", "\n", " encode_time = time.perf_counter() - t0\n", " ram_post_encode = get_process_ram_mb()\n", " current, peak = tracemalloc.get_traced_memory()\n", " tracemalloc.stop()\n", "\n", " # Tamaño de los embeddings en MB\n", " emb_size_mb = embeddings.nbytes / (1024 * 1024)\n", "\n", " result = {\n", " 'model': name,\n", " 'corpus_size': corpus_size,\n", " 'dim': dim,\n", " 'ram_model_mb': round(ram_model),\n", " 'ram_pre_encode_mb': round(ram_pre_encode),\n", " 'ram_post_encode_mb': round(ram_post_encode),\n", " 'ram_delta_encode_mb': round(ram_post_encode - ram_pre_encode),\n", " 'tracemalloc_peak_mb': round(peak / (1024 * 1024), 1),\n", " 'emb_size_mb': round(emb_size_mb, 2),\n", " 'encode_time_s': round(encode_time, 3),\n", " 'docs_per_sec': round(corpus_size / encode_time, 1),\n", " }\n", " results.append(result)\n", "\n", " print(f' {corpus_size:5d} docs | encode={encode_time:6.2f}s ({result[\"docs_per_sec\"]:7.1f} docs/s) | '\n", " f'RAM delta=+{result[\"ram_delta_encode_mb\"]:4d} MB | peak(tracemalloc)={result[\"tracemalloc_peak_mb\"]:6.1f} MB | '\n", " f'embeddings={emb_size_mb:.2f} MB')\n", "\n", " # Limpiar modelo\n", " del model\n", " gc.collect()\n", " ram_after_cleanup = get_process_ram_mb()\n", " print(f' RAM después de cleanup: {ram_after_cleanup:.0f} MB (liberados ~{ram_after_load - ram_after_cleanup:.0f} MB)')\n", "\n", " return results\n", "\n", "# Ejecutar benchmarks secuencialmente\n", "all_results = []\n", "for name, config in MODELS.items():\n", " all_results.extend(benchmark_model(name, config, corpora))" ] }, { "cell_type": "markdown", "id": "e6d6679a", "metadata": {}, "source": [ "## 4. Tabla resumen" ] }, { "cell_type": "code", "execution_count": 4, "id": "7b447fa1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
| \n", " | Modelo | \n", "Docs | \n", "Dim | \n", "RAM modelo (MB) | \n", "RAM delta encode (MB) | \n", "Peak tracemalloc (MB) | \n", "Embeddings (MB) | \n", "Tiempo (s) | \n", "Docs/s | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "multi-e5-small | \n", "100 | \n", "384 | \n", "684 | \n", "191 | \n", "0.200000 | \n", "0.150000 | \n", "0.281000 | \n", "356.200000 | \n", "
| 1 | \n", "multi-e5-small | \n", "500 | \n", "384 | \n", "684 | \n", "2 | \n", "0.900000 | \n", "0.730000 | \n", "0.279000 | \n", "1791.300000 | \n", "
| 2 | \n", "multi-e5-small | \n", "1000 | \n", "384 | \n", "684 | \n", "2 | \n", "1.800000 | \n", "1.460000 | \n", "0.553000 | \n", "1807.400000 | \n", "
| 3 | \n", "multi-e5-small | \n", "5000 | \n", "384 | \n", "684 | \n", "15 | \n", "9.100000 | \n", "7.320000 | \n", "2.610000 | \n", "1915.600000 | \n", "
| 4 | \n", "BGE-m3 | \n", "100 | \n", "1024 | \n", "110 | \n", "0 | \n", "0.500000 | \n", "0.390000 | \n", "0.520000 | \n", "192.500000 | \n", "
| 5 | \n", "BGE-m3 | \n", "500 | \n", "1024 | \n", "110 | \n", "0 | \n", "2.200000 | \n", "1.950000 | \n", "0.997000 | \n", "501.700000 | \n", "
| 6 | \n", "BGE-m3 | \n", "1000 | \n", "1024 | \n", "110 | \n", "0 | \n", "4.300000 | \n", "3.910000 | \n", "1.867000 | \n", "535.700000 | \n", "
| 7 | \n", "BGE-m3 | \n", "5000 | \n", "1024 | \n", "110 | \n", "24 | \n", "21.300000 | \n", "19.530000 | \n", "9.011000 | \n", "554.900000 | \n", "