Update tutorial on using vLLM with improved installation instructions and model compatibility details

This commit is contained in:
2024-12-10 21:03:46 +01:00
parent 13d014c24c
commit 8d1274dbec
2 changed files with 812 additions and 304 deletions
+153 -154
View File
@@ -1,155 +1,154 @@
{ {
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Tutorial: Uso de Modelos LLM con vLLM\n", "# Tutorial: Uso de Modelos LLM con vLLM\n",
"En este tutorial aprenderás:\n", "En este tutorial aprenderás:\n",
"- Qué es vLLM y sus ventajas.\n", "- Qué es vLLM y sus ventajas.\n",
"- Los tipos de modelos compatibles con vLLM.\n", "- Los tipos de modelos compatibles con vLLM.\n",
"- Cómo descargar y usar modelos con vLLM para tareas como chat." "- Cómo descargar y usar modelos con vLLM para tareas como chat."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## ¿Qué es vLLM?\n", "## ¿Qué es vLLM?\n",
"**vLLM** es una biblioteca optimizada para ejecutar modelos LLM de manera eficiente, aprovechando técnicas avanzadas de gestión de memoria y paralelización. Está diseñada para manejar tareas como generación de texto en grandes modelos con un rendimiento mejorado.\n", "**vLLM** es una biblioteca optimizada para ejecutar modelos LLM de manera eficiente, aprovechando técnicas avanzadas de gestión de memoria y paralelización. Está diseñada para manejar tareas como generación de texto en grandes modelos con un rendimiento mejorado.\n",
"\n", "\n",
"**Ventajas de vLLM:**\n", "**Ventajas de vLLM:**\n",
"- Mayor eficiencia en el uso de GPU y memoria.\n", "- Mayor eficiencia en el uso de GPU y memoria.\n",
"- Reducción en los tiempos de inferencia.\n", "- Reducción en los tiempos de inferencia.\n",
"- Compatibilidad con modelos de Hugging Face y otras plataformas populares." "- Compatibilidad con modelos de Hugging Face y otras plataformas populares."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Tipos de Modelos Compatibles con vLLM\n", "## Tipos de Modelos Compatibles con vLLM\n",
"Los modelos más comunes que se pueden usar con vLLM incluyen:\n", "Los modelos más comunes que se pueden usar con vLLM incluyen:\n",
"- **Modelos abiertos:** Por ejemplo, GPT-NeoX, OPT, LLaMA, y Falcon.\n", "- **Modelos abiertos:** Por ejemplo, GPT-NeoX, OPT, LLaMA, y Falcon.\n",
"- **Modelos comerciales:** Modelos compatibles disponibles en Hugging Face o APIs comerciales como GPT-3 o GPT-4.\n", "- **Modelos comerciales:** Modelos compatibles disponibles en Hugging Face o APIs comerciales como GPT-3 o GPT-4.\n",
"- **Modelos específicos:** Diseñados para tareas específicas como chat o generación de código (por ejemplo, CodeT5, StarCoder)." "- **Modelos específicos:** Diseñados para tareas específicas como chat o generación de código (por ejemplo, CodeT5, StarCoder)."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Instalación de vLLM\n", "## Instalación de vLLM\n",
"Primero, necesitamos instalar la biblioteca vLLM. Esto se puede hacer con `pip`." "Primero, necesitamos instalar la biblioteca vLLM. Esto se puede hacer con `pip`."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"!pip install vllm" "!pip install vllm"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Descargar Modelos para vLLM\n", "## Descargar Modelos para vLLM\n",
"Usaremos un modelo abierto como LLaMA 2 o GPT-NeoX desde el hub de Hugging Face. La biblioteca vLLM puede cargar modelos directamente desde Hugging Face." "Usaremos un modelo abierto como LLaMA 2 o GPT-NeoX desde el hub de Hugging Face. La biblioteca vLLM puede cargar modelos directamente desde Hugging Face."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Importar las bibliotecas necesarias\n", "# Importar las bibliotecas necesarias\n",
"from vllm import LLM, SamplingParams\n", "from vllm import LLM, SamplingParams\n",
"\n", "\n",
"# Nombre del modelo (puedes cambiarlo según tu preferencia)\n", "# Nombre del modelo (puedes cambiarlo según tu preferencia)\n",
"model_name = 'meta-llama/Llama-2-7b-chat-hf'\n", "model_name = 'meta-llama/Llama-2-7b-chat-hf'\n",
"\n", "\n",
"# Inicializar el modelo\n", "# Inicializar el modelo\n",
"llm = LLM(model=model_name)" "llm = LLM(model=model_name)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Uso de vLLM para Chat\n", "## Uso de vLLM para Chat\n",
"Podemos usar vLLM para interactuar con el modelo en aplicaciones de chat. Esto implica enviar un prompt y recibir una respuesta generada por el modelo." "Podemos usar vLLM para interactuar con el modelo en aplicaciones de chat. Esto implica enviar un prompt y recibir una respuesta generada por el modelo."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Parámetros de entrada para la generación\n", "# Parámetros de entrada para la generación\n",
"prompt = \"Hola, ¿cómo puedo ayudarte hoy?\"\n", "prompt = \"Hola, ¿cómo puedo ayudarte hoy?\"\n",
"\n", "\n",
"# Parámetros de muestreo\n", "# Parámetros de muestreo\n",
"sampling_params = SamplingParams(\n", "sampling_params = SamplingParams(\n",
" temperature=0.7, # Controla la creatividad\n", " temperature=0.7, # Controla la creatividad\n",
" max_tokens=100 # Número máximo de tokens generados\n", " max_tokens=100 # Número máximo de tokens generados\n",
")\n", ")\n",
"\n", "\n",
"# Generar texto\n", "# Generar texto\n",
"outputs = llm.generate(prompt, sampling_params)\n", "outputs = llm.generate(prompt, sampling_params)\n",
"\n", "\n",
"# Mostrar la respuesta\n", "# Mostrar la respuesta\n",
"print(outputs[0].text)" "print(outputs[0].text)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Otros Modelos Compatibles\n", "## Otros Modelos Compatibles\n",
"Además de LLaMA 2, puedes usar otros modelos como:\n", "Además de LLaMA 2, puedes usar otros modelos como:\n",
"- **Falcon:** Un modelo optimizado para tareas de NLP, disponible en tamaños como 7B y 40B.\n", "- **Falcon:** Un modelo optimizado para tareas de NLP, disponible en tamaños como 7B y 40B.\n",
"- **OPT (Meta):** Diseñado para tareas de lenguaje natural.\n", "- **OPT (Meta):** Diseñado para tareas de lenguaje natural.\n",
"- **GPT-NeoX:** Modelo de EleutherAI diseñado como una alternativa a GPT-3.\n", "- **GPT-NeoX:** Modelo de EleutherAI diseñado como una alternativa a GPT-3.\n",
"\n", "\n",
"Estos modelos se pueden cargar de manera similar cambiando el nombre del modelo en el código." "Estos modelos se pueden cargar de manera similar cambiando el nombre del modelo en el código."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Conclusión\n", "## Conclusión\n",
"Este tutorial mostró cómo instalar y usar vLLM para ejecutar modelos LLM de manera eficiente. Los modelos como LLaMA, Falcon, y GPT-NeoX son compatibles y pueden ser utilizados para tareas como chat y generación de texto.\n", "Este tutorial mostró cómo instalar y usar vLLM para ejecutar modelos LLM de manera eficiente. Los modelos como LLaMA, Falcon, y GPT-NeoX son compatibles y pueden ser utilizados para tareas como chat y generación de texto.\n",
"\n", "\n",
"Con vLLM, puedes aprovechar grandes modelos con un rendimiento optimizado, ideal para aplicaciones de inteligencia artificial." "Con vLLM, puedes aprovechar grandes modelos con un rendimiento optimizado, ideal para aplicaciones de inteligencia artificial."
] ]
} }
], ],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "Python 3", "display_name": "venv",
"language": "python", "language": "python",
"name": "python3" "name": "python3"
}, },
"language_info": { "language_info": {
"codemirror_mode": { "codemirror_mode": {
"name": "ipython", "name": "ipython",
"version": 3 "version": 3
}, },
"file_extension": ".py", "file_extension": ".py",
"mimetype": "text/x-python", "mimetype": "text/x-python",
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.8.5" "version": "3.10.11"
} }
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 4 "nbformat_minor": 4
} }
+659 -150
View File
@@ -1,152 +1,661 @@
{ {
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Tutorial: Uso de vLLM con un modelo local\n", "# Tutorial: Uso de vLLM con un modelo local\n",
"En este tutorial aprenderás a utilizar **vLLM** para cargar un modelo descargado localmente y realizar inferencias.\n", "En este tutorial aprenderás a utilizar **vLLM** para cargar un modelo descargado localmente y realizar inferencias.\n",
"\n", "\n",
"## ¿Qué es vLLM?\n", "## ¿Qué es vLLM?\n",
"vLLM es un runtime de alto rendimiento para modelos de lenguaje grande (LLMs). Ofrece características avanzadas como el manejo eficiente de memoria y procesamiento paralelo.\n", "vLLM es un runtime de alto rendimiento para modelos de lenguaje grande (LLMs). Ofrece características avanzadas como el manejo eficiente de memoria y procesamiento paralelo.\n",
"\n", "\n",
"Este tutorial asumirá que ya tienes un modelo descargado localmente, por ejemplo, un modelo de Hugging Face." "Este tutorial asumirá que ya tienes un modelo descargado localmente, por ejemplo, un modelo de Hugging Face."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Paso 1: Instalación de vLLM\n", "## Paso 1: Instalación de vLLM\n",
"Primero, necesitamos instalar la biblioteca **vLLM**. Puedes hacerlo con el siguiente comando." "Primero, necesitamos instalar la biblioteca **vLLM**. Puedes hacerlo con el siguiente comando."
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 1,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [
"source": [ {
"!pip install vllm" "name": "stdout",
] "output_type": "stream",
}, "text": [
{ "Collecting vllm"
"cell_type": "markdown", ]
"metadata": {},
"source": [
"## Paso 2: Verificación de dependencias\n",
"Asegúrate de que las dependencias requeridas, como PyTorch, estén instaladas. Si no tienes PyTorch, instálalo con el comando apropiado para tu entorno."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 3: Carga del modelo local\n",
"Usaremos la clase `LLM` de vLLM para cargar el modelo desde una carpeta local. En este ejemplo, asumimos que el modelo está almacenado en un directorio llamado `./modelo_local`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from vllm import LLM\n",
"\n",
"# Ruta al modelo local\n",
"ruta_modelo = \"./modelo_local\"\n",
"\n",
"# Inicialización del modelo\n",
"llm = LLM(model=ruta_modelo)\n",
"print(\"Modelo cargado exitosamente.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 4: Realización de inferencias\n",
"Con el modelo cargado, podemos realizar inferencias utilizando la función `generate`. Proporciona un texto de entrada y el modelo generará una respuesta."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Texto de entrada\n",
"texto_entrada = \"¿Cuál es la capital de Francia?\"\n",
"\n",
"# Generar respuesta\n",
"respuesta = llm.generate(texto_entrada)\n",
"print(\"Respuesta generada:\", respuesta)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 5: Ajuste de parámetros (opcional)\n",
"Puedes ajustar parámetros como la temperatura o el número máximo de tokens generados para personalizar el comportamiento del modelo."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Ajuste de parámetros\n",
"respuesta_ajustada = llm.generate(\n",
" texto_entrada,\n",
" temperature=0.7, # Controla la aleatoriedad\n",
" max_tokens=100 # Límite de tokens generados\n",
")\n",
"print(\"Respuesta ajustada:\", respuesta_ajustada)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 6: Conclusión\n",
"Has aprendido a instalar y usar vLLM para cargar un modelo local y realizar inferencias. Puedes explorar más características en la [documentación oficial de vLLM](https://vllm.org).\n",
"\n",
"¡Ahora estás listo para usar modelos de lenguaje eficientemente con vLLM!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
}, },
"nbformat": 4, {
"nbformat_minor": 4 "name": "stderr",
} "output_type": "stream",
"text": [
"\n",
"[notice] A new release of pip is available: 23.0.1 -> 24.3.1\n",
"[notice] To update, run: python.exe -m pip install --upgrade pip\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" Downloading vllm-0.6.4.post1.tar.gz (3.1 MB)\n",
" ---------------------------------------- 0.0/3.1 MB ? eta -:--:--\n",
" ---------------------------------- ----- 2.7/3.1 MB 56.6 MB/s eta 0:00:01\n",
" ---------------------------------------- 3.1/3.1 MB 49.2 MB/s eta 0:00:00\n",
" Installing build dependencies: started\n",
" Installing build dependencies: still running...\n",
" Installing build dependencies: finished with status 'done'\n",
" Getting requirements to build wheel: started\n",
" Getting requirements to build wheel: finished with status 'done'\n",
" Preparing metadata (pyproject.toml): started\n",
" Preparing metadata (pyproject.toml): finished with status 'done'\n",
"Collecting ray>=2.9\n",
" Downloading ray-2.40.0-cp310-cp310-win_amd64.whl (25.3 MB)\n",
" ---------------------------------------- 0.0/25.3 MB ? eta -:--:--\n",
" ---- ----------------------------------- 3.1/25.3 MB 66.7 MB/s eta 0:00:01\n",
" ---- ----------------------------------- 3.1/25.3 MB 66.7 MB/s eta 0:00:01\n",
" ------- -------------------------------- 4.8/25.3 MB 34.2 MB/s eta 0:00:01\n",
" ------------- -------------------------- 8.6/25.3 MB 45.9 MB/s eta 0:00:01\n",
" ------------------- ------------------- 12.4/25.3 MB 50.4 MB/s eta 0:00:01\n",
" ------------------------ -------------- 15.9/25.3 MB 81.8 MB/s eta 0:00:01\n",
" ------------------------------ -------- 20.0/25.3 MB 81.8 MB/s eta 0:00:01\n",
" ------------------------------------ -- 23.8/25.3 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" --------------------------------------- 25.3/25.3 MB 21.8 MB/s eta 0:00:00\n",
"Collecting lm-format-enforcer<0.11,>=0.10.9\n",
" Downloading lm_format_enforcer-0.10.9-py3-none-any.whl (43 kB)\n",
" ---------------------------------------- 0.0/43.9 kB ? eta -:--:--\n",
" ---------------------------------------- 43.9/43.9 kB 2.2 MB/s eta 0:00:00\n",
"Collecting outlines<0.1,>=0.0.43\n",
" Downloading outlines-0.0.46-py3-none-any.whl (101 kB)\n",
" ---------------------------------------- 0.0/101.9 kB ? eta -:--:--\n",
" ---------------------------------------- 101.9/101.9 kB ? eta 0:00:00\n",
"Collecting aiohttp\n",
" Downloading aiohttp-3.11.10-cp310-cp310-win_amd64.whl (441 kB)\n",
" ---------------------------------------- 0.0/441.8 kB ? eta -:--:--\n",
" ------------------------------------- 441.8/441.8 kB 28.8 MB/s eta 0:00:00\n",
"Collecting tokenizers>=0.19.1\n",
" Downloading tokenizers-0.21.0-cp39-abi3-win_amd64.whl (2.4 MB)\n",
" ---------------------------------------- 0.0/2.4 MB ? eta -:--:--\n",
" --------------------------------------- 2.4/2.4 MB 76.6 MB/s eta 0:00:01\n",
" --------------------------------------- 2.4/2.4 MB 76.6 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.4/2.4 MB 21.6 MB/s eta 0:00:00\n",
"Collecting py-cpuinfo\n",
" Downloading py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)\n",
"Collecting tqdm\n",
" Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)\n",
" ---------------------------------------- 0.0/78.5 kB ? eta -:--:--\n",
" ---------------------------------------- 78.5/78.5 kB 4.6 MB/s eta 0:00:00\n",
"Collecting protobuf\n",
" Downloading protobuf-5.29.1-cp310-abi3-win_amd64.whl (434 kB)\n",
" ---------------------------------------- 0.0/434.5 kB ? eta -:--:--\n",
" ------------------------------------- 434.5/434.5 kB 26.5 MB/s eta 0:00:00\n",
"Collecting uvicorn[standard]\n",
" Downloading uvicorn-0.32.1-py3-none-any.whl (63 kB)\n",
" ---------------------------------------- 0.0/63.8 kB ? eta -:--:--\n",
" ---------------------------------------- 63.8/63.8 kB ? eta 0:00:00\n",
"Collecting tiktoken>=0.6.0\n",
" Downloading tiktoken-0.8.0-cp310-cp310-win_amd64.whl (884 kB)\n",
" ---------------------------------------- 0.0/884.2 kB ? eta -:--:--\n",
" --------------------------------------- 880.6/884.2 kB ? eta -:--:--\n",
" ------------------------------------- 884.2/884.2 kB 54.6 MB/s eta 0:00:00\n",
"Collecting requests>=2.26.0\n",
" Downloading requests-2.32.3-py3-none-any.whl (64 kB)\n",
" ---------------------------------------- 0.0/64.9 kB ? eta -:--:--\n",
" ---------------------------------------- 64.9/64.9 kB 3.6 MB/s eta 0:00:00\n",
"Collecting torch==2.5.1\n",
" Using cached torch-2.5.1-cp310-cp310-win_amd64.whl (203.1 MB)\n",
"Collecting pyyaml\n",
" Downloading PyYAML-6.0.2-cp310-cp310-win_amd64.whl (161 kB)\n",
" ---------------------------------------- 0.0/161.8 kB ? eta -:--:--\n",
" ---------------------------------------- 161.8/161.8 kB ? eta 0:00:00\n",
"Collecting partial-json-parser\n",
" Downloading partial_json_parser-0.2.1.1.post4-py3-none-any.whl (9.9 kB)\n",
"Collecting einops\n",
" Downloading einops-0.8.0-py3-none-any.whl (43 kB)\n",
" ---------------------------------------- 0.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" -------------------------------------- 43.2/43.2 kB 131.9 kB/s eta 0:00:00\n",
"Collecting mistral_common[opencv]>=1.5.0\n",
" Downloading mistral_common-1.5.1-py3-none-any.whl (6.5 MB)\n",
" ---------------------------------------- 0.0/6.5 MB ? eta -:--:--\n",
" --------------- ------------------------ 2.6/6.5 MB 82.9 MB/s eta 0:00:01\n",
" ------------------- -------------------- 3.1/6.5 MB 66.7 MB/s eta 0:00:01\n",
" ---------------------- ----------------- 3.6/6.5 MB 29.1 MB/s eta 0:00:01\n",
" --------------------------------------- 6.5/6.5 MB 37.6 MB/s eta 0:00:01\n",
" --------------------------------------- 6.5/6.5 MB 37.6 MB/s eta 0:00:01\n",
" ---------------------------------------- 6.5/6.5 MB 27.6 MB/s eta 0:00:00\n",
"Collecting pydantic>=2.9\n",
" Downloading pydantic-2.10.3-py3-none-any.whl (456 kB)\n",
" ---------------------------------------- 0.0/457.0 kB ? eta -:--:--\n",
" ------------------------------------- 457.0/457.0 kB 29.8 MB/s eta 0:00:00\n",
"Collecting gguf==0.10.0\n",
" Downloading gguf-0.10.0-py3-none-any.whl (71 kB)\n",
" ---------------------------------------- 0.0/71.6 kB ? eta -:--:--\n",
" ---------------------------------------- 71.6/71.6 kB 3.8 MB/s eta 0:00:00\n",
"Requirement already satisfied: psutil in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from vllm) (6.1.0)\n",
"Collecting compressed-tensors==0.8.0\n",
" Downloading compressed_tensors-0.8.0-py3-none-any.whl (86 kB)\n",
" ---------------------------------------- 0.0/87.0 kB ? eta -:--:--\n",
" ---------------------------------------- 87.0/87.0 kB 5.1 MB/s eta 0:00:00\n",
"Collecting openai>=1.45.0\n",
" Downloading openai-1.57.2-py3-none-any.whl (389 kB)\n",
" ---------------------------------------- 0.0/389.9 kB ? eta -:--:--\n",
" ---------------------------------------- 389.9/389.9 kB ? eta 0:00:00\n",
"Requirement already satisfied: pyzmq in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from vllm) (26.2.0)\n",
"Collecting filelock>=3.10.4\n",
" Using cached filelock-3.16.1-py3-none-any.whl (16 kB)\n",
"Collecting prometheus_client>=0.18.0\n",
" Downloading prometheus_client-0.21.1-py3-none-any.whl (54 kB)\n",
" ---------------------------------------- 0.0/54.7 kB ? eta -:--:--\n",
" ---------------------------------------- 54.7/54.7 kB ? eta 0:00:00\n",
"Collecting msgspec\n",
" Downloading msgspec-0.18.6-cp310-cp310-win_amd64.whl (185 kB)\n",
" ---------------------------------------- 0.0/185.7 kB ? eta -:--:--\n",
" ------------------------------------- 185.7/185.7 kB 11.0 MB/s eta 0:00:00\n",
"Collecting numpy<2.0.0\n",
" Downloading numpy-1.26.4-cp310-cp310-win_amd64.whl (15.8 MB)\n",
" ---------------------------------------- 0.0/15.8 MB ? eta -:--:--\n",
" --------- ------------------------------ 3.6/15.8 MB 77.1 MB/s eta 0:00:01\n",
" ------------------ --------------------- 7.4/15.8 MB 79.5 MB/s eta 0:00:01\n",
" --------------------------- ----------- 11.0/15.8 MB 81.8 MB/s eta 0:00:01\n",
" ------------------------------------ -- 14.6/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" --------------------------------------- 15.8/15.8 MB 23.4 MB/s eta 0:00:00\n",
"Collecting importlib_metadata\n",
" Downloading importlib_metadata-8.5.0-py3-none-any.whl (26 kB)\n",
"Collecting nvidia-ml-py>=12.560.30\n",
" Downloading nvidia_ml_py-12.560.30-py3-none-any.whl (40 kB)\n",
" ---------------------------------------- 0.0/40.5 kB ? eta -:--:--\n",
" ---------------------------------------- 40.5/40.5 kB ? eta 0:00:00\n",
"Collecting torchvision==0.20.1\n",
" Downloading torchvision-0.20.1-cp310-cp310-win_amd64.whl (1.6 MB)\n",
" ---------------------------------------- 0.0/1.6 MB ? eta -:--:--\n",
" ---------------------------------------- 1.6/1.6 MB 50.1 MB/s eta 0:00:00\n",
"Collecting fastapi!=0.113.*,!=0.114.0,>=0.107.0\n",
" Downloading fastapi-0.115.6-py3-none-any.whl (94 kB)\n",
" ---------------------------------------- 0.0/94.8 kB ? eta -:--:--\n",
" ---------------------------------------- 94.8/94.8 kB ? eta 0:00:00\n",
"Collecting sentencepiece\n",
" Downloading sentencepiece-0.2.0-cp310-cp310-win_amd64.whl (991 kB)\n",
" ---------------------------------------- 0.0/991.5 kB ? eta -:--:--\n",
" ------------------------------------- 991.5/991.5 kB 61.3 MB/s eta 0:00:00\n",
"Collecting pillow\n",
" Downloading pillow-11.0.0-cp310-cp310-win_amd64.whl (2.6 MB)\n",
" ---------------------------------------- 0.0/2.6 MB ? eta -:--:--\n",
" --------------------------------------- 2.6/2.6 MB 170.0 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.6/2.6 MB 54.4 MB/s eta 0:00:00\n",
"Collecting prometheus-fastapi-instrumentator>=7.0.0\n",
" Downloading prometheus_fastapi_instrumentator-7.0.0-py3-none-any.whl (19 kB)\n",
"Requirement already satisfied: typing_extensions>=4.10 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from vllm) (4.12.2)\n",
"Collecting transformers>=4.45.2\n",
" Downloading transformers-4.47.0-py3-none-any.whl (10.1 MB)\n",
" ---------------------------------------- 0.0/10.1 MB ? eta -:--:--\n",
" --------------- ------------------------ 4.0/10.1 MB 84.3 MB/s eta 0:00:01\n",
" ------------------------------ --------- 7.8/10.1 MB 83.0 MB/s eta 0:00:01\n",
" -------------------------------------- 10.1/10.1 MB 80.9 MB/s eta 0:00:01\n",
" --------------------------------------- 10.1/10.1 MB 58.9 MB/s eta 0:00:00\n",
"Collecting jinja2\n",
" Using cached jinja2-3.1.4-py3-none-any.whl (133 kB)\n",
"Collecting sympy==1.13.1\n",
" Using cached sympy-1.13.1-py3-none-any.whl (6.2 MB)\n",
"Collecting fsspec\n",
" Using cached fsspec-2024.10.0-py3-none-any.whl (179 kB)\n",
"Collecting networkx\n",
" Using cached networkx-3.4.2-py3-none-any.whl (1.7 MB)\n",
"Collecting mpmath<1.4,>=1.1.0\n",
" Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)\n",
"Collecting starlette<0.42.0,>=0.40.0\n",
" Downloading starlette-0.41.3-py3-none-any.whl (73 kB)\n",
" ---------------------------------------- 0.0/73.2 kB ? eta -:--:--\n",
" ---------------------------------------- 73.2/73.2 kB ? eta 0:00:00\n",
"Requirement already satisfied: packaging in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from lm-format-enforcer<0.11,>=0.10.9->vllm) (24.2)\n",
"Collecting interegular>=0.3.2\n",
" Downloading interegular-0.3.3-py37-none-any.whl (23 kB)\n",
"Collecting jsonschema<5.0.0,>=4.21.1\n",
" Downloading jsonschema-4.23.0-py3-none-any.whl (88 kB)\n",
" ---------------------------------------- 0.0/88.5 kB ? eta -:--:--\n",
" ---------------------------------------- 88.5/88.5 kB ? eta 0:00:00\n",
"Collecting tiktoken>=0.6.0\n",
" Downloading tiktoken-0.7.0-cp310-cp310-win_amd64.whl (798 kB)\n",
" ---------------------------------------- 0.0/798.9 kB ? eta -:--:--\n",
" ------------------------------------- 798.9/798.9 kB 49.3 MB/s eta 0:00:00\n",
"Collecting pillow\n",
" Downloading pillow-10.4.0-cp310-cp310-win_amd64.whl (2.6 MB)\n",
" ---------------------------------------- 0.0/2.6 MB ? eta -:--:--\n",
" --------------------------------------- 2.5/2.6 MB 79.4 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.6/2.6 MB 54.1 MB/s eta 0:00:00\n",
"Collecting opencv-python-headless<5.0.0,>=4.0.0\n",
" Downloading opencv_python_headless-4.10.0.84-cp37-abi3-win_amd64.whl (38.8 MB)\n",
" ---------------------------------------- 0.0/38.8 MB ? eta -:--:--\n",
" --- ------------------------------------ 3.8/38.8 MB 80.8 MB/s eta 0:00:01\n",
" ------- -------------------------------- 7.6/38.8 MB 81.0 MB/s eta 0:00:01\n",
" ----------- --------------------------- 11.5/38.8 MB 81.8 MB/s eta 0:00:01\n",
" --------------- ----------------------- 15.3/38.8 MB 81.8 MB/s eta 0:00:01\n",
" ------------------- ------------------- 19.2/38.8 MB 81.8 MB/s eta 0:00:01\n",
" ----------------------- --------------- 23.0/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------- ------------ 26.8/38.8 MB 81.8 MB/s eta 0:00:01\n",
" ------------------------------ -------- 30.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" ---------------------------------- ---- 34.4/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.3/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" --------------------------------------- 38.8/38.8 MB 15.2 MB/s eta 0:00:00\n",
"Collecting anyio<5,>=3.5.0\n",
" Downloading anyio-4.7.0-py3-none-any.whl (93 kB)\n",
" ---------------------------------------- 0.0/93.1 kB ? eta -:--:--\n",
" ---------------------------------------- 93.1/93.1 kB 2.7 MB/s eta 0:00:00\n",
"Collecting sniffio\n",
" Downloading sniffio-1.3.1-py3-none-any.whl (10 kB)\n",
"Collecting httpx<1,>=0.23.0\n",
" Downloading httpx-0.28.1-py3-none-any.whl (73 kB)\n",
" ---------------------------------------- 0.0/73.5 kB ? eta -:--:--\n",
" ---------------------------------------- 73.5/73.5 kB ? eta 0:00:00\n",
"Collecting distro<2,>=1.7.0\n",
" Downloading distro-1.9.0-py3-none-any.whl (20 kB)\n",
"Collecting jiter<1,>=0.4.0\n",
" Downloading jiter-0.8.2-cp310-cp310-win_amd64.whl (204 kB)\n",
" ---------------------------------------- 0.0/204.4 kB ? eta -:--:--\n",
" ---------------------------------------- 204.4/204.4 kB ? eta 0:00:00\n",
"Collecting lark\n",
" Downloading lark-1.2.2-py3-none-any.whl (111 kB)\n",
" ---------------------------------------- 0.0/111.0 kB ? eta -:--:--\n",
" ---------------------------------------- 111.0/111.0 kB ? eta 0:00:00\n",
"Collecting diskcache\n",
" Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)\n",
" ---------------------------------------- 0.0/45.5 kB ? eta -:--:--\n",
" ---------------------------------------- 45.5/45.5 kB ? eta 0:00:00\n",
"Collecting referencing\n",
" Downloading referencing-0.35.1-py3-none-any.whl (26 kB)\n",
"Collecting datasets\n",
" Downloading datasets-3.2.0-py3-none-any.whl (480 kB)\n",
" ---------------------------------------- 0.0/480.6 kB ? eta -:--:--\n",
" ------------------------------------- 480.6/480.6 kB 31.4 MB/s eta 0:00:00\n",
"Collecting numba\n",
" Downloading numba-0.60.0-cp310-cp310-win_amd64.whl (2.7 MB)\n",
" ---------------------------------------- 0.0/2.7 MB ? eta -:--:--\n",
" --------------------------------------- 2.7/2.7 MB 86.2 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.7/2.7 MB 43.2 MB/s eta 0:00:00\n",
"Collecting pycountry\n",
" Downloading pycountry-24.6.1-py3-none-any.whl (6.3 MB)\n",
" ---------------------------------------- 0.0/6.3 MB ? eta -:--:--\n",
" ------------------------- -------------- 4.0/6.3 MB 84.8 MB/s eta 0:00:01\n",
" --------------------------------------- 6.3/6.3 MB 81.0 MB/s eta 0:00:01\n",
" ---------------------------------------- 6.3/6.3 MB 58.0 MB/s eta 0:00:00\n",
"Collecting pyairports\n",
" Downloading pyairports-2.1.1-py3-none-any.whl (371 kB)\n",
" ---------------------------------------- 0.0/371.7 kB ? eta -:--:--\n",
" ---------------------------------------- 371.7/371.7 kB ? eta 0:00:00\n",
"Requirement already satisfied: nest-asyncio in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from outlines<0.1,>=0.0.43->vllm) (1.6.0)\n",
"Collecting cloudpickle\n",
" Downloading cloudpickle-3.1.0-py3-none-any.whl (22 kB)\n",
"Collecting pydantic-core==2.27.1\n",
" Downloading pydantic_core-2.27.1-cp310-none-win_amd64.whl (2.0 MB)\n",
" ---------------------------------------- 0.0/2.0 MB ? eta -:--:--\n",
" ---------------------------------------- 2.0/2.0 MB 42.8 MB/s eta 0:00:00\n",
"Collecting annotated-types>=0.6.0\n",
" Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)\n",
"Collecting msgpack<2.0.0,>=1.0.0\n",
" Downloading msgpack-1.1.0-cp310-cp310-win_amd64.whl (74 kB)\n",
" ---------------------------------------- 0.0/74.7 kB ? eta -:--:--\n",
" ---------------------------------------- 74.7/74.7 kB ? eta 0:00:00\n",
"Collecting frozenlist\n",
" Downloading frozenlist-1.5.0-cp310-cp310-win_amd64.whl (51 kB)\n",
" ---------------------------------------- 0.0/51.6 kB ? eta -:--:--\n",
" ---------------------------------------- 51.6/51.6 kB 2.8 MB/s eta 0:00:00\n",
"Collecting click>=7.0\n",
" Downloading click-8.1.7-py3-none-any.whl (97 kB)\n",
" ---------------------------------------- 0.0/97.9 kB ? eta -:--:--\n",
" ---------------------------------------- 97.9/97.9 kB ? eta 0:00:00\n",
"Collecting aiosignal\n",
" Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\n",
"Collecting urllib3<3,>=1.21.1\n",
" Downloading urllib3-2.2.3-py3-none-any.whl (126 kB)\n",
" ---------------------------------------- 0.0/126.3 kB ? eta -:--:--\n",
" -------------------------------------- 126.3/126.3 kB 7.3 MB/s eta 0:00:00\n",
"Collecting certifi>=2017.4.17\n",
" Downloading certifi-2024.8.30-py3-none-any.whl (167 kB)\n",
" ---------------------------------------- 0.0/167.3 kB ? eta -:--:--\n",
" ---------------------------------------- 167.3/167.3 kB ? eta 0:00:00\n",
"Collecting charset-normalizer<4,>=2\n",
" Downloading charset_normalizer-3.4.0-cp310-cp310-win_amd64.whl (102 kB)\n",
" ---------------------------------------- 0.0/102.2 kB ? eta -:--:--\n",
" -------------------------------------- 102.2/102.2 kB 6.1 MB/s eta 0:00:00\n",
"Collecting idna<4,>=2.5\n",
" Downloading idna-3.10-py3-none-any.whl (70 kB)\n",
" ---------------------------------------- 0.0/70.4 kB ? eta -:--:--\n",
" ---------------------------------------- 70.4/70.4 kB 3.8 MB/s eta 0:00:00\n",
"Collecting regex>=2022.1.18\n",
" Downloading regex-2024.11.6-cp310-cp310-win_amd64.whl (274 kB)\n",
" ---------------------------------------- 0.0/274.0 kB ? eta -:--:--\n",
" ------------------------------------- 274.0/274.0 kB 16.5 MB/s eta 0:00:00\n",
"Collecting huggingface-hub<1.0,>=0.16.4\n",
" Downloading huggingface_hub-0.26.5-py3-none-any.whl (447 kB)\n",
" ---------------------------------------- 0.0/447.8 kB ? eta -:--:--\n",
" ------------------------------------- 447.8/447.8 kB 29.2 MB/s eta 0:00:00\n",
"Requirement already satisfied: colorama in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from tqdm->vllm) (0.4.6)\n",
"Collecting safetensors>=0.4.1\n",
" Downloading safetensors-0.4.5-cp310-none-win_amd64.whl (285 kB)\n",
" ---------------------------------------- 0.0/285.9 kB ? eta -:--:--\n",
" ------------------------------------- 285.9/285.9 kB 18.4 MB/s eta 0:00:00\n",
"Collecting multidict<7.0,>=4.5\n",
" Downloading multidict-6.1.0-cp310-cp310-win_amd64.whl (28 kB)\n",
"Collecting yarl<2.0,>=1.17.0\n",
" Downloading yarl-1.18.3-cp310-cp310-win_amd64.whl (90 kB)\n",
" ---------------------------------------- 0.0/90.5 kB ? eta -:--:--\n",
" ---------------------------------------- 90.5/90.5 kB 5.0 MB/s eta 0:00:00\n",
"Collecting propcache>=0.2.0\n",
" Downloading propcache-0.2.1-cp310-cp310-win_amd64.whl (44 kB)\n",
" ---------------------------------------- 0.0/44.4 kB ? eta -:--:--\n",
" ---------------------------------------- 44.4/44.4 kB 2.1 MB/s eta 0:00:00\n",
"Collecting aiohappyeyeballs>=2.3.0\n",
" Downloading aiohappyeyeballs-2.4.4-py3-none-any.whl (14 kB)\n",
"Collecting async-timeout<6.0,>=4.0\n",
" Downloading async_timeout-5.0.1-py3-none-any.whl (6.2 kB)\n",
"Collecting attrs>=17.3.0\n",
" Downloading attrs-24.2.0-py3-none-any.whl (63 kB)\n",
" ---------------------------------------- 0.0/63.0 kB ? eta -:--:--\n",
" ---------------------------------------- 63.0/63.0 kB ? eta 0:00:00\n",
"Collecting zipp>=3.20\n",
" Downloading zipp-3.21.0-py3-none-any.whl (9.6 kB)\n",
"Collecting h11>=0.8\n",
" Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
" ---------------------------------------- 0.0/58.3 kB ? eta -:--:--\n",
" ---------------------------------------- 58.3/58.3 kB 3.0 MB/s eta 0:00:00\n",
"Collecting httptools>=0.6.3\n",
" Downloading httptools-0.6.4-cp310-cp310-win_amd64.whl (88 kB)\n",
" ---------------------------------------- 0.0/88.3 kB ? eta -:--:--\n",
" ---------------------------------------- 88.3/88.3 kB 4.9 MB/s eta 0:00:00\n",
"Collecting watchfiles>=0.13\n",
" Downloading watchfiles-1.0.1-cp310-cp310-win_amd64.whl (284 kB)\n",
" ---------------------------------------- 0.0/284.3 kB ? eta -:--:--\n",
" ---------------------------------------- 284.3/284.3 kB ? eta 0:00:00\n",
"Collecting python-dotenv>=0.13\n",
" Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n",
"Collecting websockets>=10.4\n",
" Downloading websockets-14.1-cp310-cp310-win_amd64.whl (163 kB)\n",
" ---------------------------------------- 0.0/163.3 kB ? eta -:--:--\n",
" ---------------------------------------- 163.3/163.3 kB ? eta 0:00:00\n",
"Requirement already satisfied: exceptiongroup>=1.0.2 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from anyio<5,>=3.5.0->openai>=1.45.0->vllm) (1.2.2)\n",
"Collecting httpcore==1.*\n",
" Downloading httpcore-1.0.7-py3-none-any.whl (78 kB)\n",
" ---------------------------------------- 0.0/78.6 kB ? eta -:--:--\n",
" ---------------------------------------- 78.6/78.6 kB ? eta 0:00:00\n",
"Collecting jsonschema-specifications>=2023.03.6\n",
" Downloading jsonschema_specifications-2024.10.1-py3-none-any.whl (18 kB)\n",
"Collecting rpds-py>=0.7.1\n",
" Downloading rpds_py-0.22.3-cp310-cp310-win_amd64.whl (231 kB)\n",
" ---------------------------------------- 0.0/231.7 kB ? eta -:--:--\n",
" ------------------------------------- 231.7/231.7 kB 14.8 MB/s eta 0:00:00\n",
"Collecting fsspec[http]<=2024.9.0,>=2023.1.0\n",
" Downloading fsspec-2024.9.0-py3-none-any.whl (179 kB)\n",
" ---------------------------------------- 0.0/179.3 kB ? eta -:--:--\n",
" ------------------------------------- 179.3/179.3 kB 11.3 MB/s eta 0:00:00\n",
"Collecting xxhash\n",
" Downloading xxhash-3.5.0-cp310-cp310-win_amd64.whl (30 kB)\n",
"Collecting pyarrow>=15.0.0\n",
" Downloading pyarrow-18.1.0-cp310-cp310-win_amd64.whl (25.1 MB)\n",
" ---------------------------------------- 0.0/25.1 MB ? eta -:--:--\n",
" ----- --------------------------------- 3.7/25.1 MB 119.9 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 7.5/25.1 MB 95.7 MB/s eta 0:00:01\n",
" ----------------- --------------------- 11.2/25.1 MB 81.8 MB/s eta 0:00:01\n",
" ----------------------- --------------- 15.0/25.1 MB 81.8 MB/s eta 0:00:01\n",
" ----------------------------- --------- 18.9/25.1 MB 81.8 MB/s eta 0:00:01\n",
" ----------------------------------- --- 22.7/25.1 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 24.5/25.1 MB 72.6 MB/s eta 0:00:01\n",
" -------------------------------------- 25.1/25.1 MB 54.4 MB/s eta 0:00:01\n",
" -------------------------------------- 25.1/25.1 MB 54.4 MB/s eta 0:00:01\n",
" -------------------------------------- 25.1/25.1 MB 54.4 MB/s eta 0:00:01\n",
" --------------------------------------- 25.1/25.1 MB 32.7 MB/s eta 0:00:00\n",
"Collecting pandas\n",
" Downloading pandas-2.2.3-cp310-cp310-win_amd64.whl (11.6 MB)\n",
" ---------------------------------------- 0.0/11.6 MB ? eta -:--:--\n",
" -------------- ------------------------- 4.2/11.6 MB 90.4 MB/s eta 0:00:01\n",
" --------------------------- ------------ 8.0/11.6 MB 85.7 MB/s eta 0:00:01\n",
" -------------------------------------- 11.6/11.6 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 11.6/11.6 MB 81.8 MB/s eta 0:00:01\n",
" --------------------------------------- 11.6/11.6 MB 54.4 MB/s eta 0:00:00\n",
"Collecting dill<0.3.9,>=0.3.0\n",
" Downloading dill-0.3.8-py3-none-any.whl (116 kB)\n",
" ---------------------------------------- 0.0/116.3 kB ? eta -:--:--\n",
" ---------------------------------------- 116.3/116.3 kB ? eta 0:00:00\n",
"Collecting multiprocess<0.70.17\n",
" Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)\n",
" ---------------------------------------- 0.0/134.8 kB ? eta -:--:--\n",
" ---------------------------------------- 134.8/134.8 kB ? eta 0:00:00\n",
"Collecting MarkupSafe>=2.0\n",
" Using cached MarkupSafe-3.0.2-cp310-cp310-win_amd64.whl (15 kB)\n",
"Collecting llvmlite<0.44,>=0.43.0dev0\n",
" Downloading llvmlite-0.43.0-cp310-cp310-win_amd64.whl (28.1 MB)\n",
" ---------------------------------------- 0.0/28.1 MB ? eta -:--:--\n",
" ----- --------------------------------- 3.7/28.1 MB 119.6 MB/s eta 0:00:01\n",
" ---------- ----------------------------- 7.6/28.1 MB 96.8 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n",
" ------------ --------------------------- 8.6/28.1 MB 27.6 MB/s eta 0:00:01\n",
" --------------------------------- ---- 24.8/28.1 MB 330.0 MB/s eta 0:00:01\n",
" ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n",
" ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n",
" ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n",
" ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n",
" --------------------------------------- 28.1/28.1 MB 46.7 MB/s eta 0:00:00\n",
"Collecting tzdata>=2022.7\n",
" Downloading tzdata-2024.2-py2.py3-none-any.whl (346 kB)\n",
" ---------------------------------------- 0.0/346.6 kB ? eta -:--:--\n",
" ---------------------------------------- 346.6/346.6 kB ? eta 0:00:00\n",
"Collecting pytz>=2020.1\n",
" Downloading pytz-2024.2-py2.py3-none-any.whl (508 kB)\n",
" ---------------------------------------- 0.0/508.0 kB ? eta -:--:--\n",
" ------------------------------------- 508.0/508.0 kB 31.1 MB/s eta 0:00:00\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from pandas->datasets->outlines<0.1,>=0.0.43->vllm) (2.9.0.post0)\n",
"Requirement already satisfied: six>=1.5 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from python-dateutil>=2.8.2->pandas->datasets->outlines<0.1,>=0.0.43->vllm) (1.17.0)\n",
"Building wheels for collected packages: vllm\n",
" Building wheel for vllm (pyproject.toml): started\n",
" Building wheel for vllm (pyproject.toml): finished with status 'done'\n",
" Created wheel for vllm: filename=vllm-0.6.4.post1-py3-none-any.whl size=1569302 sha256=beb78c80329b7cc624604a7f488de8ad3980fb2ede5ec19d2760688e46953955\n",
" Stored in directory: c:\\users\\tr4shhh\\appdata\\local\\pip\\cache\\wheels\\13\\ca\\7b\\d22b97dfa0a4deb1a6f7a26e495d459bc71b0b4bbf6daa1c06\n",
"Successfully built vllm\n",
"Installing collected packages: sentencepiece, pytz, pyairports, py-cpuinfo, nvidia-ml-py, mpmath, zipp, xxhash, websockets, urllib3, tzdata, tqdm, sympy, sniffio, safetensors, rpds-py, regex, pyyaml, python-dotenv, pydantic-core, pycountry, pyarrow, protobuf, propcache, prometheus_client, pillow, partial-json-parser, numpy, networkx, multidict, msgspec, msgpack, MarkupSafe, llvmlite, lark, jiter, interegular, idna, httptools, h11, fsspec, frozenlist, filelock, einops, distro, diskcache, dill, cloudpickle, click, charset-normalizer, certifi, attrs, async-timeout, annotated-types, aiohappyeyeballs, yarl, uvicorn, requests, referencing, pydantic, pandas, opencv-python-headless, numba, multiprocess, jinja2, importlib_metadata, httpcore, gguf, anyio, aiosignal, watchfiles, torch, tiktoken, starlette, lm-format-enforcer, jsonschema-specifications, huggingface-hub, httpx, aiohttp, torchvision, tokenizers, prometheus-fastapi-instrumentator, openai, jsonschema, fastapi, transformers, ray, mistral_common, datasets, outlines, compressed-tensors, vllm\n",
"Successfully installed MarkupSafe-3.0.2 aiohappyeyeballs-2.4.4 aiohttp-3.11.10 aiosignal-1.3.1 annotated-types-0.7.0 anyio-4.7.0 async-timeout-5.0.1 attrs-24.2.0 certifi-2024.8.30 charset-normalizer-3.4.0 click-8.1.7 cloudpickle-3.1.0 compressed-tensors-0.8.0 datasets-3.2.0 dill-0.3.8 diskcache-5.6.3 distro-1.9.0 einops-0.8.0 fastapi-0.115.6 filelock-3.16.1 frozenlist-1.5.0 fsspec-2024.9.0 gguf-0.10.0 h11-0.14.0 httpcore-1.0.7 httptools-0.6.4 httpx-0.28.1 huggingface-hub-0.26.5 idna-3.10 importlib_metadata-8.5.0 interegular-0.3.3 jinja2-3.1.4 jiter-0.8.2 jsonschema-4.23.0 jsonschema-specifications-2024.10.1 lark-1.2.2 llvmlite-0.43.0 lm-format-enforcer-0.10.9 mistral_common-1.5.1 mpmath-1.3.0 msgpack-1.1.0 msgspec-0.18.6 multidict-6.1.0 multiprocess-0.70.16 networkx-3.4.2 numba-0.60.0 numpy-1.26.4 nvidia-ml-py-12.560.30 openai-1.57.2 opencv-python-headless-4.10.0.84 outlines-0.0.46 pandas-2.2.3 partial-json-parser-0.2.1.1.post4 pillow-10.4.0 prometheus-fastapi-instrumentator-7.0.0 prometheus_client-0.21.1 propcache-0.2.1 protobuf-5.29.1 py-cpuinfo-9.0.0 pyairports-2.1.1 pyarrow-18.1.0 pycountry-24.6.1 pydantic-2.10.3 pydantic-core-2.27.1 python-dotenv-1.0.1 pytz-2024.2 pyyaml-6.0.2 ray-2.40.0 referencing-0.35.1 regex-2024.11.6 requests-2.32.3 rpds-py-0.22.3 safetensors-0.4.5 sentencepiece-0.2.0 sniffio-1.3.1 starlette-0.41.3 sympy-1.13.1 tiktoken-0.7.0 tokenizers-0.21.0 torch-2.5.1 torchvision-0.20.1 tqdm-4.67.1 transformers-4.47.0 tzdata-2024.2 urllib3-2.2.3 uvicorn-0.32.1 vllm-0.6.4.post1 watchfiles-1.0.1 websockets-14.1 xxhash-3.5.0 yarl-1.18.3 zipp-3.21.0\n"
]
}
],
"source": [
"!pip install vllm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 2: Verificación de dependencias\n",
"Asegúrate de que las dependencias requeridas, como PyTorch, estén instaladas. Si no tienes PyTorch, instálalo con el comando apropiado para tu entorno."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Looking in indexes: https://download.pytorch.org/whl/cpu\n",
"Requirement already satisfied: torch in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (2.5.1)\n",
"Requirement already satisfied: torchvision in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (0.20.1)\n",
"Collecting torchaudio\n",
" Downloading https://download.pytorch.org/whl/cpu/torchaudio-2.5.1%2Bcpu-cp310-cp310-win_amd64.whl (2.4 MB)\n",
" ---------------------------------------- 0.0/2.4 MB ? eta -:--:--\n",
" ----- ---------------------------------- 0.3/2.4 MB 19.8 MB/s eta 0:00:01\n",
" ------------- -------------------------- 0.8/2.4 MB 12.4 MB/s eta 0:00:01\n",
" --------------------- ------------------ 1.3/2.4 MB 11.5 MB/s eta 0:00:01\n",
" ------------------------------ --------- 1.8/2.4 MB 11.5 MB/s eta 0:00:01\n",
" --------------------------------------- 2.4/2.4 MB 11.9 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.4/2.4 MB 11.0 MB/s eta 0:00:00\n",
"Requirement already satisfied: jinja2 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (3.1.4)\n",
"Requirement already satisfied: filelock in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (3.16.1)\n",
"Requirement already satisfied: fsspec in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (2024.9.0)\n",
"Requirement already satisfied: networkx in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (3.4.2)\n",
"Requirement already satisfied: sympy==1.13.1 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (1.13.1)\n",
"Requirement already satisfied: typing-extensions>=4.8.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (4.12.2)\n",
"Requirement already satisfied: mpmath<1.4,>=1.1.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from sympy==1.13.1->torch) (1.3.0)\n",
"Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torchvision) (10.4.0)\n",
"Requirement already satisfied: numpy in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torchvision) (1.26.4)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from jinja2->torch) (3.0.2)\n",
"Installing collected packages: torchaudio\n",
"Successfully installed torchaudio-2.5.1+cpu\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"[notice] A new release of pip is available: 23.0.1 -> 24.3.1\n",
"[notice] To update, run: python.exe -m pip install --upgrade pip\n"
]
}
],
"source": [
"!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 3: Carga del modelo local\n",
"Usaremos la clase `LLM` de vLLM para cargar el modelo desde una carpeta local. En este ejemplo, asumimos que el modelo está almacenado en un directorio llamado `./modelo_local`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from vllm import LLM\n",
"\n",
"# Ruta al modelo local\n",
"ruta_modelo = \"./modelo_local\"\n",
"\n",
"# Inicialización del modelo\n",
"llm = LLM(model=ruta_modelo)\n",
"print(\"Modelo cargado exitosamente.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 4: Realización de inferencias\n",
"Con el modelo cargado, podemos realizar inferencias utilizando la función `generate`. Proporciona un texto de entrada y el modelo generará una respuesta."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Texto de entrada\n",
"texto_entrada = \"¿Cuál es la capital de Francia?\"\n",
"\n",
"# Generar respuesta\n",
"respuesta = llm.generate(texto_entrada)\n",
"print(\"Respuesta generada:\", respuesta)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 5: Ajuste de parámetros (opcional)\n",
"Puedes ajustar parámetros como la temperatura o el número máximo de tokens generados para personalizar el comportamiento del modelo."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Ajuste de parámetros\n",
"respuesta_ajustada = llm.generate(\n",
" texto_entrada,\n",
" temperature=0.7, # Controla la aleatoriedad\n",
" max_tokens=100 # Límite de tokens generados\n",
")\n",
"print(\"Respuesta ajustada:\", respuesta_ajustada)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 6: Conclusión\n",
"Has aprendido a instalar y usar vLLM para cargar un modelo local y realizar inferencias. Puedes explorar más características en la [documentación oficial de vLLM](https://docs.vllm.ai/en/latest/).\n",
"\n",
"¡Ahora estás listo para usar modelos de lenguaje eficientemente con vLLM!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}