diff --git a/descargar_modelos.ipynb b/descargar_modelos.ipynb index 4b60a0e..d3cda47 100644 --- a/descargar_modelos.ipynb +++ b/descargar_modelos.ipynb @@ -1,155 +1,154 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tutorial: Uso de Modelos LLM con vLLM\n", - "En este tutorial aprenderás:\n", - "- Qué es vLLM y sus ventajas.\n", - "- Los tipos de modelos compatibles con vLLM.\n", - "- Cómo descargar y usar modelos con vLLM para tareas como chat." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## ¿Qué es vLLM?\n", - "**vLLM** es una biblioteca optimizada para ejecutar modelos LLM de manera eficiente, aprovechando técnicas avanzadas de gestión de memoria y paralelización. Está diseñada para manejar tareas como generación de texto en grandes modelos con un rendimiento mejorado.\n", - "\n", - "**Ventajas de vLLM:**\n", - "- Mayor eficiencia en el uso de GPU y memoria.\n", - "- Reducción en los tiempos de inferencia.\n", - "- Compatibilidad con modelos de Hugging Face y otras plataformas populares." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Tipos de Modelos Compatibles con vLLM\n", - "Los modelos más comunes que se pueden usar con vLLM incluyen:\n", - "- **Modelos abiertos:** Por ejemplo, GPT-NeoX, OPT, LLaMA, y Falcon.\n", - "- **Modelos comerciales:** Modelos compatibles disponibles en Hugging Face o APIs comerciales como GPT-3 o GPT-4.\n", - "- **Modelos específicos:** Diseñados para tareas específicas como chat o generación de código (por ejemplo, CodeT5, StarCoder)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Instalación de vLLM\n", - "Primero, necesitamos instalar la biblioteca vLLM. Esto se puede hacer con `pip`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install vllm" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Descargar Modelos para vLLM\n", - "Usaremos un modelo abierto como LLaMA 2 o GPT-NeoX desde el hub de Hugging Face. La biblioteca vLLM puede cargar modelos directamente desde Hugging Face." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Importar las bibliotecas necesarias\n", - "from vllm import LLM, SamplingParams\n", - "\n", - "# Nombre del modelo (puedes cambiarlo según tu preferencia)\n", - "model_name = 'meta-llama/Llama-2-7b-chat-hf'\n", - "\n", - "# Inicializar el modelo\n", - "llm = LLM(model=model_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Uso de vLLM para Chat\n", - "Podemos usar vLLM para interactuar con el modelo en aplicaciones de chat. Esto implica enviar un prompt y recibir una respuesta generada por el modelo." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Parámetros de entrada para la generación\n", - "prompt = \"Hola, ¿cómo puedo ayudarte hoy?\"\n", - "\n", - "# Parámetros de muestreo\n", - "sampling_params = SamplingParams(\n", - " temperature=0.7, # Controla la creatividad\n", - " max_tokens=100 # Número máximo de tokens generados\n", - ")\n", - "\n", - "# Generar texto\n", - "outputs = llm.generate(prompt, sampling_params)\n", - "\n", - "# Mostrar la respuesta\n", - "print(outputs[0].text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Otros Modelos Compatibles\n", - "Además de LLaMA 2, puedes usar otros modelos como:\n", - "- **Falcon:** Un modelo optimizado para tareas de NLP, disponible en tamaños como 7B y 40B.\n", - "- **OPT (Meta):** Diseñado para tareas de lenguaje natural.\n", - "- **GPT-NeoX:** Modelo de EleutherAI diseñado como una alternativa a GPT-3.\n", - "\n", - "Estos modelos se pueden cargar de manera similar cambiando el nombre del modelo en el código." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Conclusión\n", - "Este tutorial mostró cómo instalar y usar vLLM para ejecutar modelos LLM de manera eficiente. Los modelos como LLaMA, Falcon, y GPT-NeoX son compatibles y pueden ser utilizados para tareas como chat y generación de texto.\n", - "\n", - "Con vLLM, puedes aprovechar grandes modelos con un rendimiento optimizado, ideal para aplicaciones de inteligencia artificial." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 - } - \ No newline at end of file + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial: Uso de Modelos LLM con vLLM\n", + "En este tutorial aprenderás:\n", + "- Qué es vLLM y sus ventajas.\n", + "- Los tipos de modelos compatibles con vLLM.\n", + "- Cómo descargar y usar modelos con vLLM para tareas como chat." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## ¿Qué es vLLM?\n", + "**vLLM** es una biblioteca optimizada para ejecutar modelos LLM de manera eficiente, aprovechando técnicas avanzadas de gestión de memoria y paralelización. Está diseñada para manejar tareas como generación de texto en grandes modelos con un rendimiento mejorado.\n", + "\n", + "**Ventajas de vLLM:**\n", + "- Mayor eficiencia en el uso de GPU y memoria.\n", + "- Reducción en los tiempos de inferencia.\n", + "- Compatibilidad con modelos de Hugging Face y otras plataformas populares." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Tipos de Modelos Compatibles con vLLM\n", + "Los modelos más comunes que se pueden usar con vLLM incluyen:\n", + "- **Modelos abiertos:** Por ejemplo, GPT-NeoX, OPT, LLaMA, y Falcon.\n", + "- **Modelos comerciales:** Modelos compatibles disponibles en Hugging Face o APIs comerciales como GPT-3 o GPT-4.\n", + "- **Modelos específicos:** Diseñados para tareas específicas como chat o generación de código (por ejemplo, CodeT5, StarCoder)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Instalación de vLLM\n", + "Primero, necesitamos instalar la biblioteca vLLM. Esto se puede hacer con `pip`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install vllm" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Descargar Modelos para vLLM\n", + "Usaremos un modelo abierto como LLaMA 2 o GPT-NeoX desde el hub de Hugging Face. La biblioteca vLLM puede cargar modelos directamente desde Hugging Face." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Importar las bibliotecas necesarias\n", + "from vllm import LLM, SamplingParams\n", + "\n", + "# Nombre del modelo (puedes cambiarlo según tu preferencia)\n", + "model_name = 'meta-llama/Llama-2-7b-chat-hf'\n", + "\n", + "# Inicializar el modelo\n", + "llm = LLM(model=model_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Uso de vLLM para Chat\n", + "Podemos usar vLLM para interactuar con el modelo en aplicaciones de chat. Esto implica enviar un prompt y recibir una respuesta generada por el modelo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Parámetros de entrada para la generación\n", + "prompt = \"Hola, ¿cómo puedo ayudarte hoy?\"\n", + "\n", + "# Parámetros de muestreo\n", + "sampling_params = SamplingParams(\n", + " temperature=0.7, # Controla la creatividad\n", + " max_tokens=100 # Número máximo de tokens generados\n", + ")\n", + "\n", + "# Generar texto\n", + "outputs = llm.generate(prompt, sampling_params)\n", + "\n", + "# Mostrar la respuesta\n", + "print(outputs[0].text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Otros Modelos Compatibles\n", + "Además de LLaMA 2, puedes usar otros modelos como:\n", + "- **Falcon:** Un modelo optimizado para tareas de NLP, disponible en tamaños como 7B y 40B.\n", + "- **OPT (Meta):** Diseñado para tareas de lenguaje natural.\n", + "- **GPT-NeoX:** Modelo de EleutherAI diseñado como una alternativa a GPT-3.\n", + "\n", + "Estos modelos se pueden cargar de manera similar cambiando el nombre del modelo en el código." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusión\n", + "Este tutorial mostró cómo instalar y usar vLLM para ejecutar modelos LLM de manera eficiente. Los modelos como LLaMA, Falcon, y GPT-NeoX son compatibles y pueden ser utilizados para tareas como chat y generación de texto.\n", + "\n", + "Con vLLM, puedes aprovechar grandes modelos con un rendimiento optimizado, ideal para aplicaciones de inteligencia artificial." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/tutorial_vllm.ipynb b/tutorial_vllm.ipynb index e8d8d67..cb1ba96 100644 --- a/tutorial_vllm.ipynb +++ b/tutorial_vllm.ipynb @@ -1,152 +1,661 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tutorial: Uso de vLLM con un modelo local\n", - "En este tutorial aprenderás a utilizar **vLLM** para cargar un modelo descargado localmente y realizar inferencias.\n", - "\n", - "## ¿Qué es vLLM?\n", - "vLLM es un runtime de alto rendimiento para modelos de lenguaje grande (LLMs). Ofrece características avanzadas como el manejo eficiente de memoria y procesamiento paralelo.\n", - "\n", - "Este tutorial asumirá que ya tienes un modelo descargado localmente, por ejemplo, un modelo de Hugging Face." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Paso 1: Instalación de vLLM\n", - "Primero, necesitamos instalar la biblioteca **vLLM**. Puedes hacerlo con el siguiente comando." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install vllm" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Paso 2: Verificación de dependencias\n", - "Asegúrate de que las dependencias requeridas, como PyTorch, estén instaladas. Si no tienes PyTorch, instálalo con el comando apropiado para tu entorno." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Paso 3: Carga del modelo local\n", - "Usaremos la clase `LLM` de vLLM para cargar el modelo desde una carpeta local. En este ejemplo, asumimos que el modelo está almacenado en un directorio llamado `./modelo_local`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from vllm import LLM\n", - "\n", - "# Ruta al modelo local\n", - "ruta_modelo = \"./modelo_local\"\n", - "\n", - "# Inicialización del modelo\n", - "llm = LLM(model=ruta_modelo)\n", - "print(\"Modelo cargado exitosamente.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Paso 4: Realización de inferencias\n", - "Con el modelo cargado, podemos realizar inferencias utilizando la función `generate`. Proporciona un texto de entrada y el modelo generará una respuesta." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Texto de entrada\n", - "texto_entrada = \"¿Cuál es la capital de Francia?\"\n", - "\n", - "# Generar respuesta\n", - "respuesta = llm.generate(texto_entrada)\n", - "print(\"Respuesta generada:\", respuesta)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Paso 5: Ajuste de parámetros (opcional)\n", - "Puedes ajustar parámetros como la temperatura o el número máximo de tokens generados para personalizar el comportamiento del modelo." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Ajuste de parámetros\n", - "respuesta_ajustada = llm.generate(\n", - " texto_entrada,\n", - " temperature=0.7, # Controla la aleatoriedad\n", - " max_tokens=100 # Límite de tokens generados\n", - ")\n", - "print(\"Respuesta ajustada:\", respuesta_ajustada)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Paso 6: Conclusión\n", - "Has aprendido a instalar y usar vLLM para cargar un modelo local y realizar inferencias. Puedes explorar más características en la [documentación oficial de vLLM](https://vllm.org).\n", - "\n", - "¡Ahora estás listo para usar modelos de lenguaje eficientemente con vLLM!" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial: Uso de vLLM con un modelo local\n", + "En este tutorial aprenderás a utilizar **vLLM** para cargar un modelo descargado localmente y realizar inferencias.\n", + "\n", + "## ¿Qué es vLLM?\n", + "vLLM es un runtime de alto rendimiento para modelos de lenguaje grande (LLMs). Ofrece características avanzadas como el manejo eficiente de memoria y procesamiento paralelo.\n", + "\n", + "Este tutorial asumirá que ya tienes un modelo descargado localmente, por ejemplo, un modelo de Hugging Face." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Paso 1: Instalación de vLLM\n", + "Primero, necesitamos instalar la biblioteca **vLLM**. Puedes hacerlo con el siguiente comando." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Collecting vllm" + ] }, - "nbformat": 4, - "nbformat_minor": 4 - } - \ No newline at end of file + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "[notice] A new release of pip is available: 23.0.1 -> 24.3.1\n", + "[notice] To update, run: python.exe -m pip install --upgrade pip\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " Downloading vllm-0.6.4.post1.tar.gz (3.1 MB)\n", + " ---------------------------------------- 0.0/3.1 MB ? eta -:--:--\n", + " ---------------------------------- ----- 2.7/3.1 MB 56.6 MB/s eta 0:00:01\n", + " ---------------------------------------- 3.1/3.1 MB 49.2 MB/s eta 0:00:00\n", + " Installing build dependencies: started\n", + " Installing build dependencies: still running...\n", + " Installing build dependencies: finished with status 'done'\n", + " Getting requirements to build wheel: started\n", + " Getting requirements to build wheel: finished with status 'done'\n", + " Preparing metadata (pyproject.toml): started\n", + " Preparing metadata (pyproject.toml): finished with status 'done'\n", + "Collecting ray>=2.9\n", + " Downloading ray-2.40.0-cp310-cp310-win_amd64.whl (25.3 MB)\n", + " ---------------------------------------- 0.0/25.3 MB ? eta -:--:--\n", + " ---- ----------------------------------- 3.1/25.3 MB 66.7 MB/s eta 0:00:01\n", + " ---- ----------------------------------- 3.1/25.3 MB 66.7 MB/s eta 0:00:01\n", + " ------- -------------------------------- 4.8/25.3 MB 34.2 MB/s eta 0:00:01\n", + " ------------- -------------------------- 8.6/25.3 MB 45.9 MB/s eta 0:00:01\n", + " ------------------- ------------------- 12.4/25.3 MB 50.4 MB/s eta 0:00:01\n", + " ------------------------ -------------- 15.9/25.3 MB 81.8 MB/s eta 0:00:01\n", + " ------------------------------ -------- 20.0/25.3 MB 81.8 MB/s eta 0:00:01\n", + " ------------------------------------ -- 23.8/25.3 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n", + " -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n", + " -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n", + " -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n", + " -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n", + " -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n", + " -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n", + " --------------------------------------- 25.3/25.3 MB 21.8 MB/s eta 0:00:00\n", + "Collecting lm-format-enforcer<0.11,>=0.10.9\n", + " Downloading lm_format_enforcer-0.10.9-py3-none-any.whl (43 kB)\n", + " ---------------------------------------- 0.0/43.9 kB ? eta -:--:--\n", + " ---------------------------------------- 43.9/43.9 kB 2.2 MB/s eta 0:00:00\n", + "Collecting outlines<0.1,>=0.0.43\n", + " Downloading outlines-0.0.46-py3-none-any.whl (101 kB)\n", + " ---------------------------------------- 0.0/101.9 kB ? eta -:--:--\n", + " ---------------------------------------- 101.9/101.9 kB ? eta 0:00:00\n", + "Collecting aiohttp\n", + " Downloading aiohttp-3.11.10-cp310-cp310-win_amd64.whl (441 kB)\n", + " ---------------------------------------- 0.0/441.8 kB ? eta -:--:--\n", + " ------------------------------------- 441.8/441.8 kB 28.8 MB/s eta 0:00:00\n", + "Collecting tokenizers>=0.19.1\n", + " Downloading tokenizers-0.21.0-cp39-abi3-win_amd64.whl (2.4 MB)\n", + " ---------------------------------------- 0.0/2.4 MB ? eta -:--:--\n", + " --------------------------------------- 2.4/2.4 MB 76.6 MB/s eta 0:00:01\n", + " --------------------------------------- 2.4/2.4 MB 76.6 MB/s eta 0:00:01\n", + " ---------------------------------------- 2.4/2.4 MB 21.6 MB/s eta 0:00:00\n", + "Collecting py-cpuinfo\n", + " Downloading py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)\n", + "Collecting tqdm\n", + " Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)\n", + " ---------------------------------------- 0.0/78.5 kB ? eta -:--:--\n", + " ---------------------------------------- 78.5/78.5 kB 4.6 MB/s eta 0:00:00\n", + "Collecting protobuf\n", + " Downloading protobuf-5.29.1-cp310-abi3-win_amd64.whl (434 kB)\n", + " ---------------------------------------- 0.0/434.5 kB ? eta -:--:--\n", + " ------------------------------------- 434.5/434.5 kB 26.5 MB/s eta 0:00:00\n", + "Collecting uvicorn[standard]\n", + " Downloading uvicorn-0.32.1-py3-none-any.whl (63 kB)\n", + " ---------------------------------------- 0.0/63.8 kB ? eta -:--:--\n", + " ---------------------------------------- 63.8/63.8 kB ? eta 0:00:00\n", + "Collecting tiktoken>=0.6.0\n", + " Downloading tiktoken-0.8.0-cp310-cp310-win_amd64.whl (884 kB)\n", + " ---------------------------------------- 0.0/884.2 kB ? eta -:--:--\n", + " --------------------------------------- 880.6/884.2 kB ? eta -:--:--\n", + " ------------------------------------- 884.2/884.2 kB 54.6 MB/s eta 0:00:00\n", + "Collecting requests>=2.26.0\n", + " Downloading requests-2.32.3-py3-none-any.whl (64 kB)\n", + " ---------------------------------------- 0.0/64.9 kB ? eta -:--:--\n", + " ---------------------------------------- 64.9/64.9 kB 3.6 MB/s eta 0:00:00\n", + "Collecting torch==2.5.1\n", + " Using cached torch-2.5.1-cp310-cp310-win_amd64.whl (203.1 MB)\n", + "Collecting pyyaml\n", + " Downloading PyYAML-6.0.2-cp310-cp310-win_amd64.whl (161 kB)\n", + " ---------------------------------------- 0.0/161.8 kB ? eta -:--:--\n", + " ---------------------------------------- 161.8/161.8 kB ? eta 0:00:00\n", + "Collecting partial-json-parser\n", + " Downloading partial_json_parser-0.2.1.1.post4-py3-none-any.whl (9.9 kB)\n", + "Collecting einops\n", + " Downloading einops-0.8.0-py3-none-any.whl (43 kB)\n", + " ---------------------------------------- 0.0/43.2 kB ? eta -:--:--\n", + " ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n", + " ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n", + " ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n", + " ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n", + " ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n", + " -------------------------------------- 43.2/43.2 kB 131.9 kB/s eta 0:00:00\n", + "Collecting mistral_common[opencv]>=1.5.0\n", + " Downloading mistral_common-1.5.1-py3-none-any.whl (6.5 MB)\n", + " ---------------------------------------- 0.0/6.5 MB ? eta -:--:--\n", + " --------------- ------------------------ 2.6/6.5 MB 82.9 MB/s eta 0:00:01\n", + " ------------------- -------------------- 3.1/6.5 MB 66.7 MB/s eta 0:00:01\n", + " ---------------------- ----------------- 3.6/6.5 MB 29.1 MB/s eta 0:00:01\n", + " --------------------------------------- 6.5/6.5 MB 37.6 MB/s eta 0:00:01\n", + " --------------------------------------- 6.5/6.5 MB 37.6 MB/s eta 0:00:01\n", + " ---------------------------------------- 6.5/6.5 MB 27.6 MB/s eta 0:00:00\n", + "Collecting pydantic>=2.9\n", + " Downloading pydantic-2.10.3-py3-none-any.whl (456 kB)\n", + " ---------------------------------------- 0.0/457.0 kB ? eta -:--:--\n", + " ------------------------------------- 457.0/457.0 kB 29.8 MB/s eta 0:00:00\n", + "Collecting gguf==0.10.0\n", + " Downloading gguf-0.10.0-py3-none-any.whl (71 kB)\n", + " ---------------------------------------- 0.0/71.6 kB ? eta -:--:--\n", + " ---------------------------------------- 71.6/71.6 kB 3.8 MB/s eta 0:00:00\n", + "Requirement already satisfied: psutil in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from vllm) (6.1.0)\n", + "Collecting compressed-tensors==0.8.0\n", + " Downloading compressed_tensors-0.8.0-py3-none-any.whl (86 kB)\n", + " ---------------------------------------- 0.0/87.0 kB ? eta -:--:--\n", + " ---------------------------------------- 87.0/87.0 kB 5.1 MB/s eta 0:00:00\n", + "Collecting openai>=1.45.0\n", + " Downloading openai-1.57.2-py3-none-any.whl (389 kB)\n", + " ---------------------------------------- 0.0/389.9 kB ? eta -:--:--\n", + " ---------------------------------------- 389.9/389.9 kB ? eta 0:00:00\n", + "Requirement already satisfied: pyzmq in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from vllm) (26.2.0)\n", + "Collecting filelock>=3.10.4\n", + " Using cached filelock-3.16.1-py3-none-any.whl (16 kB)\n", + "Collecting prometheus_client>=0.18.0\n", + " Downloading prometheus_client-0.21.1-py3-none-any.whl (54 kB)\n", + " ---------------------------------------- 0.0/54.7 kB ? eta -:--:--\n", + " ---------------------------------------- 54.7/54.7 kB ? eta 0:00:00\n", + "Collecting msgspec\n", + " Downloading msgspec-0.18.6-cp310-cp310-win_amd64.whl (185 kB)\n", + " ---------------------------------------- 0.0/185.7 kB ? eta -:--:--\n", + " ------------------------------------- 185.7/185.7 kB 11.0 MB/s eta 0:00:00\n", + "Collecting numpy<2.0.0\n", + " Downloading numpy-1.26.4-cp310-cp310-win_amd64.whl (15.8 MB)\n", + " ---------------------------------------- 0.0/15.8 MB ? eta -:--:--\n", + " --------- ------------------------------ 3.6/15.8 MB 77.1 MB/s eta 0:00:01\n", + " ------------------ --------------------- 7.4/15.8 MB 79.5 MB/s eta 0:00:01\n", + " --------------------------- ----------- 11.0/15.8 MB 81.8 MB/s eta 0:00:01\n", + " ------------------------------------ -- 14.6/15.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n", + " --------------------------------------- 15.8/15.8 MB 23.4 MB/s eta 0:00:00\n", + "Collecting importlib_metadata\n", + " Downloading importlib_metadata-8.5.0-py3-none-any.whl (26 kB)\n", + "Collecting nvidia-ml-py>=12.560.30\n", + " Downloading nvidia_ml_py-12.560.30-py3-none-any.whl (40 kB)\n", + " ---------------------------------------- 0.0/40.5 kB ? eta -:--:--\n", + " ---------------------------------------- 40.5/40.5 kB ? eta 0:00:00\n", + "Collecting torchvision==0.20.1\n", + " Downloading torchvision-0.20.1-cp310-cp310-win_amd64.whl (1.6 MB)\n", + " ---------------------------------------- 0.0/1.6 MB ? eta -:--:--\n", + " ---------------------------------------- 1.6/1.6 MB 50.1 MB/s eta 0:00:00\n", + "Collecting fastapi!=0.113.*,!=0.114.0,>=0.107.0\n", + " Downloading fastapi-0.115.6-py3-none-any.whl (94 kB)\n", + " ---------------------------------------- 0.0/94.8 kB ? eta -:--:--\n", + " ---------------------------------------- 94.8/94.8 kB ? eta 0:00:00\n", + "Collecting sentencepiece\n", + " Downloading sentencepiece-0.2.0-cp310-cp310-win_amd64.whl (991 kB)\n", + " ---------------------------------------- 0.0/991.5 kB ? eta -:--:--\n", + " ------------------------------------- 991.5/991.5 kB 61.3 MB/s eta 0:00:00\n", + "Collecting pillow\n", + " Downloading pillow-11.0.0-cp310-cp310-win_amd64.whl (2.6 MB)\n", + " ---------------------------------------- 0.0/2.6 MB ? eta -:--:--\n", + " --------------------------------------- 2.6/2.6 MB 170.0 MB/s eta 0:00:01\n", + " ---------------------------------------- 2.6/2.6 MB 54.4 MB/s eta 0:00:00\n", + "Collecting prometheus-fastapi-instrumentator>=7.0.0\n", + " Downloading prometheus_fastapi_instrumentator-7.0.0-py3-none-any.whl (19 kB)\n", + "Requirement already satisfied: typing_extensions>=4.10 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from vllm) (4.12.2)\n", + "Collecting transformers>=4.45.2\n", + " Downloading transformers-4.47.0-py3-none-any.whl (10.1 MB)\n", + " ---------------------------------------- 0.0/10.1 MB ? eta -:--:--\n", + " --------------- ------------------------ 4.0/10.1 MB 84.3 MB/s eta 0:00:01\n", + " ------------------------------ --------- 7.8/10.1 MB 83.0 MB/s eta 0:00:01\n", + " -------------------------------------- 10.1/10.1 MB 80.9 MB/s eta 0:00:01\n", + " --------------------------------------- 10.1/10.1 MB 58.9 MB/s eta 0:00:00\n", + "Collecting jinja2\n", + " Using cached jinja2-3.1.4-py3-none-any.whl (133 kB)\n", + "Collecting sympy==1.13.1\n", + " Using cached sympy-1.13.1-py3-none-any.whl (6.2 MB)\n", + "Collecting fsspec\n", + " Using cached fsspec-2024.10.0-py3-none-any.whl (179 kB)\n", + "Collecting networkx\n", + " Using cached networkx-3.4.2-py3-none-any.whl (1.7 MB)\n", + "Collecting mpmath<1.4,>=1.1.0\n", + " Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)\n", + "Collecting starlette<0.42.0,>=0.40.0\n", + " Downloading starlette-0.41.3-py3-none-any.whl (73 kB)\n", + " ---------------------------------------- 0.0/73.2 kB ? eta -:--:--\n", + " ---------------------------------------- 73.2/73.2 kB ? eta 0:00:00\n", + "Requirement already satisfied: packaging in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from lm-format-enforcer<0.11,>=0.10.9->vllm) (24.2)\n", + "Collecting interegular>=0.3.2\n", + " Downloading interegular-0.3.3-py37-none-any.whl (23 kB)\n", + "Collecting jsonschema<5.0.0,>=4.21.1\n", + " Downloading jsonschema-4.23.0-py3-none-any.whl (88 kB)\n", + " ---------------------------------------- 0.0/88.5 kB ? eta -:--:--\n", + " ---------------------------------------- 88.5/88.5 kB ? eta 0:00:00\n", + "Collecting tiktoken>=0.6.0\n", + " Downloading tiktoken-0.7.0-cp310-cp310-win_amd64.whl (798 kB)\n", + " ---------------------------------------- 0.0/798.9 kB ? eta -:--:--\n", + " ------------------------------------- 798.9/798.9 kB 49.3 MB/s eta 0:00:00\n", + "Collecting pillow\n", + " Downloading pillow-10.4.0-cp310-cp310-win_amd64.whl (2.6 MB)\n", + " ---------------------------------------- 0.0/2.6 MB ? eta -:--:--\n", + " --------------------------------------- 2.5/2.6 MB 79.4 MB/s eta 0:00:01\n", + " ---------------------------------------- 2.6/2.6 MB 54.1 MB/s eta 0:00:00\n", + "Collecting opencv-python-headless<5.0.0,>=4.0.0\n", + " Downloading opencv_python_headless-4.10.0.84-cp37-abi3-win_amd64.whl (38.8 MB)\n", + " ---------------------------------------- 0.0/38.8 MB ? eta -:--:--\n", + " --- ------------------------------------ 3.8/38.8 MB 80.8 MB/s eta 0:00:01\n", + " ------- -------------------------------- 7.6/38.8 MB 81.0 MB/s eta 0:00:01\n", + " ----------- --------------------------- 11.5/38.8 MB 81.8 MB/s eta 0:00:01\n", + " --------------- ----------------------- 15.3/38.8 MB 81.8 MB/s eta 0:00:01\n", + " ------------------- ------------------- 19.2/38.8 MB 81.8 MB/s eta 0:00:01\n", + " ----------------------- --------------- 23.0/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------- ------------ 26.8/38.8 MB 81.8 MB/s eta 0:00:01\n", + " ------------------------------ -------- 30.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " ---------------------------------- ---- 34.4/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.3/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n", + " --------------------------------------- 38.8/38.8 MB 15.2 MB/s eta 0:00:00\n", + "Collecting anyio<5,>=3.5.0\n", + " Downloading anyio-4.7.0-py3-none-any.whl (93 kB)\n", + " ---------------------------------------- 0.0/93.1 kB ? eta -:--:--\n", + " ---------------------------------------- 93.1/93.1 kB 2.7 MB/s eta 0:00:00\n", + "Collecting sniffio\n", + " Downloading sniffio-1.3.1-py3-none-any.whl (10 kB)\n", + "Collecting httpx<1,>=0.23.0\n", + " Downloading httpx-0.28.1-py3-none-any.whl (73 kB)\n", + " ---------------------------------------- 0.0/73.5 kB ? eta -:--:--\n", + " ---------------------------------------- 73.5/73.5 kB ? eta 0:00:00\n", + "Collecting distro<2,>=1.7.0\n", + " Downloading distro-1.9.0-py3-none-any.whl (20 kB)\n", + "Collecting jiter<1,>=0.4.0\n", + " Downloading jiter-0.8.2-cp310-cp310-win_amd64.whl (204 kB)\n", + " ---------------------------------------- 0.0/204.4 kB ? eta -:--:--\n", + " ---------------------------------------- 204.4/204.4 kB ? eta 0:00:00\n", + "Collecting lark\n", + " Downloading lark-1.2.2-py3-none-any.whl (111 kB)\n", + " ---------------------------------------- 0.0/111.0 kB ? eta -:--:--\n", + " ---------------------------------------- 111.0/111.0 kB ? eta 0:00:00\n", + "Collecting diskcache\n", + " Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)\n", + " ---------------------------------------- 0.0/45.5 kB ? eta -:--:--\n", + " ---------------------------------------- 45.5/45.5 kB ? eta 0:00:00\n", + "Collecting referencing\n", + " Downloading referencing-0.35.1-py3-none-any.whl (26 kB)\n", + "Collecting datasets\n", + " Downloading datasets-3.2.0-py3-none-any.whl (480 kB)\n", + " ---------------------------------------- 0.0/480.6 kB ? eta -:--:--\n", + " ------------------------------------- 480.6/480.6 kB 31.4 MB/s eta 0:00:00\n", + "Collecting numba\n", + " Downloading numba-0.60.0-cp310-cp310-win_amd64.whl (2.7 MB)\n", + " ---------------------------------------- 0.0/2.7 MB ? eta -:--:--\n", + " --------------------------------------- 2.7/2.7 MB 86.2 MB/s eta 0:00:01\n", + " ---------------------------------------- 2.7/2.7 MB 43.2 MB/s eta 0:00:00\n", + "Collecting pycountry\n", + " Downloading pycountry-24.6.1-py3-none-any.whl (6.3 MB)\n", + " ---------------------------------------- 0.0/6.3 MB ? eta -:--:--\n", + " ------------------------- -------------- 4.0/6.3 MB 84.8 MB/s eta 0:00:01\n", + " --------------------------------------- 6.3/6.3 MB 81.0 MB/s eta 0:00:01\n", + " ---------------------------------------- 6.3/6.3 MB 58.0 MB/s eta 0:00:00\n", + "Collecting pyairports\n", + " Downloading pyairports-2.1.1-py3-none-any.whl (371 kB)\n", + " ---------------------------------------- 0.0/371.7 kB ? eta -:--:--\n", + " ---------------------------------------- 371.7/371.7 kB ? eta 0:00:00\n", + "Requirement already satisfied: nest-asyncio in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from outlines<0.1,>=0.0.43->vllm) (1.6.0)\n", + "Collecting cloudpickle\n", + " Downloading cloudpickle-3.1.0-py3-none-any.whl (22 kB)\n", + "Collecting pydantic-core==2.27.1\n", + " Downloading pydantic_core-2.27.1-cp310-none-win_amd64.whl (2.0 MB)\n", + " ---------------------------------------- 0.0/2.0 MB ? eta -:--:--\n", + " ---------------------------------------- 2.0/2.0 MB 42.8 MB/s eta 0:00:00\n", + "Collecting annotated-types>=0.6.0\n", + " Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)\n", + "Collecting msgpack<2.0.0,>=1.0.0\n", + " Downloading msgpack-1.1.0-cp310-cp310-win_amd64.whl (74 kB)\n", + " ---------------------------------------- 0.0/74.7 kB ? eta -:--:--\n", + " ---------------------------------------- 74.7/74.7 kB ? eta 0:00:00\n", + "Collecting frozenlist\n", + " Downloading frozenlist-1.5.0-cp310-cp310-win_amd64.whl (51 kB)\n", + " ---------------------------------------- 0.0/51.6 kB ? eta -:--:--\n", + " ---------------------------------------- 51.6/51.6 kB 2.8 MB/s eta 0:00:00\n", + "Collecting click>=7.0\n", + " Downloading click-8.1.7-py3-none-any.whl (97 kB)\n", + " ---------------------------------------- 0.0/97.9 kB ? eta -:--:--\n", + " ---------------------------------------- 97.9/97.9 kB ? eta 0:00:00\n", + "Collecting aiosignal\n", + " Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\n", + "Collecting urllib3<3,>=1.21.1\n", + " Downloading urllib3-2.2.3-py3-none-any.whl (126 kB)\n", + " ---------------------------------------- 0.0/126.3 kB ? eta -:--:--\n", + " -------------------------------------- 126.3/126.3 kB 7.3 MB/s eta 0:00:00\n", + "Collecting certifi>=2017.4.17\n", + " Downloading certifi-2024.8.30-py3-none-any.whl (167 kB)\n", + " ---------------------------------------- 0.0/167.3 kB ? eta -:--:--\n", + " ---------------------------------------- 167.3/167.3 kB ? eta 0:00:00\n", + "Collecting charset-normalizer<4,>=2\n", + " Downloading charset_normalizer-3.4.0-cp310-cp310-win_amd64.whl (102 kB)\n", + " ---------------------------------------- 0.0/102.2 kB ? eta -:--:--\n", + " -------------------------------------- 102.2/102.2 kB 6.1 MB/s eta 0:00:00\n", + "Collecting idna<4,>=2.5\n", + " Downloading idna-3.10-py3-none-any.whl (70 kB)\n", + " ---------------------------------------- 0.0/70.4 kB ? eta -:--:--\n", + " ---------------------------------------- 70.4/70.4 kB 3.8 MB/s eta 0:00:00\n", + "Collecting regex>=2022.1.18\n", + " Downloading regex-2024.11.6-cp310-cp310-win_amd64.whl (274 kB)\n", + " ---------------------------------------- 0.0/274.0 kB ? eta -:--:--\n", + " ------------------------------------- 274.0/274.0 kB 16.5 MB/s eta 0:00:00\n", + "Collecting huggingface-hub<1.0,>=0.16.4\n", + " Downloading huggingface_hub-0.26.5-py3-none-any.whl (447 kB)\n", + " ---------------------------------------- 0.0/447.8 kB ? eta -:--:--\n", + " ------------------------------------- 447.8/447.8 kB 29.2 MB/s eta 0:00:00\n", + "Requirement already satisfied: colorama in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from tqdm->vllm) (0.4.6)\n", + "Collecting safetensors>=0.4.1\n", + " Downloading safetensors-0.4.5-cp310-none-win_amd64.whl (285 kB)\n", + " ---------------------------------------- 0.0/285.9 kB ? eta -:--:--\n", + " ------------------------------------- 285.9/285.9 kB 18.4 MB/s eta 0:00:00\n", + "Collecting multidict<7.0,>=4.5\n", + " Downloading multidict-6.1.0-cp310-cp310-win_amd64.whl (28 kB)\n", + "Collecting yarl<2.0,>=1.17.0\n", + " Downloading yarl-1.18.3-cp310-cp310-win_amd64.whl (90 kB)\n", + " ---------------------------------------- 0.0/90.5 kB ? eta -:--:--\n", + " ---------------------------------------- 90.5/90.5 kB 5.0 MB/s eta 0:00:00\n", + "Collecting propcache>=0.2.0\n", + " Downloading propcache-0.2.1-cp310-cp310-win_amd64.whl (44 kB)\n", + " ---------------------------------------- 0.0/44.4 kB ? eta -:--:--\n", + " ---------------------------------------- 44.4/44.4 kB 2.1 MB/s eta 0:00:00\n", + "Collecting aiohappyeyeballs>=2.3.0\n", + " Downloading aiohappyeyeballs-2.4.4-py3-none-any.whl (14 kB)\n", + "Collecting async-timeout<6.0,>=4.0\n", + " Downloading async_timeout-5.0.1-py3-none-any.whl (6.2 kB)\n", + "Collecting attrs>=17.3.0\n", + " Downloading attrs-24.2.0-py3-none-any.whl (63 kB)\n", + " ---------------------------------------- 0.0/63.0 kB ? eta -:--:--\n", + " ---------------------------------------- 63.0/63.0 kB ? eta 0:00:00\n", + "Collecting zipp>=3.20\n", + " Downloading zipp-3.21.0-py3-none-any.whl (9.6 kB)\n", + "Collecting h11>=0.8\n", + " Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n", + " ---------------------------------------- 0.0/58.3 kB ? eta -:--:--\n", + " ---------------------------------------- 58.3/58.3 kB 3.0 MB/s eta 0:00:00\n", + "Collecting httptools>=0.6.3\n", + " Downloading httptools-0.6.4-cp310-cp310-win_amd64.whl (88 kB)\n", + " ---------------------------------------- 0.0/88.3 kB ? eta -:--:--\n", + " ---------------------------------------- 88.3/88.3 kB 4.9 MB/s eta 0:00:00\n", + "Collecting watchfiles>=0.13\n", + " Downloading watchfiles-1.0.1-cp310-cp310-win_amd64.whl (284 kB)\n", + " ---------------------------------------- 0.0/284.3 kB ? eta -:--:--\n", + " ---------------------------------------- 284.3/284.3 kB ? eta 0:00:00\n", + "Collecting python-dotenv>=0.13\n", + " Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n", + "Collecting websockets>=10.4\n", + " Downloading websockets-14.1-cp310-cp310-win_amd64.whl (163 kB)\n", + " ---------------------------------------- 0.0/163.3 kB ? eta -:--:--\n", + " ---------------------------------------- 163.3/163.3 kB ? eta 0:00:00\n", + "Requirement already satisfied: exceptiongroup>=1.0.2 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from anyio<5,>=3.5.0->openai>=1.45.0->vllm) (1.2.2)\n", + "Collecting httpcore==1.*\n", + " Downloading httpcore-1.0.7-py3-none-any.whl (78 kB)\n", + " ---------------------------------------- 0.0/78.6 kB ? eta -:--:--\n", + " ---------------------------------------- 78.6/78.6 kB ? eta 0:00:00\n", + "Collecting jsonschema-specifications>=2023.03.6\n", + " Downloading jsonschema_specifications-2024.10.1-py3-none-any.whl (18 kB)\n", + "Collecting rpds-py>=0.7.1\n", + " Downloading rpds_py-0.22.3-cp310-cp310-win_amd64.whl (231 kB)\n", + " ---------------------------------------- 0.0/231.7 kB ? eta -:--:--\n", + " ------------------------------------- 231.7/231.7 kB 14.8 MB/s eta 0:00:00\n", + "Collecting fsspec[http]<=2024.9.0,>=2023.1.0\n", + " Downloading fsspec-2024.9.0-py3-none-any.whl (179 kB)\n", + " ---------------------------------------- 0.0/179.3 kB ? eta -:--:--\n", + " ------------------------------------- 179.3/179.3 kB 11.3 MB/s eta 0:00:00\n", + "Collecting xxhash\n", + " Downloading xxhash-3.5.0-cp310-cp310-win_amd64.whl (30 kB)\n", + "Collecting pyarrow>=15.0.0\n", + " Downloading pyarrow-18.1.0-cp310-cp310-win_amd64.whl (25.1 MB)\n", + " ---------------------------------------- 0.0/25.1 MB ? eta -:--:--\n", + " ----- --------------------------------- 3.7/25.1 MB 119.9 MB/s eta 0:00:01\n", + " ----------- ---------------------------- 7.5/25.1 MB 95.7 MB/s eta 0:00:01\n", + " ----------------- --------------------- 11.2/25.1 MB 81.8 MB/s eta 0:00:01\n", + " ----------------------- --------------- 15.0/25.1 MB 81.8 MB/s eta 0:00:01\n", + " ----------------------------- --------- 18.9/25.1 MB 81.8 MB/s eta 0:00:01\n", + " ----------------------------------- --- 22.7/25.1 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 24.5/25.1 MB 72.6 MB/s eta 0:00:01\n", + " -------------------------------------- 25.1/25.1 MB 54.4 MB/s eta 0:00:01\n", + " -------------------------------------- 25.1/25.1 MB 54.4 MB/s eta 0:00:01\n", + " -------------------------------------- 25.1/25.1 MB 54.4 MB/s eta 0:00:01\n", + " --------------------------------------- 25.1/25.1 MB 32.7 MB/s eta 0:00:00\n", + "Collecting pandas\n", + " Downloading pandas-2.2.3-cp310-cp310-win_amd64.whl (11.6 MB)\n", + " ---------------------------------------- 0.0/11.6 MB ? eta -:--:--\n", + " -------------- ------------------------- 4.2/11.6 MB 90.4 MB/s eta 0:00:01\n", + " --------------------------- ------------ 8.0/11.6 MB 85.7 MB/s eta 0:00:01\n", + " -------------------------------------- 11.6/11.6 MB 81.8 MB/s eta 0:00:01\n", + " -------------------------------------- 11.6/11.6 MB 81.8 MB/s eta 0:00:01\n", + " --------------------------------------- 11.6/11.6 MB 54.4 MB/s eta 0:00:00\n", + "Collecting dill<0.3.9,>=0.3.0\n", + " Downloading dill-0.3.8-py3-none-any.whl (116 kB)\n", + " ---------------------------------------- 0.0/116.3 kB ? eta -:--:--\n", + " ---------------------------------------- 116.3/116.3 kB ? eta 0:00:00\n", + "Collecting multiprocess<0.70.17\n", + " Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)\n", + " ---------------------------------------- 0.0/134.8 kB ? eta -:--:--\n", + " ---------------------------------------- 134.8/134.8 kB ? eta 0:00:00\n", + "Collecting MarkupSafe>=2.0\n", + " Using cached MarkupSafe-3.0.2-cp310-cp310-win_amd64.whl (15 kB)\n", + "Collecting llvmlite<0.44,>=0.43.0dev0\n", + " Downloading llvmlite-0.43.0-cp310-cp310-win_amd64.whl (28.1 MB)\n", + " ---------------------------------------- 0.0/28.1 MB ? eta -:--:--\n", + " ----- --------------------------------- 3.7/28.1 MB 119.6 MB/s eta 0:00:01\n", + " ---------- ----------------------------- 7.6/28.1 MB 96.8 MB/s eta 0:00:01\n", + " ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n", + " ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n", + " ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n", + " ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n", + " ------------ --------------------------- 8.6/28.1 MB 27.6 MB/s eta 0:00:01\n", + " --------------------------------- ---- 24.8/28.1 MB 330.0 MB/s eta 0:00:01\n", + " ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n", + " ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n", + " ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n", + " ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n", + " --------------------------------------- 28.1/28.1 MB 46.7 MB/s eta 0:00:00\n", + "Collecting tzdata>=2022.7\n", + " Downloading tzdata-2024.2-py2.py3-none-any.whl (346 kB)\n", + " ---------------------------------------- 0.0/346.6 kB ? eta -:--:--\n", + " ---------------------------------------- 346.6/346.6 kB ? eta 0:00:00\n", + "Collecting pytz>=2020.1\n", + " Downloading pytz-2024.2-py2.py3-none-any.whl (508 kB)\n", + " ---------------------------------------- 0.0/508.0 kB ? eta -:--:--\n", + " ------------------------------------- 508.0/508.0 kB 31.1 MB/s eta 0:00:00\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from pandas->datasets->outlines<0.1,>=0.0.43->vllm) (2.9.0.post0)\n", + "Requirement already satisfied: six>=1.5 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from python-dateutil>=2.8.2->pandas->datasets->outlines<0.1,>=0.0.43->vllm) (1.17.0)\n", + "Building wheels for collected packages: vllm\n", + " Building wheel for vllm (pyproject.toml): started\n", + " Building wheel for vllm (pyproject.toml): finished with status 'done'\n", + " Created wheel for vllm: filename=vllm-0.6.4.post1-py3-none-any.whl size=1569302 sha256=beb78c80329b7cc624604a7f488de8ad3980fb2ede5ec19d2760688e46953955\n", + " Stored in directory: c:\\users\\tr4shhh\\appdata\\local\\pip\\cache\\wheels\\13\\ca\\7b\\d22b97dfa0a4deb1a6f7a26e495d459bc71b0b4bbf6daa1c06\n", + "Successfully built vllm\n", + "Installing collected packages: sentencepiece, pytz, pyairports, py-cpuinfo, nvidia-ml-py, mpmath, zipp, xxhash, websockets, urllib3, tzdata, tqdm, sympy, sniffio, safetensors, rpds-py, regex, pyyaml, python-dotenv, pydantic-core, pycountry, pyarrow, protobuf, propcache, prometheus_client, pillow, partial-json-parser, numpy, networkx, multidict, msgspec, msgpack, MarkupSafe, llvmlite, lark, jiter, interegular, idna, httptools, h11, fsspec, frozenlist, filelock, einops, distro, diskcache, dill, cloudpickle, click, charset-normalizer, certifi, attrs, async-timeout, annotated-types, aiohappyeyeballs, yarl, uvicorn, requests, referencing, pydantic, pandas, opencv-python-headless, numba, multiprocess, jinja2, importlib_metadata, httpcore, gguf, anyio, aiosignal, watchfiles, torch, tiktoken, starlette, lm-format-enforcer, jsonschema-specifications, huggingface-hub, httpx, aiohttp, torchvision, tokenizers, prometheus-fastapi-instrumentator, openai, jsonschema, fastapi, transformers, ray, mistral_common, datasets, outlines, compressed-tensors, vllm\n", + "Successfully installed MarkupSafe-3.0.2 aiohappyeyeballs-2.4.4 aiohttp-3.11.10 aiosignal-1.3.1 annotated-types-0.7.0 anyio-4.7.0 async-timeout-5.0.1 attrs-24.2.0 certifi-2024.8.30 charset-normalizer-3.4.0 click-8.1.7 cloudpickle-3.1.0 compressed-tensors-0.8.0 datasets-3.2.0 dill-0.3.8 diskcache-5.6.3 distro-1.9.0 einops-0.8.0 fastapi-0.115.6 filelock-3.16.1 frozenlist-1.5.0 fsspec-2024.9.0 gguf-0.10.0 h11-0.14.0 httpcore-1.0.7 httptools-0.6.4 httpx-0.28.1 huggingface-hub-0.26.5 idna-3.10 importlib_metadata-8.5.0 interegular-0.3.3 jinja2-3.1.4 jiter-0.8.2 jsonschema-4.23.0 jsonschema-specifications-2024.10.1 lark-1.2.2 llvmlite-0.43.0 lm-format-enforcer-0.10.9 mistral_common-1.5.1 mpmath-1.3.0 msgpack-1.1.0 msgspec-0.18.6 multidict-6.1.0 multiprocess-0.70.16 networkx-3.4.2 numba-0.60.0 numpy-1.26.4 nvidia-ml-py-12.560.30 openai-1.57.2 opencv-python-headless-4.10.0.84 outlines-0.0.46 pandas-2.2.3 partial-json-parser-0.2.1.1.post4 pillow-10.4.0 prometheus-fastapi-instrumentator-7.0.0 prometheus_client-0.21.1 propcache-0.2.1 protobuf-5.29.1 py-cpuinfo-9.0.0 pyairports-2.1.1 pyarrow-18.1.0 pycountry-24.6.1 pydantic-2.10.3 pydantic-core-2.27.1 python-dotenv-1.0.1 pytz-2024.2 pyyaml-6.0.2 ray-2.40.0 referencing-0.35.1 regex-2024.11.6 requests-2.32.3 rpds-py-0.22.3 safetensors-0.4.5 sentencepiece-0.2.0 sniffio-1.3.1 starlette-0.41.3 sympy-1.13.1 tiktoken-0.7.0 tokenizers-0.21.0 torch-2.5.1 torchvision-0.20.1 tqdm-4.67.1 transformers-4.47.0 tzdata-2024.2 urllib3-2.2.3 uvicorn-0.32.1 vllm-0.6.4.post1 watchfiles-1.0.1 websockets-14.1 xxhash-3.5.0 yarl-1.18.3 zipp-3.21.0\n" + ] + } + ], + "source": [ + "!pip install vllm" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Paso 2: Verificación de dependencias\n", + "Asegúrate de que las dependencias requeridas, como PyTorch, estén instaladas. Si no tienes PyTorch, instálalo con el comando apropiado para tu entorno." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Looking in indexes: https://download.pytorch.org/whl/cpu\n", + "Requirement already satisfied: torch in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (2.5.1)\n", + "Requirement already satisfied: torchvision in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (0.20.1)\n", + "Collecting torchaudio\n", + " Downloading https://download.pytorch.org/whl/cpu/torchaudio-2.5.1%2Bcpu-cp310-cp310-win_amd64.whl (2.4 MB)\n", + " ---------------------------------------- 0.0/2.4 MB ? eta -:--:--\n", + " ----- ---------------------------------- 0.3/2.4 MB 19.8 MB/s eta 0:00:01\n", + " ------------- -------------------------- 0.8/2.4 MB 12.4 MB/s eta 0:00:01\n", + " --------------------- ------------------ 1.3/2.4 MB 11.5 MB/s eta 0:00:01\n", + " ------------------------------ --------- 1.8/2.4 MB 11.5 MB/s eta 0:00:01\n", + " --------------------------------------- 2.4/2.4 MB 11.9 MB/s eta 0:00:01\n", + " ---------------------------------------- 2.4/2.4 MB 11.0 MB/s eta 0:00:00\n", + "Requirement already satisfied: jinja2 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (3.1.4)\n", + "Requirement already satisfied: filelock in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (3.16.1)\n", + "Requirement already satisfied: fsspec in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (2024.9.0)\n", + "Requirement already satisfied: networkx in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (3.4.2)\n", + "Requirement already satisfied: sympy==1.13.1 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (1.13.1)\n", + "Requirement already satisfied: typing-extensions>=4.8.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (4.12.2)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from sympy==1.13.1->torch) (1.3.0)\n", + "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torchvision) (10.4.0)\n", + "Requirement already satisfied: numpy in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torchvision) (1.26.4)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from jinja2->torch) (3.0.2)\n", + "Installing collected packages: torchaudio\n", + "Successfully installed torchaudio-2.5.1+cpu\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "[notice] A new release of pip is available: 23.0.1 -> 24.3.1\n", + "[notice] To update, run: python.exe -m pip install --upgrade pip\n" + ] + } + ], + "source": [ + "!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Paso 3: Carga del modelo local\n", + "Usaremos la clase `LLM` de vLLM para cargar el modelo desde una carpeta local. En este ejemplo, asumimos que el modelo está almacenado en un directorio llamado `./modelo_local`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from vllm import LLM\n", + "\n", + "# Ruta al modelo local\n", + "ruta_modelo = \"./modelo_local\"\n", + "\n", + "# Inicialización del modelo\n", + "llm = LLM(model=ruta_modelo)\n", + "print(\"Modelo cargado exitosamente.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Paso 4: Realización de inferencias\n", + "Con el modelo cargado, podemos realizar inferencias utilizando la función `generate`. Proporciona un texto de entrada y el modelo generará una respuesta." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Texto de entrada\n", + "texto_entrada = \"¿Cuál es la capital de Francia?\"\n", + "\n", + "# Generar respuesta\n", + "respuesta = llm.generate(texto_entrada)\n", + "print(\"Respuesta generada:\", respuesta)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Paso 5: Ajuste de parámetros (opcional)\n", + "Puedes ajustar parámetros como la temperatura o el número máximo de tokens generados para personalizar el comportamiento del modelo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Ajuste de parámetros\n", + "respuesta_ajustada = llm.generate(\n", + " texto_entrada,\n", + " temperature=0.7, # Controla la aleatoriedad\n", + " max_tokens=100 # Límite de tokens generados\n", + ")\n", + "print(\"Respuesta ajustada:\", respuesta_ajustada)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Paso 6: Conclusión\n", + "Has aprendido a instalar y usar vLLM para cargar un modelo local y realizar inferencias. Puedes explorar más características en la [documentación oficial de vLLM](https://docs.vllm.ai/en/latest/).\n", + "\n", + "¡Ahora estás listo para usar modelos de lenguaje eficientemente con vLLM!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}