Update tutorial on using vLLM with improved installation instructions and model compatibility details

This commit is contained in:
2024-12-10 21:03:46 +01:00
parent 13d014c24c
commit 8d1274dbec
2 changed files with 812 additions and 304 deletions
+153 -154
View File
@@ -1,155 +1,154 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial: Uso de Modelos LLM con vLLM\n",
"En este tutorial aprenderás:\n",
"- Qué es vLLM y sus ventajas.\n",
"- Los tipos de modelos compatibles con vLLM.\n",
"- Cómo descargar y usar modelos con vLLM para tareas como chat."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ¿Qué es vLLM?\n",
"**vLLM** es una biblioteca optimizada para ejecutar modelos LLM de manera eficiente, aprovechando técnicas avanzadas de gestión de memoria y paralelización. Está diseñada para manejar tareas como generación de texto en grandes modelos con un rendimiento mejorado.\n",
"\n",
"**Ventajas de vLLM:**\n",
"- Mayor eficiencia en el uso de GPU y memoria.\n",
"- Reducción en los tiempos de inferencia.\n",
"- Compatibilidad con modelos de Hugging Face y otras plataformas populares."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tipos de Modelos Compatibles con vLLM\n",
"Los modelos más comunes que se pueden usar con vLLM incluyen:\n",
"- **Modelos abiertos:** Por ejemplo, GPT-NeoX, OPT, LLaMA, y Falcon.\n",
"- **Modelos comerciales:** Modelos compatibles disponibles en Hugging Face o APIs comerciales como GPT-3 o GPT-4.\n",
"- **Modelos específicos:** Diseñados para tareas específicas como chat o generación de código (por ejemplo, CodeT5, StarCoder)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instalación de vLLM\n",
"Primero, necesitamos instalar la biblioteca vLLM. Esto se puede hacer con `pip`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install vllm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Descargar Modelos para vLLM\n",
"Usaremos un modelo abierto como LLaMA 2 o GPT-NeoX desde el hub de Hugging Face. La biblioteca vLLM puede cargar modelos directamente desde Hugging Face."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Importar las bibliotecas necesarias\n",
"from vllm import LLM, SamplingParams\n",
"\n",
"# Nombre del modelo (puedes cambiarlo según tu preferencia)\n",
"model_name = 'meta-llama/Llama-2-7b-chat-hf'\n",
"\n",
"# Inicializar el modelo\n",
"llm = LLM(model=model_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uso de vLLM para Chat\n",
"Podemos usar vLLM para interactuar con el modelo en aplicaciones de chat. Esto implica enviar un prompt y recibir una respuesta generada por el modelo."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Parámetros de entrada para la generación\n",
"prompt = \"Hola, ¿cómo puedo ayudarte hoy?\"\n",
"\n",
"# Parámetros de muestreo\n",
"sampling_params = SamplingParams(\n",
" temperature=0.7, # Controla la creatividad\n",
" max_tokens=100 # Número máximo de tokens generados\n",
")\n",
"\n",
"# Generar texto\n",
"outputs = llm.generate(prompt, sampling_params)\n",
"\n",
"# Mostrar la respuesta\n",
"print(outputs[0].text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Otros Modelos Compatibles\n",
"Además de LLaMA 2, puedes usar otros modelos como:\n",
"- **Falcon:** Un modelo optimizado para tareas de NLP, disponible en tamaños como 7B y 40B.\n",
"- **OPT (Meta):** Diseñado para tareas de lenguaje natural.\n",
"- **GPT-NeoX:** Modelo de EleutherAI diseñado como una alternativa a GPT-3.\n",
"\n",
"Estos modelos se pueden cargar de manera similar cambiando el nombre del modelo en el código."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusión\n",
"Este tutorial mostró cómo instalar y usar vLLM para ejecutar modelos LLM de manera eficiente. Los modelos como LLaMA, Falcon, y GPT-NeoX son compatibles y pueden ser utilizados para tareas como chat y generación de texto.\n",
"\n",
"Con vLLM, puedes aprovechar grandes modelos con un rendimiento optimizado, ideal para aplicaciones de inteligencia artificial."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial: Uso de Modelos LLM con vLLM\n",
"En este tutorial aprenderás:\n",
"- Qué es vLLM y sus ventajas.\n",
"- Los tipos de modelos compatibles con vLLM.\n",
"- Cómo descargar y usar modelos con vLLM para tareas como chat."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ¿Qué es vLLM?\n",
"**vLLM** es una biblioteca optimizada para ejecutar modelos LLM de manera eficiente, aprovechando técnicas avanzadas de gestión de memoria y paralelización. Está diseñada para manejar tareas como generación de texto en grandes modelos con un rendimiento mejorado.\n",
"\n",
"**Ventajas de vLLM:**\n",
"- Mayor eficiencia en el uso de GPU y memoria.\n",
"- Reducción en los tiempos de inferencia.\n",
"- Compatibilidad con modelos de Hugging Face y otras plataformas populares."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tipos de Modelos Compatibles con vLLM\n",
"Los modelos más comunes que se pueden usar con vLLM incluyen:\n",
"- **Modelos abiertos:** Por ejemplo, GPT-NeoX, OPT, LLaMA, y Falcon.\n",
"- **Modelos comerciales:** Modelos compatibles disponibles en Hugging Face o APIs comerciales como GPT-3 o GPT-4.\n",
"- **Modelos específicos:** Diseñados para tareas específicas como chat o generación de código (por ejemplo, CodeT5, StarCoder)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instalación de vLLM\n",
"Primero, necesitamos instalar la biblioteca vLLM. Esto se puede hacer con `pip`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install vllm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Descargar Modelos para vLLM\n",
"Usaremos un modelo abierto como LLaMA 2 o GPT-NeoX desde el hub de Hugging Face. La biblioteca vLLM puede cargar modelos directamente desde Hugging Face."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Importar las bibliotecas necesarias\n",
"from vllm import LLM, SamplingParams\n",
"\n",
"# Nombre del modelo (puedes cambiarlo según tu preferencia)\n",
"model_name = 'meta-llama/Llama-2-7b-chat-hf'\n",
"\n",
"# Inicializar el modelo\n",
"llm = LLM(model=model_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uso de vLLM para Chat\n",
"Podemos usar vLLM para interactuar con el modelo en aplicaciones de chat. Esto implica enviar un prompt y recibir una respuesta generada por el modelo."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Parámetros de entrada para la generación\n",
"prompt = \"Hola, ¿cómo puedo ayudarte hoy?\"\n",
"\n",
"# Parámetros de muestreo\n",
"sampling_params = SamplingParams(\n",
" temperature=0.7, # Controla la creatividad\n",
" max_tokens=100 # Número máximo de tokens generados\n",
")\n",
"\n",
"# Generar texto\n",
"outputs = llm.generate(prompt, sampling_params)\n",
"\n",
"# Mostrar la respuesta\n",
"print(outputs[0].text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Otros Modelos Compatibles\n",
"Además de LLaMA 2, puedes usar otros modelos como:\n",
"- **Falcon:** Un modelo optimizado para tareas de NLP, disponible en tamaños como 7B y 40B.\n",
"- **OPT (Meta):** Diseñado para tareas de lenguaje natural.\n",
"- **GPT-NeoX:** Modelo de EleutherAI diseñado como una alternativa a GPT-3.\n",
"\n",
"Estos modelos se pueden cargar de manera similar cambiando el nombre del modelo en el código."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusión\n",
"Este tutorial mostró cómo instalar y usar vLLM para ejecutar modelos LLM de manera eficiente. Los modelos como LLaMA, Falcon, y GPT-NeoX son compatibles y pueden ser utilizados para tareas como chat y generación de texto.\n",
"\n",
"Con vLLM, puedes aprovechar grandes modelos con un rendimiento optimizado, ideal para aplicaciones de inteligencia artificial."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
+659 -150
View File
@@ -1,152 +1,661 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial: Uso de vLLM con un modelo local\n",
"En este tutorial aprenderás a utilizar **vLLM** para cargar un modelo descargado localmente y realizar inferencias.\n",
"\n",
"## ¿Qué es vLLM?\n",
"vLLM es un runtime de alto rendimiento para modelos de lenguaje grande (LLMs). Ofrece características avanzadas como el manejo eficiente de memoria y procesamiento paralelo.\n",
"\n",
"Este tutorial asumirá que ya tienes un modelo descargado localmente, por ejemplo, un modelo de Hugging Face."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 1: Instalación de vLLM\n",
"Primero, necesitamos instalar la biblioteca **vLLM**. Puedes hacerlo con el siguiente comando."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install vllm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 2: Verificación de dependencias\n",
"Asegúrate de que las dependencias requeridas, como PyTorch, estén instaladas. Si no tienes PyTorch, instálalo con el comando apropiado para tu entorno."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 3: Carga del modelo local\n",
"Usaremos la clase `LLM` de vLLM para cargar el modelo desde una carpeta local. En este ejemplo, asumimos que el modelo está almacenado en un directorio llamado `./modelo_local`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from vllm import LLM\n",
"\n",
"# Ruta al modelo local\n",
"ruta_modelo = \"./modelo_local\"\n",
"\n",
"# Inicialización del modelo\n",
"llm = LLM(model=ruta_modelo)\n",
"print(\"Modelo cargado exitosamente.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 4: Realización de inferencias\n",
"Con el modelo cargado, podemos realizar inferencias utilizando la función `generate`. Proporciona un texto de entrada y el modelo generará una respuesta."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Texto de entrada\n",
"texto_entrada = \"¿Cuál es la capital de Francia?\"\n",
"\n",
"# Generar respuesta\n",
"respuesta = llm.generate(texto_entrada)\n",
"print(\"Respuesta generada:\", respuesta)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 5: Ajuste de parámetros (opcional)\n",
"Puedes ajustar parámetros como la temperatura o el número máximo de tokens generados para personalizar el comportamiento del modelo."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Ajuste de parámetros\n",
"respuesta_ajustada = llm.generate(\n",
" texto_entrada,\n",
" temperature=0.7, # Controla la aleatoriedad\n",
" max_tokens=100 # Límite de tokens generados\n",
")\n",
"print(\"Respuesta ajustada:\", respuesta_ajustada)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 6: Conclusión\n",
"Has aprendido a instalar y usar vLLM para cargar un modelo local y realizar inferencias. Puedes explorar más características en la [documentación oficial de vLLM](https://vllm.org).\n",
"\n",
"¡Ahora estás listo para usar modelos de lenguaje eficientemente con vLLM!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial: Uso de vLLM con un modelo local\n",
"En este tutorial aprenderás a utilizar **vLLM** para cargar un modelo descargado localmente y realizar inferencias.\n",
"\n",
"## ¿Qué es vLLM?\n",
"vLLM es un runtime de alto rendimiento para modelos de lenguaje grande (LLMs). Ofrece características avanzadas como el manejo eficiente de memoria y procesamiento paralelo.\n",
"\n",
"Este tutorial asumirá que ya tienes un modelo descargado localmente, por ejemplo, un modelo de Hugging Face."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 1: Instalación de vLLM\n",
"Primero, necesitamos instalar la biblioteca **vLLM**. Puedes hacerlo con el siguiente comando."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting vllm"
]
},
"nbformat": 4,
"nbformat_minor": 4
}
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"[notice] A new release of pip is available: 23.0.1 -> 24.3.1\n",
"[notice] To update, run: python.exe -m pip install --upgrade pip\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" Downloading vllm-0.6.4.post1.tar.gz (3.1 MB)\n",
" ---------------------------------------- 0.0/3.1 MB ? eta -:--:--\n",
" ---------------------------------- ----- 2.7/3.1 MB 56.6 MB/s eta 0:00:01\n",
" ---------------------------------------- 3.1/3.1 MB 49.2 MB/s eta 0:00:00\n",
" Installing build dependencies: started\n",
" Installing build dependencies: still running...\n",
" Installing build dependencies: finished with status 'done'\n",
" Getting requirements to build wheel: started\n",
" Getting requirements to build wheel: finished with status 'done'\n",
" Preparing metadata (pyproject.toml): started\n",
" Preparing metadata (pyproject.toml): finished with status 'done'\n",
"Collecting ray>=2.9\n",
" Downloading ray-2.40.0-cp310-cp310-win_amd64.whl (25.3 MB)\n",
" ---------------------------------------- 0.0/25.3 MB ? eta -:--:--\n",
" ---- ----------------------------------- 3.1/25.3 MB 66.7 MB/s eta 0:00:01\n",
" ---- ----------------------------------- 3.1/25.3 MB 66.7 MB/s eta 0:00:01\n",
" ------- -------------------------------- 4.8/25.3 MB 34.2 MB/s eta 0:00:01\n",
" ------------- -------------------------- 8.6/25.3 MB 45.9 MB/s eta 0:00:01\n",
" ------------------- ------------------- 12.4/25.3 MB 50.4 MB/s eta 0:00:01\n",
" ------------------------ -------------- 15.9/25.3 MB 81.8 MB/s eta 0:00:01\n",
" ------------------------------ -------- 20.0/25.3 MB 81.8 MB/s eta 0:00:01\n",
" ------------------------------------ -- 23.8/25.3 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" -------------------------------------- 25.3/25.3 MB 93.9 MB/s eta 0:00:01\n",
" --------------------------------------- 25.3/25.3 MB 21.8 MB/s eta 0:00:00\n",
"Collecting lm-format-enforcer<0.11,>=0.10.9\n",
" Downloading lm_format_enforcer-0.10.9-py3-none-any.whl (43 kB)\n",
" ---------------------------------------- 0.0/43.9 kB ? eta -:--:--\n",
" ---------------------------------------- 43.9/43.9 kB 2.2 MB/s eta 0:00:00\n",
"Collecting outlines<0.1,>=0.0.43\n",
" Downloading outlines-0.0.46-py3-none-any.whl (101 kB)\n",
" ---------------------------------------- 0.0/101.9 kB ? eta -:--:--\n",
" ---------------------------------------- 101.9/101.9 kB ? eta 0:00:00\n",
"Collecting aiohttp\n",
" Downloading aiohttp-3.11.10-cp310-cp310-win_amd64.whl (441 kB)\n",
" ---------------------------------------- 0.0/441.8 kB ? eta -:--:--\n",
" ------------------------------------- 441.8/441.8 kB 28.8 MB/s eta 0:00:00\n",
"Collecting tokenizers>=0.19.1\n",
" Downloading tokenizers-0.21.0-cp39-abi3-win_amd64.whl (2.4 MB)\n",
" ---------------------------------------- 0.0/2.4 MB ? eta -:--:--\n",
" --------------------------------------- 2.4/2.4 MB 76.6 MB/s eta 0:00:01\n",
" --------------------------------------- 2.4/2.4 MB 76.6 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.4/2.4 MB 21.6 MB/s eta 0:00:00\n",
"Collecting py-cpuinfo\n",
" Downloading py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)\n",
"Collecting tqdm\n",
" Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)\n",
" ---------------------------------------- 0.0/78.5 kB ? eta -:--:--\n",
" ---------------------------------------- 78.5/78.5 kB 4.6 MB/s eta 0:00:00\n",
"Collecting protobuf\n",
" Downloading protobuf-5.29.1-cp310-abi3-win_amd64.whl (434 kB)\n",
" ---------------------------------------- 0.0/434.5 kB ? eta -:--:--\n",
" ------------------------------------- 434.5/434.5 kB 26.5 MB/s eta 0:00:00\n",
"Collecting uvicorn[standard]\n",
" Downloading uvicorn-0.32.1-py3-none-any.whl (63 kB)\n",
" ---------------------------------------- 0.0/63.8 kB ? eta -:--:--\n",
" ---------------------------------------- 63.8/63.8 kB ? eta 0:00:00\n",
"Collecting tiktoken>=0.6.0\n",
" Downloading tiktoken-0.8.0-cp310-cp310-win_amd64.whl (884 kB)\n",
" ---------------------------------------- 0.0/884.2 kB ? eta -:--:--\n",
" --------------------------------------- 880.6/884.2 kB ? eta -:--:--\n",
" ------------------------------------- 884.2/884.2 kB 54.6 MB/s eta 0:00:00\n",
"Collecting requests>=2.26.0\n",
" Downloading requests-2.32.3-py3-none-any.whl (64 kB)\n",
" ---------------------------------------- 0.0/64.9 kB ? eta -:--:--\n",
" ---------------------------------------- 64.9/64.9 kB 3.6 MB/s eta 0:00:00\n",
"Collecting torch==2.5.1\n",
" Using cached torch-2.5.1-cp310-cp310-win_amd64.whl (203.1 MB)\n",
"Collecting pyyaml\n",
" Downloading PyYAML-6.0.2-cp310-cp310-win_amd64.whl (161 kB)\n",
" ---------------------------------------- 0.0/161.8 kB ? eta -:--:--\n",
" ---------------------------------------- 161.8/161.8 kB ? eta 0:00:00\n",
"Collecting partial-json-parser\n",
" Downloading partial_json_parser-0.2.1.1.post4-py3-none-any.whl (9.9 kB)\n",
"Collecting einops\n",
" Downloading einops-0.8.0-py3-none-any.whl (43 kB)\n",
" ---------------------------------------- 0.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" ------------------------------------- -- 41.0/43.2 kB ? eta -:--:--\n",
" -------------------------------------- 43.2/43.2 kB 131.9 kB/s eta 0:00:00\n",
"Collecting mistral_common[opencv]>=1.5.0\n",
" Downloading mistral_common-1.5.1-py3-none-any.whl (6.5 MB)\n",
" ---------------------------------------- 0.0/6.5 MB ? eta -:--:--\n",
" --------------- ------------------------ 2.6/6.5 MB 82.9 MB/s eta 0:00:01\n",
" ------------------- -------------------- 3.1/6.5 MB 66.7 MB/s eta 0:00:01\n",
" ---------------------- ----------------- 3.6/6.5 MB 29.1 MB/s eta 0:00:01\n",
" --------------------------------------- 6.5/6.5 MB 37.6 MB/s eta 0:00:01\n",
" --------------------------------------- 6.5/6.5 MB 37.6 MB/s eta 0:00:01\n",
" ---------------------------------------- 6.5/6.5 MB 27.6 MB/s eta 0:00:00\n",
"Collecting pydantic>=2.9\n",
" Downloading pydantic-2.10.3-py3-none-any.whl (456 kB)\n",
" ---------------------------------------- 0.0/457.0 kB ? eta -:--:--\n",
" ------------------------------------- 457.0/457.0 kB 29.8 MB/s eta 0:00:00\n",
"Collecting gguf==0.10.0\n",
" Downloading gguf-0.10.0-py3-none-any.whl (71 kB)\n",
" ---------------------------------------- 0.0/71.6 kB ? eta -:--:--\n",
" ---------------------------------------- 71.6/71.6 kB 3.8 MB/s eta 0:00:00\n",
"Requirement already satisfied: psutil in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from vllm) (6.1.0)\n",
"Collecting compressed-tensors==0.8.0\n",
" Downloading compressed_tensors-0.8.0-py3-none-any.whl (86 kB)\n",
" ---------------------------------------- 0.0/87.0 kB ? eta -:--:--\n",
" ---------------------------------------- 87.0/87.0 kB 5.1 MB/s eta 0:00:00\n",
"Collecting openai>=1.45.0\n",
" Downloading openai-1.57.2-py3-none-any.whl (389 kB)\n",
" ---------------------------------------- 0.0/389.9 kB ? eta -:--:--\n",
" ---------------------------------------- 389.9/389.9 kB ? eta 0:00:00\n",
"Requirement already satisfied: pyzmq in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from vllm) (26.2.0)\n",
"Collecting filelock>=3.10.4\n",
" Using cached filelock-3.16.1-py3-none-any.whl (16 kB)\n",
"Collecting prometheus_client>=0.18.0\n",
" Downloading prometheus_client-0.21.1-py3-none-any.whl (54 kB)\n",
" ---------------------------------------- 0.0/54.7 kB ? eta -:--:--\n",
" ---------------------------------------- 54.7/54.7 kB ? eta 0:00:00\n",
"Collecting msgspec\n",
" Downloading msgspec-0.18.6-cp310-cp310-win_amd64.whl (185 kB)\n",
" ---------------------------------------- 0.0/185.7 kB ? eta -:--:--\n",
" ------------------------------------- 185.7/185.7 kB 11.0 MB/s eta 0:00:00\n",
"Collecting numpy<2.0.0\n",
" Downloading numpy-1.26.4-cp310-cp310-win_amd64.whl (15.8 MB)\n",
" ---------------------------------------- 0.0/15.8 MB ? eta -:--:--\n",
" --------- ------------------------------ 3.6/15.8 MB 77.1 MB/s eta 0:00:01\n",
" ------------------ --------------------- 7.4/15.8 MB 79.5 MB/s eta 0:00:01\n",
" --------------------------- ----------- 11.0/15.8 MB 81.8 MB/s eta 0:00:01\n",
" ------------------------------------ -- 14.6/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 15.8/15.8 MB 81.8 MB/s eta 0:00:01\n",
" --------------------------------------- 15.8/15.8 MB 23.4 MB/s eta 0:00:00\n",
"Collecting importlib_metadata\n",
" Downloading importlib_metadata-8.5.0-py3-none-any.whl (26 kB)\n",
"Collecting nvidia-ml-py>=12.560.30\n",
" Downloading nvidia_ml_py-12.560.30-py3-none-any.whl (40 kB)\n",
" ---------------------------------------- 0.0/40.5 kB ? eta -:--:--\n",
" ---------------------------------------- 40.5/40.5 kB ? eta 0:00:00\n",
"Collecting torchvision==0.20.1\n",
" Downloading torchvision-0.20.1-cp310-cp310-win_amd64.whl (1.6 MB)\n",
" ---------------------------------------- 0.0/1.6 MB ? eta -:--:--\n",
" ---------------------------------------- 1.6/1.6 MB 50.1 MB/s eta 0:00:00\n",
"Collecting fastapi!=0.113.*,!=0.114.0,>=0.107.0\n",
" Downloading fastapi-0.115.6-py3-none-any.whl (94 kB)\n",
" ---------------------------------------- 0.0/94.8 kB ? eta -:--:--\n",
" ---------------------------------------- 94.8/94.8 kB ? eta 0:00:00\n",
"Collecting sentencepiece\n",
" Downloading sentencepiece-0.2.0-cp310-cp310-win_amd64.whl (991 kB)\n",
" ---------------------------------------- 0.0/991.5 kB ? eta -:--:--\n",
" ------------------------------------- 991.5/991.5 kB 61.3 MB/s eta 0:00:00\n",
"Collecting pillow\n",
" Downloading pillow-11.0.0-cp310-cp310-win_amd64.whl (2.6 MB)\n",
" ---------------------------------------- 0.0/2.6 MB ? eta -:--:--\n",
" --------------------------------------- 2.6/2.6 MB 170.0 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.6/2.6 MB 54.4 MB/s eta 0:00:00\n",
"Collecting prometheus-fastapi-instrumentator>=7.0.0\n",
" Downloading prometheus_fastapi_instrumentator-7.0.0-py3-none-any.whl (19 kB)\n",
"Requirement already satisfied: typing_extensions>=4.10 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from vllm) (4.12.2)\n",
"Collecting transformers>=4.45.2\n",
" Downloading transformers-4.47.0-py3-none-any.whl (10.1 MB)\n",
" ---------------------------------------- 0.0/10.1 MB ? eta -:--:--\n",
" --------------- ------------------------ 4.0/10.1 MB 84.3 MB/s eta 0:00:01\n",
" ------------------------------ --------- 7.8/10.1 MB 83.0 MB/s eta 0:00:01\n",
" -------------------------------------- 10.1/10.1 MB 80.9 MB/s eta 0:00:01\n",
" --------------------------------------- 10.1/10.1 MB 58.9 MB/s eta 0:00:00\n",
"Collecting jinja2\n",
" Using cached jinja2-3.1.4-py3-none-any.whl (133 kB)\n",
"Collecting sympy==1.13.1\n",
" Using cached sympy-1.13.1-py3-none-any.whl (6.2 MB)\n",
"Collecting fsspec\n",
" Using cached fsspec-2024.10.0-py3-none-any.whl (179 kB)\n",
"Collecting networkx\n",
" Using cached networkx-3.4.2-py3-none-any.whl (1.7 MB)\n",
"Collecting mpmath<1.4,>=1.1.0\n",
" Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)\n",
"Collecting starlette<0.42.0,>=0.40.0\n",
" Downloading starlette-0.41.3-py3-none-any.whl (73 kB)\n",
" ---------------------------------------- 0.0/73.2 kB ? eta -:--:--\n",
" ---------------------------------------- 73.2/73.2 kB ? eta 0:00:00\n",
"Requirement already satisfied: packaging in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from lm-format-enforcer<0.11,>=0.10.9->vllm) (24.2)\n",
"Collecting interegular>=0.3.2\n",
" Downloading interegular-0.3.3-py37-none-any.whl (23 kB)\n",
"Collecting jsonschema<5.0.0,>=4.21.1\n",
" Downloading jsonschema-4.23.0-py3-none-any.whl (88 kB)\n",
" ---------------------------------------- 0.0/88.5 kB ? eta -:--:--\n",
" ---------------------------------------- 88.5/88.5 kB ? eta 0:00:00\n",
"Collecting tiktoken>=0.6.0\n",
" Downloading tiktoken-0.7.0-cp310-cp310-win_amd64.whl (798 kB)\n",
" ---------------------------------------- 0.0/798.9 kB ? eta -:--:--\n",
" ------------------------------------- 798.9/798.9 kB 49.3 MB/s eta 0:00:00\n",
"Collecting pillow\n",
" Downloading pillow-10.4.0-cp310-cp310-win_amd64.whl (2.6 MB)\n",
" ---------------------------------------- 0.0/2.6 MB ? eta -:--:--\n",
" --------------------------------------- 2.5/2.6 MB 79.4 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.6/2.6 MB 54.1 MB/s eta 0:00:00\n",
"Collecting opencv-python-headless<5.0.0,>=4.0.0\n",
" Downloading opencv_python_headless-4.10.0.84-cp37-abi3-win_amd64.whl (38.8 MB)\n",
" ---------------------------------------- 0.0/38.8 MB ? eta -:--:--\n",
" --- ------------------------------------ 3.8/38.8 MB 80.8 MB/s eta 0:00:01\n",
" ------- -------------------------------- 7.6/38.8 MB 81.0 MB/s eta 0:00:01\n",
" ----------- --------------------------- 11.5/38.8 MB 81.8 MB/s eta 0:00:01\n",
" --------------- ----------------------- 15.3/38.8 MB 81.8 MB/s eta 0:00:01\n",
" ------------------- ------------------- 19.2/38.8 MB 81.8 MB/s eta 0:00:01\n",
" ----------------------- --------------- 23.0/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------- ------------ 26.8/38.8 MB 81.8 MB/s eta 0:00:01\n",
" ------------------------------ -------- 30.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" ---------------------------------- ---- 34.4/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.3/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 38.7/38.8 MB 81.8 MB/s eta 0:00:01\n",
" --------------------------------------- 38.8/38.8 MB 15.2 MB/s eta 0:00:00\n",
"Collecting anyio<5,>=3.5.0\n",
" Downloading anyio-4.7.0-py3-none-any.whl (93 kB)\n",
" ---------------------------------------- 0.0/93.1 kB ? eta -:--:--\n",
" ---------------------------------------- 93.1/93.1 kB 2.7 MB/s eta 0:00:00\n",
"Collecting sniffio\n",
" Downloading sniffio-1.3.1-py3-none-any.whl (10 kB)\n",
"Collecting httpx<1,>=0.23.0\n",
" Downloading httpx-0.28.1-py3-none-any.whl (73 kB)\n",
" ---------------------------------------- 0.0/73.5 kB ? eta -:--:--\n",
" ---------------------------------------- 73.5/73.5 kB ? eta 0:00:00\n",
"Collecting distro<2,>=1.7.0\n",
" Downloading distro-1.9.0-py3-none-any.whl (20 kB)\n",
"Collecting jiter<1,>=0.4.0\n",
" Downloading jiter-0.8.2-cp310-cp310-win_amd64.whl (204 kB)\n",
" ---------------------------------------- 0.0/204.4 kB ? eta -:--:--\n",
" ---------------------------------------- 204.4/204.4 kB ? eta 0:00:00\n",
"Collecting lark\n",
" Downloading lark-1.2.2-py3-none-any.whl (111 kB)\n",
" ---------------------------------------- 0.0/111.0 kB ? eta -:--:--\n",
" ---------------------------------------- 111.0/111.0 kB ? eta 0:00:00\n",
"Collecting diskcache\n",
" Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)\n",
" ---------------------------------------- 0.0/45.5 kB ? eta -:--:--\n",
" ---------------------------------------- 45.5/45.5 kB ? eta 0:00:00\n",
"Collecting referencing\n",
" Downloading referencing-0.35.1-py3-none-any.whl (26 kB)\n",
"Collecting datasets\n",
" Downloading datasets-3.2.0-py3-none-any.whl (480 kB)\n",
" ---------------------------------------- 0.0/480.6 kB ? eta -:--:--\n",
" ------------------------------------- 480.6/480.6 kB 31.4 MB/s eta 0:00:00\n",
"Collecting numba\n",
" Downloading numba-0.60.0-cp310-cp310-win_amd64.whl (2.7 MB)\n",
" ---------------------------------------- 0.0/2.7 MB ? eta -:--:--\n",
" --------------------------------------- 2.7/2.7 MB 86.2 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.7/2.7 MB 43.2 MB/s eta 0:00:00\n",
"Collecting pycountry\n",
" Downloading pycountry-24.6.1-py3-none-any.whl (6.3 MB)\n",
" ---------------------------------------- 0.0/6.3 MB ? eta -:--:--\n",
" ------------------------- -------------- 4.0/6.3 MB 84.8 MB/s eta 0:00:01\n",
" --------------------------------------- 6.3/6.3 MB 81.0 MB/s eta 0:00:01\n",
" ---------------------------------------- 6.3/6.3 MB 58.0 MB/s eta 0:00:00\n",
"Collecting pyairports\n",
" Downloading pyairports-2.1.1-py3-none-any.whl (371 kB)\n",
" ---------------------------------------- 0.0/371.7 kB ? eta -:--:--\n",
" ---------------------------------------- 371.7/371.7 kB ? eta 0:00:00\n",
"Requirement already satisfied: nest-asyncio in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from outlines<0.1,>=0.0.43->vllm) (1.6.0)\n",
"Collecting cloudpickle\n",
" Downloading cloudpickle-3.1.0-py3-none-any.whl (22 kB)\n",
"Collecting pydantic-core==2.27.1\n",
" Downloading pydantic_core-2.27.1-cp310-none-win_amd64.whl (2.0 MB)\n",
" ---------------------------------------- 0.0/2.0 MB ? eta -:--:--\n",
" ---------------------------------------- 2.0/2.0 MB 42.8 MB/s eta 0:00:00\n",
"Collecting annotated-types>=0.6.0\n",
" Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB)\n",
"Collecting msgpack<2.0.0,>=1.0.0\n",
" Downloading msgpack-1.1.0-cp310-cp310-win_amd64.whl (74 kB)\n",
" ---------------------------------------- 0.0/74.7 kB ? eta -:--:--\n",
" ---------------------------------------- 74.7/74.7 kB ? eta 0:00:00\n",
"Collecting frozenlist\n",
" Downloading frozenlist-1.5.0-cp310-cp310-win_amd64.whl (51 kB)\n",
" ---------------------------------------- 0.0/51.6 kB ? eta -:--:--\n",
" ---------------------------------------- 51.6/51.6 kB 2.8 MB/s eta 0:00:00\n",
"Collecting click>=7.0\n",
" Downloading click-8.1.7-py3-none-any.whl (97 kB)\n",
" ---------------------------------------- 0.0/97.9 kB ? eta -:--:--\n",
" ---------------------------------------- 97.9/97.9 kB ? eta 0:00:00\n",
"Collecting aiosignal\n",
" Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\n",
"Collecting urllib3<3,>=1.21.1\n",
" Downloading urllib3-2.2.3-py3-none-any.whl (126 kB)\n",
" ---------------------------------------- 0.0/126.3 kB ? eta -:--:--\n",
" -------------------------------------- 126.3/126.3 kB 7.3 MB/s eta 0:00:00\n",
"Collecting certifi>=2017.4.17\n",
" Downloading certifi-2024.8.30-py3-none-any.whl (167 kB)\n",
" ---------------------------------------- 0.0/167.3 kB ? eta -:--:--\n",
" ---------------------------------------- 167.3/167.3 kB ? eta 0:00:00\n",
"Collecting charset-normalizer<4,>=2\n",
" Downloading charset_normalizer-3.4.0-cp310-cp310-win_amd64.whl (102 kB)\n",
" ---------------------------------------- 0.0/102.2 kB ? eta -:--:--\n",
" -------------------------------------- 102.2/102.2 kB 6.1 MB/s eta 0:00:00\n",
"Collecting idna<4,>=2.5\n",
" Downloading idna-3.10-py3-none-any.whl (70 kB)\n",
" ---------------------------------------- 0.0/70.4 kB ? eta -:--:--\n",
" ---------------------------------------- 70.4/70.4 kB 3.8 MB/s eta 0:00:00\n",
"Collecting regex>=2022.1.18\n",
" Downloading regex-2024.11.6-cp310-cp310-win_amd64.whl (274 kB)\n",
" ---------------------------------------- 0.0/274.0 kB ? eta -:--:--\n",
" ------------------------------------- 274.0/274.0 kB 16.5 MB/s eta 0:00:00\n",
"Collecting huggingface-hub<1.0,>=0.16.4\n",
" Downloading huggingface_hub-0.26.5-py3-none-any.whl (447 kB)\n",
" ---------------------------------------- 0.0/447.8 kB ? eta -:--:--\n",
" ------------------------------------- 447.8/447.8 kB 29.2 MB/s eta 0:00:00\n",
"Requirement already satisfied: colorama in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from tqdm->vllm) (0.4.6)\n",
"Collecting safetensors>=0.4.1\n",
" Downloading safetensors-0.4.5-cp310-none-win_amd64.whl (285 kB)\n",
" ---------------------------------------- 0.0/285.9 kB ? eta -:--:--\n",
" ------------------------------------- 285.9/285.9 kB 18.4 MB/s eta 0:00:00\n",
"Collecting multidict<7.0,>=4.5\n",
" Downloading multidict-6.1.0-cp310-cp310-win_amd64.whl (28 kB)\n",
"Collecting yarl<2.0,>=1.17.0\n",
" Downloading yarl-1.18.3-cp310-cp310-win_amd64.whl (90 kB)\n",
" ---------------------------------------- 0.0/90.5 kB ? eta -:--:--\n",
" ---------------------------------------- 90.5/90.5 kB 5.0 MB/s eta 0:00:00\n",
"Collecting propcache>=0.2.0\n",
" Downloading propcache-0.2.1-cp310-cp310-win_amd64.whl (44 kB)\n",
" ---------------------------------------- 0.0/44.4 kB ? eta -:--:--\n",
" ---------------------------------------- 44.4/44.4 kB 2.1 MB/s eta 0:00:00\n",
"Collecting aiohappyeyeballs>=2.3.0\n",
" Downloading aiohappyeyeballs-2.4.4-py3-none-any.whl (14 kB)\n",
"Collecting async-timeout<6.0,>=4.0\n",
" Downloading async_timeout-5.0.1-py3-none-any.whl (6.2 kB)\n",
"Collecting attrs>=17.3.0\n",
" Downloading attrs-24.2.0-py3-none-any.whl (63 kB)\n",
" ---------------------------------------- 0.0/63.0 kB ? eta -:--:--\n",
" ---------------------------------------- 63.0/63.0 kB ? eta 0:00:00\n",
"Collecting zipp>=3.20\n",
" Downloading zipp-3.21.0-py3-none-any.whl (9.6 kB)\n",
"Collecting h11>=0.8\n",
" Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
" ---------------------------------------- 0.0/58.3 kB ? eta -:--:--\n",
" ---------------------------------------- 58.3/58.3 kB 3.0 MB/s eta 0:00:00\n",
"Collecting httptools>=0.6.3\n",
" Downloading httptools-0.6.4-cp310-cp310-win_amd64.whl (88 kB)\n",
" ---------------------------------------- 0.0/88.3 kB ? eta -:--:--\n",
" ---------------------------------------- 88.3/88.3 kB 4.9 MB/s eta 0:00:00\n",
"Collecting watchfiles>=0.13\n",
" Downloading watchfiles-1.0.1-cp310-cp310-win_amd64.whl (284 kB)\n",
" ---------------------------------------- 0.0/284.3 kB ? eta -:--:--\n",
" ---------------------------------------- 284.3/284.3 kB ? eta 0:00:00\n",
"Collecting python-dotenv>=0.13\n",
" Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n",
"Collecting websockets>=10.4\n",
" Downloading websockets-14.1-cp310-cp310-win_amd64.whl (163 kB)\n",
" ---------------------------------------- 0.0/163.3 kB ? eta -:--:--\n",
" ---------------------------------------- 163.3/163.3 kB ? eta 0:00:00\n",
"Requirement already satisfied: exceptiongroup>=1.0.2 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from anyio<5,>=3.5.0->openai>=1.45.0->vllm) (1.2.2)\n",
"Collecting httpcore==1.*\n",
" Downloading httpcore-1.0.7-py3-none-any.whl (78 kB)\n",
" ---------------------------------------- 0.0/78.6 kB ? eta -:--:--\n",
" ---------------------------------------- 78.6/78.6 kB ? eta 0:00:00\n",
"Collecting jsonschema-specifications>=2023.03.6\n",
" Downloading jsonschema_specifications-2024.10.1-py3-none-any.whl (18 kB)\n",
"Collecting rpds-py>=0.7.1\n",
" Downloading rpds_py-0.22.3-cp310-cp310-win_amd64.whl (231 kB)\n",
" ---------------------------------------- 0.0/231.7 kB ? eta -:--:--\n",
" ------------------------------------- 231.7/231.7 kB 14.8 MB/s eta 0:00:00\n",
"Collecting fsspec[http]<=2024.9.0,>=2023.1.0\n",
" Downloading fsspec-2024.9.0-py3-none-any.whl (179 kB)\n",
" ---------------------------------------- 0.0/179.3 kB ? eta -:--:--\n",
" ------------------------------------- 179.3/179.3 kB 11.3 MB/s eta 0:00:00\n",
"Collecting xxhash\n",
" Downloading xxhash-3.5.0-cp310-cp310-win_amd64.whl (30 kB)\n",
"Collecting pyarrow>=15.0.0\n",
" Downloading pyarrow-18.1.0-cp310-cp310-win_amd64.whl (25.1 MB)\n",
" ---------------------------------------- 0.0/25.1 MB ? eta -:--:--\n",
" ----- --------------------------------- 3.7/25.1 MB 119.9 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 7.5/25.1 MB 95.7 MB/s eta 0:00:01\n",
" ----------------- --------------------- 11.2/25.1 MB 81.8 MB/s eta 0:00:01\n",
" ----------------------- --------------- 15.0/25.1 MB 81.8 MB/s eta 0:00:01\n",
" ----------------------------- --------- 18.9/25.1 MB 81.8 MB/s eta 0:00:01\n",
" ----------------------------------- --- 22.7/25.1 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 24.5/25.1 MB 72.6 MB/s eta 0:00:01\n",
" -------------------------------------- 25.1/25.1 MB 54.4 MB/s eta 0:00:01\n",
" -------------------------------------- 25.1/25.1 MB 54.4 MB/s eta 0:00:01\n",
" -------------------------------------- 25.1/25.1 MB 54.4 MB/s eta 0:00:01\n",
" --------------------------------------- 25.1/25.1 MB 32.7 MB/s eta 0:00:00\n",
"Collecting pandas\n",
" Downloading pandas-2.2.3-cp310-cp310-win_amd64.whl (11.6 MB)\n",
" ---------------------------------------- 0.0/11.6 MB ? eta -:--:--\n",
" -------------- ------------------------- 4.2/11.6 MB 90.4 MB/s eta 0:00:01\n",
" --------------------------- ------------ 8.0/11.6 MB 85.7 MB/s eta 0:00:01\n",
" -------------------------------------- 11.6/11.6 MB 81.8 MB/s eta 0:00:01\n",
" -------------------------------------- 11.6/11.6 MB 81.8 MB/s eta 0:00:01\n",
" --------------------------------------- 11.6/11.6 MB 54.4 MB/s eta 0:00:00\n",
"Collecting dill<0.3.9,>=0.3.0\n",
" Downloading dill-0.3.8-py3-none-any.whl (116 kB)\n",
" ---------------------------------------- 0.0/116.3 kB ? eta -:--:--\n",
" ---------------------------------------- 116.3/116.3 kB ? eta 0:00:00\n",
"Collecting multiprocess<0.70.17\n",
" Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)\n",
" ---------------------------------------- 0.0/134.8 kB ? eta -:--:--\n",
" ---------------------------------------- 134.8/134.8 kB ? eta 0:00:00\n",
"Collecting MarkupSafe>=2.0\n",
" Using cached MarkupSafe-3.0.2-cp310-cp310-win_amd64.whl (15 kB)\n",
"Collecting llvmlite<0.44,>=0.43.0dev0\n",
" Downloading llvmlite-0.43.0-cp310-cp310-win_amd64.whl (28.1 MB)\n",
" ---------------------------------------- 0.0/28.1 MB ? eta -:--:--\n",
" ----- --------------------------------- 3.7/28.1 MB 119.6 MB/s eta 0:00:01\n",
" ---------- ----------------------------- 7.6/28.1 MB 96.8 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n",
" ----------- ---------------------------- 8.1/28.1 MB 85.6 MB/s eta 0:00:01\n",
" ------------ --------------------------- 8.6/28.1 MB 27.6 MB/s eta 0:00:01\n",
" --------------------------------- ---- 24.8/28.1 MB 330.0 MB/s eta 0:00:01\n",
" ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n",
" ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n",
" ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n",
" ------------------------------------- 28.1/28.1 MB 165.0 MB/s eta 0:00:01\n",
" --------------------------------------- 28.1/28.1 MB 46.7 MB/s eta 0:00:00\n",
"Collecting tzdata>=2022.7\n",
" Downloading tzdata-2024.2-py2.py3-none-any.whl (346 kB)\n",
" ---------------------------------------- 0.0/346.6 kB ? eta -:--:--\n",
" ---------------------------------------- 346.6/346.6 kB ? eta 0:00:00\n",
"Collecting pytz>=2020.1\n",
" Downloading pytz-2024.2-py2.py3-none-any.whl (508 kB)\n",
" ---------------------------------------- 0.0/508.0 kB ? eta -:--:--\n",
" ------------------------------------- 508.0/508.0 kB 31.1 MB/s eta 0:00:00\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from pandas->datasets->outlines<0.1,>=0.0.43->vllm) (2.9.0.post0)\n",
"Requirement already satisfied: six>=1.5 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from python-dateutil>=2.8.2->pandas->datasets->outlines<0.1,>=0.0.43->vllm) (1.17.0)\n",
"Building wheels for collected packages: vllm\n",
" Building wheel for vllm (pyproject.toml): started\n",
" Building wheel for vllm (pyproject.toml): finished with status 'done'\n",
" Created wheel for vllm: filename=vllm-0.6.4.post1-py3-none-any.whl size=1569302 sha256=beb78c80329b7cc624604a7f488de8ad3980fb2ede5ec19d2760688e46953955\n",
" Stored in directory: c:\\users\\tr4shhh\\appdata\\local\\pip\\cache\\wheels\\13\\ca\\7b\\d22b97dfa0a4deb1a6f7a26e495d459bc71b0b4bbf6daa1c06\n",
"Successfully built vllm\n",
"Installing collected packages: sentencepiece, pytz, pyairports, py-cpuinfo, nvidia-ml-py, mpmath, zipp, xxhash, websockets, urllib3, tzdata, tqdm, sympy, sniffio, safetensors, rpds-py, regex, pyyaml, python-dotenv, pydantic-core, pycountry, pyarrow, protobuf, propcache, prometheus_client, pillow, partial-json-parser, numpy, networkx, multidict, msgspec, msgpack, MarkupSafe, llvmlite, lark, jiter, interegular, idna, httptools, h11, fsspec, frozenlist, filelock, einops, distro, diskcache, dill, cloudpickle, click, charset-normalizer, certifi, attrs, async-timeout, annotated-types, aiohappyeyeballs, yarl, uvicorn, requests, referencing, pydantic, pandas, opencv-python-headless, numba, multiprocess, jinja2, importlib_metadata, httpcore, gguf, anyio, aiosignal, watchfiles, torch, tiktoken, starlette, lm-format-enforcer, jsonschema-specifications, huggingface-hub, httpx, aiohttp, torchvision, tokenizers, prometheus-fastapi-instrumentator, openai, jsonschema, fastapi, transformers, ray, mistral_common, datasets, outlines, compressed-tensors, vllm\n",
"Successfully installed MarkupSafe-3.0.2 aiohappyeyeballs-2.4.4 aiohttp-3.11.10 aiosignal-1.3.1 annotated-types-0.7.0 anyio-4.7.0 async-timeout-5.0.1 attrs-24.2.0 certifi-2024.8.30 charset-normalizer-3.4.0 click-8.1.7 cloudpickle-3.1.0 compressed-tensors-0.8.0 datasets-3.2.0 dill-0.3.8 diskcache-5.6.3 distro-1.9.0 einops-0.8.0 fastapi-0.115.6 filelock-3.16.1 frozenlist-1.5.0 fsspec-2024.9.0 gguf-0.10.0 h11-0.14.0 httpcore-1.0.7 httptools-0.6.4 httpx-0.28.1 huggingface-hub-0.26.5 idna-3.10 importlib_metadata-8.5.0 interegular-0.3.3 jinja2-3.1.4 jiter-0.8.2 jsonschema-4.23.0 jsonschema-specifications-2024.10.1 lark-1.2.2 llvmlite-0.43.0 lm-format-enforcer-0.10.9 mistral_common-1.5.1 mpmath-1.3.0 msgpack-1.1.0 msgspec-0.18.6 multidict-6.1.0 multiprocess-0.70.16 networkx-3.4.2 numba-0.60.0 numpy-1.26.4 nvidia-ml-py-12.560.30 openai-1.57.2 opencv-python-headless-4.10.0.84 outlines-0.0.46 pandas-2.2.3 partial-json-parser-0.2.1.1.post4 pillow-10.4.0 prometheus-fastapi-instrumentator-7.0.0 prometheus_client-0.21.1 propcache-0.2.1 protobuf-5.29.1 py-cpuinfo-9.0.0 pyairports-2.1.1 pyarrow-18.1.0 pycountry-24.6.1 pydantic-2.10.3 pydantic-core-2.27.1 python-dotenv-1.0.1 pytz-2024.2 pyyaml-6.0.2 ray-2.40.0 referencing-0.35.1 regex-2024.11.6 requests-2.32.3 rpds-py-0.22.3 safetensors-0.4.5 sentencepiece-0.2.0 sniffio-1.3.1 starlette-0.41.3 sympy-1.13.1 tiktoken-0.7.0 tokenizers-0.21.0 torch-2.5.1 torchvision-0.20.1 tqdm-4.67.1 transformers-4.47.0 tzdata-2024.2 urllib3-2.2.3 uvicorn-0.32.1 vllm-0.6.4.post1 watchfiles-1.0.1 websockets-14.1 xxhash-3.5.0 yarl-1.18.3 zipp-3.21.0\n"
]
}
],
"source": [
"!pip install vllm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 2: Verificación de dependencias\n",
"Asegúrate de que las dependencias requeridas, como PyTorch, estén instaladas. Si no tienes PyTorch, instálalo con el comando apropiado para tu entorno."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Looking in indexes: https://download.pytorch.org/whl/cpu\n",
"Requirement already satisfied: torch in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (2.5.1)\n",
"Requirement already satisfied: torchvision in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (0.20.1)\n",
"Collecting torchaudio\n",
" Downloading https://download.pytorch.org/whl/cpu/torchaudio-2.5.1%2Bcpu-cp310-cp310-win_amd64.whl (2.4 MB)\n",
" ---------------------------------------- 0.0/2.4 MB ? eta -:--:--\n",
" ----- ---------------------------------- 0.3/2.4 MB 19.8 MB/s eta 0:00:01\n",
" ------------- -------------------------- 0.8/2.4 MB 12.4 MB/s eta 0:00:01\n",
" --------------------- ------------------ 1.3/2.4 MB 11.5 MB/s eta 0:00:01\n",
" ------------------------------ --------- 1.8/2.4 MB 11.5 MB/s eta 0:00:01\n",
" --------------------------------------- 2.4/2.4 MB 11.9 MB/s eta 0:00:01\n",
" ---------------------------------------- 2.4/2.4 MB 11.0 MB/s eta 0:00:00\n",
"Requirement already satisfied: jinja2 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (3.1.4)\n",
"Requirement already satisfied: filelock in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (3.16.1)\n",
"Requirement already satisfied: fsspec in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (2024.9.0)\n",
"Requirement already satisfied: networkx in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (3.4.2)\n",
"Requirement already satisfied: sympy==1.13.1 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (1.13.1)\n",
"Requirement already satisfied: typing-extensions>=4.8.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torch) (4.12.2)\n",
"Requirement already satisfied: mpmath<1.4,>=1.1.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from sympy==1.13.1->torch) (1.3.0)\n",
"Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torchvision) (10.4.0)\n",
"Requirement already satisfied: numpy in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from torchvision) (1.26.4)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in e:\\proyects\\llm_tests_and_code_execution\\venv\\lib\\site-packages (from jinja2->torch) (3.0.2)\n",
"Installing collected packages: torchaudio\n",
"Successfully installed torchaudio-2.5.1+cpu\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"[notice] A new release of pip is available: 23.0.1 -> 24.3.1\n",
"[notice] To update, run: python.exe -m pip install --upgrade pip\n"
]
}
],
"source": [
"!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 3: Carga del modelo local\n",
"Usaremos la clase `LLM` de vLLM para cargar el modelo desde una carpeta local. En este ejemplo, asumimos que el modelo está almacenado en un directorio llamado `./modelo_local`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from vllm import LLM\n",
"\n",
"# Ruta al modelo local\n",
"ruta_modelo = \"./modelo_local\"\n",
"\n",
"# Inicialización del modelo\n",
"llm = LLM(model=ruta_modelo)\n",
"print(\"Modelo cargado exitosamente.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 4: Realización de inferencias\n",
"Con el modelo cargado, podemos realizar inferencias utilizando la función `generate`. Proporciona un texto de entrada y el modelo generará una respuesta."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Texto de entrada\n",
"texto_entrada = \"¿Cuál es la capital de Francia?\"\n",
"\n",
"# Generar respuesta\n",
"respuesta = llm.generate(texto_entrada)\n",
"print(\"Respuesta generada:\", respuesta)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 5: Ajuste de parámetros (opcional)\n",
"Puedes ajustar parámetros como la temperatura o el número máximo de tokens generados para personalizar el comportamiento del modelo."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Ajuste de parámetros\n",
"respuesta_ajustada = llm.generate(\n",
" texto_entrada,\n",
" temperature=0.7, # Controla la aleatoriedad\n",
" max_tokens=100 # Límite de tokens generados\n",
")\n",
"print(\"Respuesta ajustada:\", respuesta_ajustada)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paso 6: Conclusión\n",
"Has aprendido a instalar y usar vLLM para cargar un modelo local y realizar inferencias. Puedes explorar más características en la [documentación oficial de vLLM](https://docs.vllm.ai/en/latest/).\n",
"\n",
"¡Ahora estás listo para usar modelos de lenguaje eficientemente con vLLM!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}