fn_registry/python/functions/ml/vram_budget.md at ea6a3ec8a5efdf291c61313ca536707eaa7072db

Files

T

egutierrez 47fac22230 chore: auto-commit (799 archivos)

- .claude/CLAUDE.md
- .claude/commands/subagentes.md
- .claude/rules/INDEX.md
- .mcp.json
- bash/functions/cybersecurity/analyze_dns.md
- bash/functions/cybersecurity/audit_http_headers.md
- bash/functions/cybersecurity/audit_ssh_config.md
- bash/functions/cybersecurity/check_firewall.md
- bash/functions/cybersecurity/detect_suspicious_users.md
- bash/functions/cybersecurity/encrypt_file.md
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-14 00:28:20 +02:00

3.8 KiB

Raw Blame History

name, kind, lang, domain, version, purity, signature, description, tags, uses_functions, uses_types, returns, returns_optional, error_type, imports, params, output, tested, tests, test_file_path, file_path

name

kind

lang

domain

version

purity

signature

description

tags

uses_functions

uses_types

returns

returns_optional

error_type

imports

params

output

tested

tests

test_file_path

file_path

vram_budget

function

1.0.0

pure

def vram_budget(gpu_vram_total_mb: int, model_type: str, quantization: str, n_loras: int = 0, width: int = 1024, height: int = 1024, batch_size: int = 1) -> dict

Estima la VRAM requerida para ejecutar un modelo de generacion de imagen via heuristicas tabuladas por (model_type, quantization). Retorna VRAM estimada, si cabe en la GPU indicada, headroom disponible, y warnings por incompatibilidades (lora+quant) o falta de VRAM. Funcion pura: solo lookup y aritmetica, sin GPU ni runtime.

vram

gpu

budget

stable-diffusion

flux

sdxl

quantization

lora

estimation

pure

pendiente-usar

false

name	desc
gpu_vram_total_mb	VRAM total de la GPU objetivo en MB. Obtener con gpu_info() o torch.cuda.get_device_properties().

name	desc
model_type	Tipo de modelo. Valores soportados: sd15, sdxl, flux_dev, flux_schnell, sd3, qwen_image. Combinaciones fuera de la tabla retornan required_mb=-1.

name	desc
quantization	Esquema de cuantizacion. Valores: fp16, q8_0, q4_0 (y variantes q4_k_m, q5_k_m, q6_k). Afecta tanto el tamano base como la compatibilidad con LoRAs.

name	desc
n_loras	Numero de LoRAs a cargar simultaneamente en VRAM. Cada LoRA suma ~300 MB. Con quantization != fp16 se emite warning de incompatibilidad.

name	desc
width	Ancho en pixeles de la imagen a generar. Afecta el overhead de latentes (mayor resolucion = mas VRAM para activaciones).

name	desc
height	Alto en pixeles de la imagen a generar.

name	desc
batch_size	Numero de imagenes generadas en paralelo. El overhead de latentes escala linealmente con batch_size.

dict con: required_mb (int, -1 si combo desconocido), fits (bool, True si cabe en gpu_vram_total_mb), headroom_mb (int, negativo si no cabe, 0 si combo desconocido), warning (str o None con aviso de incompatibilidad lora+quant o deficit de VRAM).

true

sdxl fp16 cabe en 24gb con headroom positivo

flux fp16 no cabe en 8gb warning con deficit

lora con quantization incompatible emite warning

combo desconocido retorna required minus1 y warning

python/functions/ml/tests/test_vram_budget.py

python/functions/ml/vram_budget.py

Ejemplo

from ml.vram_budget import vram_budget

# SDXL fp16 en 24 GB — cabe
r = vram_budget(24576, "sdxl", "fp16")
# {"required_mb": 6960, "fits": True, "headroom_mb": 17616, "warning": None}

# Flux dev fp16 en 8 GB — no cabe
r = vram_budget(8192, "flux_dev", "fp16")
# {"required_mb": 23512, "fits": False, "headroom_mb": -15320, "warning": "needs +15320 MB ..."}

# Flux dev q4_0 en 8 GB con 1 LoRA — incompatible
r = vram_budget(8192, "flux_dev", "q4_0", n_loras=1)
# {"required_mb": 7300, "fits": True, "headroom_mb": 892,
#  "warning": "lora+quantization incompatible — usa fp16 para cargar LoRAs con flux_dev"}

# Combo desconocido
r = vram_budget(24576, "mi_modelo", "q99_k")
# {"required_mb": -1, "fits": False, "headroom_mb": 0,
#  "warning": "unknown model/quant combo: ('mi_modelo', 'q99_k')"}

Notas

La tabla _MODEL_VRAM_MB es una estimacion inicial; el usuario debe calibrarla con mediciones reales (nvidia-smi durante inference).
El overhead de latentes se calcula como w*h/64 MB para SD/SDXL/SD3 y w*h/32 MB para modelos Flux (espacio latente con mas canales).
LoRA warning tiene prioridad sobre el warning de no-fits: si hay incompatibilidad lora+quant, ese warning se emite aunque el modelo no quepa.
Para obtener gpu_vram_total_mb en tiempo real usar gpu_info_py_ml (impure).
Funcion pura: misma entrada, misma salida. Sin I/O ni dependencias externas.

3.8 KiB Raw Blame History

Ejemplo

Notas

3.8 KiB

Raw Blame History