fn_registry/python/functions/ml/safetensors_inspect.md at master

Files

T

egutierrez 47fac22230 chore: auto-commit (799 archivos)

- .claude/CLAUDE.md
- .claude/commands/subagentes.md
- .claude/rules/INDEX.md
- .mcp.json
- bash/functions/cybersecurity/analyze_dns.md
- bash/functions/cybersecurity/audit_http_headers.md
- bash/functions/cybersecurity/audit_ssh_config.md
- bash/functions/cybersecurity/check_firewall.md
- bash/functions/cybersecurity/detect_suspicious_users.md
- bash/functions/cybersecurity/encrypt_file.md
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-14 00:28:20 +02:00

3.3 KiB

Raw Permalink Blame History

name, kind, lang, domain, version, purity, signature, description, tags, uses_functions, uses_types, returns, returns_optional, error_type, imports, params, output, tested, tests, test_file_path, file_path

name

kind

lang

domain

version

purity

signature

description

tags

uses_functions

uses_types

returns

returns_optional

error_type

imports

params

output

tested

tests

test_file_path

file_path

safetensors_inspect

function

1.0.0

impure

def safetensors_inspect(path: str) -> dict

Lee SOLO el header de un archivo .safetensors sin cargar los tensores en RAM. Retorna metadata del modelo, lista de tensores con dtype/shape/offsets, tamano total y conteo. Util para inspeccionar checkpoints de varios GB sin agotarlos en memoria.

safetensors

model

inspect

header

huggingface

checkpoint

dtype

shape

pendiente-usar

false

error_go_core

name	desc
path	ruta al archivo .safetensors a inspeccionar (absoluta o relativa)

dict con claves: path (str ruta absoluta), metadata (dict con __metadata__ del header), tensors (list[dict] con name/dtype/shape/data_offsets por tensor), total_size_bytes (int), n_tensors (int)

true

n_tensors refleja el numero de tensores en el header

total_size_bytes refleja el tamano real del archivo

metadata devuelve el contenido de __metadata__

tensors es lista con una entrada por tensor del header

cada tensor tiene dtype, shape y data_offsets

result path es la ruta absoluta del archivo

FileNotFoundError si el archivo no existe

ValueError si el header no es JSON valido

ValueError si el archivo esta vacio

si no hay __metadata__ metadata retorna dict vacio

python/functions/ml/tests/test_safetensors_inspect.py

python/functions/ml/safetensors_inspect.py

Ejemplo

from ml.safetensors_inspect import safetensors_inspect

info = safetensors_inspect("/models/sd-v1-5/model.safetensors")
print(info["n_tensors"])          # 1344
print(info["total_size_bytes"])   # 3_975_733_952  (~3.7 GB)
print(info["metadata"])           # {"format": "pt", "model_type": "stable_diffusion"}

# Ver los 5 primeros tensores
for t in info["tensors"][:5]:
    print(t["name"], t["dtype"], t["shape"])
# model.diffusion_model.input_blocks.0.0.weight F16 [320, 4, 3, 3]
# model.diffusion_model.input_blocks.0.0.bias   F16 [320]
# ...

Notas

Formato safetensors

[8 bytes: uint64 LE = N (longitud del header JSON)]
[N bytes: JSON con metadata y descriptores]
[datos binarios de los tensores (no se leen)]

El JSON tiene esta estructura:

{
  "__metadata__": {"format": "pt", ...},
  "tensor_name": {
    "dtype": "F32",
    "shape": [1024, 768],
    "data_offsets": [0, 3145728]
  },
  ...
}

data_offsets son relativos al inicio del bloque de datos (despues del header), no al inicio del archivo. Para acceso lazy a un tensor concreto: offset_en_archivo = 8 + header_len + data_offsets[0].

Por que no usar la libreria `safetensors`

Esta funcion solo usa stdlib (struct, json, os) para no requerir instalaciones adicionales y ser ejecutable durante fn index. La libreria oficial safetensors de HuggingFace cargaria los tensores en RAM al usar safe_open sin framework=None. Esta implementacion es read-only sobre el header y garantiza que no se carga ningun dato de tensor.

Dtypes comunes

dtype	descripcion
F32	float32 (full precision)
BF16	bfloat16 (training, ampere+)
F16	float16 (inference)
I32	int32
I64	int64
U8	uint8

3.3 KiB Raw Permalink Blame History