chore: sync from fn-registry agent

This commit is contained in:
fn-registry agent
2026-05-30 17:28:44 +02:00
commit 655db6839b
19 changed files with 5553 additions and 0 deletions
@@ -0,0 +1,230 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "aabd4af1",
"metadata": {},
"source": [
"# 01 — Panorama single-image-to-3D (2024-2026)\n",
"\n",
"**Objetivo:** mapear modelos open source serios para generar 3D (mesh / gaussians / NeRF) desde **una sola imagen**. Sirve de base para:\n",
"\n",
"1. Decidir cual integramos primero en el backend Python de `image_to_3d_studio`.\n",
"2. Documentar trade-offs (latencia, VRAM, calidad, licencia) que el bucle reactivo del registry pueda referenciar.\n",
"3. Generar contratos / wrappers comunes en `projects/imagegen/`.\n",
"\n",
"**Hardware target:** WSL2 Linux + CUDA (RTX local).\n",
"\n",
"**Vault de outputs:** `projects/imagegen/vaults/imagegen_models/` (mismo que diffusion 2D)."
]
},
{
"cell_type": "markdown",
"id": "5f7f4a7c",
"metadata": {},
"source": [
"## Taxonomia\n",
"\n",
"Los modelos varian en 3 ejes:\n",
"\n",
"| Eje | Opciones | Notas |\n",
"|---|---|---|\n",
"| **Representacion 3D** | mesh (GLB/OBJ), 3D gaussians (PLY), NeRF, SDF/voxel, multi-view RGB | Mesh = mas portable. Gaussians = render rapido pero pipeline raro. |\n",
"| **Pipeline** | feed-forward 1-pass, multi-view diffusion + reconstruction, optimization per-image | 1-pass = segundos. Multi-view = decenas de segundos. Optim = minutos. |\n",
"| **Texturizado** | albedo baked, PBR (albedo+normal+roughness), sin textura | Hunyuan3D-2 y Trellis = textura aparte. TripoSR = baked vertex color basico. |\n",
"\n",
"**Familias principales:**\n",
"\n",
"1. **LRM-likes** (feed-forward triplane→mesh): TripoSR, SF3D, OpenLRM, InstantMesh-recon. Rapidos.\n",
"2. **Multi-view diffusion + recon**: Zero123++, Wonder3D, SyncDreamer, InstantMesh. Mejor calidad geometrica con vistas multiples consistentes.\n",
"3. **Latent 3D structures** (Microsoft Trellis): structured latents (sparse voxels + features) → decoded a mesh/gaussian/NeRF.\n",
"4. **3D-native diffusion**: Hunyuan3D-2, Shap-E, Direct3D. Diffusion entrenado directamente en representacion 3D.\n",
"5. **3D Gaussian feed-forward**: LGM, GRM, GS-LRM. Bueno para gaussians, conversion a mesh pierde calidad."
]
},
{
"cell_type": "markdown",
"id": "6112267f",
"metadata": {},
"source": [
"## Tabla comparativa\n",
"\n",
"| Modelo | Autor / año | Repr. | Latencia GPU* | VRAM | Licencia | Repo | Output nativo |\n",
"|---|---|---|---|---|---|---|---|\n",
"| **TripoSR** | Stability+Tripo 2024-03 | mesh | ~0.5s | ~6 GB | MIT | github.com/VAST-AI-Research/TripoSR | OBJ (vertex color) |\n",
"| **SF3D (StableFast3D)** | Stability 2024-08 | mesh PBR | ~0.5s | ~7 GB | SAIL-1.0 (no comercial) | github.com/Stability-AI/stable-fast-3d | GLB (PBR baked) |\n",
"| **InstantMesh** | Tencent 2024-04 | mesh | ~10s | ~12 GB | Apache 2.0 | github.com/TencentARC/InstantMesh | OBJ + textura |\n",
"| **Wonder3D** | HKU 2024-01 | mesh + normals | ~3min (optim) | ~16 GB | AGPL-3.0 | github.com/xxlong0/Wonder3D | OBJ |\n",
"| **Trellis** | Microsoft 2024-12 | mesh / 3DGS / RF | ~30-60s | ~16 GB | MIT (modelo: research-only) | github.com/microsoft/TRELLIS | GLB / PLY |\n",
"| **Hunyuan3D-2** | Tencent 2025-01 | mesh + textura | ~25s mesh + ~20s tex | ~12 GB mesh / ~24 GB full | Tencent Community (uso libre <100M MAU) | github.com/Tencent/Hunyuan3D-2 | GLB con PBR |\n",
"\n",
"*Latencia aproximada en RTX 4090, batch=1, sin optimizaciones extra (FP16 por defecto).\n",
"\n",
"**Lectura rapida:**\n",
"\n",
"- **Velocidad** → TripoSR / SF3D (sub-segundo).\n",
"- **Calidad mesh** → Hunyuan3D-2 (mejor textura), Trellis (mejor geometria limpia).\n",
"- **Mas balance / variedad de outputs** → Trellis (mesh+gaussian+NeRF del mismo modelo).\n",
"- **Mas restrictivo legalmente** → SF3D (no comercial), Wonder3D (AGPL contagia).\n",
"- **Mas seguro para producto** → TripoSR (MIT), InstantMesh (Apache 2.0), Trellis (MIT codigo).\n",
"- **Mejor textura PBR** → Hunyuan3D-2, SF3D."
]
},
{
"cell_type": "markdown",
"id": "9d5dfabe",
"metadata": {},
"source": [
"## Decision de roadmap (propuesta)\n",
"\n",
"**Fase A — backend Python con TripoSR.** Razones:\n",
"\n",
"- MIT, sin friccion legal.\n",
"- Sub-segundo en GPU, perfecto para iterar UI C++.\n",
"- VRAM baja, libera GPU para diffusion 2D paralelo.\n",
"- Output OBJ trivial de convertir a GLB con `trimesh`.\n",
"\n",
"**Fase B — anadir Hunyuan3D-2 (calidad).** Razones:\n",
"\n",
"- SOTA open en textura.\n",
"- Licencia community OK para uso personal.\n",
"- Necesita VRAM, pero el RTX lo aguanta.\n",
"\n",
"**Fase C — anadir Trellis (variedad output).** Razones:\n",
"\n",
"- MIT codigo (modelo solo research, OK para nosotros).\n",
"- Mismo modelo da mesh + 3DGS + NeRF → util para experimentar con gaussians sin re-instalar nada.\n",
"- Pesado pero VRAM razonable.\n",
"\n",
"**Aplazado:**\n",
"\n",
"- **SF3D** — licencia no comercial nos bloquearia despues.\n",
"- **Wonder3D** — AGPL contagia el repo entero.\n",
"- **InstantMesh** — buen Apache pero queda eclipsado por Trellis y Hunyuan3D-2."
]
},
{
"cell_type": "markdown",
"id": "79b13fbf",
"metadata": {},
"source": [
"## Contrato compartido (propuesta tipo del registry)\n",
"\n",
"Espejo del `GenerationConfig_py_ml` de diffusion 2D:\n",
"\n",
"```python\n",
"@dataclass\n",
"class ImageTo3DConfig:\n",
" model_id: str # 'triposr' | 'hunyuan3d-2' | 'trellis'\n",
" image_path: str # o bytes\n",
" seed: int = 0\n",
" foreground_ratio: float = 0.85 # bbox crop antes de inferir\n",
" mc_resolution: int = 256 # marching cubes (TripoSR / Trellis)\n",
" texture: bool = True # Hunyuan3D-2: skip texture pass si False\n",
" output_format: str = 'glb' # 'glb' | 'obj' | 'ply'\n",
" output_path: str = '' # vault destino\n",
"```\n",
"\n",
"Cuando estabilice, se promueve a `python/types/ml/image_to_3d_config.py` + `.md` y se anade `tags: [image-to-3d, imagegen]`."
]
},
{
"cell_type": "markdown",
"id": "ac91228a",
"metadata": {},
"source": [
"## Integracion con app C++\n",
"\n",
"Arquitectura:\n",
"\n",
"```\n",
"image_to_3d_studio (C++/ImGui)\n",
" POST /generate (model_id, image bytes, config) →\n",
" backend FastAPI (uvicorn, 127.0.0.1:8600)\n",
" dispatcher(model_id)\n",
" → triposr_backend() / triposr / 3.10+\n",
" → hunyuan3d_backend() / hunyuan3d-2\n",
" → trellis_backend() / TRELLIS\n",
" ← bytes GLB (Content-Type: model/gltf-binary)\n",
" C++ guarda en local_files/cache/{hash}.glb\n",
" Viewer GLB (tinygltf + OpenGL) en panel ImGui via FBO\n",
"```\n",
"\n",
"Aislamiento: cada modelo en su `.venv` separado dentro de `backend/envs/<model>/` para evitar conflictos de torch/CUDA. El dispatcher arranca el subproceso correcto bajo demanda."
]
},
{
"cell_type": "markdown",
"id": "e9949f01",
"metadata": {},
"source": [
"## Verificacion entorno\n",
"\n",
"Comprobar que torch + CUDA funcionan antes de cargar modelos pesados."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c506c304",
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"print(f'torch: {torch.__version__}')\n",
"print(f'cuda available: {torch.cuda.is_available()}')\n",
"if torch.cuda.is_available():\n",
" print(f'device: {torch.cuda.get_device_name(0)}')\n",
" print(f'vram total: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')\n",
" print(f'cuda runtime: {torch.version.cuda}')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "518eb741",
"metadata": {},
"outputs": [],
"source": [
"# Acceso al registry desde el kernel\n",
"fn_search('imagegen')[:5]\n"
]
},
{
"cell_type": "markdown",
"id": "bcbd7d5d",
"metadata": {},
"source": [
"## Pasos siguientes\n",
"\n",
"- **02_smoke_triposr.ipynb** — clonar TripoSR, generar 1 mesh, medir latencia/VRAM, dump GLB en vault.\n",
"- **03_smoke_hunyuan3d.ipynb** — repetir con Hunyuan3D-2.\n",
"- **04_smoke_trellis.ipynb** — repetir con Trellis (probar los 3 outputs).\n",
"- **05_benchmark.ipynb** — set fijo de 10 imagenes, cruzar metricas (latencia, VRAM peak, mesh stats: #vertices, manifoldness, area, watertight).\n",
"- Promover contrato `ImageTo3DConfig` a `python/types/ml/`.\n",
"- Refactor backend `image_to_3d_studio/backend/` con dispatcher final."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@@ -0,0 +1,304 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c9b49319",
"metadata": {},
"source": [
"# 02 — Smoke TripoSR\n",
"\n",
"**Objetivo:** validar pipeline single-image-to-3D end-to-end con el modelo mas barato (MIT, sub-segundo, ~6 GB VRAM). Antes de pelearnos con Hunyuan3D-2 y Trellis, confirmar:\n",
"\n",
"1. Repo upstream clona y compila sin drama.\n",
"2. Modelo se descarga (HF hub) y carga en GPU.\n",
"3. Inferencia genera mesh valido (watertight, manifold-ish).\n",
"4. Latencia + VRAM peak coinciden con lo prometido (~0.5s / ~6 GB).\n",
"5. Export GLB usable por el viewer C++.\n",
"\n",
"**Output:** GLB en `vaults/imagegen_models/image_to_3d/triposr/smoke_<seed>.glb`.\n",
"\n",
"## Plan\n",
"\n",
"- Seccion A: setup (clone repo + install deps + descarga checkpoint).\n",
"- Seccion B: imagen sample (de cualquier vault o descargada).\n",
"- Seccion C: inferencia + medicion (warm-up + 3 runs).\n",
"- Seccion D: mesh stats (vertices, faces, watertight, bounds).\n",
"- Seccion E: export GLB con trimesh + verificacion."
]
},
{
"cell_type": "markdown",
"id": "a943b390",
"metadata": {},
"source": [
"## A. Setup\n",
"\n",
"TripoSR vive en `github.com/VAST-AI-Research/TripoSR`. Clonamos a `sources/` (gitignored, fuera del repo del analysis). Deps extra: `rembg` (segmentacion background), `omegaconf`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4e94d4d4",
"metadata": {},
"outputs": [],
"source": [
"import os, sys, subprocess, pathlib\n",
"\n",
"REGISTRY_ROOT = pathlib.Path(os.environ['FN_REGISTRY_ROOT'])\n",
"SOURCES_DIR = REGISTRY_ROOT / 'sources' / 'TripoSR'\n",
"VAULT_DIR = pathlib.Path.home() / 'vaults' / 'imagegen_models' / 'image_to_3d' / 'triposr'\n",
"VAULT_DIR.mkdir(parents=True, exist_ok=True)\n",
"\n",
"if not SOURCES_DIR.exists():\n",
" SOURCES_DIR.parent.mkdir(parents=True, exist_ok=True)\n",
" subprocess.run(['git', 'clone', '--depth=1',\n",
" 'https://github.com/VAST-AI-Research/TripoSR.git',\n",
" str(SOURCES_DIR)], check=True)\n",
"\n",
"# Anadir al sys.path para importar el paquete tsr/\n",
"if str(SOURCES_DIR) not in sys.path:\n",
" sys.path.insert(0, str(SOURCES_DIR))\n",
"\n",
"print('SOURCES_DIR:', SOURCES_DIR)\n",
"print('VAULT_DIR: ', VAULT_DIR)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66dbea10",
"metadata": {},
"outputs": [],
"source": [
"# Deps extra del repo. Las instalamos en el .venv del analysis con uv.\n",
"# omegaconf, einops, rembg vienen en su requirements.txt. xatlas para UV unwrap (opcional).\n",
"import subprocess\n",
"subprocess.run(['uv', 'add', 'omegaconf', 'einops', 'rembg[gpu]', 'xatlas', 'onnxruntime-gpu'],\n",
" cwd=str(REGISTRY_ROOT / 'projects' / 'imagegen' / 'analysis' / 'spike_image_to_3d'),\n",
" check=True)\n",
"print('deps OK')\n"
]
},
{
"cell_type": "markdown",
"id": "776fe837",
"metadata": {},
"source": [
"## B. Imagen sample\n",
"\n",
"Usamos `examples/chair.png` del propio repo (incluido en el clone)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ab0f81e",
"metadata": {},
"outputs": [],
"source": [
"from PIL import Image\n",
"sample_path = SOURCES_DIR / 'examples' / 'chair.png'\n",
"assert sample_path.exists(), f'sample faltante: {sample_path}'\n",
"img = Image.open(sample_path).convert('RGB')\n",
"print('imagen:', img.size, img.mode)\n",
"img.thumbnail((256, 256))\n",
"img\n"
]
},
{
"cell_type": "markdown",
"id": "1aeb1dda",
"metadata": {},
"source": [
"## C. Inferencia + medicion\n",
"\n",
"TripoSR API minimo: `TSR.from_pretrained(repo, ...)` → `model(image)` → mesh. El repo guia con `run.py`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eda8e7d2",
"metadata": {},
"outputs": [],
"source": [
"import time, torch\n",
"from tsr.system import TSR\n",
"from tsr.utils import remove_background, resize_foreground\n",
"from rembg import new_session\n",
"\n",
"DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
"print('device:', DEVICE)\n",
"\n",
"# Carga (la primera vez descarga ~1.2 GB desde HF)\n",
"t0 = time.perf_counter()\n",
"model = TSR.from_pretrained(\n",
" 'stabilityai/TripoSR',\n",
" config_name='config.yaml',\n",
" weight_name='model.ckpt',\n",
")\n",
"model.renderer.set_chunk_size(8192)\n",
"model.to(DEVICE)\n",
"print(f'load: {time.perf_counter() - t0:.2f}s')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "34911de7",
"metadata": {},
"outputs": [],
"source": [
"# Pre-procesado: quitar fondo (necesario para TripoSR — espera foreground sobre canvas vacio)\n",
"rembg_session = new_session()\n",
"raw = Image.open(sample_path)\n",
"fg = remove_background(raw, rembg_session)\n",
"fg = resize_foreground(fg, 0.85)\n",
"fg.thumbnail((384, 384))\n",
"fg\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f646c3c5",
"metadata": {},
"outputs": [],
"source": [
"# Warm-up (compilacion CUDA + cache) + 3 runs medidos\n",
"torch.cuda.reset_peak_memory_stats()\n",
"_ = model([fg], device=DEVICE)\n",
"torch.cuda.synchronize()\n",
"\n",
"runs = []\n",
"for i in range(3):\n",
" torch.cuda.reset_peak_memory_stats()\n",
" t0 = time.perf_counter()\n",
" scene_codes = model([fg], device=DEVICE)\n",
" torch.cuda.synchronize()\n",
" dt = time.perf_counter() - t0\n",
" peak = torch.cuda.max_memory_allocated() / 1e9\n",
" runs.append({'idx': i, 'inference_s': dt, 'vram_peak_gb': peak})\n",
"\n",
"import pandas as pd\n",
"pd.DataFrame(runs)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8150499f",
"metadata": {},
"outputs": [],
"source": [
"# Extraer mesh (marching cubes en la representacion triplane)\n",
"t0 = time.perf_counter()\n",
"meshes = model.extract_mesh(scene_codes, resolution=256)\n",
"dt_mesh = time.perf_counter() - t0\n",
"mesh = meshes[0]\n",
"print(f'extract_mesh: {dt_mesh:.2f}s')\n",
"print(f'vertices: {len(mesh.vertices):,}')\n",
"print(f'faces: {len(mesh.faces):,}')\n"
]
},
{
"cell_type": "markdown",
"id": "339bc8f9",
"metadata": {},
"source": [
"## D. Mesh stats\n",
"\n",
"Pasamos por `trimesh` para validar manifoldness y exportar a GLB con normales."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "daffd271",
"metadata": {},
"outputs": [],
"source": [
"import trimesh, numpy as np\n",
"tm = trimesh.Trimesh(vertices=np.asarray(mesh.vertices),\n",
" faces=np.asarray(mesh.faces),\n",
" process=True)\n",
"stats = {\n",
" 'n_vertices': len(tm.vertices),\n",
" 'n_faces': len(tm.faces),\n",
" 'watertight': bool(tm.is_watertight),\n",
" 'volume': float(tm.volume) if tm.is_watertight else None,\n",
" 'area': float(tm.area),\n",
" 'bounds_min': tm.bounds[0].tolist(),\n",
" 'bounds_max': tm.bounds[1].tolist(),\n",
" 'euler': int(tm.euler_number),\n",
"}\n",
"stats\n"
]
},
{
"cell_type": "markdown",
"id": "9a21c430",
"metadata": {},
"source": [
"## E. Export GLB\n",
"\n",
"Convencion: `<vault>/triposr/smoke_<seed>.glb`. Verificamos que `trimesh.load` lo lee de vuelta sin warnings."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "78614388",
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"out_path = VAULT_DIR / f'smoke_{datetime.now():%Y%m%d_%H%M%S}.glb'\n",
"tm.export(out_path)\n",
"print('saved:', out_path, '-', out_path.stat().st_size, 'bytes')\n",
"\n",
"# Round-trip verification\n",
"roundtrip = trimesh.load(out_path, force='mesh')\n",
"print('roundtrip:', len(roundtrip.vertices), 'verts /', len(roundtrip.faces), 'faces')\n"
]
},
{
"cell_type": "markdown",
"id": "0b2f2d9f",
"metadata": {},
"source": [
"## Conclusion + handoff a backend\n",
"\n",
"Si las metricas confirman lo prometido (latencia <2s end-to-end, VRAM peak <8 GB, mesh manifold):\n",
"\n",
"1. Funcion del registry candidata: `triposr_infer_py_ml(image_bytes, seed, mc_resolution) -> bytes (GLB)`. Stub si no quieres atar deps al registry.\n",
"2. Backend dispatcher en `apps/image_to_3d_studio/backend/` puede importar directamente desde `sources/TripoSR` (no atado al registry).\n",
"3. Cliente C++ envia POST `/generate?model=triposr` con la imagen, recibe bytes GLB y los guarda en `local_files/cache/`.\n",
"\n",
"Si no cumple (lento / VRAM mayor / mesh roto): debug pre-procesado (rembg buena? foreground bien cropeado?), probar `chunk_size` mas grande, validar checkpoint."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
+230
View File
@@ -0,0 +1,230 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "aabd4af1",
"metadata": {},
"source": [
"# 01 — Panorama single-image-to-3D (2024-2026)\n",
"\n",
"**Objetivo:** mapear modelos open source serios para generar 3D (mesh / gaussians / NeRF) desde **una sola imagen**. Sirve de base para:\n",
"\n",
"1. Decidir cual integramos primero en el backend Python de `image_to_3d_studio`.\n",
"2. Documentar trade-offs (latencia, VRAM, calidad, licencia) que el bucle reactivo del registry pueda referenciar.\n",
"3. Generar contratos / wrappers comunes en `projects/imagegen/`.\n",
"\n",
"**Hardware target:** WSL2 Linux + CUDA (RTX local).\n",
"\n",
"**Vault de outputs:** `projects/imagegen/vaults/imagegen_models/` (mismo que diffusion 2D)."
]
},
{
"cell_type": "markdown",
"id": "5f7f4a7c",
"metadata": {},
"source": [
"## Taxonomia\n",
"\n",
"Los modelos varian en 3 ejes:\n",
"\n",
"| Eje | Opciones | Notas |\n",
"|---|---|---|\n",
"| **Representacion 3D** | mesh (GLB/OBJ), 3D gaussians (PLY), NeRF, SDF/voxel, multi-view RGB | Mesh = mas portable. Gaussians = render rapido pero pipeline raro. |\n",
"| **Pipeline** | feed-forward 1-pass, multi-view diffusion + reconstruction, optimization per-image | 1-pass = segundos. Multi-view = decenas de segundos. Optim = minutos. |\n",
"| **Texturizado** | albedo baked, PBR (albedo+normal+roughness), sin textura | Hunyuan3D-2 y Trellis = textura aparte. TripoSR = baked vertex color basico. |\n",
"\n",
"**Familias principales:**\n",
"\n",
"1. **LRM-likes** (feed-forward triplane→mesh): TripoSR, SF3D, OpenLRM, InstantMesh-recon. Rapidos.\n",
"2. **Multi-view diffusion + recon**: Zero123++, Wonder3D, SyncDreamer, InstantMesh. Mejor calidad geometrica con vistas multiples consistentes.\n",
"3. **Latent 3D structures** (Microsoft Trellis): structured latents (sparse voxels + features) → decoded a mesh/gaussian/NeRF.\n",
"4. **3D-native diffusion**: Hunyuan3D-2, Shap-E, Direct3D. Diffusion entrenado directamente en representacion 3D.\n",
"5. **3D Gaussian feed-forward**: LGM, GRM, GS-LRM. Bueno para gaussians, conversion a mesh pierde calidad."
]
},
{
"cell_type": "markdown",
"id": "6112267f",
"metadata": {},
"source": [
"## Tabla comparativa\n",
"\n",
"| Modelo | Autor / año | Repr. | Latencia GPU* | VRAM | Licencia | Repo | Output nativo |\n",
"|---|---|---|---|---|---|---|---|\n",
"| **TripoSR** | Stability+Tripo 2024-03 | mesh | ~0.5s | ~6 GB | MIT | github.com/VAST-AI-Research/TripoSR | OBJ (vertex color) |\n",
"| **SF3D (StableFast3D)** | Stability 2024-08 | mesh PBR | ~0.5s | ~7 GB | SAIL-1.0 (no comercial) | github.com/Stability-AI/stable-fast-3d | GLB (PBR baked) |\n",
"| **InstantMesh** | Tencent 2024-04 | mesh | ~10s | ~12 GB | Apache 2.0 | github.com/TencentARC/InstantMesh | OBJ + textura |\n",
"| **Wonder3D** | HKU 2024-01 | mesh + normals | ~3min (optim) | ~16 GB | AGPL-3.0 | github.com/xxlong0/Wonder3D | OBJ |\n",
"| **Trellis** | Microsoft 2024-12 | mesh / 3DGS / RF | ~30-60s | ~16 GB | MIT (modelo: research-only) | github.com/microsoft/TRELLIS | GLB / PLY |\n",
"| **Hunyuan3D-2** | Tencent 2025-01 | mesh + textura | ~25s mesh + ~20s tex | ~12 GB mesh / ~24 GB full | Tencent Community (uso libre <100M MAU) | github.com/Tencent/Hunyuan3D-2 | GLB con PBR |\n",
"\n",
"*Latencia aproximada en RTX 4090, batch=1, sin optimizaciones extra (FP16 por defecto).\n",
"\n",
"**Lectura rapida:**\n",
"\n",
"- **Velocidad** → TripoSR / SF3D (sub-segundo).\n",
"- **Calidad mesh** → Hunyuan3D-2 (mejor textura), Trellis (mejor geometria limpia).\n",
"- **Mas balance / variedad de outputs** → Trellis (mesh+gaussian+NeRF del mismo modelo).\n",
"- **Mas restrictivo legalmente** → SF3D (no comercial), Wonder3D (AGPL contagia).\n",
"- **Mas seguro para producto** → TripoSR (MIT), InstantMesh (Apache 2.0), Trellis (MIT codigo).\n",
"- **Mejor textura PBR** → Hunyuan3D-2, SF3D."
]
},
{
"cell_type": "markdown",
"id": "9d5dfabe",
"metadata": {},
"source": [
"## Decision de roadmap (propuesta)\n",
"\n",
"**Fase A — backend Python con TripoSR.** Razones:\n",
"\n",
"- MIT, sin friccion legal.\n",
"- Sub-segundo en GPU, perfecto para iterar UI C++.\n",
"- VRAM baja, libera GPU para diffusion 2D paralelo.\n",
"- Output OBJ trivial de convertir a GLB con `trimesh`.\n",
"\n",
"**Fase B — anadir Hunyuan3D-2 (calidad).** Razones:\n",
"\n",
"- SOTA open en textura.\n",
"- Licencia community OK para uso personal.\n",
"- Necesita VRAM, pero el RTX lo aguanta.\n",
"\n",
"**Fase C — anadir Trellis (variedad output).** Razones:\n",
"\n",
"- MIT codigo (modelo solo research, OK para nosotros).\n",
"- Mismo modelo da mesh + 3DGS + NeRF → util para experimentar con gaussians sin re-instalar nada.\n",
"- Pesado pero VRAM razonable.\n",
"\n",
"**Aplazado:**\n",
"\n",
"- **SF3D** — licencia no comercial nos bloquearia despues.\n",
"- **Wonder3D** — AGPL contagia el repo entero.\n",
"- **InstantMesh** — buen Apache pero queda eclipsado por Trellis y Hunyuan3D-2."
]
},
{
"cell_type": "markdown",
"id": "79b13fbf",
"metadata": {},
"source": [
"## Contrato compartido (propuesta tipo del registry)\n",
"\n",
"Espejo del `GenerationConfig_py_ml` de diffusion 2D:\n",
"\n",
"```python\n",
"@dataclass\n",
"class ImageTo3DConfig:\n",
" model_id: str # 'triposr' | 'hunyuan3d-2' | 'trellis'\n",
" image_path: str # o bytes\n",
" seed: int = 0\n",
" foreground_ratio: float = 0.85 # bbox crop antes de inferir\n",
" mc_resolution: int = 256 # marching cubes (TripoSR / Trellis)\n",
" texture: bool = True # Hunyuan3D-2: skip texture pass si False\n",
" output_format: str = 'glb' # 'glb' | 'obj' | 'ply'\n",
" output_path: str = '' # vault destino\n",
"```\n",
"\n",
"Cuando estabilice, se promueve a `python/types/ml/image_to_3d_config.py` + `.md` y se anade `tags: [image-to-3d, imagegen]`."
]
},
{
"cell_type": "markdown",
"id": "ac91228a",
"metadata": {},
"source": [
"## Integracion con app C++\n",
"\n",
"Arquitectura:\n",
"\n",
"```\n",
"image_to_3d_studio (C++/ImGui)\n",
" POST /generate (model_id, image bytes, config) →\n",
" backend FastAPI (uvicorn, 127.0.0.1:8600)\n",
" dispatcher(model_id)\n",
" → triposr_backend() / triposr / 3.10+\n",
" → hunyuan3d_backend() / hunyuan3d-2\n",
" → trellis_backend() / TRELLIS\n",
" ← bytes GLB (Content-Type: model/gltf-binary)\n",
" C++ guarda en local_files/cache/{hash}.glb\n",
" Viewer GLB (tinygltf + OpenGL) en panel ImGui via FBO\n",
"```\n",
"\n",
"Aislamiento: cada modelo en su `.venv` separado dentro de `backend/envs/<model>/` para evitar conflictos de torch/CUDA. El dispatcher arranca el subproceso correcto bajo demanda."
]
},
{
"cell_type": "markdown",
"id": "e9949f01",
"metadata": {},
"source": [
"## Verificacion entorno\n",
"\n",
"Comprobar que torch + CUDA funcionan antes de cargar modelos pesados."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c506c304",
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"print(f'torch: {torch.__version__}')\n",
"print(f'cuda available: {torch.cuda.is_available()}')\n",
"if torch.cuda.is_available():\n",
" print(f'device: {torch.cuda.get_device_name(0)}')\n",
" print(f'vram total: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')\n",
" print(f'cuda runtime: {torch.version.cuda}')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "518eb741",
"metadata": {},
"outputs": [],
"source": [
"# Acceso al registry desde el kernel\n",
"fn_search('imagegen')[:5]\n"
]
},
{
"cell_type": "markdown",
"id": "bcbd7d5d",
"metadata": {},
"source": [
"## Pasos siguientes\n",
"\n",
"- **02_smoke_triposr.ipynb** — clonar TripoSR, generar 1 mesh, medir latencia/VRAM, dump GLB en vault.\n",
"- **03_smoke_hunyuan3d.ipynb** — repetir con Hunyuan3D-2.\n",
"- **04_smoke_trellis.ipynb** — repetir con Trellis (probar los 3 outputs).\n",
"- **05_benchmark.ipynb** — set fijo de 10 imagenes, cruzar metricas (latencia, VRAM peak, mesh stats: #vertices, manifoldness, area, watertight).\n",
"- Promover contrato `ImageTo3DConfig` a `python/types/ml/`.\n",
"- Refactor backend `image_to_3d_studio/backend/` con dispatcher final."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
File diff suppressed because one or more lines are too long