Multi-View / Multi-Image Image-to-3D — Research (mayo 2026)

Investigacion para extender image_to_3d_studio (hoy TripoSR single-image) a condicionar con varias fotos reales del objeto (frente/espalda/lados).

Distincion clave

Dos cosas se llaman "multi-view 3D" — solo UNA acepta tus fotos reales:

Clase	Input	Que hace	Ejemplos
A. Multi-view conditioning	Tus N fotos reales	Reconstruye desde las fotos que das	Hunyuan3D-2mv, TRELLIS multi-image, EscherNet, GS-LRM/MeshLRM (posed)
B. Single-image con NVS interno	1 foto	Alucina los lados que faltan, luego reconstruye. NO puedes meter fotos reales de atras/lados	InstantMesh, Unique3D, CRM, Wonder3D, Era3D, SV3D, Zero123++, LGM, GRM

Hallazgo principal: la mayoria de repos famosos "multi-view" (InstantMesh, CRM, Unique3D, LGM, GRM) son Clase B — el "multi-view" es andamiaje interno, no input del usuario. Para "pasar detalles de varios lados" necesitas Clase A.

Tabla comparativa

Modelo	Autor/Año	Clase	Fotos reales?	#vistas	Poses	Output	Licencia	VRAM	Repo
Hunyuan3D-2mv	Tencent 2025	A ✅	Si	3 (front/left/back, config)	Fijas canonicas	Mesh	tencent-hunyuan-community (restrictiva)	~6 GB shape / ~12 GB +tex	GH · HF
TRELLIS multi-image	Microsoft 2024/25	A ✅	Si	N arbitrario	Libres/unposed	3DGS+mesh+RF	MIT (codigo)	~16-24 GB	GH
EscherNet	CVPR'24 oral	A ✅	Si	Any (1..10)	6-DoF relativas	NVS → NeuS mesh	ver repo	alta (A100)	GH
GS-LRM	Adobe 2024	A	Si (posed)	2-4	Poses conocidas	3DGS	research, sin weights	A100	paper
MeshLRM/Long-LRM	Adobe 2024	A	Si (posed)	4 / 32	Conocidas	Mesh/3DGS	research, weights escasos	A100	Long-LRM
MV-Adapter	ICCV'25	helper	n/a	—	—	Solo multi-view imgs (no mesh)	Apache-2.0	~14 GB	GH
InstantMesh	TencentARC 2024	B	No	6 interno	—	Mesh+NeRF	Apache-2.0	~12-16 GB	GH
Unique3D	NeurIPS'24	B	No	4 interno	—	Mesh texturizado	MIT	~10-12 GB (problemas en 8GB)	GH
CRM	ECCV'24	B	No	6 interno	—	Mesh texturizado	MIT	~9-16 GB	GH
Wonder3D/++	—	B	No	6 interno	—	Mesh via NeuS	MIT	~12+ GB	GH
SV3D	Stability 2024	B (NVS)	No	21 orbit	opcional	Video orbit → 3D	non-commercial	alta	HF
Zero123++	SUDO-AI 2023/24	B (NVS)	No	6 fijas	fijas	Multi-view imgs	ver repo	~12 GB	GH
LGM	ECCV'24	B	No	4 interno	—	3DGS → mesh	ver repo	~10-16 GB	project
GRM	ECCV'24	B	(posed)	4	conocidas	3DGS, 0.1s	ver repo	A100	paper

Baseline: TripoSR MIT, 1 img, ~6-8 GB, <0.5s (A100) / 5-10s (3060-class).

Recomendacion para image_to_3d_studio

Integracion primaria: Hunyuan3D-2mv.

Hecho para input multi-lado — el dict {"front":..,"left":..,"back":..} ES la UX. Sin NeuS, sin pose estimation, sin NVS interno.
VRAM cabe — shape ~6 GB (vs TRELLIS 16-24 GB que haria OOM en 3070 8GB/WSL2). Variante Hunyuan3D-2mini ~5 GB.
Mesh directo (trimesh), mismo shape que TripoSR → trivial en el dispatcher.
Maduro, ComfyUI + diffusers, weights HF, ejemplos claros.
Calidad SOTA multi-view open mayo 2026; mejora geometria atras/lados vs single-image.

Watch-out: licencia tencent-hunyuan-community NO es OSI-permisiva (restricciones uso/escala). OK personal; revisar si comercial.

Secundaria "max flexibilidad": TRELLIS multi-image detras de flag "high VRAM". MIT, N vistas unposed, 3DGS+mesh. Solo practico con CPU offload o GPU mayor.

Evitar como backend multi-view: InstantMesh, Unique3D, CRM, LGM, GRM, SV3D, Zero123++ (son Clase B, no consumen fotos reales de lados). Unique3D/CRM siguen siendo buenos single-image alternativos a TripoSR (mejor textura, MIT).

Forma del dispatcher

backends = {
  "triposr":      single_image  -> mesh   # existente
  "hunyuan3d_mv": multi_view     -> mesh   # NUEVO default multi-lado
  "trellis_mv":   multi_view(N)  -> mesh+3DGS  # opcional, flag high-VRAM
}

Pipeline UI propuesto

Mode toggle: [ Single image ] [ Multi-side ]

Multi-side: drop zones etiquetadas
  [ FRONT* ] [ BACK ] [ LEFT ] [ RIGHT ] (+TOP)
  * front obligatorio; resto opcional
  [ Backend: Hunyuan3D-2mv v ] [ Generate ]
       |  POST {front, back?, left?, right?, top?}
       v
Backend dispatcher:
  1. Validar: >=1 imagen, front presente
  2. Preprocess cada vista: bg removal + center + square pad (reusar paso TripoSR)
  3. Route: 1 img -> triposr ; >=2 labeled -> hunyuan3d_mv(image={front,left,back})
  4. (opt) texture pass Hunyuan3D-Paint
  5. Export .glb -> path + thumbnail
       v
  Viewer 3D existente + download

Notas:

Reusar bg-removal/centering de TripoSR por vista — Hunyuan3D-2mv espera inputs limpios, centrados, orientacion canonica.
Mapear labels UI -> claves canonicas (front/left/back). Poses fijas → el etiquetado ES la pose, sin estimar camara. Por eso gana a EscherNet/TRELLIS (que necesitan poses o mas compute).
Fallback front-only → TripoSR.
Gate TRELLIS por torch.cuda.mem_get_info.

Fuentes

Hunyuan3D-2/2mv: github.com/Tencent-Hunyuan/Hunyuan3D-2 · huggingface.co/tencent/Hunyuan3D-2mv · arxiv 2501.12202 TRELLIS: github.com/microsoft/TRELLIS EscherNet: github.com/kxhit/EscherNet · kxhit.github.io/EscherNet InstantMesh: github.com/TencentARC/InstantMesh · arxiv 2404.07191 Unique3D: github.com/AiuniAI/Unique3D CRM: github.com/thu-ml/CRM · arxiv 2403.05034 Wonder3D: github.com/xxlong0/Wonder3D SV3D: huggingface.co/stabilityai/sv3d Zero123++: github.com/SUDO-AI-3D/zero123plus LGM: me.kiui.moe/lgm · arxiv 2402.05054 GRM: arxiv 2403.14621 GS-LRM: arxiv 2404.19702 Long-LRM: arxiv 2410.12781 MV-Adapter: github.com/huanngzh/MV-Adapter · arxiv 2412.03632 TripoSR: github.com/VAST-AI-Research/TripoSR · arxiv 2403.02151

6.5 KiB Raw Permalink Blame History