chore: sync from fn-registry agent
This commit is contained in:
@@ -0,0 +1,107 @@
|
||||
# Multi-View / Multi-Image Image-to-3D — Research (mayo 2026)
|
||||
|
||||
Investigacion para extender `image_to_3d_studio` (hoy TripoSR single-image) a
|
||||
condicionar con **varias fotos reales** del objeto (frente/espalda/lados).
|
||||
|
||||
## Distincion clave
|
||||
|
||||
Dos cosas se llaman "multi-view 3D" — solo UNA acepta tus fotos reales:
|
||||
|
||||
| Clase | Input | Que hace | Ejemplos |
|
||||
|---|---|---|---|
|
||||
| **A. Multi-view *conditioning*** | Tus N fotos reales | Reconstruye desde las fotos que das | **Hunyuan3D-2mv**, **TRELLIS multi-image**, **EscherNet**, GS-LRM/MeshLRM (posed) |
|
||||
| **B. Single-image con NVS interno** | 1 foto | Alucina los lados que faltan, luego reconstruye. NO puedes meter fotos reales de atras/lados | InstantMesh, Unique3D, CRM, Wonder3D, Era3D, SV3D, Zero123++, LGM, GRM |
|
||||
|
||||
**Hallazgo principal**: la mayoria de repos famosos "multi-view" (InstantMesh, CRM, Unique3D, LGM, GRM) son **Clase B** — el "multi-view" es andamiaje interno, no input del usuario. Para "pasar detalles de varios lados" necesitas **Clase A**.
|
||||
|
||||
## Tabla comparativa
|
||||
|
||||
| Modelo | Autor/Año | Clase | Fotos reales? | #vistas | Poses | Output | Licencia | VRAM | Repo |
|
||||
|---|---|---|---|---|---|---|---|---|---|
|
||||
| **Hunyuan3D-2mv** | Tencent 2025 | **A** ✅ | Si | 3 (front/left/back, config) | Fijas canonicas | Mesh | tencent-hunyuan-community (restrictiva) | ~6 GB shape / ~12 GB +tex | [GH](https://github.com/Tencent-Hunyuan/Hunyuan3D-2) · [HF](https://huggingface.co/tencent/Hunyuan3D-2mv) |
|
||||
| **TRELLIS multi-image** | Microsoft 2024/25 | **A** ✅ | Si | N arbitrario | Libres/unposed | 3DGS+mesh+RF | MIT (codigo) | ~16-24 GB | [GH](https://github.com/microsoft/TRELLIS) |
|
||||
| **EscherNet** | CVPR'24 oral | **A** ✅ | Si | Any (1..10) | 6-DoF relativas | NVS → NeuS mesh | ver repo | alta (A100) | [GH](https://github.com/kxhit/EscherNet) |
|
||||
| **GS-LRM** | Adobe 2024 | A | Si (posed) | 2-4 | **Poses conocidas** | 3DGS | research, **sin weights** | A100 | [paper](https://arxiv.org/abs/2404.19702) |
|
||||
| **MeshLRM/Long-LRM** | Adobe 2024 | A | Si (posed) | 4 / 32 | Conocidas | Mesh/3DGS | research, weights escasos | A100 | [Long-LRM](https://arxiv.org/pdf/2410.12781) |
|
||||
| **MV-Adapter** | ICCV'25 | helper | n/a | — | — | Solo multi-view imgs (no mesh) | Apache-2.0 | ~14 GB | [GH](https://github.com/huanngzh/MV-Adapter) |
|
||||
| InstantMesh | TencentARC 2024 | B | No | 6 interno | — | Mesh+NeRF | Apache-2.0 | ~12-16 GB | [GH](https://github.com/TencentARC/InstantMesh) |
|
||||
| Unique3D | NeurIPS'24 | B | No | 4 interno | — | Mesh texturizado | MIT | ~10-12 GB (problemas en 8GB) | [GH](https://github.com/AiuniAI/Unique3D) |
|
||||
| CRM | ECCV'24 | B | No | 6 interno | — | Mesh texturizado | MIT | ~9-16 GB | [GH](https://github.com/thu-ml/CRM) |
|
||||
| Wonder3D/++ | — | B | No | 6 interno | — | Mesh via NeuS | MIT | ~12+ GB | [GH](https://github.com/xxlong0/Wonder3D) |
|
||||
| SV3D | Stability 2024 | B (NVS) | No | 21 orbit | opcional | Video orbit → 3D | non-commercial | alta | [HF](https://huggingface.co/stabilityai/sv3d) |
|
||||
| Zero123++ | SUDO-AI 2023/24 | B (NVS) | No | 6 fijas | fijas | Multi-view imgs | ver repo | ~12 GB | [GH](https://github.com/SUDO-AI-3D/zero123plus) |
|
||||
| LGM | ECCV'24 | B | No | 4 interno | — | 3DGS → mesh | ver repo | ~10-16 GB | [project](https://me.kiui.moe/lgm/) |
|
||||
| GRM | ECCV'24 | B | (posed) | 4 | conocidas | 3DGS, 0.1s | ver repo | A100 | [paper](https://arxiv.org/abs/2403.14621) |
|
||||
|
||||
Baseline: **TripoSR** MIT, 1 img, ~6-8 GB, <0.5s (A100) / 5-10s (3060-class).
|
||||
|
||||
## Recomendacion para image_to_3d_studio
|
||||
|
||||
**Integracion primaria: Hunyuan3D-2mv.**
|
||||
|
||||
1. **Hecho para input multi-lado** — el dict `{"front":..,"left":..,"back":..}` ES la UX. Sin NeuS, sin pose estimation, sin NVS interno.
|
||||
2. **VRAM cabe** — shape ~6 GB (vs TRELLIS 16-24 GB que haria OOM en 3070 8GB/WSL2). Variante `Hunyuan3D-2mini` ~5 GB.
|
||||
3. **Mesh directo** (trimesh), mismo shape que TripoSR → trivial en el dispatcher.
|
||||
4. **Maduro**, ComfyUI + diffusers, weights HF, ejemplos claros.
|
||||
5. **Calidad** SOTA multi-view open mayo 2026; mejora geometria atras/lados vs single-image.
|
||||
|
||||
Watch-out: licencia `tencent-hunyuan-community` NO es OSI-permisiva (restricciones uso/escala). OK personal; revisar si comercial.
|
||||
|
||||
**Secundaria "max flexibilidad": TRELLIS multi-image** detras de flag "high VRAM". MIT, N vistas unposed, 3DGS+mesh. Solo practico con CPU offload o GPU mayor.
|
||||
|
||||
**Evitar como backend multi-view**: InstantMesh, Unique3D, CRM, LGM, GRM, SV3D, Zero123++ (son Clase B, no consumen fotos reales de lados). Unique3D/CRM siguen siendo buenos *single-image* alternativos a TripoSR (mejor textura, MIT).
|
||||
|
||||
### Forma del dispatcher
|
||||
|
||||
```
|
||||
backends = {
|
||||
"triposr": single_image -> mesh # existente
|
||||
"hunyuan3d_mv": multi_view -> mesh # NUEVO default multi-lado
|
||||
"trellis_mv": multi_view(N) -> mesh+3DGS # opcional, flag high-VRAM
|
||||
}
|
||||
```
|
||||
|
||||
## Pipeline UI propuesto
|
||||
|
||||
```
|
||||
Mode toggle: [ Single image ] [ Multi-side ]
|
||||
|
||||
Multi-side: drop zones etiquetadas
|
||||
[ FRONT* ] [ BACK ] [ LEFT ] [ RIGHT ] (+TOP)
|
||||
* front obligatorio; resto opcional
|
||||
[ Backend: Hunyuan3D-2mv v ] [ Generate ]
|
||||
| POST {front, back?, left?, right?, top?}
|
||||
v
|
||||
Backend dispatcher:
|
||||
1. Validar: >=1 imagen, front presente
|
||||
2. Preprocess cada vista: bg removal + center + square pad (reusar paso TripoSR)
|
||||
3. Route: 1 img -> triposr ; >=2 labeled -> hunyuan3d_mv(image={front,left,back})
|
||||
4. (opt) texture pass Hunyuan3D-Paint
|
||||
5. Export .glb -> path + thumbnail
|
||||
v
|
||||
Viewer 3D existente + download
|
||||
```
|
||||
|
||||
Notas:
|
||||
- **Reusar bg-removal/centering de TripoSR** por vista — Hunyuan3D-2mv espera inputs limpios, centrados, orientacion canonica.
|
||||
- **Mapear labels UI -> claves canonicas** (`front`/`left`/`back`). Poses fijas → el etiquetado ES la pose, sin estimar camara. Por eso gana a EscherNet/TRELLIS (que necesitan poses o mas compute).
|
||||
- Fallback front-only → TripoSR.
|
||||
- Gate TRELLIS por `torch.cuda.mem_get_info`.
|
||||
|
||||
## Fuentes
|
||||
|
||||
Hunyuan3D-2/2mv: github.com/Tencent-Hunyuan/Hunyuan3D-2 · huggingface.co/tencent/Hunyuan3D-2mv · arxiv 2501.12202
|
||||
TRELLIS: github.com/microsoft/TRELLIS
|
||||
EscherNet: github.com/kxhit/EscherNet · kxhit.github.io/EscherNet
|
||||
InstantMesh: github.com/TencentARC/InstantMesh · arxiv 2404.07191
|
||||
Unique3D: github.com/AiuniAI/Unique3D
|
||||
CRM: github.com/thu-ml/CRM · arxiv 2403.05034
|
||||
Wonder3D: github.com/xxlong0/Wonder3D
|
||||
SV3D: huggingface.co/stabilityai/sv3d
|
||||
Zero123++: github.com/SUDO-AI-3D/zero123plus
|
||||
LGM: me.kiui.moe/lgm · arxiv 2402.05054
|
||||
GRM: arxiv 2403.14621
|
||||
GS-LRM: arxiv 2404.19702
|
||||
Long-LRM: arxiv 2410.12781
|
||||
MV-Adapter: github.com/huanngzh/MV-Adapter · arxiv 2412.03632
|
||||
TripoSR: github.com/VAST-AI-Research/TripoSR · arxiv 2403.02151
|
||||
Reference in New Issue
Block a user