HY-World 2.0 is Tencent Hunyuan’s open multi-modal world model stack: it ingests text, single-view images, multi-view images, and video, and targets persistent 3D outputs—meshes, 3D Gaussian Splattings (3DGS), and point clouds—not just another mp4. The team positions it as “building a playable world” versus “watching a movie that ends.”
This post summarizes the public GitHub README and docs as of early May 2026; weights, APIs, and benchmarks should be re-checked on the repo and DOCUMENTATION.md before you freeze a reproduction.
Product try (vendor): 3d-models.hunyuan.tencent.com/world — the README notes demand can be high.
TL;DR
| Topic | Takeaway |
|---|---|
| Core pitch | 3D assets (3DGS / mesh / points) with engine import, vs non-editable video world models |
| Reconstruction (shipping) | WorldMirror 2.0 — multi-view / video → 3D, ~1.2B params, HF weights, Python API + CLI + Gradio |
| Generation (roadmap) | Four-stage pipeline: HY-Pano 2.0 (panorama) → WorldNav (trajectory) → WorldStereo 2.0 (expansion) → WorldMirror 2.0 + 3DGS learning |
| Open today | Technical report, WorldMirror 2.0 code & checkpoints per README April 16, 2026 news block |
| Not open yet | Full world generation inference, HY-Pano 2.0, WorldStereo 2.0, WorldNav (all listed coming soon) |
Two capabilities: generation vs reconstruction
World generation (per README): turn text or a single image into a navigable scene via the staged pipeline above—panorama, planning, stereo expansion, then composition with WorldMirror 2.0 and 3DGS training.
World reconstruction: WorldMirror 2.0 is the feed-forward workhorse—one forward pass estimates depth, surface normals, camera parameters, point clouds, and 3DGS-style attributes from multi-view stills or casual video, with flexible resolution (README cites 50K–500K pixels).
Architecture (high level)
The README diagrams a systematic pipeline for generation: HY-Pano 2.0 → WorldNav → WorldStereo 2.0 → WorldMirror 2.0 + splatting—turning language or a single rgb input into a composed 3D world. Technical details live in their report (linked from the repo); this article does not reproduce proprietary figures.
Open-source plan (checklist from README)
| Item | Status in README |
|---|---|
| Technical report | Released |
| WorldMirror 2.0 code & checkpoints | Released |
| Full world generation inference (WorldNav + composition) | Planned |
| HY-Pano 2.0 weights & code | Planned (HunyuanWorld 1.0 noted as interim) |
| WorldStereo 2.0 weights & code | Planned (WorldStereo as interim) |
| WorldNav | Planned |
Treat checkboxes as intent; license, export rules, and GPU support still gate real adoption.
Getting started with WorldMirror 2.0
The README’s minimal Python shape:
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
pipeline = WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0')
result = pipeline('path/to/images')
Optional priors (camera / depth) are passed as paths; the repo points to a prior preparation guide in DOCUMENTATION.md.
CLI (single GPU):
python -m hyworld2.worldrecon.pipeline --input_path path/to/images
Multi-GPU uses torchrun with --use_fsdp --enable_bf16. Important operational constraint: input image count ≥ GPU count (e.g. 8 images for 8 processes).
Gradio:
python -m hyworld2.worldrecon.gradio_app
Environment: conda Python 3.10, CUDA 12.4, torch 2.4.0 + cu124 wheels, pip install -r requirements.txt, and FlashAttention (v3 build or pip install flash-attn path).
Benchmarks (as reported—verify in the report)
The README includes tables for:
- WorldStereo 2.0 — camera metrics and single-view-generated reconstruction on Tanks-and-Temples / MipNeRF360 vs baselines such as SEVA, Gen3C, Lyra, FlashWorld.
- WorldMirror 2.0 — point map accuracy / completeness on 7-Scenes, NRGBD, DTU at low / medium / high inference resolutions, with and without prior injection; comparisons include Pow3R and MapAnything under varying prior conditions.
Rule of thumb: read the technical report for protocol detail—leaderboard numbers without split / preprocessing context mislead buyers and paper reviewers alike.
Why teams care (strategic, not hype)
Game / sim / robotics: Persistent 3D fits Unreal / Unity / Isaac pipelines better than frame dumps. One-time reconstruction cost plus cheap real-time rendering matches interactive RL and digital-twin workflows—if export and license terms align.
Caution: World generation end-to-end is not fully open yet; most hackers will live in WorldMirror reconstruction until WorldNav / HY-Pano 2.0 / WorldStereo 2.0 ship.
Related on ExplainX
- WebGPU complete guide (2026) — browser-side 3D/GPU context
- How diffusion image generation works — complementary generative-media primer
- AI tools directory — discover utilities by task
- Agent skills registry — repo-native agent playbooks
Primary sources
- Repository: github.com/Tencent-Hunyuan/HY-World-2.0
- Documentation: DOCUMENTATION.md (English) · DOCUMENTATION_zh.md (中文)
- Model hub: README cites
WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0')— confirm the exact Hugging Face card from the repo’s Model Zoo table - Product page: 3d-models.hunyuan.tencent.com/world
Citation (from README)
@article{hyworld22026,
title={HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds},
author={Team HY-World},
journal={arXiv preprint},
year={2026}
}
HY-World 2.0 is a fast-moving research release. Treat this ExplainX article as May 6, 2026 orientation text—validate LICENSE, weights, and CLI flags on the official repository before production use.