← Blog
explainx / blog

Tencent Hunyuan HY-World 2.0: 3D world models, WorldMirror 2.0, and open-source plan

HY-World 2.0 from Tencent Hunyuan: multi-modal 3D worlds (3DGS/meshes) vs pixel-only video world models, WorldMirror 2.0 reconstruction, pipeline roadmap—GitHub, Hugging Face, install notes.

4 min readYash Thakker
TencentHunyuanWorld models3DGaussian SplattingWorldMirror

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Tencent Hunyuan HY-World 2.0: 3D world models, WorldMirror 2.0, and open-source plan

HY-World 2.0 is Tencent Hunyuan’s open multi-modal world model stack: it ingests text, single-view images, multi-view images, and video, and targets persistent 3D outputs—meshes, 3D Gaussian Splattings (3DGS), and point clouds—not just another mp4. The team positions it as “building a playable world” versus “watching a movie that ends.”

This post summarizes the public GitHub README and docs as of early May 2026; weights, APIs, and benchmarks should be re-checked on the repo and DOCUMENTATION.md before you freeze a reproduction.

Product try (vendor): 3d-models.hunyuan.tencent.com/world — the README notes demand can be high.


TL;DR

TopicTakeaway
Core pitch3D assets (3DGS / mesh / points) with engine import, vs non-editable video world models
Reconstruction (shipping)WorldMirror 2.0multi-view / video → 3D, ~1.2B params, HF weights, Python API + CLI + Gradio
Generation (roadmap)Four-stage pipeline: HY-Pano 2.0 (panorama) → WorldNav (trajectory) → WorldStereo 2.0 (expansion) → WorldMirror 2.0 + 3DGS learning
Open todayTechnical report, WorldMirror 2.0 code & checkpoints per README April 16, 2026 news block
Not open yetFull world generation inference, HY-Pano 2.0, WorldStereo 2.0, WorldNav (all listed coming soon)

Two capabilities: generation vs reconstruction

World generation (per README): turn text or a single image into a navigable scene via the staged pipeline above—panorama, planning, stereo expansion, then composition with WorldMirror 2.0 and 3DGS training.

World reconstruction: WorldMirror 2.0 is the feed-forward workhorse—one forward pass estimates depth, surface normals, camera parameters, point clouds, and 3DGS-style attributes from multi-view stills or casual video, with flexible resolution (README cites 50K–500K pixels).


Architecture (high level)

The README diagrams a systematic pipeline for generation: HY-Pano 2.0WorldNavWorldStereo 2.0WorldMirror 2.0 + splatting—turning language or a single rgb input into a composed 3D world. Technical details live in their report (linked from the repo); this article does not reproduce proprietary figures.


Open-source plan (checklist from README)

ItemStatus in README
Technical reportReleased
WorldMirror 2.0 code & checkpointsReleased
Full world generation inference (WorldNav + composition)Planned
HY-Pano 2.0 weights & codePlanned (HunyuanWorld 1.0 noted as interim)
WorldStereo 2.0 weights & codePlanned (WorldStereo as interim)
WorldNavPlanned

Treat checkboxes as intent; license, export rules, and GPU support still gate real adoption.


Getting started with WorldMirror 2.0

The README’s minimal Python shape:

from hyworld2.worldrecon.pipeline import WorldMirrorPipeline

pipeline = WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0')
result = pipeline('path/to/images')

Optional priors (camera / depth) are passed as paths; the repo points to a prior preparation guide in DOCUMENTATION.md.

CLI (single GPU):

python -m hyworld2.worldrecon.pipeline --input_path path/to/images

Multi-GPU uses torchrun with --use_fsdp --enable_bf16. Important operational constraint: input image count ≥ GPU count (e.g. 8 images for 8 processes).

Gradio:

python -m hyworld2.worldrecon.gradio_app

Environment: conda Python 3.10, CUDA 12.4, torch 2.4.0 + cu124 wheels, pip install -r requirements.txt, and FlashAttention (v3 build or pip install flash-attn path).


Benchmarks (as reported—verify in the report)

The README includes tables for:

  • WorldStereo 2.0camera metrics and single-view-generated reconstruction on Tanks-and-Temples / MipNeRF360 vs baselines such as SEVA, Gen3C, Lyra, FlashWorld.
  • WorldMirror 2.0point map accuracy / completeness on 7-Scenes, NRGBD, DTU at low / medium / high inference resolutions, with and without prior injection; comparisons include Pow3R and MapAnything under varying prior conditions.

Rule of thumb: read the technical report for protocol detailleaderboard numbers without split / preprocessing context mislead buyers and paper reviewers alike.


Why teams care (strategic, not hype)

Game / sim / robotics: Persistent 3D fits Unreal / Unity / Isaac pipelines better than frame dumps. One-time reconstruction cost plus cheap real-time rendering matches interactive RL and digital-twin workflows—if export and license terms align.

Caution: World generation end-to-end is not fully open yet; most hackers will live in WorldMirror reconstruction until WorldNav / HY-Pano 2.0 / WorldStereo 2.0 ship.


Related on ExplainX


Primary sources

Citation (from README)

@article{hyworld22026,
  title={HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds},
  author={Team HY-World},
  journal={arXiv preprint},
  year={2026}
}

HY-World 2.0 is a fast-moving research release. Treat this ExplainX article as May 6, 2026 orientation text—validate LICENSE, weights, and CLI flags on the official repository before production use.

Related posts