Abstract
ArtLLM generates articulated 3D assets from meshes using a 3D multimodal large language model that predicts part layouts and joints while synthesizing high-fidelity geometries.
Creating interactive digital environments for gaming, robotics, and simulation relies on articulated 3D objects whose functionality emerges from their part geometry and kinematic structure. However, existing approaches remain fundamentally limited: optimization-based reconstruction methods require slow, per-object joint fitting and typically handle only simple, single-joint objects, while retrieval-based methods assemble parts from a fixed library, leading to repetitive geometry and poor generalization. To address these challenges, we introduce ArtLLM, a novel framework for generating high-quality articulated assets directly from complete 3D meshes. At its core is a 3D multimodal large language model trained on a large-scale articulation dataset curated from both existing articulation datasets and procedurally generated objects. Unlike prior work, ArtLLM autoregressively predicts a variable number of parts and joints, inferring their kinematic structure in a unified manner from the object's point cloud. This articulation-aware layout then conditions a 3D generative model to synthesize high-fidelity part geometries. Experiments on the PartNet-Mobility dataset show that ArtLLM significantly outperforms state-of-the-art methods in both part layout accuracy and joint prediction, while generalizing robustly to real-world objects. Finally, we demonstrate its utility in constructing digital twins, highlighting its potential for scalable robot learning.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PAct: Part-Decomposed Single-View Articulated Object Generation (2026)
- ArtPro: Self-Supervised Articulated Object Reconstruction with Adaptive Integration of Mobility Proposals (2026)
- AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation (2026)
- PokeNet: Learning Kinematic Models of Articulated Objects from Human Observations (2026)
- SceneFoundry: Generating Interactive Infinite 3D Worlds (2026)
- HY3D-Bench: Generation of 3D Assets (2026)
- ShapeR: Robust Conditional 3D Shape Generation from Casual Captures (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
