EnsembleVLA โ Released Checkpoints
Released checkpoints for EnsembleVLA: Ensemble Learning for Vision-Language Action Models.
- ๐ป Code: https://github.com/MingC715/EnsembleVLA
- ๐ Paper: ICML 2026 (coming soon)
EnsembleVLA is an energy-based framework for principled composition of diverse Vision-Language-Action (VLA) policies. It formulates diffusion-based and flow-based VLA models under a unified energy perspective, where additive energy aggregation induces policy composition at the distribution level. Multiple pretrained policies stay frozen while a lightweight ensemble head with learnable composition weights and confidence-aware gating aggregates them into a stronger policy, evaluated on the RoboTwin2 rollout interface.
What's in this repository
Two released composition families, each over 8 RoboTwin2 tasks. For every task we release the lightweight ensemble head plus the two frozen base policies:
| Family | Base policy 1 | Base policy 2 |
|---|---|---|
dp+dp3 |
Diffusion Policy (DP) | 3D Diffusion Policy (DP3) |
dp+pi0.5 |
Diffusion Policy (DP) | pi0.5 / openpi |
Tasks: beat_block_hammer, click_alarmclock, dump_bin_bigbin,
handover_block, move_playingcard_away, open_laptop, place_bread_skillet,
stack_bowls_three.
Repository layout
Files live at the repository root and mirror the code's best_checkpoint/ layout:
dp+dp3/<task>/
ensemble_checkpoint/best.pt # lightweight EnsembleVLA head
base_dp/<ckpt>.ckpt # frozen DP base policy
base_dp3/<ckpt>.ckpt # frozen DP3 base policy
dp+pi0.5/<task>/
ensemble_checkpoint/best.pt # lightweight EnsembleVLA head
base_dp/<ckpt>.ckpt # frozen DP base policy
base_pi05_checkpoint_dir/
model.safetensors # frozen pi0.5 base policy weights
metadata.pt
assets/<task>/norm_stats.json
The pi0.5 base needs all three of
model.safetensors,metadata.pt, andassets/from the samebase_pi05_checkpoint_dir/.
Download
Download everything straight into the code repo's best_checkpoint/ directory:
pip install -U huggingface_hub
huggingface-cli download mingchens/EnsembleVLA --repo-type model --local-dir best_checkpoint
Then follow the Environment Setup and Evaluation instructions in the
GitHub README. Only inference
checkpoints are required for evaluation; optimizer states and training/rollout
logs are not included. The full checkpoint manifest (per-task base checkpoints
and results) is in
docs/checkpoints.md.
License
Released under the MIT License.
Citation
@inproceedings{song2026ensemblevla,
title={EnsembleVLA: Ensemble Learning for Vision-Language Action Models},
author={Song, Mingchen and Deng, Xiang and Wei, Jie and Jiang, Dongmei and Nie, Liqiang and Guan, Weili},
booktitle={International Conference on Machine Learning},
year={2026}
}