SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching VLA Models

SA-VLA is a spatially-aware reinforcement learning approach for flow-matching Vision-Language-Action (VLA) models.
It is developed on top of the RLinf framework and targets robust embodied manipulation with stronger spatial generalization.

📄 Paper: https://arxiv.org/abs/2602.00743
🌐 Project Page: https://xupan.top/Projects/savla
🧩 Codebase: https://github.com/TwSphinx54/SA-VLA
🏗️ RL Framework: https://github.com/RLinf/RLinf

Model Summary

SA-VLA fuses visual tokens and spatial tokens into geometry-aware embeddings, then optimizes the policy via:

Step-level dense rewards
Spatially-conditioned exploration (SCAN)
RL fine-tuning on embodied benchmarks

This repository provides model weights used in SA-VLA experiments.

Intended Use

RL fine-tuning and evaluation for embodied manipulation tasks
Experiments on LIBERO / LIBERO-PLUS style benchmarks
Research on spatial reasoning in VLA post-training

For complete environment setup, training scripts, and benchmark integration, use the full code repository: https://github.com/TwSphinx54/SA-VLA

Quick Start (with SA-VLA codebase)

1) Clone project

git clone https://github.com/TwSphinx54/SA-VLA.git
cd SA-VLA

2) Setup environment

Follow the RLinf setup in:

README.RLinf.md (framework/environment)
scripts/setup_container.sh (extra container setup)

3) Place weights

Put downloaded checkpoints under:

weights/

4) Run training / evaluation

# RL training
bash examples/embodiment/run_embodiment.sh libero_spatial_ppo_openpi_pi05

# Evaluation
bash examples/embodiment/eval_embodiment.sh libero_spatial_ppo_openpi_pi05_eval

Recommended Weight Layout

weights
|-- Pi05-LIBERO
|-- Pi05-VGGT-LIBERO-FUSER-SFT_BF16
`-- RLinf-Pi05-SFT

Dataset Notes

The SA-VLA experiments rely on LIBERO-family data and benchmark configs.
For subset/full-set switching, modify benchmark mapping in your OpenPi LIBERO installation as documented in the main repo.

Limitations

Requires non-trivial robotics simulation setup
Performance depends on environment/version consistency
Not intended for safety-critical real-world deployment without additional validation

Citation

@misc{pan2026savlaspatiallyawareflowmatchingvisionlanguageaction,
  title={SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning},
  author={Xu Pan and Zhenglin Wan and Xingrui Yu and Xianwei Zheng and Youkai Ke and Ming Sun and Rui Wang and Ziwei Wang and Ivor Tsang},
  year={2026},
  eprint={2602.00743},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2602.00743}
}

License

Apache-2.0

Acknowledgments

Built upon:

RLinf: https://github.com/RLinf/RLinf
OpenPi: https://github.com/Physical-Intelligence/openpi
LIBERO: https://github.com/Lifelong-Robot-Learning/LIBERO
LIBERO-PLUS: https://github.com/sylvestf/LIBERO-plus
VGGT: https://github.com/facebookresearch/vggt

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Paper for SSSSphinx/SA-VLA

SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning

Paper • 2602.00743 • Published Jan 31 • 1