Robotics
Safetensors
File size: 2,555 Bytes
91a6750
 
55c1ca0
91a6750
55c1ca0
25eb952
72112ac
 
e0766d3
72112ac
 
25eb952
 
 
11350eb
25eb952
 
 
 
 
 
 
 
 
 
55c1ca0
 
 
 
 
 
 
25eb952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55c1ca0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
license: mit
pipeline_tag: robotics
---

<div align="center">

<p align="center">
  <img src="https://raw.githubusercontent.com/zjunlp/LabVLA/main/assets/logo/labvla-symbol.png" width="88" alt="LabVLA symbol" /><img src="https://raw.githubusercontent.com/zjunlp/LabVLA/main/assets/logo/labvla-wordmark.png" height="56" alt="LabVLA" />
</p>

<h3 align="center"> Grounding Vision–Language–Action Models in Scientific Laboratories </h3>

<p align="center">
  <a href="https://huggingface.co/papers/2606.13578">📰HF Paper</a><a href="https://zjunlp.github.io/LabVLA/">🔥Project Page</a><a href="https://github.com/zjunlp/LabVLA">💻GitHub Repo</a><a href="https://huggingface.co/zjunlp/LabVLA">🤗Model</a>
</p>
</div>

---

## Model Description

**LabVLA** is the first vision–language–action (VLA) model designed specifically for scientific laboratory environments, as introduced in [LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories](https://huggingface.co/papers/2606.13578). 

It combines a **Qwen3-VL-4B-Instruct** vision–language backbone with a **DiT flow-matching action expert**. The model is trained using a two-stage recipe:
1. **FAST action token pretraining**: Makes the backbone action-aware.
2. **Flow matching posttraining**: Attaches the DiT action expert under knowledge insulation to enable continuous control.

LabVLA addresses the gap in existing policies that are mostly trained on household data, enabling autonomous execution of scientific protocols involving laboratory instruments and transparent liquids.

## How to Use

### Download

```bash
huggingface-cli download zjunlp/LabVLA --local-dir LabVLA
```

### Deployment

Serve the model over the OpenPI msgpack WebSocket protocol:

```bash
git clone https://github.com/zjunlp/LabVLA.git
cd LabVLA
bash deployment/deploy.sh
```

For training, data preparation, and more details, please refer to the [GitHub repository](https://github.com/zjunlp/LabVLA).

## Citation

```bibtex
@article{ren2026labvla,
  title   = {LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories},
  author  = {Ren, Baochang and Liu, Xinjie and Chen, Xi and Liu, Yanshuo and
             Li, Chenxi and Gao, Daqi and Su, Zeqin and Xing, Jintao and
             Xue, Zirui and Li, Rui and Zhao, Xiangyu and Qiao, Shuofei and
             Pan, Minting and Zuo, Wangmeng and Bai, Lei and Zhou, Dongzhan and
             Zhang, Ningyu and Chen, Huajun},
  journal = {arXiv preprint arXiv:2606.13578},
  year    = {2026}
}
```