docs: rewrite model card with variants table, citation, AnyLabeling cross-link

071f580 verified 25 days ago

3.75 kB

	---
	license: apache-2.0
	pipeline_tag: image-segmentation
	library_name: onnx
	tags:
	- onnxruntime
	- onnx
	- segment-anything
	- segment-anything-2
	- image-segmentation
	- edge-ai
	- anylabeling
	authors:
	- Viet-Anh Nguyen
	---

	# Segment Anything 2 (SAM 2) — ONNX Models

	ONNX exports of Meta's [SAM 2](https://github.com/facebookresearch/sam2) image-segmentation backbones, packaged for direct use with [`onnxruntime`](https://onnxruntime.ai) and [AnyLabeling](https://github.com/vietanhdev/anylabeling).

	## Why this repo exists

	SAM 2 is materially better than SAM 1 on speed and quality, but the official release ships PyTorch checkpoints. ONNX gives you a portable, dependency-light runtime that works in Python, C++, JavaScript, and most embedded targets. These exports are the ones AnyLabeling consumes for its smart-labeling features.

	## Variants

	Each backbone is provided in two equivalent forms — pick whichever fits your loader:

	- Raw ONNX pair: `<backbone>.encoder.onnx` + `<backbone>.decoder.onnx`
	- Bundled zip: `<backbone>.zip` containing both files

	\| Backbone \| Encoder size \| Decoder size \| Bundle \|
	\|---\|---\|---\|---\|
	\| `sam2_hiera_tiny` \| 128 MB \| 19.7 MB \| `sam2_hiera_tiny.zip` (148 MB) \|
	\| `sam2_hiera_small` \| 155 MB \| 19.7 MB \| `sam2_hiera_small.zip` (175 MB) \|
	\| `sam2_hiera_base_plus` \| 324 MB \| 19.7 MB \| `sam2_hiera_base_plus.zip` (344 MB) \|
	\| `sam2_hiera_large` \| 848 MB \| 19.7 MB \| `sam2_hiera_large.zip` (868 MB) \|

	`zip_models.py` is the helper script used to produce the bundled zips from the encoder/decoder pairs.

	## Quick start

	```bash
	pip install huggingface_hub onnxruntime
	```

	```python
	from huggingface_hub import hf_hub_download
	import onnxruntime as ort

	repo = "vietanhdev/segment-anything-2-onnx-models"
	encoder = hf_hub_download(repo_id=repo, filename="sam2_hiera_tiny.encoder.onnx")
	decoder = hf_hub_download(repo_id=repo, filename="sam2_hiera_tiny.decoder.onnx")

	enc = ort.InferenceSession(encoder, providers=["CPUExecutionProvider"])
	dec = ort.InferenceSession(decoder, providers=["CPUExecutionProvider"])

	# Inspect expected inputs:
	print("Encoder:", [(i.name, i.shape, i.type) for i in enc.get_inputs()])
	print("Decoder:", [(i.name, i.shape, i.type) for i in dec.get_inputs()])
	```

	For the full image → mask pipeline (encoder + decoder + prompt handling), see how AnyLabeling wires it: <https://github.com/vietanhdev/anylabeling>

	## Use with AnyLabeling

	These models drop into AnyLabeling's auto-labeling backend without conversion. See the [AnyLabeling docs](https://github.com/vietanhdev/anylabeling) for the model-config wiring.

	## Source weights

	Original SAM 2 weights and license: <https://github.com/facebookresearch/sam2>

	This repo redistributes the same weights in ONNX format. License unchanged from upstream (Apache 2.0).

	## Citation

	```bibtex
	@misc{nguyen2026sam2_onnx,
	author = {Nguyen, Viet-Anh and {Neural Research Lab}},
	title = {SAM 2 ONNX Models},
	year = {2026},
	url = {https://huggingface.co/vietanhdev/segment-anything-2-onnx-models}
	}
	```

	For the underlying model, cite Meta's SAM 2 paper:

	```bibtex
	@article{ravi2024sam2,
	title = {SAM 2: Segment Anything in Images and Videos},
	author = {Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
	journal = {arXiv:2408.00714},
	year = {2024}
	}
	```

	## Acknowledgments

	Thanks to Meta AI Research for releasing SAM 2. This repo packages their work for edge inference.