Instructions to use LeoChen085/SLIP-Llama with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LeoChen085/SLIP-Llama with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="LeoChen085/SLIP-Llama", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("LeoChen085/SLIP-Llama", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
SLIP-Llama: Sensor Language-Informed Pretraining (Llama-3.2-1B backbone)
Learning Transferable Sensor Models via Language-Informed Pretraining
Yuliang Chen, Arvind Pillai, Yu Yvonne Wu, Tess Z. Griffin, Lisa Marsch, Michael V. Heinz, Nicholas C. Jacobson, Andrew Campbell
Dartmouth College
[Code] [Gemma checkpoint] [Dataset] [SFT Dataset]
Backbone variants: Gemma-3-270M · Llama-3.2-1B (this repo)
Overview
This is the Llama-3.2-1B backbone variant of SLIP. It is the
pretrained base checkpoint (post_train: false), before any task-specific SFT.
SLIP is a multimodal pretraining framework that learns language-aligned sensor representations transferable
across diverse sensor setups. It pairs CLIP-style contrastive alignment with sensor-conditioned captioning,
giving both discriminative understanding and generative reasoning over multivariate time series from
heterogeneous sensors. This variant swaps the original Gemma-3-270M backbone for meta-llama/Llama-3.2-1B,
repurposed into a unimodal text encoder (first split_layer layers) and a multimodal decoder (remaining
layers extended with cross-attention to the sensor stream).
Architecture
| Component | Setting |
|---|---|
| LLM backbone | meta-llama/Llama-3.2-1B |
| Hidden size | 2048 |
| Vocab size | 128256 |
| Split layer (text-encoder / multimodal-decoder boundary) | 12 |
| Cross-attention heads (decoder) | 32 |
| Sensor pooler queries | 64 (num_img_queries) |
| Sensor pooler heads | 8 (img_attn_pool_num_heads) |
| Sensor encoder | Transformer, embed_dim=768, depth=12, heads=12, FlexMLP patch embedding + 2D RoPE |
| Total parameters | ~1.74B |
| Dtype | mixed — Llama backbone bfloat16, sensor encoder / pooler float32 |
Files
| File | Description |
|---|---|
model.safetensors |
Pretrained SLIP-Llama base weights (LoRA merged into the backbone) |
config.json |
SlipConfig for the Llama backbone |
configuration_slip.py, modeling_slip.py |
Custom model code (trust_remote_code) |
multimodal_gemma.py, multimodal_llama.py, ts_transformer.py |
Backbone wrappers + sensor encoder |
tokenizer.json, tokenizer_config.json, special_tokens_map.json |
Llama-3.2 tokenizer |
Datasets
Pretrained and SFT'd on the same data as the Gemma checkpoint:
- LeoChen085/SlipDataset — 600K+ sensor-caption pretraining pairs
- LeoChen085/SlipSFTDataset — task-specific SFT (HAR / sleep / ECG / TSQA / captioning)
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"LeoChen085/SLIP-Llama", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("LeoChen085/SLIP-Llama")
model.eval()
The sensor-conditioning API (flexi-patch sensor inputs, get_embedding, get_sensor_embedding,
sensor-conditioned generate) is identical to the Gemma checkpoint — see the usage examples and
sensor input format documented at LeoChen085/SLIP and the
GitHub repo. The only difference is the backbone; embedding/projection
dimensions follow the Llama hidden size (2048) rather than Gemma's 640.
Citation
@article{chen2026slip,
title={Learning Transferable Sensor Models via Language-Informed Pretraining},
author={Chen, Yuliang and Pillai, Arvind and Wu, Yu Yvonne and Griffin, Tess Z. and Marsch, Lisa and Heinz, Michael V. and Jacobson, Nicholas C. and Campbell, Andrew},
journal={Preprint},
year={2026}
}
License
The SLIP model code is released under the MIT License. This checkpoint embeds (LoRA-merged) weights derived from Llama-3.2-1B and is therefore additionally governed by the Llama 3.2 Community License. Built with Llama.
- Downloads last month
- 18
Model tree for LeoChen085/SLIP-Llama
Base model
meta-llama/Llama-3.2-1B