Light-Omni

Light-Omni is a multimodal agent framework for reflexive video understanding with long-term memory. It replaces costly detective-style iterative reasoning with dual contextual states: a compact global state consolidated from episodic memory, and a latent state that drives action control and semantically aligned retrieval.

This repository hosts the Light-Omni model checkpoint for inference. It contains the safetensors weight shards, tokenizer files, model configuration, and multimodal preprocessor configuration files.

Links

Citation

@inproceedings{nie2026lightomni,
  title={Light-Omni: Reflex over Reasoning in Agentic Video Understanding with Long-Term Memory},
  author={Nie, Chang and Wei, Jiaju and Feng, Junlan and Fu, Chaoyou and Shan,
  Caifeng},
  year={2026},
  url={http://arxiv.org/abs/xxxx.xxxx}
}
Downloads last month
195
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support