RoadQAQ
/

ReLIFT-Qwen2.5-7B-Zero

Question Answering

text-generation

text-generation-inference

Model card Files Files and versions

ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone, as described in Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions.

Code: https://github.com/TheRoadQaQ/ReLIFT

Project page: https://github.com/TheRoadQaQ/ReLIFT

Downloads last month: 8

Safetensors

Model size

8B params

Tensor type

F32

·

Model tree for RoadQAQ/ReLIFT-Qwen2.5-7B-Zero

Quantizations

Collection including RoadQAQ/ReLIFT-Qwen2.5-7B-Zero

ReLIFT

ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone. • 8 items • Updated Jun 10, 2025 • 1

Paper for RoadQAQ/ReLIFT-Qwen2.5-7B-Zero

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

Paper • 2506.07527 • Published Jun 9, 2025 • 3