ReLIFT
Collection
ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone. • 8 items • Updated • 1
How to use RoadQAQ/ReLIFT-Qwen2.5-7B-Zero with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("question-answering", model="RoadQAQ/ReLIFT-Qwen2.5-7B-Zero") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("RoadQAQ/ReLIFT-Qwen2.5-7B-Zero")
model = AutoModelForCausalLM.from_pretrained("RoadQAQ/ReLIFT-Qwen2.5-7B-Zero")ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone, as described in Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions.
Code: https://github.com/TheRoadQaQ/ReLIFT
Project page: https://github.com/TheRoadQaQ/ReLIFT