Off policy/entropy - a chang2394 Collection

chang2394 's Collections

Off policy/entropy

Inference improvements

Off policy/entropy

updated Apr 15

Efficient RL Training for LLMs with Experience Replay

Paper • 2604.08706 • Published Apr 9 • 22