SFT & Reward Models used in the experiments of the ICML 2024 paper "Towards Efficient Exact Optimization of Language Model Alignment"
Haozhe Ji
ehzoah
·
AI & ML interests
language modeling, text generation
Organizations
None yet
models 7
ehzoah/RM-Llama-3.2-1B_UltraFeedback-ArmoRM
1B • Updated
ehzoah/Llama-3.2-1B-sft-full
Text Generation • 1B • Updated
• 5 •
ehzoah/pythia-1.4b-sft-full
Updated
ehzoah/exo-hh-reward-model
Updated
ehzoah/exo-imdb-sft-model
Text Generation • Updated
• 1
ehzoah/exo-imdb-reward-model
Text Generation • Updated
• 1
ehzoah/exo-hh-sft-model
Updated