foundational openai-community/gpt2 Text Generation • 0.1B • Updated Feb 19, 2024 • 14.2M • 3.22k google-bert/bert-base-uncased Fill-Mask • 0.1B • Updated Feb 19, 2024 • 60.7M • • 2.63k facebook/bart-large-mnli Zero-Shot Classification • 0.4B • Updated Sep 5, 2023 • 2.78M • • 1.56k
y25_w19 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6, 2025 • 94 cognition-ai/Kevin-32B 33B • Updated May 6, 2025 • 134 • 164 PrimeIntellect/INTELLECT-2 33B • Updated May 13, 2025 • 36 • 205
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6, 2025 • 94
foundational openai-community/gpt2 Text Generation • 0.1B • Updated Feb 19, 2024 • 14.2M • 3.22k google-bert/bert-base-uncased Fill-Mask • 0.1B • Updated Feb 19, 2024 • 60.7M • • 2.63k facebook/bart-large-mnli Zero-Shot Classification • 0.4B • Updated Sep 5, 2023 • 2.78M • • 1.56k
y25_w19 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6, 2025 • 94 cognition-ai/Kevin-32B 33B • Updated May 6, 2025 • 134 • 164 PrimeIntellect/INTELLECT-2 33B • Updated May 13, 2025 • 36 • 205
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6, 2025 • 94