Think-J: Learning to Think for Generative LLM-as-a-Judge Paper • 2505.14268 • Published May 20, 2025 • 1
Mitigating the Bias of Large Language Model Evaluation Paper • 2409.16788 • Published Sep 25, 2024 • 1
RM-Distiller: Exploiting Generative LLM for Reward Model Distillation Paper • 2601.14032 • Published Jan 20 • 1
Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization Paper • 2603.08091 • Published 20 days ago • 1
RM-Distiller: Exploiting Generative LLM for Reward Model Distillation Paper • 2601.14032 • Published Jan 20 • 1
Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization Paper • 2603.08091 • Published 20 days ago • 1
Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory Paper • 2505.15055 • Published May 21, 2025 • 1
Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory Paper • 2505.15055 • Published May 21, 2025 • 1
Mitigating the Bias of Large Language Model Evaluation Paper • 2409.16788 • Published Sep 25, 2024 • 1
Think-J: Learning to Think for Generative LLM-as-a-Judge Paper • 2505.14268 • Published May 20, 2025 • 1
An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers Paper • 2403.02839 • Published Mar 5, 2024 • 2
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Paper • 2406.10216 • Published Jun 14, 2024 • 2