AlephBERT:A Hebrew Large Pre-Trained Language Model to Start-off your Hebrew NLP Application With Paper β’ 2104.04052 β’ Published Apr 8, 2021
Lexical Generalization Improves with Larger Models and Longer Training Paper β’ 2210.12673 β’ Published Oct 23, 2022
Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation Paper β’ 2407.13696 β’ Published Jul 18, 2024 β’ 5
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation Paper β’ 2503.01622 β’ Published Mar 3, 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper β’ 2504.17502 β’ Published Apr 24, 2025 β’ 55