UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding Paper • 2606.07167 • Published 29 days ago • 1
TABVERSE: Benchmarking Cross-Format Table Understanding in LLMs and VLMs Paper • 2606.09578 • Published 26 days ago
Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues Paper • 2605.00119 • Published Apr 30
SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning Paper • 2604.19098 • Published Apr 30
NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors Paper • 2506.10627 • Published Jun 12, 2025
A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding Paper • 2601.08645 • Published Feb 24
Jais-2-Family Collection The 2nd generation of the Jais Large Language Models Family • 4 items • Updated Feb 20 • 15
iBitter-Stack: A Multi-Representation Ensemble Learning Model for Accurate Bitter Peptide Identification Paper • 2505.15730 • Published May 21, 2025
UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking Paper • 2505.15063 • Published May 21, 2025
Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD Paper • 2508.17450 • Published Aug 24, 2025 • 9