arxiv:2603.24080

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale

Published on Mar 25

Authors:

Abstract

Analysis of language model factual accuracy reveals significant gaps between benchmark scores and real-world verification, with substantial unverifiability issues and limited subject overlap across models.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Benchmarks like MMLU suggest flagship language models approach factuality saturation above 90\%. LLMpedia shows this picture is incomplete. We materialize {sim}1.3M encyclopedia articles entirely from parametric memory across three model families, then audit every claim against Wikipedia and curated web evidence. For gpt-5-mini, the verifiable true rate is 68.4\% on Wikipedia-covered subjects - more than 21\,pp below MMLU - and the gap is driven by unverifiability (30.5\%), not refutation (1.2\%). Beyond Wikipedia, frontier articles audited against curated web evidence reach 57.6\%; Wikipedia covers only 56.7\% of model-surfaced subjects, and three model families overlap in just 7.3\% of subject choices. In a retrieval-trap benchmark inspired by prior analysis of Grokipedia, LLMpedia is more factual at roughly half the textual similarity to Wikipedia. Every prompt, article, and verdict is released. Data, code, interface: https://llmpedia.net.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.24080

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.24080 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.24080 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.