GGUF OOM PoC

Issue

The GGUF parser in ggml/src/gguf.cpp allows a crafted GGUF file of ~200 bytes to force multi-gigabyte memory allocations, causing an out-of-memory crash on any system that loads the file.

Root Cause

The general.alignment KV pair is validated only as a power of 2, allowing values up to 2^31 (2 GB). Each tensor is then padded to this alignment via GGML_PAD(), so a 4-byte tensor becomes 2 GB. With N tensors, the total allocation is N * 2^31.

Files

crash_oom_8gb.gguf - 197 bytes, 4 tensors, forces 8 GB allocation
crash_oom_20gb.gguf - 407 bytes, 10 tensors, forces 20 GB allocation

Reproduction

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
cmake -B build && cmake --build build --target llama-gguf
./build/bin/llama-gguf crash_oom_8gb.gguf r
# ggml_aligned_malloc: insufficient memory (attempted to allocate 8192.00 MB)
# GGML_ASSERT(ctx->mem_buffer != NULL) failed
# Aborted (core dumped)

Impact

Amplification factor: ~40,000,000x (file size to allocation)
Crashes ANY application that loads the file
Affects: llama.cpp, all ggml-based tools, model pipelines

Suggested Fix

Cap alignment to a reasonable maximum (e.g., 1 MB).

Downloads last month: 2

GGUF

Model size

10 params

Architecture

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support