Qwen3.6-27B-OTQ-GGUF / BENCHMARKS.md
zlaabsi's picture
Add paired BF16 quantization chart and reproducibility links
b057957 verified

Benchmarks

Benchmarks are split into measured OTQ runtime frontiers, release-gate checks, allocation transparency, and the official Qwen reference table.

Measured OTQ Runtime

runtime-frontier

Vector: SVG | PDF

prefill-decode-tradeoff

Vector: SVG | PDF

release-scorecard

Vector: SVG | PDF

tensor-allocation

Vector: SVG | PDF

allocation-policy

Vector: SVG | PDF

Release Gates

release-gate-latency

Vector: SVG | PDF

release-gate-coverage

Vector: SVG | PDF

These release suites are deterministic guardrails, not substitutes for full academic benchmarks.

Official Qwen Baseline

Official Qwen3.6-27B language benchmark scores are imported as an external reference table in benchmarks/official_qwen36_baseline.csv. They are not plotted against OTQ until matching benchmark tasks are run on these GGUF files.

Deltas versus the official Qwen baseline must only be reported for tasks that are actually run on the OTQ artifacts with the same task definition and scoring rule.

OTQ Task Runs

No separate OTQ-only benchmark subset is attached beyond the paired BF16-vs-GGUF mini-subset.

Paired BF16-vs-GGUF Mini-Subset

The staged repo includes a same-task, same-prompt, deterministic 232-sample practical subset comparing the BF16 sidecar against Q3_K_M, Q4_K_M, and Q5_K_M.

paired-bf16-quant-delta

Vector: SVG | PDF

Files:

  • benchmarks/paired_bf16_quant_summary.csv
  • benchmarks/paired_bf16_quant_summary.json
  • benchmarks/paired_bf16_quant_report.md

This is a quantization-regression signal, not a full official benchmark replacement. Do not compare its small-subset mmlu_pro or gpqa rates directly to the Qwen model-card full-harness scores.

CSV Data

  • benchmarks/throughput.csv
  • benchmarks/eval_summary.csv
  • benchmarks/category_pass_rate.csv
  • benchmarks/artifacts.csv
  • benchmarks/tensor_allocation.csv
  • benchmarks/category_tensor_allocation.csv
  • benchmarks/official_qwen36_baseline.csv
  • benchmarks/paired_bf16_quant_summary.csv
  • benchmarks/paired_bf16_quant_summary.json
  • benchmarks/paired_bf16_quant_report.md
  • benchmarks/quant_eval.csv when separate OTQ-only task runs are present