Models in CI - a nm-testing Collection

nm-testing 's Collections

KV Cache Quantization

FP8-Block Quantized Models

LLM Compressor testing

Speculators testing

Sparse-Llama-3.1-8B-2of4

Models in CI

updated May 12

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-FP8-Channelwise-compressed-tensors

Text Generation • 8B • Updated Oct 9, 2024 • 32 • 2
nm-testing/Meta-Llama-3-8B-Instruct-FBGEMM-nonuniform

Text Generation • 8B • Updated Jul 20, 2024 • 7
nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test

Text Generation • 8B • Updated Oct 9, 2024 • 19.1k
nm-testing/Meta-Llama-3-8B-Instruct-W8-Channel-A8-Dynamic-Asym-Per-Token-Test

8B • Updated Oct 9, 2024 • 1.62k • 1
nm-testing/Meta-Llama-3-8B-Instruct-W8-Channel-A8-Dynamic-Per-Token-Test

Text Generation • 8B • Updated Oct 9, 2024 • 25
nm-testing/Meta-Llama-3-8B-Instruct-nonuniform-test

Text Generation • 8B • Updated Oct 9, 2024 • 43.3k
nm-testing/Meta-Llama-3-70B-Instruct-FBGEMM-nonuniform

Text Generation • 71B • Updated Jul 20, 2024 • 1.44k • 1
nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16

14B • Updated Feb 24, 2025 • 136k • 1
nm-testing/Qwen2-1.5B-Instruct-FP8W8

Text Generation • 2B • Updated Oct 9, 2024 • 4.28k
nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-chnl_wts_per_tok_dyn_act_fp8-BitM

5B • Updated Dec 17, 2024 • 4
nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change

Text Generation • 1B • Updated Oct 9, 2024 • 65.8k
nm-testing/pixtral-12b-FP8-dynamic

Image-Text-to-Text • 13B • Updated Apr 11, 2025 • 2.13k • 1
RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic

Image-Text-to-Text • 24B • Updated 18 days ago • 9.2k • 9
nm-testing/Llama-3.2-1B-Instruct-FP8-KV

1B • Updated Nov 1, 2024 • 31.4k
nm-testing/tinyllama-oneshot-w8a8-channel-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 16.3k
nm-testing/tinyllama-oneshot-w8-channel-a8-tensor

Text Generation • 1B • Updated Oct 9, 2024 • 1.36k
RedHatAI/Llama-3.2-1B-quantized.w8a8

1B • Updated Jan 16, 2025 • 61.9k • 1
nm-testing/tinyllama-oneshot-w8a8-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 14.2k
nm-testing/asym-w8w8-int8-static-per-tensor-tiny-llama

1B • Updated Oct 9, 2024 • 9.65k
nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Static-Per-Tensor-Sym

8B • Updated Dec 10, 2024 • 98
nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Static-Per-Tensor-Asym

8B • Updated Dec 11, 2024 • 90
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-chnl_wts_per_tok_dyn_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 77
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-chnl_wts_tensor_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 56
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-tensor_wts_per_tok_dyn_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 69
nm-testing/TinyLlama-1.1B-Chat-v1.0-gsm8k-pruned.2of4-tensor_wts_tensor_act_int8-BitM

0.7B • Updated Dec 17, 2024 • 76
nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Channel-Weight-testing

1B • Updated Dec 8, 2024 • 19
nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Static-testing

1B • Updated Dec 8, 2024 • 17
nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Tensor-Weight-testing

1B • Updated Dec 8, 2024 • 17
nm-testing/TinyLlama-1.1B-Chat-v1.0-2of4-Sparse-Dense-Compressor

1B • Updated Dec 8, 2024 • 23
nm-testing/llama2.c-stories42M-pruned2.4-compressed

48.6M • Updated Jan 22, 2025 • 122
nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4

0.7B • Updated May 12 • 27.1k
nm-testing/Llama-3.2-1B-Instruct-spinquantR1R2R4-w4a16

2B • Updated Aug 22, 2025 • 9.47k
nm-testing/Llama-3.2-1B-Instruct-quip-w4a16

2B • Updated Sep 12, 2025 • 9.2k
nm-testing/tinyllama-oneshot-w4a16-channel-v2

Text Generation • 1B • Updated Oct 9, 2024 • 20.6k • 1
nm-testing/test-w4a16-mixtral-actorder-group

47B • Updated Dec 26, 2024 • 1.11k
nm-testing/TinyLlama-1.1B-Chat-v1.0-kvcache-fp8-attn_head

1B • Updated Jan 14 • 163
nm-testing/TinyLlama-1.1B-Chat-v1.0-kvcache-fp8-tensor

1B • Updated Jan 14 • 9.12k
nm-testing/Qwen3-30B-A3B-MXFP4A16

17B • Updated Feb 17 • 23.4k
nm-testing/Qwen3-0.6B-MXFP8

0.8B • Updated Mar 19 • 7
nm-testing/TinyLlama-1.1B-Chat-v1.0-MXFP8

1B • Updated Mar 19 • 6
nm-testing/dflash-qwen3-8b-speculators

2B • Updated Apr 15 • 23.8k
nm-testing/TinyLlama-1.1B-Chat-v1.0-MXFP4

0.6B • Updated May 5 • 8.98k
nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4A16

0.7B • Updated May 12 • 11.2k