Qwen3.5-0.8B-2bit

2-bit affine quantization of Qwen/Qwen3.5-0.8B, produced with FFAI's ffai convert (mlx-affine format, group_size=64).

Scope — kernel testing, not coherent inference

Pure 2-bit quantization at 0.8B parameters is below the threshold where this architecture retains coherent decode. mlx-community does not publish a pure 2-bit Qwen3.5-0.8B for the same reason (their offering at this size is Qwen3.5-0.8B-mixed_2_6).

This checkpoint exists to exercise FFAI's int2 kernel surface end-to-end (dequant_gemv_int2, dequant_gather_int2, mt_qmm_mma_int2) at production-realistic shapes, plus the Qwen3.5-VL loader + centered-RMSNorm fold path. Outputs will be incoherent — use the 3/4/5/6/8-bit conversions for actual generation.

Conversion

ffai convert Qwen/Qwen3.5-0.8B --bits 2 --quantize-embeddings \
    --upload-repo ekryski/Qwen3.5-0.8B-2bit

Default --no-quantize-vision keeps the vision tower at bf16 (FFAI's VL towers use plain Linear, not QuantizedLinear).

Downloads last month
37
Safetensors
Model size
0.4B params
Tensor type
BF16
·
F32
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ekryski/Qwen3.5-0.8B-2bit

Quantized
(125)
this model