Instructions to use ekryski/Qwen3.5-0.8B-2bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use ekryski/Qwen3.5-0.8B-2bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen3.5-0.8B-2bit ekryski/Qwen3.5-0.8B-2bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Qwen3.5-0.8B-2bit
2-bit affine quantization of Qwen/Qwen3.5-0.8B, produced with FFAI's ffai convert (mlx-affine format, group_size=64).
Scope — kernel testing, not coherent inference
Pure 2-bit quantization at 0.8B parameters is below the threshold where this architecture retains coherent decode. mlx-community does not publish a pure 2-bit Qwen3.5-0.8B for the same reason (their offering at this size is Qwen3.5-0.8B-mixed_2_6).
This checkpoint exists to exercise FFAI's int2 kernel surface end-to-end (dequant_gemv_int2, dequant_gather_int2, mt_qmm_mma_int2) at production-realistic shapes, plus the Qwen3.5-VL loader + centered-RMSNorm fold path. Outputs will be incoherent — use the 3/4/5/6/8-bit conversions for actual generation.
Conversion
ffai convert Qwen/Qwen3.5-0.8B --bits 2 --quantize-embeddings \
--upload-repo ekryski/Qwen3.5-0.8B-2bit
Default --no-quantize-vision keeps the vision tower at bf16 (FFAI's VL towers use plain Linear, not QuantizedLinear).
- Downloads last month
- 37
2-bit