GLM-5-abliterated_dq3

This model is a DQ3 quantized version of the original model [GLM-5-abliterated](Local Model). It was quantized locally using the mlx_lm library.

Quantization Methodology (DQ3)

This model was quantized using the dynamic DQ3 (3-bit / 4-bit / 8-bit mixed) approach, inspired by the methodology described in the mlx-community/Kimi-K2.5-mlx-DQ3_K_M-q8 repository.

The weights are mixed based on MLX layers:

  • Expert layers (switch_mlp / mlp) are quantized to 3-bit.
  • The first 5 layers are kept at higher quality (5-bit).
  • Every 5th layer is medium quality (4-bit).
  • All other layers (e.g. attention, normalization) remain at 8-bit to serve as the "8-bit brain".
Downloads last month
59
Safetensors
Model size
744B params
Tensor type
BF16
U32
F32
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support