Instructions to use Irfanuruchi/Phi-2-MLX-5bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Irfanuruchi/Phi-2-MLX-5bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Irfanuruchi/Phi-2-MLX-5bit") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use Irfanuruchi/Phi-2-MLX-5bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "Irfanuruchi/Phi-2-MLX-5bit" --prompt "Once upon a time"
Phi-2 MLX 5-bit
This repository provides a 5-bit MLX-quantized version of Microsoft Phi-2, optimized for higher output quality while remaining suitable for local, offline inference on Apple Silicon.
This variant offers better instruction-following and coherence compared to the 4-bit version, at a modest increase in memory usage.
Model Details
- Base model: microsoft/phi-2
- Architecture: Decoder-only Transformer
- License: MIT
- Quantization: MLX static quantization (≈5.5 bits per weight)
- Target hardware: Apple Silicon (M1 / M2 / M3)
Performance Characteristics
| Metric | Value |
|---|---|
| Disk size | ~1.9–2.1 GB |
| Peak RAM usage | ~2.0–2.2 GB |
| Inference speed | Moderate |
| Instruction quality | Higher |
Usage
mlx_lm.generate \
--model /path/to/Phi-2-MLX-5bit \
--prompt "Explain the FFT in simple terms." \
--max-tokens 120
Notes
- This is a quantized conversion, not a fine-tuned model.
- The 5-bit version is recommended for:
- better reasoning consistency
- fewer repetitions
- improved instruction adherence
- For maximum speed and lower memory usage, see the 4-bit variant.
License
This repository redistributes a quantized MLX conversion of Microsoft Phi-2.
- Original model license: MIT
- MLX conversion: MIT
See LICENSE for details.
- Downloads last month
- 21
Model size
0.5B params
Tensor type
F16
·
U32 ·
Hardware compatibility
Log In to add your hardware
5-bit
Model tree for Irfanuruchi/Phi-2-MLX-5bit
Base model
microsoft/phi-2