Qwen3.6-27B-Omnimerge-v4 — MLX 4-bit (Vision-Language)
Full multimodal 4-bit MLX quantization of ManniX-ITA/Qwen3.6-27B-Omnimerge-v4: text + image + video, runnable natively on Apple Silicon via mlx-vlm.
This is the VL build. There is also a text-only MLX-4bit release for slightly smaller-footprint, language-only inference via mlx-lm.
The base model is a same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B with three Qwen3.6 fine-tunes (rico03, Esper3.1, kai-os Opus-Reasoning-anchor), plus an MLP-passthrough surgery that fixes Qwen3.6's reasoning-tag-emission fragility. Method, benchmark numbers, and forensic write-up live on the base model card.
Quantization
- Type: MLX 4-bit (
-q --q-bits 4 --q-group-size 64) viamlx_vlm.convert - Group size: 64
- Effective bits/weight: 4.695 (slightly higher than the text-only 4.501 —
mlx_vlmkeeps the vision tower in higher precision by default; only the LM weights are 4-bit quantized) - Shape on disk: 3 safetensors shards, ~16 GB total
- What is preserved:
vision_tower.*weights — full vision encodermulti_modal_projector.*weights — vision → LM connectorpreprocessor_config.json— image preprocessingvideo_preprocessor_config.json— video preprocessingprocessor_config.json— chat-time processor wiringchat_template.jinja— Qwen3.5/3.6 chat template with image/video roles
- Build env (verified 2026-05-11 on Linux + RTX 3090 + CUDA 12.1):
CUDA backend used only for the conversion step on this Linux box; end users on Apple Silicon use the nativemlx==0.30.0 mlx-cuda==0.30.0 ← ABI-coupled, must match mlx mlx-lm==0.30.7 ← Qwen3.5/3.6 model_type support mlx-vlm==0.3.12 (--no-deps) ← last version that doesn't transitively bump mlx torch==2.11.0+cpu ← satisfies Qwen3VLVideoProcessor's torchvision dep without disturbing mlx-cuda's nvidia-cublas pinmlxruntime, which has no CUDA dependency.
Conversion recipe: omnimergekit/scripts/mlx_convert.sh (auto-detects vision_config and routes through mlx_vlm.convert). See MLX_CONVERT.md for the full pin rationale.
Usage
pip install -U mlx-vlm
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
repo = "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-VL-4bit"
model, processor = load(repo)
config = load_config(repo)
# Pure-text generation
prompt = apply_chat_template(processor, config,
"Write a Rust function that returns the n-th Fibonacci number iteratively.")
print(generate(model, processor, prompt, max_tokens=512, verbose=True))
# Vision (with an image)
prompt = apply_chat_template(processor, config,
"Describe the image in detail, then state what's likely happening.",
num_images=1)
print(generate(model, processor, prompt,
max_tokens=512, verbose=True, image=["path/to/image.png"]))
# Video (with a clip)
prompt = apply_chat_template(processor, config,
"Summarize what happens in this video.", num_videos=1)
print(generate(model, processor, prompt,
max_tokens=512, verbose=True, video=["path/to/clip.mp4"]))
The base model emits Qwen3.6 reasoning tags (<think>...</think>). Strip them in post-processing or use a chat template wrapper that handles them appropriately.
Memory & speed
Empirically (M-series, 32 GB+ recommended):
- Resident memory: ~17–18 GB (vs ~16-17 GB for the text-only build — vision tower adds ~1 GB at higher precision)
- Speed: comparable to other Qwen3-VL 27B 4-bit MLX builds; depends on chip generation
- Context length: inherits the base model's 256k context (RAM permitting)
Choosing between the two MLX builds
| text-only | VL (this build) | |
|---|---|---|
| Loader | mlx_lm.load |
mlx_vlm.load |
| Image / video input | ❌ | ✅ |
| Disk size | ~15 GB | ~16 GB |
| Resident RAM | ~16–17 GB | ~17–18 GB |
| Quality on text-only tasks | identical LM weights | identical LM weights |
Pick text-only if you don't need vision and want a marginally smaller download. Pick VL for anything multimodal — same language model behind it.
Related
- Base merge:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 - MLX text-only:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit - GGUF release:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF - Ollama tags:
mannix/omnimerge-v4 - Methodology + scripts:
mann1x/omnimergekit
License
Apache 2.0 — inherits from Qwen3.6 base. See the base model card for the full attribution list (Qwen team, rico03, ValiantLabs, kai-os, mergekit community).
- Downloads last month
- 7
4-bit
Model tree for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-VL-4bit
Base model
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4