Content
This model area links to models and tools around Mägic. The development milestones are called Skipper (T3) and Mate (M8).
The Mägic project is a Proto Open Source project (OpenSoars) that does NOT publish its code but applies the benefits preferrably to OSI models and some select Open Weights models depending on community feedback. The goal is to strengthen the True Open Source model family while giving everyone a choice to run efficient high quality inference on-device.
Converted GGUF models will be provided for:
- all OSI compliant language models (Olmo, Apertus, Smol, ...)
- select Open Weights language models (Mistral, Granite, ...)
Inference software based on llama.cpp will be provided as open source under Apache 2.0 license.
- initial versions target 2bpw for fp16 quality at a 4x speedup: A single RTX 4090 will be able to serve a 70b model as fast as a H200 today.
- T3 experiments showed that 1.4bpw and 10x speedup is possible. That is Mägic.
Mägic Technology
Mägic splits the work into two parts:
- secret code for lossless model conversion
- The secret transformation is based on prior work I did 2001-2005 on video codec quantization. Back then I experimented with higher dimensional transformations (JAVC) as well as adaptive resolution (HeiDi). Those transformations and ideas fit very well into the age of neural network quantization.
- public code for efficient model inference Apache 2.0
- binaries and source code
- future extensions will debut as Mägic+ and trickle down into Mägic over time
Mägic+
Mägic+ will be the paid tier and offer improved compression, performance and model support
Demo Spaces
- Regular compression
- Granite4family All Granite4 models (small, tiny, micro, nano 1b and nano 350m)
- SmartQuant ikllama.cpp
- SmartQuant llama.cpp
- SmartQuant-My-GGUF: Public space to convert models efficiently to GGUF for llama and ikllama
- T3 OSI compression
- TOM@zero Demo of next generation 2bpw compression (Skipper aka T3) with high quality open source models (OSI)
- Olmo 3.1 32b
- TOM@zero Demo of next generation 2bpw compression (Skipper aka T3) with high quality open source models (OSI)
- T3 Open Weights compression
- Granite4extreme Granite 4 small hybrid 32b compressed to below 9GB in fp16 quality
- Mägic compression
- Mägic Olmo-3.1-32B-Instruct
- Mägic SmolLM2-135M-Instruct
- more converted models: Smol3 3b, Olmo 3.1 7b, Mistral 3.1 24b, Gemma4 9b, Gemma4 31b, Apertus 70b
- github links to source code for Mägic inference
- maegic space with compiled inference binaries
SmartQuant
SmartQuant is not Mägic. It provides a baseline with improved default compression.
- Downloads last month
- -
4-bit
6-bit