Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

TobDeBer
/

maegic

Model card Files Files and versions

Content
Mägic Technology
Mägic+
Demo Spaces
SmartQuant

Content

This model area links to models and tools around Mägic. The development milestones are called Skipper (T3) and Mate (M8).

The Mägic project is a Proto Open Source project (OpenSoars) that does NOT publish its code but applies the benefits preferrably to OSI models and some select Open Weights models depending on community feedback. The goal is to strengthen the True Open Source model family while giving everyone a choice to run efficient high quality inference on-device.

Converted GGUF models will be provided for:

all OSI compliant language models (Olmo, Apertus, Smol, ...)
select Open Weights language models (Mistral, Granite, ...)

Inference software based on llama.cpp will be provided as open source under Apache 2.0 license.

initial versions target 2bpw for fp16 quality at a 4x speedup: A single RTX 4090 will be able to serve a 70b model as fast as a H200 today.
T3 experiments showed that 1.4bpw and 10x speedup is possible. That is Mägic.

Mägic Technology

Mägic splits the work into two parts:

secret code for lossless model conversion
- The secret transformation is based on prior work I did 2001-2005 on video codec quantization. Back then I experimented with higher dimensional transformations (JAVC) as well as adaptive resolution (HeiDi). Those transformations and ideas fit very well into the age of neural network quantization.
public code for efficient model inference Apache 2.0
- binaries and source code
- future extensions will debut as Mägic+ and trickle down into Mägic over time

Mägic+

Mägic+ will be the paid tier and offer improved compression, performance and model support

Demo Spaces

Regular compression
- Granite4family All Granite4 models (small, tiny, micro, nano 1b and nano 350m)
- SmartQuant ikllama.cpp
- SmartQuant llama.cpp
- SmartQuant-My-GGUF: Public space to convert models efficiently to GGUF for llama and ikllama
T3 OSI compression
- TOM@zero Demo of next generation 2bpw compression (Skipper aka T3) with high quality open source models (OSI)
  - Olmo 3.1 32b
T3 Open Weights compression
- Granite4extreme Granite 4 small hybrid 32b compressed to below 9GB in fp16 quality
Mägic compression
- Mägic Olmo-3.1-32B-Instruct
- Mägic SmolLM2-135M-Instruct
- more converted models: Smol3 3b, Olmo 3.1 7b, Mistral 3.1 24b, Gemma4 9b, Gemma4 31b, Apertus 70b
- github links to source code for Mägic inference
- maegic space with compiled inference binaries

SmartQuant

SmartQuant is not Mägic. It provides a baseline with improved default compression.

Downloads last month: -

GGUF

Hardware compatibility

Log In to add your hardware

4-bit

6-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs