maia3-79m-fp16
This repository contains an fp16 reduced-precision release derived from the original UofTCSSLab/Maia3-79M checkpoint.
Base model relationship
Repository metadata is configured to establish a derived-model relationship on the Hugging Face Hub.
base_model:UofTCSSLab/Maia3-79Mbase_model_relation: quantized
Although Hugging Face uses quantized as the repository relation type, this release performs precision reduction rather than integer quantization.
Quantization details
The original checkpoint weights were converted from float32 (fp32) to float16 (fp16).
Conversion:
float32 โ float16
Characteristics:
- Reduced model size (~50% relative to
fp32) - Lower memory usage during loading and inference
- Increased throughput on hardware with efficient half-precision support
- Minimal expected quality degradation compared to the original checkpoint
- Weights remain floating-point values rather than integer-quantized representations
This release does not use:
int8int4- GPTQ
- AWQ
- GGUF quantization schemes
- other integer weight compression methods
Repository contents
maia3-79m-fp16.ptโ convertedfp16checkpoint- model definition or loading utilities required for inference
Loading example
import torch
state_dict = torch.load(
"maia3-79m-fp16.pt",
map_location="cpu"
)
Provenance
- Original checkpoint:
UofTCSSLab/Maia3-79M - Converted by:
bqrio - Conversion method:
float32 โ float16
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for bqrio/maia3-79m-fp16
Base model
UofTCSSLab/Maia3-79M