North Code Quant

GGUF Code Generation

High-performance quantized GGUF builds of Cohere's North Code model.
Optimized for local inference via llama.cpp, LM Studio, and Ollama.

Base Model Cohere North Code
Architecture Cohere / Command-R
Context Length 128K Tokens
License CC-BY-NC / Custom

โšก Quick Start

LM Studio

Search for "North Code Quant" in the LM Studio search bar, select your preferred quantization level from the sidebar, and click Download.

llama.cpp

Bash ./llama-cli -m north-code-quant-Q4_K_M.gguf \ --ctx-size 8192 \ --threads $(nproc) \ --prompt "def fibonacci(n):"

๐Ÿ“ฆ Available Quants

Files are sorted by size and quality. Q4_K_M is recommended for most users as the best balance of speed and perplexity.

File Name Quant Type Size Description
North-Code-Quant.gguf Q8_0 -- GB Near-lossless. Best quality, higher VRAM/RAM requirement.

๐Ÿ“ About This Quantization

These GGUF files were converted from the official Cohere North Code weights using llama.cpp with importance matrix calibration for optimal token-level precision retention.

โš ๏ธ Disclaimer: This is a quantized derivative model. While quants retain most of the base model's capabilities, lower-bit quantizations may exhibit degraded performance in edge-case code generation or multilingual tasks. Always verify generated code before execution. This model inherits the license terms of the original Cohere North Code model.
Downloads last month
33
GGUF
Model size
0.8B params
Architecture
sd-lora
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support