North Code Quant

GGUF Code Generation

High-performance quantized GGUF builds of Cohere's North Code model.
Optimized for local inference via llama.cpp, LM Studio, and Ollama.

Base Model Cohere North Code

Architecture Cohere / Command-R

Context Length 128K Tokens

License CC-BY-NC / Custom

⚡ Quick Start

LM Studio

Search for "North Code Quant" in the LM Studio search bar, select your preferred quantization level from the sidebar, and click Download.

llama.cpp

Bash

./llama-cli -m north-code-quant-Q4_K_M.gguf \
  --ctx-size 8192 \
  --threads $(nproc) \
  --prompt "def fibonacci(n):"

📦 Available Quants

Files are sorted by size and quality. Q4_K_M is recommended for most users as the best balance of speed and perplexity.

File Name	Quant Type	Size	Description
`North-Code-Quant.gguf`	Q8_0	-- GB	Near-lossless. Best quality, higher VRAM/RAM requirement.

📝 About This Quantization

These GGUF files were converted from the official Cohere North Code weights using llama.cpp with importance matrix calibration for optimal token-level precision retention.

⚠️ Disclaimer: This is a quantized derivative model. While quants retain most of the base model's capabilities, lower-bit quantizations may exhibit degraded performance in edge-case code generation or multilingual tasks. Always verify generated code before execution. This model inherits the license terms of the original Cohere North Code model.

Downloads last month: 33

GGUF

Model size

0.8B params

Architecture

sd-lora

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support