Instructions to use zlaabsi/Qwen3.6-27B-OTQ-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zlaabsi/Qwen3.6-27B-OTQ-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="zlaabsi/Qwen3.6-27B-OTQ-GGUF",
	filename="Qwen3.6-27B-OTQ-DYN-Q3_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use zlaabsi/Qwen3.6-27B-OTQ-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M

Use Docker

docker model run hf.co/zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use zlaabsi/Qwen3.6-27B-OTQ-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zlaabsi/Qwen3.6-27B-OTQ-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zlaabsi/Qwen3.6-27B-OTQ-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M

Ollama
How to use zlaabsi/Qwen3.6-27B-OTQ-GGUF with Ollama:
```
ollama run hf.co/zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M
```

Unsloth Studio new

How to use zlaabsi/Qwen3.6-27B-OTQ-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for zlaabsi/Qwen3.6-27B-OTQ-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for zlaabsi/Qwen3.6-27B-OTQ-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for zlaabsi/Qwen3.6-27B-OTQ-GGUF to start chatting

Pi new

How to use zlaabsi/Qwen3.6-27B-OTQ-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Qwen3.6-27B-OTQ-GGUF"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use zlaabsi/Qwen3.6-27B-OTQ-GGUF with Docker Model Runner:
```
docker model run hf.co/zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M
```

Lemonade

How to use zlaabsi/Qwen3.6-27B-OTQ-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull zlaabsi/Qwen3.6-27B-OTQ-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.6-27B-OTQ-GGUF-Q4_K_M

List all available models

lemonade list

Qwen3.6-27B-OTQ-GGUF

OpenTQ TurboQuant dynamic-compatible GGUFs for Qwen/Qwen3.6-27B.

This is the stock llama.cpp release track. OpenTQ chooses the tensor-level allocation policy, but the files themselves use standard GGUF tensor types (Q3_K_M, Q4_K_M, Q5_K, Q6_K, Q8_0, F16). No custom OpenTQ runtime is required for these GGUF files.

The Hugging Face pipeline_tag follows the official Qwen3.6-27B card (image-text-to-text). These GGUF artifacts are validated here for local text inference with stock llama.cpp; vision tensors are not part of this text-focused release track.

Why This Release Exists

These builds target MacBook-class Apple Silicon where wall-clock time matters, especially with long prompts, large system messages and agent/tool context. The goal is not to publish another uniform quant; it is to provide a stock-compatible GGUF family where OpenTQ spends precision on the tensors that matter more for local inference.

What Is OpenTQ?

OpenTQ is an open quantization toolchain for TurboQuant-style low-bit model releases. For this GGUF track, OpenTQ does not introduce a custom file format: it audits the model tensor map, assigns standard GGUF tensor types per tensor family, validates the resulting files in stock llama.cpp, and publishes the allocation/evaluation evidence next to the model.

Field	Value
Release track	`Qwen3.6-27B-OTQ-GGUF`
Method	OpenTQ / TurboQuant-inspired dynamic tensor allocation
Runtime	stock `llama.cpp` with Metal and FlashAttention
Compatibility boundary	standard GGUF only; no native OpenTQ kernel required
Current public variants	`Q3_K_M` compact, `Q4_K_M` balanced, and `Q5_K_M` quality-first
Validation machine	M1 Max, 8K prefill gate, bounded generation, deterministic release suites

Paired BF16-vs-GGUF Quality Signal

These are small paired release signals, not full benchmark replacements. They use the same pinned task IDs, prompt format qwen3-no-think, deterministic decoding, and local scoring rules for BF16 and the GGUF artifacts.

BF16 sidecar: Hugging Face Jobs H200 run 69f235d2d2c8bd8662bd320e, model Qwen/Qwen3.6-27B. Reproducibility data is published in zlaabsi/Qwen3.6-27B-OTQ-GGUF-benchmarks.

Benchmark	BF16	Q3_K_M	Delta Q3	Q4_K_M	Delta Q4	Q5_K_M	Delta Q5
`mmlu`	15/16 (93.8%)	15/16 (93.8%)	+0.0%	15/16 (93.8%)	+0.0%	15/16 (93.8%)	+0.0%
`mmlu_pro`	13/24 (54.2%)	13/24 (54.2%)	+0.0%	13/24 (54.2%)	+0.0%	13/24 (54.2%)	+0.0%
`arc`	15/16 (93.8%)	15/16 (93.8%)	+0.0%	15/16 (93.8%)	+0.0%	15/16 (93.8%)	+0.0%
`hellaswag`	15/16 (93.8%)	15/16 (93.8%)	+0.0%	14/16 (87.5%)	-6.2%	15/16 (93.8%)	+0.0%
`gsm8k`	6/16 (37.5%)	5/16 (31.2%)	-6.2%	6/16 (37.5%)	+0.0%	6/16 (37.5%)	+0.0%
`math`	6/16 (37.5%)	5/16 (31.2%)	-6.2%	7/16 (43.8%)	+6.2%	6/16 (37.5%)	+0.0%
`bbh`	18/24 (75.0%)	18/24 (75.0%)	+0.0%	18/24 (75.0%)	+0.0%	19/24 (79.2%)	+4.2%
`gpqa`	0/24 (0.0%)	0/24 (0.0%)	+0.0%	0/24 (0.0%)	+0.0%	0/24 (0.0%)	+0.0%
`truthfulqa`	14/16 (87.5%)	13/16 (81.2%)	-6.2%	13/16 (81.2%)	-6.2%	14/16 (87.5%)	+0.0%
`winogrande`	14/16 (87.5%)	14/16 (87.5%)	+0.0%	14/16 (87.5%)	+0.0%	14/16 (87.5%)	+0.0%
`drop`	13/16 (81.2%)	13/16 (81.2%)	+0.0%	12/16 (75.0%)	-6.2%	11/16 (68.8%)	-12.5%
`piqa`	15/16 (93.8%)	15/16 (93.8%)	+0.0%	15/16 (93.8%)	+0.0%	15/16 (93.8%)	+0.0%
`commonsenseqa`	13/16 (81.2%)	13/16 (81.2%)	+0.0%	13/16 (81.2%)	+0.0%	12/16 (75.0%)	-6.2%
`TOTAL`	157/232 (67.7%)	154/232 (66.4%)	-1.3%	155/232 (66.8%)	-0.9%	155/232 (66.8%)	-0.9%

Aggregate deltas on this practical subset are small: Q3 is -1.3 points, Q4 is -0.9 points, and Q5 is -0.9 points vs BF16. Per-benchmark rows still have small-N variance and should not be used as leaderboard claims.

Official Qwen3.6-27B full-harness scores remain the baseline for model capability claims. This table measures same-subset quantization regression only.

Allocation Transparency

Variant	Mapped tensors	F16	Q3_K	Q4_K	Q5_K	Q6_K	Q8_0
`Q3_K_M`	851	353	180	252	65	1	0
`Q4_K_M`	851	353	0	180	237	80	1
`Q5_K_M`	851	353	0	0	180	237	81

The allocation plots show where OpenTQ spends precision. For example, the compact profile pushes bulk MLP tensors lower while preserving attention anchors and output-sensitive tensors at higher precision.

Custom Allocation Policies

OpenTQ can also generate a dynamic GGUF plan from a user-defined YAML/JSON policy. This lets you customize where precision is spent without editing OpenTQ source code.

name: MY-DYN-Q4
base_ftype: Q4_K_M
target: custom 32GB Apple Silicon profile
requires_imatrix: false

category_types:
  embeddings: Q6_K
  lm_head: Q8_0
  self_attn_proj: Q6_K
  linear_attn_proj: Q5_K
  linear_attn_conv: F16
  mlp_proj: Q3_K

edge_layers: 2
edge_overrides:
  mlp_proj: Q5_K
  self_attn_proj: Q8_0

periodic_stride: 4
periodic_overrides:
  self_attn_proj: Q6_K

Generate the plan:

git clone https://github.com/zlaabsi/opentq
cd opentq
uv sync

uv run opentq dynamic-gguf-plan \
  --policy-file policies/qwen36-custom-dyn-q4.yaml \
  --output artifacts/qwen36-my-dyn-q4 \
  --llama-cpp /path/to/llama.cpp \
  --source-gguf artifacts/qwen36-bf16/Qwen3.6-27B-BF16.gguf \
  --target-gguf artifacts/qwen36-my-dyn-q4/Qwen3.6-27B-MY-DYN-Q4.gguf

The output directory contains plan.json, tensor-types.txt, tensor-types.annotated.tsv, and a runnable quantize.sh. The stock-compatible track supports custom allocation across standard GGUF tensor types; arbitrary new quantization kernels belong to the native OpenTQ runtime track.

See the full OpenTQ cookbook for built-in profiles, external policies, release evidence, and validation workflows.

Quantization Monitor

OpenTQ includes a terminal dashboard for long quantization batches:

uv run opentq monitor \
  --root artifacts/qwen3.6-27b \
  --watch \
  --interval 5

Use uv run opentq status --root artifacts/qwen3.6-27b when you need machine-readable status output for automation.

Files

File	Quant	Size	SHA256	Target
`Qwen3.6-27B-OTQ-DYN-Q3_K_M.gguf`	`Q3_K_M`	13.48 GiB	`0088e8884a0593b6720a58e2e0ab91a1dd216dfb80942b698f9ddee5dc8b3192`	32 GB Apple Silicon first pick
`Qwen3.6-27B-OTQ-DYN-Q4_K_M.gguf`	`Q4_K_M`	16.82 GiB	`6b1b9bcbb987e8861c9727488b320e90446d1610a6d3341e3c2185e7388bc2e9`	32 GB moderate context; 48 GB+ preferred
`Qwen3.6-27B-OTQ-DYN-Q5_K_M.gguf`	`Q5_K_M`	19.92 GiB	`aaf270a91d943e9f26692f267aa9ccaa5359ae2084abb8ba76d84d56b660ab16`	48 GB+ preferred; measured on M1 Max 32 GB with tight headroom

Variant Family

File	Quant	Size	Apple Silicon target	Role
`Qwen3.6-27B-OTQ-DYN-Q3_K_M.gguf`	`Q3_K_M`	13.48 GiB	32 GB Apple Silicon first pick	smallest public OpenTQ dynamic-compatible release
`Qwen3.6-27B-OTQ-DYN-Q4_K_M.gguf`	`Q4_K_M`	16.82 GiB	32 GB moderate context; 48 GB+ preferred	quality-balanced public release
`Qwen3.6-27B-OTQ-DYN-Q5_K_M.gguf`	`Q5_K_M`	19.92 GiB	48 GB+ preferred; measured on M1 Max 32 GB with tight headroom	quality-first public release for larger unified-memory Macs

Naming

OTQ: OpenTQ, the release/tooling brand.
TurboQuant: the quantization family and design direction.
DYN: dynamic tensor-level allocation; different tensor families receive different GGUF quant types.
Q3_K_M / Q4_K_M / Q5_K_M: standard GGUF quant names recognized by Hugging Face and stock llama.cpp.

Which File Should I Use?

Q3_K_M: first pick for 32 GB Apple Silicon and larger app/tool contexts.
Q4_K_M: quality-balanced pick; usable on 32 GB at moderate context, more comfortable on 48 GB+.
Q5_K_M: quality-first pick; measured on M1 Max 32 GB, but 48 GB+ is the practical target.

Hardware Compatibility

Hardware	Status	Recommended artifact	Notes
M1 Max 32 GB	Measured	`Q3_K_M`; `Q4_K_M`; `Q5_K_M` tight	`Q5_K_M` passed 8K gates but leaves limited app/headroom.
32 GB Apple Silicon	Expected	`Q3_K_M`; `Q4_K_M` only with care	Capacity guidance for M-series systems with similar usable unified memory.
48 GB Apple Silicon	Expected	`Q4_K_M`; `Q5_K_M`	Recommended floor for comfortable Q5 use.
64 GB+ Apple Silicon	Expected	`Q5_K_M` quality-first	Best local target for Q5 plus larger contexts and other apps.
16 GB Apple Silicon	Not recommended	None	Current 27B artifacts leave too little memory headroom.

Expected rows are capacity guidance, not measured benchmark claims. Q5_K_M is measured on M1 Max 32 GB, but 48 GB+ is the practical recommendation for comfortable use.

Model Overview

Base model field	Value
Base model	`Qwen/Qwen3.6-27B`
Parameter class	27B dense model
HF architecture	`Qwen3_5ForConditionalGeneration`
Layer count	64 language layers
Hidden size	5120
Native context	262,144 tokens in the base model; practical local context depends on RAM, KV/cache settings and apps
Public GGUF modality	text inference release track
Runtime target	Apple Silicon Metal through stock `llama.cpp`

Runtime Compatibility

llama.cpp, llama-cli, llama-server: supported.
LM Studio and Ollama local GGUF import: expected to work as standard GGUF loaders.
OpenTQ custom runtime: not required for this repo.
Native TurboQuant/OpenTQ tensor formats: separate release track, not mixed into this GGUF repo.
MLX: not the target runtime for this GGUF track.

Quick Start

1. Download A GGUF

hf download zlaabsi/Qwen3.6-27B-OTQ-GGUF Qwen3.6-27B-OTQ-DYN-Q3_K_M.gguf --local-dir models/Qwen3.6-27B-OTQ-GGUF

Use Q3_K_M first on 32 GB Macs. Use Q4_K_M when you can afford the extra memory. Use Q5_K_M for quality-first local inference when headroom matters less than fidelity.

2. Build llama.cpp With Metal

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_METAL=ON -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=ON
cmake --build build -j

3. Run Locally

./build/bin/llama-cli \
  -m models/Qwen3.6-27B-OTQ-GGUF/Qwen3.6-27B-OTQ-DYN-Q3_K_M.gguf \
  -ngl 99 \
  -fa \
  -c 8192 \
  --temp 0.6 \
  --top-p 0.95 \
  -p "<|im_start|>user\nExplain the tradeoff between prefill and decode throughput.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"

4. Serve An OpenAI-Compatible API

./build/bin/llama-server \
  -m models/Qwen3.6-27B-OTQ-GGUF/Qwen3.6-27B-OTQ-DYN-Q3_K_M.gguf \
  -ngl 99 \
  -fa \
  -c 8192 \
  --host 0.0.0.0 \
  --port 8080

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.6-27b-otq","messages":[{"role":"user","content":"Give me a 3-bullet summary of OpenTQ."}],"temperature":0.6}'

llama.cpp Settings

Setting	Recommended value	Why
GPU layers	`-ngl 99`	Offload all supported layers to Metal on Apple Silicon
FlashAttention	`-fa` / `-fa on`	Critical for long-context prefill wall-clock
Context	`-c 8192` first	Validated release gate; increase only after checking memory headroom
Prompt format	Qwen chat template	Keep `<
Sampling	`--temp 0.6 --top-p 0.95`	Good default for general chat; tighten for deterministic evals
Server	`llama-server`	Use for OpenAI-compatible local apps and agents

Apple Silicon Guide

Machine class	Recommendation
32 GB MacBook Pro / Mac Studio	Prefer `Q3_K_M` for headroom, especially with agentic prompts and other apps open.
48-64 GB Apple Silicon	Prefer `Q4_K_M` for balance; use `Q5_K_M` for quality-first local inference.
96 GB+ Apple Silicon	Prefer `Q5_K_M`; larger native/custom candidates remain separate until runtime gates pass.
Agent workloads with large tool context	Measure total wall-clock time. Decode-only tok/s hides prefill cost.

Benchmarks

Variant	Test	Throughput	Backend	Size
`Q3_K_M`	pp8192	107.09 +/- 0.00	MTL,BLAS	13.47 GiB
`Q3_K_M`	tg128	10.19 +/- 0.00	MTL,BLAS	13.47 GiB
`Q4_K_M`	pp8192	106.98 +/- 0.00	MTL,BLAS	16.81 GiB
`Q4_K_M`	tg128	9.62 +/- 0.00	MTL,BLAS	16.81 GiB
`Q5_K_M`	pp8192	93.94 +/- 0.00	MTL,BLAS	19.91 GiB
`Q5_K_M`	tg128	8.87 +/- 0.00	MTL,BLAS	19.91 GiB

The plots compare the quantized OTQ artifacts against each other on measured release data. Official Qwen scores are kept as a reference table, not plotted as a fake delta.

Practical Mini-Subset Quality Signals

See Paired BF16-vs-GGUF Quality Signal. The table and chart are placed near the top of this card because they are the main same-subset quantization-regression evidence.

Release Evaluation

Variant	Suite	Passed	Pass rate	Mean latency	p95 latency
`Q3_K_M`	smoke	5/5	1.0	7.605s	22.371s
`Q3_K_M`	release	10/10	1.0	9.325s	26.905s
`Q4_K_M`	smoke	5/5	1.0	8.333s	23.826s
`Q4_K_M`	release	10/10	1.0	9.907s	21.395s
`Q5_K_M`	smoke	5/5	1.0	16.046s	34.387s
`Q5_K_M`	release	10/10	1.0	16.955s	34.58s

Release Gate

Variant	Metadata	Bounded generation	8K llama-bench	Smoke gate	Release gate	Timestamp
`Q3_K_M`	passed	passed (24.246s)	passed (91.371s)	5/5	10/10	`2026-04-27T19:38:50.320253+00:00`
`Q4_K_M`	passed	passed (22.348s)	passed (93.163s)	5/5	10/10	`2026-04-27T19:43:25.174228+00:00`
`Q5_K_M`	passed	passed (44.272s)	passed (119.964s)	5/5	10/10	`2026-04-28T23:18:17.700281+00:00`

Official Baseline vs OTQ Claims

Item	Status
Official Qwen3.6-27B source scores	Imported from the official model card into `benchmarks/official_qwen36_baseline.csv`
OTQ `Q3_K_M` / `Q4_K_M` / `Q5_K_M` runtime	Measured with `llama-bench` on M1 Max
OTQ functional release gates	Measured with deterministic smoke and extended suites
Official benchmark deltas	Not claimed yet; requires running the same tasks/scoring on the GGUF artifacts

Transparency Files

Each variant has full release evidence under evidence/<quant>/:

validation.json
quality-eval.json
release-eval.json
opentq-plan.json
tensor-types.txt
tensor-types.annotated.tsv
quantize-dry-run.log

Reproduce Release Evidence

git clone https://github.com/zlaabsi/opentq
cd opentq
uv sync
uv run python scripts/stage_qwen36_otq_gguf_repo.py
uv run python scripts/build_qwen36_release_report.py --repo artifacts/hf-gguf-canonical/Qwen3.6-27B-OTQ-GGUF

Run the same style of OTQ release evaluation:

LLAMA_CPP_DIR=/path/to/llama.cpp ./scripts/run_qwen36_otq_eval.sh

Run the long-context benchmark directly:

./build/bin/llama-bench \
  -m models/Qwen3.6-27B-OTQ-GGUF/Qwen3.6-27B-OTQ-DYN-Q3_K_M.gguf \
  -ngl 99 \
  -fa on \
  -p 8192 \
  -n 128 \
  -r 1 \
  --no-warmup

Downloads last month: 3,688

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

3-bit

4-bit

5-bit

Model tree for zlaabsi/Qwen3.6-27B-OTQ-GGUF

Base model

Qwen/Qwen3.6-27B

Quantized

(291)

this model

Collection including zlaabsi/Qwen3.6-27B-OTQ-GGUF

OpenTQ Qwen3.6 GGUF Releases

Collection

OpenTQ TurboQuant dynamic-compatible Qwen3.6-27B GGUFs for stock llama.cpp on Apple Silicon. • 1 item • Updated 11 days ago • 1