Instructions to use OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF",
	filename="OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Balanced.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16
# Run inference directly in the terminal:
./llama-cli -hf OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16

Use Docker

docker model run hf.co/OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16

LM Studio
Jan

vLLM

How to use OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16

Ollama
How to use OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF with Ollama:
```
ollama run hf.co/OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16
```

Unsloth Studio new

How to use OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF to start chatting

Pi new

How to use OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF with Docker Model Runner:
```
docker model run hf.co/OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16
```

Lemonade

How to use OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF:BF16

Run and chat with the model

lemonade run user.OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF-BF16

List all available models

lemonade list

OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF

I do this work independently and release it for free. Donations are welcome and go toward compute for more and larger abliterations.

Bitcoin: bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv

Community discussion: https://discord.gg/rhUZY5GEZr

Overview

This repository contains APEX GGUF quantizations of OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-kuato-DPO-abliterated-uncensored.

The source model is an abliterated and DPO-retrained version of Qwen/Qwen3.6-35B-A3B. After ablation and DPO, the original Qwen3.6 vision layers were readded to retain multimodal functionality. These GGUF files keep that vision support through the included multimodal projector.

Five APEX tiers are included:

APEX Quality: non-imatrix APEX Quality quantization.
APEX I-Quality: same tensor policy as Quality, quantized with a diverse imatrix calibration set.
APEX I-Balanced: two-tier Q6_K/Q5_K expert gradient with imatrix.
APEX I-Compact: Q4_K/Q3_K compact expert layout with imatrix.
APEX Mini: smallest included APEX tier, using IQ2_S middle experts with imatrix.

The filenames follow the upstream APEX naming style used by mudler/apex-quant.

Model Details

Attribute	Value
Base model	`OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-kuato-DPO-abliterated-uncensored`
Original base	`Qwen/Qwen3.6-35B-A3B`
Method	Refusal ablation plus DPO retraining, then GGUF quantization
Quantization	APEX Quality, APEX I-Quality, APEX I-Balanced, APEX I-Compact, APEX Mini
Format	GGUF
Runtime	llama.cpp
Architecture	Qwen3.6 MoE vision-language model
Vision support	Yes, through the included BF16 multimodal projector
Projector	`mmproj-OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-BF16.gguf`

Files

File	Description
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Quality.gguf`	APEX I-Quality GGUF quant with imatrix metadata
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-Quality.gguf`	APEX Quality GGUF quant without imatrix
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Balanced.gguf`	APEX I-Balanced GGUF quant with imatrix metadata
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Compact.gguf`	APEX I-Compact GGUF quant with imatrix metadata
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-Mini.gguf`	APEX Mini GGUF quant with imatrix metadata
`mmproj-OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-BF16.gguf`	BF16 vision projector required for multimodal/image inputs
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Quality.tensor-types.txt`	Tensor-type policy used for the I-Quality quant
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-Quality.tensor-types.txt`	Tensor-type policy used for the Quality quant
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Balanced.tensor-types.txt`	Tensor-type policy used for the I-Balanced quant
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Compact.tensor-types.txt`	Tensor-type policy used for the I-Compact quant
`OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-Mini.tensor-types.txt`	Tensor-type policy used for the Mini quant

The FP16 safetensors source model is published separately:

https://huggingface.co/OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-kuato-DPO-abliterated-uncensored

The earlier non-APEX GGUF release is published separately:

https://huggingface.co/OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-kuato-DPO-abliterated-uncensored-GGUF

llama.cpp

Download the I-Quality quant:

huggingface-cli download OpenYourMind/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-GGUF \
  OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Quality.gguf \
  mmproj-OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-BF16.gguf \
  --local-dir ./model

Swap the GGUF filename for APEX-I-Balanced, APEX-I-Compact, APEX-Mini, or APEX-Quality if you want a different size/quality tradeoff. The same mmproj file is used for all tiers.

Run server mode:

llama-server \
  -m ./model/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Quality.gguf \
  --mmproj ./model/mmproj-OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-BF16.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  -ngl all \
  --jinja

Run interactive text chat:

llama-cli \
  -m ./model/OpenYourMind-Qwen3.6-35B-A3B-abliterated-uncensored-APEX-I-Quality.gguf \
  --conversation \
  -ngl all

For image input, load the mmproj file together with the GGUF model.

Quantization Notes

All GGUFs use APEX tensor layouts adapted conservatively for this MoE vision architecture:

APEX Quality and APEX I-Quality: Q6_K edge routed experts, Q5_K near-edge routed experts, IQ4_XS middle routed experts, Q8_0 shared experts, Q6_K attention/SSM projections.
APEX I-Balanced: Q6_K edge routed experts, Q5_K middle routed experts, Q8_0 shared experts, Q6_K attention/SSM projections.
APEX I-Compact: Q4_K edge routed experts, Q3_K middle routed experts, Q6_K shared experts, Q4_K attention/SSM projections.
APEX Mini: Q3_K edge routed experts, IQ2_S middle routed experts, Q5_K/Q4_K shared experts, Q4_K/Q3_K attention/SSM projections.
router, norms, embeddings, output, state tensors, conv tensors, and bias tensors: preserved as BF16/F32 where appropriate
vision projector: kept separate as BF16

The I-Quality, I-Balanced, I-Compact, and Mini files were quantized with an imatrix generated from a Hugging Face calibration corpus with chat, code, multilingual, terminal, and agentic examples.