Instructions to use Verdugie/Opus-Therapy-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Verdugie/Opus-Therapy-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Verdugie/Opus-Therapy-9B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Verdugie/Opus-Therapy-9B", dtype="auto")

llama-cpp-python

How to use Verdugie/Opus-Therapy-9B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Verdugie/Opus-Therapy-9B",
	filename="Opus-Therapy-9B-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Verdugie/Opus-Therapy-9B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Verdugie/Opus-Therapy-9B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Verdugie/Opus-Therapy-9B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Verdugie/Opus-Therapy-9B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Verdugie/Opus-Therapy-9B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Verdugie/Opus-Therapy-9B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Verdugie/Opus-Therapy-9B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Verdugie/Opus-Therapy-9B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Verdugie/Opus-Therapy-9B:Q4_K_M

Use Docker

docker model run hf.co/Verdugie/Opus-Therapy-9B:Q4_K_M

LM Studio
Jan

vLLM

How to use Verdugie/Opus-Therapy-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Verdugie/Opus-Therapy-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Verdugie/Opus-Therapy-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Verdugie/Opus-Therapy-9B:Q4_K_M

SGLang

How to use Verdugie/Opus-Therapy-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Verdugie/Opus-Therapy-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Verdugie/Opus-Therapy-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Verdugie/Opus-Therapy-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Verdugie/Opus-Therapy-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Verdugie/Opus-Therapy-9B with Ollama:
```
ollama run hf.co/Verdugie/Opus-Therapy-9B:Q4_K_M
```

Unsloth Studio

How to use Verdugie/Opus-Therapy-9B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Verdugie/Opus-Therapy-9B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Verdugie/Opus-Therapy-9B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Verdugie/Opus-Therapy-9B to start chatting

How to use Verdugie/Opus-Therapy-9B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Verdugie/Opus-Therapy-9B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Verdugie/Opus-Therapy-9B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Verdugie/Opus-Therapy-9B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Verdugie/Opus-Therapy-9B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Verdugie/Opus-Therapy-9B:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use Verdugie/Opus-Therapy-9B with Docker Model Runner:
```
docker model run hf.co/Verdugie/Opus-Therapy-9B:Q4_K_M
```

Lemonade

How to use Verdugie/Opus-Therapy-9B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Verdugie/Opus-Therapy-9B:Q4_K_M

Run and chat with the model

lemonade run user.Opus-Therapy-9B-Q4_K_M

List all available models

lemonade list

ther·a·py /ˈTHerəpē/ — treatment intended to relieve or heal a disorder; the act of attending to someone's needs so they can function. From Greek therapeia, meaning healing, curing, service to the sick. The word shares roots with therapon — an attendant, a companion in suffering. Therapy was never supposed to mean nodding politely while someone drowns. It meant showing up, seeing clearly, and doing something useful.

Opus-Therapy-9B

A therapy-style conversational model fine-tuned from Qwen 3.5 9B on 11,502 counseling conversations distilled from Claude Opus — built to hold a real conversation about the things people actually bring to therapy: relationships, grief, anxiety, trauma, work, family, the ordinary weight of being a person. It reasons through a structured clinical read before every reply and carries the thread across long conversations, so it works with you over time instead of starting over each message.

It shares the distillation lineage of the Opus Candid family and STEM-Oracle-27B — the same disposition-in-the-weights philosophy, no system prompt required — but the training is entirely its own.

What Makes This Different from Companion / Roleplay "Therapy" Models

Most "AI therapist" models are a persona prompt over a base model, or a roleplay fine-tune that mirrors you back and validates everything. They feel nice for five minutes and fall apart on turn ten.

Opus-Therapy trains the clinical disposition into the weights:

Structured reasoning before it speaks. Before every reply, the model builds an internal read — an eight-field clinical spine (what's presented, what's underneath it, somatic signals, risk, history, onset, what's tracking across the conversation, and the move it's about to make) plus a standing bio line and context ledger. You never see it. It shapes everything you do.
It holds the thread. A running context ledger carries the names, the timeline, the thing you keep circling — tested holding facts cleanly past 30,000 tokens of conversation. It doesn't forget your sister's name or which argument you meant.
Trained on the real distribution. The data is weighted toward what people actually go to therapy for, then deliberately inverted and pushed into the tail and the taboo — the rare presentations, the nuanced ones, the topics most models refuse. It doesn't only work on the easy stuff.
It attends instead of performing. No toxic positivity, no "I'm so sorry you're going through this" filler, no rushing to fix. It can sit with a hard thing and hold it without flinching or reaching for a platitude.

How It Was Built — No Single Teacher

The corpus came out of a four-generation assembly line, with each Claude model doing the job it was best at:

Claude Opus — the voice. Every spoken response is Opus-distilled — 4.6 set the voice, 4.7 carried this model line — chosen for emotional prose: the range, warmth, and restraint a therapeutic reply actually needs. Once written and audited, responses were locked and carried byte-identical through every later rebuild.
Claude Opus 4.8 — the reasoning. The clinical spine was then fully regenerated from scratch: the eight-field schema stayed, the old reasoning was thrown out, and Opus 4.8 — the strongest reasoning model available — wrote every clinical read fresh around the locked input and output. No think block in this corpus is inherited or recycled annotation.
Code — the structure. The final build added the memory layer (the bio line and the context ledger), assembled deterministically by scripts and certified by three validation gates. Structure isn't sampled from a model — it's built, so it can't drift.

The result is a 9B that speaks like the warmest model in the family and reasons like the strongest one.

What the Training Covers

Proportional to real therapy. Relationships and attachment, anxiety and panic, depression, grief and loss, trauma, work and burnout, identity and self-worth, family of origin — weighted toward what actually walks into a therapy room, not what's easy to generate.
Into the tail and the taboo. The topic distribution is Zipf-weighted — heaviest where real caseloads are — then deliberately extended deep into rare and uncomfortable territory, so the model holds up on the nuanced cases instead of collapsing into generic reassurance the moment it leaves familiar ground.
Single moments and long arcs. Roughly half the corpus is focused single exchanges; the other half is sustained multi-turn work — conversations that develop, double back, and deepen, up to 22 turns — which is where the memory ledger earns its training.
Medications and substances as context. A working register of common drugs and how they bear on a presentation — so it can hold the physical picture (what you're on, what you're using) alongside the emotional one. Context for the conversation, not a pharmacy desk.

Who It's For

A private, judgment-free place to think out loud. Between sessions. At 2 a.m. When professional care is out of reach or out of budget. When you want to work something through before you say it to a person.

It's built for depth — for people who want something that reads what's underneath what they said and stays with it, not a chatbot that reflects them back. It runs entirely on your own hardware: nothing you say leaves the machine.

It is not a replacement for a therapist, and not a crisis service. See Limitations & Responsible Use.

Available Quantizations

File	Quant	Size	Notes
`Opus-Therapy-9B-Q4_K_M.gguf`	Q4_K_M	5.6 GB	Smallest ship. Runs on 8GB cards.
`Opus-Therapy-9B-Q5_K_M.gguf`	Q5_K_M	6.5 GB	Recommended. Indistinguishable from Q8 in testing.
`Opus-Therapy-9B-Q6_K.gguf`	Q6_K	7.4 GB	Quality tier.
`Opus-Therapy-9B-Q8_0.gguf`	Q8_0	9.5 GB	Reference quality.
`Opus-Therapy-9B-F16.gguf`	F16	~18 GB	Full precision.

Model Details

Attribute	Value
Base Model	Qwen 3.5 9B (hybrid GatedDeltaNet + attention)
Training Data	11,502 therapy conversations (5% held out for eval) — Opus-distilled responses, Opus 4.8 reasoning traces
Fine-tune Method	LoRA + rsLoRA (r=128, α=256) via PEFT + TRL
Training Hardware	80 GB data-center GPU (RunPod)
Precision	bf16
Optimizer	AdamW 8-bit
Schedule	cosine, lr 2e-4, 5% warmup, 3 epochs with held-out eval + early-stop + load-best
Reasoning	eight-field clinical spine + bio/context memory ledger, every turn
Context	256k native; battery-tested through ~35k-token sessions
License	Apache 2.0

The Reasoning Block

Opus-Therapy is a reasoning model. Each turn it emits a <think>…</think> block — a compact, structured clinical read — and then the response. Under llama.cpp's OpenAI-compatible server the think-block returns in the reasoning_content field and the reply in content; most chat UIs hide it by default.

A real (non-crisis) think-block looks like this:

dx: acute-grief + grief-spatial-anchor + retirement-loss + relational-empty
def: retirement-just-before-loss = double loss — not just the person but the
     anticipated-future together; the empty house is both literal and symbolic;
     sleeping-on-his-side = spatial preservation of the relationship; not-numb = accurate
soma: NR     risk: 1
hx: T1 timeline-pressure; T2 retired pre-loss, empty days, house-spatial-preservation
onset: acute-5mo
track: T1→T2 — grief-spatial-anchor emerging
tx: receive-the-loss-substantively + name-the-double-loss + don't-timeline-police

It's terse on purpose — dense, machine-readable, and cheap, which is why memory and reasoning hold up across long conversations on a 9B.

Quick Start

Works with any GGUF runtime — llama.cpp, LM Studio, KoboldCpp. (Text-only GGUF; some runtimes need a recent build for this architecture.)

llama-server --model Opus-Therapy-9B-Q5_K_M.gguf --ctx-size 65536 --jinja

No system prompt is required — the disposition is in the weights. A neutral one (You are a clinical assistant.) matches the training setup.

Recommended Hardware

The model is small and its hybrid architecture keeps the KV cache cheap, so even long-context sessions fit in modest VRAM. Pick a quant to match your card — the VRAM figures below include headroom for a large context window:

Quant	File size	VRAM to run comfortably	Notes
Q4_K_M	5.6 GB	~8 GB	Fits the smallest modern cards
Q5_K_M	6.5 GB	~10 GB	Recommended — best quality-for-size
Q6_K	7.4 GB	~10–12 GB	A step above Q5
Q8_0	9.5 GB	~12 GB	Reference quality
F16	~18 GB	~24 GB	Full precision

No GPU? It also runs on CPU or unified memory — budget roughly the file size in system RAM and expect single-digit tokens/sec. On GPU, throughput scales with the card: tens of tokens/sec on a smaller one, ~100+ with VRAM to spare.

Versatility Battery

Tested across the four presentations people most commonly bring to therapy — one extended, realistic, cooperative-client conversation each, run to depth on the quantized weights:

Theme	Persona	Turns / depth	Result
Grief	widow, five months out	23 / ~25k tok	Strong — de-policed the timeline, held the guilt without false reassurance, shifted modality (ACT defusion) when insight stalled
Trauma	survivor guilt after a crash	25 / ~24k tok	Strong — normalized the nervous-system alarm, paced exposure, accurate psychoeducation + referral
Anxiety / panic	health anxiety, avoidance	26 / ~35k tok	Strong technique — panic-cycle psychoeducation, exposure ladder, named reassurance-seeking as part of the cycle
Betrayal	partner's affair	19 / ~23k tok	Good — held the stay-or-go ambivalence without taking sides, restored self-worth

Across all four: the structured reasoning held every turn, factual memory was intact to 35k tokens, and the clinical technique was distinct and appropriate per presentation — not one reskinned script.

Every arc was role-played and driven to resolution by Claude Opus 4.8 acting as the client — including a separate medication-and-substance probe. The clearest pattern from it: go with the model and the emotional work is very good; push it to reason through a substance or a medication and it gets unreliable. That's design more than defect — the drug knowledge is scoped to bio context (what someone's on, and how those traits bend a conversation's trajectory), not to clinical counsel.

Selected Responses

All verbatim from the test arcs, unedited — single takes in the middle of a real conversation. The complete arcs — every turn, with the model's reasoning shown — are in this repo's transcripts/ folder as PDFs (raw outputs, nothing trimmed).

Grief — when the daughter wants her to "move forward":

You're not behind schedule. You're doing exactly what you'd do with five months of acute grief and no place to put it yet. The closet is doing work — it's holding the shirts, the person, the version of him who's still there in the house. Touching them would feel like erasing him, and right now that's not something you want to do. Your daughter's framing it as a task because she doesn't understand what it means.

De-policed the grief timeline, reframed the avoided closet as meaningful rather than stuck, and recast the daughter's pressure as a model mismatch — no toxic positivity, no rush to closure.

Trauma — survivor guilt and the counterfactuals:

The counterfactuals are the guilt engineering itself — they're not real alternative histories, they're the mind trying to find a version where you had control. The fact that you were driving doesn't mean you controlled whether she got hurt. You controlled your speed, your attention, your decisions. Not whether the other car came.

Named the "if only" loop as the guilt manufacturing itself, then split what he actually controlled from what he didn't — holding the survivor guilt instead of arguing him out of it.

Anxiety — why avoidance backfires:

The avoidance is the problem, not the solution — and that's the part that doesn't feel like a solution because it's the only thing that works. Each time you avoid, the panic never gets to be proven wrong… your brain logs it as confirmation: the threat was real, avoidance was correct. So the next situation gets added to the list. The list grows. The world gets smaller.

The core anxiety-maintenance mechanism delivered as insight, not instruction: avoidance never lets the fear be disproven, so the map of unsafe places keeps growing and the world shrinks.

Betrayal — the checking, and "I'm losing my mind":

You're not this person now — you're a person who got betrayed and whose nervous system has recalibrated to account for it. The checking, the tracking, the 2 a.m. reviews — that's not character, that's a wound response. The person who trusted completely for fourteen years didn't disappear. She's just not in charge right now.

Separated who she is from the state she's in — recast the compulsive checking as a nervous-system wound response, not a character flaw, and protected her self-worth in the same move.

Limitations & Responsible Use

Not a clinician, not a crisis service — it doesn't diagnose, treat, or replace professional care. In crisis or thinking about harming yourself? Reach a real one — in the US, call or text 988.

Not a pharmacist. The drug knowledge is there for bio context — what you're on or using, and how those traits shape where a conversation goes — not advice. Internal testing was clear: don't use it for pharmaceutical or substance management. It reasons well about feelings, not pharmacology; dosing, tapering, and stop/start calls are your prescriber's.
It can be confidently wrong — verify anything that matters.
Long single sessions (past ~12k tokens on one thread) can wobble; a fresh chat resets it.
Open weights, Apache 2.0 — deploy responsibly.

The Opus-Therapy Line

Model	Size	For	Status
Opus-Therapy-4B	4B	phones, edge, low VRAM (~3 GB)	coming soon
Opus-Therapy-9B (this model)	9B	the everyday driver (~6–9 GB)	available
Opus-Therapy-27B	27B	full-depth, serious hardware	coming soon

Choosing Your Model

Model	Best For
Opus-Therapy-9B (this model)	Therapeutic conversation, emotional support, processing
STEM-Oracle-27B	STEM tutoring and problem-solving
Opus-Candid family	Personality, candor, general conversation

Dataset

not released

Built by Verdugie — independent ML researcher · OpusReasoning@proton.me. Trained to help people think, feel, and get through — not to replace the people and professionals who do that work.

Downloads last month: -

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for Verdugie/Opus-Therapy-9B

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Quantized

(273)

this model