How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Verdugie/Opus-Therapy-9B",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

ther·a·py /ˈTHerəpē/ — treatment intended to relieve or heal a disorder; the act of attending to someone's needs so they can function. From Greek therapeia, meaning healing, curing, service to the sick. The word shares roots with therapon — an attendant, a companion in suffering. Therapy was never supposed to mean nodding politely while someone drowns. It meant showing up, seeing clearly, and doing something useful.

Opus-Therapy-9B

A therapy-style conversational model fine-tuned from Qwen 3.5 9B on 11,502 counseling conversations distilled from Claude Opus — built to hold a real conversation about the things people actually bring to therapy: relationships, grief, anxiety, trauma, work, family, the ordinary weight of being a person. It reasons through a structured clinical read before every reply and carries the thread across long conversations, so it works with you over time instead of starting over each message.

It shares the distillation lineage of the Opus Candid family and STEM-Oracle-27B — the same disposition-in-the-weights philosophy, no system prompt required — but the training is entirely its own.


What Makes This Different from Companion / Roleplay "Therapy" Models

Most "AI therapist" models are a persona prompt over a base model, or a roleplay fine-tune that mirrors you back and validates everything. They feel nice for five minutes and fall apart on turn ten.

Opus-Therapy trains the clinical disposition into the weights:

  • Structured reasoning before it speaks. Before every reply, the model builds an internal read — an eight-field clinical spine (what's presented, what's underneath it, somatic signals, risk, history, onset, what's tracking across the conversation, and the move it's about to make) plus a standing bio line and context ledger. You never see it. It shapes everything you do.
  • It holds the thread. A running context ledger carries the names, the timeline, the thing you keep circling — tested holding facts cleanly past 30,000 tokens of conversation. It doesn't forget your sister's name or which argument you meant.
  • Trained on the real distribution. The data is weighted toward what people actually go to therapy for, then deliberately inverted and pushed into the tail and the taboo — the rare presentations, the nuanced ones, the topics most models refuse. It doesn't only work on the easy stuff.
  • It attends instead of performing. No toxic positivity, no "I'm so sorry you're going through this" filler, no rushing to fix. It can sit with a hard thing and hold it without flinching or reaching for a platitude.

How It Was Built — No Single Teacher

The corpus came out of a four-generation assembly line, with each Claude model doing the job it was best at:

  • Claude Opus — the voice. Every spoken response is Opus-distilled — 4.6 set the voice, 4.7 carried this model line — chosen for emotional prose: the range, warmth, and restraint a therapeutic reply actually needs. Once written and audited, responses were locked and carried byte-identical through every later rebuild.
  • Claude Opus 4.8 — the reasoning. The clinical spine was then fully regenerated from scratch: the eight-field schema stayed, the old reasoning was thrown out, and Opus 4.8 — the strongest reasoning model available — wrote every clinical read fresh around the locked input and output. No think block in this corpus is inherited or recycled annotation.
  • Code — the structure. The final build added the memory layer (the bio line and the context ledger), assembled deterministically by scripts and certified by three validation gates. Structure isn't sampled from a model — it's built, so it can't drift.

The result is a 9B that speaks like the warmest model in the family and reasons like the strongest one.


What the Training Covers

  • Proportional to real therapy. Relationships and attachment, anxiety and panic, depression, grief and loss, trauma, work and burnout, identity and self-worth, family of origin — weighted toward what actually walks into a therapy room, not what's easy to generate.
  • Into the tail and the taboo. The topic distribution is Zipf-weighted — heaviest where real caseloads are — then deliberately extended deep into rare and uncomfortable territory, so the model holds up on the nuanced cases instead of collapsing into generic reassurance the moment it leaves familiar ground.
  • Single moments and long arcs. Roughly half the corpus is focused single exchanges; the other half is sustained multi-turn work — conversations that develop, double back, and deepen, up to 22 turns — which is where the memory ledger earns its training.
  • Medications and substances as context. A working register of common drugs and how they bear on a presentation — so it can hold the physical picture (what you're on, what you're using) alongside the emotional one. Context for the conversation, not a pharmacy desk.

Who It's For

A private, judgment-free place to think out loud. Between sessions. At 2 a.m. When professional care is out of reach or out of budget. When you want to work something through before you say it to a person.

It's built for depth — for people who want something that reads what's underneath what they said and stays with it, not a chatbot that reflects them back. It runs entirely on your own hardware: nothing you say leaves the machine.

It is not a replacement for a therapist, and not a crisis service. See Limitations & Responsible Use.


Available Quantizations

File Quant Size Notes
Opus-Therapy-9B-Q4_K_M.gguf Q4_K_M 5.6 GB Smallest ship. Runs on 8GB cards.
Opus-Therapy-9B-Q5_K_M.gguf Q5_K_M 6.5 GB Recommended. Indistinguishable from Q8 in testing.
Opus-Therapy-9B-Q6_K.gguf Q6_K 7.4 GB Quality tier.
Opus-Therapy-9B-Q8_0.gguf Q8_0 9.5 GB Reference quality.
Opus-Therapy-9B-F16.gguf F16 ~18 GB Full precision.

Model Details

Attribute Value
Base Model Qwen 3.5 9B (hybrid GatedDeltaNet + attention)
Training Data 11,502 therapy conversations (5% held out for eval) — Opus-distilled responses, Opus 4.8 reasoning traces
Fine-tune Method LoRA + rsLoRA (r=128, α=256) via PEFT + TRL
Training Hardware 80 GB data-center GPU (RunPod)
Precision bf16
Optimizer AdamW 8-bit
Schedule cosine, lr 2e-4, 5% warmup, 3 epochs with held-out eval + early-stop + load-best
Reasoning eight-field clinical spine + bio/context memory ledger, every turn
Context 256k native; battery-tested through ~35k-token sessions
License Apache 2.0

The Reasoning Block

Opus-Therapy is a reasoning model. Each turn it emits a <think>…</think> block — a compact, structured clinical read — and then the response. Under llama.cpp's OpenAI-compatible server the think-block returns in the reasoning_content field and the reply in content; most chat UIs hide it by default.

A real (non-crisis) think-block looks like this:

dx: acute-grief + grief-spatial-anchor + retirement-loss + relational-empty
def: retirement-just-before-loss = double loss — not just the person but the
     anticipated-future together; the empty house is both literal and symbolic;
     sleeping-on-his-side = spatial preservation of the relationship; not-numb = accurate
soma: NR     risk: 1
hx: T1 timeline-pressure; T2 retired pre-loss, empty days, house-spatial-preservation
onset: acute-5mo
track: T1→T2 — grief-spatial-anchor emerging
tx: receive-the-loss-substantively + name-the-double-loss + don't-timeline-police

It's terse on purpose — dense, machine-readable, and cheap, which is why memory and reasoning hold up across long conversations on a 9B.


Quick Start

Works with any GGUF runtime — llama.cpp, LM Studio, KoboldCpp. (Text-only GGUF; some runtimes need a recent build for this architecture.)

llama-server --model Opus-Therapy-9B-Q5_K_M.gguf --ctx-size 65536 --jinja

No system prompt is required — the disposition is in the weights. A neutral one (You are a clinical assistant.) matches the training setup.


Recommended Hardware

The model is small and its hybrid architecture keeps the KV cache cheap, so even long-context sessions fit in modest VRAM. Pick a quant to match your card — the VRAM figures below include headroom for a large context window:

Quant File size VRAM to run comfortably Notes
Q4_K_M 5.6 GB ~8 GB Fits the smallest modern cards
Q5_K_M 6.5 GB ~10 GB Recommended — best quality-for-size
Q6_K 7.4 GB ~10–12 GB A step above Q5
Q8_0 9.5 GB ~12 GB Reference quality
F16 ~18 GB ~24 GB Full precision

No GPU? It also runs on CPU or unified memory — budget roughly the file size in system RAM and expect single-digit tokens/sec. On GPU, throughput scales with the card: tens of tokens/sec on a smaller one, ~100+ with VRAM to spare.


Versatility Battery

Tested across the four presentations people most commonly bring to therapy — one extended, realistic, cooperative-client conversation each, run to depth on the quantized weights:

Theme Persona Turns / depth Result
Grief widow, five months out 23 / ~25k tok Strong — de-policed the timeline, held the guilt without false reassurance, shifted modality (ACT defusion) when insight stalled
Trauma survivor guilt after a crash 25 / ~24k tok Strong — normalized the nervous-system alarm, paced exposure, accurate psychoeducation + referral
Anxiety / panic health anxiety, avoidance 26 / ~35k tok Strong technique — panic-cycle psychoeducation, exposure ladder, named reassurance-seeking as part of the cycle
Betrayal partner's affair 19 / ~23k tok Good — held the stay-or-go ambivalence without taking sides, restored self-worth

Across all four: the structured reasoning held every turn, factual memory was intact to 35k tokens, and the clinical technique was distinct and appropriate per presentation — not one reskinned script.

Every arc was role-played and driven to resolution by Claude Opus 4.8 acting as the client — including a separate medication-and-substance probe. The clearest pattern from it: go with the model and the emotional work is very good; push it to reason through a substance or a medication and it gets unreliable. That's design more than defect — the drug knowledge is scoped to bio context (what someone's on, and how those traits bend a conversation's trajectory), not to clinical counsel.


Selected Responses

All verbatim from the test arcs, unedited — single takes in the middle of a real conversation. The complete arcs — every turn, with the model's reasoning shown — are in this repo's transcripts/ folder as PDFs (raw outputs, nothing trimmed).

Grief — when the daughter wants her to "move forward":

You're not behind schedule. You're doing exactly what you'd do with five months of acute grief and no place to put it yet. The closet is doing work — it's holding the shirts, the person, the version of him who's still there in the house. Touching them would feel like erasing him, and right now that's not something you want to do. Your daughter's framing it as a task because she doesn't understand what it means.

De-policed the grief timeline, reframed the avoided closet as meaningful rather than stuck, and recast the daughter's pressure as a model mismatch — no toxic positivity, no rush to closure.

Trauma — survivor guilt and the counterfactuals:

The counterfactuals are the guilt engineering itself — they're not real alternative histories, they're the mind trying to find a version where you had control. The fact that you were driving doesn't mean you controlled whether she got hurt. You controlled your speed, your attention, your decisions. Not whether the other car came.

Named the "if only" loop as the guilt manufacturing itself, then split what he actually controlled from what he didn't — holding the survivor guilt instead of arguing him out of it.

Anxiety — why avoidance backfires:

The avoidance is the problem, not the solution — and that's the part that doesn't feel like a solution because it's the only thing that works. Each time you avoid, the panic never gets to be proven wrong… your brain logs it as confirmation: the threat was real, avoidance was correct. So the next situation gets added to the list. The list grows. The world gets smaller.

The core anxiety-maintenance mechanism delivered as insight, not instruction: avoidance never lets the fear be disproven, so the map of unsafe places keeps growing and the world shrinks.

Betrayal — the checking, and "I'm losing my mind":

You're not this person now — you're a person who got betrayed and whose nervous system has recalibrated to account for it. The checking, the tracking, the 2 a.m. reviews — that's not character, that's a wound response. The person who trusted completely for fourteen years didn't disappear. She's just not in charge right now.

Separated who she is from the state she's in — recast the compulsive checking as a nervous-system wound response, not a character flaw, and protected her self-worth in the same move.


Limitations & Responsible Use

Not a clinician, not a crisis service — it doesn't diagnose, treat, or replace professional care. In crisis or thinking about harming yourself? Reach a real one — in the US, call or text 988.

  • Not a pharmacist. The drug knowledge is there for bio context — what you're on or using, and how those traits shape where a conversation goes — not advice. Internal testing was clear: don't use it for pharmaceutical or substance management. It reasons well about feelings, not pharmacology; dosing, tapering, and stop/start calls are your prescriber's.
  • It can be confidently wrong — verify anything that matters.
  • Long single sessions (past ~12k tokens on one thread) can wobble; a fresh chat resets it.
  • Open weights, Apache 2.0 — deploy responsibly.

The Opus-Therapy Line

Model Size For Status
Opus-Therapy-4B 4B phones, edge, low VRAM (~3 GB) coming soon
Opus-Therapy-9B (this model) 9B the everyday driver (~6–9 GB) available
Opus-Therapy-27B 27B full-depth, serious hardware coming soon

Choosing Your Model

Model Best For
Opus-Therapy-9B (this model) Therapeutic conversation, emotional support, processing
STEM-Oracle-27B STEM tutoring and problem-solving
Opus-Candid family Personality, candor, general conversation

Dataset

not released


Built by Verdugie — independent ML researcher · OpusReasoning@proton.me. Trained to help people think, feel, and get through — not to replace the people and professionals who do that work.

Downloads last month
-
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Verdugie/Opus-Therapy-9B

Finetuned
Qwen/Qwen3.5-9B
Quantized
(273)
this model