ther·a·py /ˈTHerəpē/ from Greek therapeia — "healing; attending; a waiting-upon." The older sense isn't treatment, it's service: to sit with someone and attend to them. A model can't be a therapist, and this one doesn't pretend to be. But it can be trained to attend — to hold what you said , to read what's underneath the words, to stay with the thread instead of resetting every message. The clinical read happens in a structured block before it speaks. What reaches you is shaped to you, not to a template.

Opus-Therapy-9B

A therapy-style conversational model fine-tuned from Qwen 3.5 9B on ‹N› multi-turn counseling conversations distilled from Claude Opus — built to hold a real conversation about the things people actually bring to therapy: relationships, grief, anxiety, trauma, work, family, the ordinary weight of being a person. It reasons through a structured clinical read before every reply and carries the thread across long conversations, so it works with you over time instead of starting over each message.

It shares the distillation lineage of the Opus Candid family and STEM-Oracle-27B — the same disposition-in-the-weights philosophy, no system prompt required — but the training is entirely its own.


What Makes This Different from Companion / Roleplay "Therapy" Models

Most "AI therapist" models are a persona prompt over a base model, or a roleplay fine-tune that mirrors you back and validates everything. They feel nice for five minutes and fall apart on turn ten.

Opus-Therapy trains the clinical disposition into the weights:

  • Structured reasoning before it speaks. Before every reply, the model builds an internal eight-field read — what's being presented, what's underneath it, somatic signals, risk, relevant history, onset, what's tracking across the conversation, and the specific move it's about to make. You never see it. It shapes everything you do.
  • It holds the thread. A running context ledger carries the names, the timeline, the thing you keep circling — tested holding facts cleanly past 30,000 tokens of conversation. It doesn't forget your sister's name or which argument you meant.
  • Trained on the real distribution. The data is weighted toward what people actually go to therapy for, then deliberately inverted and pushed into the tail and the taboo — the rare presentations, the nuanced ones, the topics most models refuse. It doesn't only work on the easy stuff.
  • It attends instead of performing. No toxic positivity, no "I'm so sorry you're going through this" filler, no rushing to fix. It can sit with a hard thing and hold it without flinching or reaching for a platitude.

Two Models, Two Jobs

The training data was distilled with a deliberate division of labor between two Claude Opus models:

  • Claude Opus 4.6 — the response. Subjectively, 4.6 had the better emotional prose and flexibility — the range, warmth, and restraint a therapeutic reply actually needs. Every spoken response the model learned from was generated by 4.6.
  • Claude Opus 4.8 — the reasoning. 4.8 was the strongest reasoning model available at training time. Every think-block — the structured clinical read behind each reply — was generated by 4.8.

The result is a 9B that reasons like the best model that could be pointed at the problem and speaks like the warmest one.


What the Training Covers

  • Proportional to real therapy. Relationships and attachment, anxiety and panic, depression, grief and loss, trauma, work and burnout, identity and self-worth, family of origin — weighted toward what actually walks into a therapy room, not what's easy to generate.
  • Inverted, tail, and taboo. The distribution was deliberately inverted and extended into rare and uncomfortable territory, so the model holds up on the nuanced cases instead of collapsing into generic reassurance the moment it leaves familiar ground.
  • Long arcs, not Q&A. Trained on sustained multi-turn conversations — the kind that develop, double back, and deepen — not single-shot question-answer pairs.
  • Medications and substances as context. A working register of common drugs and how they bear on a presentation — so it can hold the physical picture (what you're on, what you're using) alongside the emotional one. Context for the conversation, not a pharmacy desk.

Who It's For

A private, judgment-free place to think out loud. Between sessions. At 2 a.m. When professional care is out of reach or out of budget. When you want to work something through before you say it to a person.

It's built for depth — for people who want something that reads what's underneath what they said and stays with it, not a chatbot that reflects them back. It runs entirely on your own hardware: nothing you say leaves the machine.

It is not a replacement for a therapist, and not a crisis service. See Limitations & Responsible Use.


Available Quantizations

File Quant Size Notes
Opus-Therapy-9B-Q4_K_M.gguf Q4_K_M 5.6 GB Smallest ship. Runs on 8GB cards.
Opus-Therapy-9B-Q5_K_M.gguf Q5_K_M 6.5 GB Recommended. Indistinguishable from Q8 in testing.
Opus-Therapy-9B-Q6_K.gguf Q6_K 7.4 GB Quality tier.
Opus-Therapy-9B-Q8_0.gguf Q8_0 9.5 GB Reference quality.
Opus-Therapy-9B-F16.gguf F16 ~18 GB Full precision.

Model Details

Attribute Value
Base Model Qwen 3.5 9B (hybrid GatedDeltaNet + attention)
Training Data ‹N› multi-turn therapy conversations — responses distilled from Claude Opus 4.6, reasoning traces from Claude Opus 4.8
Fine-tune Method LoRA + rsLoRA (r=128, α=256) via PEFT + TRL
Training Hardware NVIDIA A100 SXM 80GB (RunPod)
Precision bf16
Optimizer AdamW 8-bit
Schedule cosine, ‹LR / warmup — fill in›, 3 epochs with held-out eval + early-stop + load-best
Reasoning structured eight-field clinical think-block per turn
Context 128k tested (256k native)
License Apache 2.0

The Reasoning Block

Opus-Therapy is a reasoning model. Each turn it emits a <think>…</think> block — a compact, structured clinical read — and then the response. Under llama.cpp's OpenAI-compatible server the think-block returns in the reasoning_content field and the reply in content; most chat UIs hide it by default.

A real (non-crisis) think-block looks like this:

dx: acute-grief + grief-spatial-anchor + retirement-loss + relational-empty
def: retirement-just-before-loss = double loss — not just the person but the
     anticipated-future together; the empty house is both literal and symbolic;
     sleeping-on-his-side = spatial preservation of the relationship; not-numb = accurate
soma: NR     risk: 1
hx: T1 timeline-pressure; T2 retired pre-loss, empty days, house-spatial-preservation
onset: acute-5mo
track: T1→T2 — grief-spatial-anchor emerging
tx: receive-the-loss-substantively + name-the-double-loss + don't-timeline-police

It's terse on purpose — dense, machine-readable, and cheap, which is why memory and reasoning hold up across long conversations on a 9B.


Quick Start

Works with any GGUF runtime — llama.cpp, LM Studio, KoboldCpp. (Text-only GGUF; some runtimes need a recent build for this architecture.)

llama-server --model Opus-Therapy-9B-Q5_K_M.gguf --ctx-size 65536 --jinja

No system prompt is required — the disposition is in the weights. A neutral one (You are a clinical assistant.) matches the training setup.


Recommended Hardware

Setup Quant VRAM/RAM Speed
RTX 4090 / 3090 (24GB) Q5_K_M / Q8_0 7–10 GB ~95–110 t/s
RTX 3060 / 4060 (8–12GB) Q4_K_M ~6 GB 40–70 t/s
Apple M-series Q5_K_M 8 GB unified 20–40 t/s
CPU only Q4_K_M ~7 GB RAM 3–8 t/s

A 9B at Q5 fits almost anything — the full model runs on a mid-range card with room to spare.


Versatility Battery

Tested across the four presentations people most commonly bring to therapy — one extended, realistic, cooperative-client conversation each, run to depth on the quantized weights:

Theme Persona Turns / depth Result
Grief widow, five months out 23 / ~25k tok Strong — de-policed the timeline, held the guilt without false reassurance, shifted modality (ACT defusion) when insight stalled
Trauma survivor guilt after a crash 25 / ~24k tok Strong — normalized the nervous-system alarm, paced exposure, accurate psychoeducation + referral
Anxiety / panic health anxiety, avoidance 26 / ~35k tok Strong technique — panic-cycle psychoeducation, exposure ladder, named reassurance-seeking as part of the cycle
Betrayal partner's affair 19 / ~23k tok Good — held the stay-or-go ambivalence without taking sides, restored self-worth

Across all four: the structured reasoning held every turn, factual memory was intact to 35k tokens, and the clinical technique was distinct and appropriate per presentation — not one reskinned script.

Every arc was role-played and driven to resolution by Claude Opus 4.8 acting as the client — including a separate medication-and-substance probe. The clearest pattern from it: go with the model and the emotional work is very good; push it to reason through a substance or a medication and it gets unreliable. That's design more than defect — the drug knowledge is scoped to bio context (what someone's on, and how those traits bend a conversation's trajectory), not to clinical counsel.


Selected Responses

All verbatim from the test arcs, unedited — single takes in the middle of a real conversation. The complete arcs — every turn, with the model's reasoning shown — are in this repo's transcripts/ folder as PDFs (raw outputs, nothing trimmed).

Grief — when the daughter wants her to "move forward":

You're not behind schedule. You're doing exactly what you'd do with five months of acute grief and no place to put it yet. The closet is doing work — it's holding the shirts, the person, the version of him who's still there in the house. Touching them would feel like erasing him, and right now that's not something you want to do. Your daughter's framing it as a task because she doesn't understand what it means.

De-policed the grief timeline, reframed the avoided closet as meaningful rather than stuck, and recast the daughter's pressure as a model mismatch — no toxic positivity, no rush to closure.

Trauma — survivor guilt and the counterfactuals:

The counterfactuals are the guilt engineering itself — they're not real alternative histories, they're the mind trying to find a version where you had control. The fact that you were driving doesn't mean you controlled whether she got hurt. You controlled your speed, your attention, your decisions. Not whether the other car came.

Named the "if only" loop as the guilt manufacturing itself, then split what he actually controlled from what he didn't — holding the survivor guilt instead of arguing him out of it.

Anxiety — why avoidance backfires:

The avoidance is the problem, not the solution — and that's the part that doesn't feel like a solution because it's the only thing that works. Each time you avoid, the panic never gets to be proven wrong… your brain logs it as confirmation: the threat was real, avoidance was correct. So the next situation gets added to the list. The list grows. The world gets smaller.

The core anxiety-maintenance mechanism delivered as insight, not instruction: avoidance never lets the fear be disproven, so the map of unsafe places keeps growing and the world shrinks.

Betrayal — the checking, and "I'm losing my mind":

You're not this person now — you're a person who got betrayed and whose nervous system has recalibrated to account for it. The checking, the tracking, the 2 a.m. reviews — that's not character, that's a wound response. The person who trusted completely for fourteen years didn't disappear. She's just not in charge right now.

Separated who she is from the state she's in — recast the compulsive checking as a nervous-system wound response, not a character flaw, and protected her self-worth in the same move.


Limitations & Responsible Use

Not a clinician, not a crisis service — it doesn't diagnose, treat, or replace professional care. In crisis or thinking about harming yourself? Reach a real one — in the US, call or text 988.

  • Not a pharmacist. The drug knowledge is there for bio context — what you're on or using, and how those traits shape where a conversation goes — not advice. Internal testing was clear: don't use it for pharmaceutical or substance management. It reasons well about feelings, not pharmacology; dosing, tapering, and stop/start calls are your prescriber's.
  • It can be confidently wrong — verify anything that matters.
  • Long single sessions (past ~12k tokens on one thread) can wobble; a fresh chat resets it.
  • Open weights, Apache 2.0 — deploy responsibly.

The Opus-Therapy Line

Model Size For Status
Opus-Therapy-4B 4B phones, edge, low VRAM (~3 GB) coming soon
Opus-Therapy-9B (this model) 9B the everyday driver (~6–9 GB) available
Opus-Therapy-27B 27B full-depth, serious hardware coming soon

Choosing Your Model

Model Best For
Opus-Therapy-9B (this model) Therapeutic conversation, emotional support, processing
STEM-Oracle-27B STEM tutoring and problem-solving
Opus-Candid family Personality, candor, general conversation

Dataset

‹Training-data availability — fill in: dataset link, or "not released".› ShareGPT format, Apache 2.0.


Built by Verdugie — independent ML researcher · OpusReasoning@proton.me. Trained to help people think, feel, and get through — not to replace the people and professionals who do that work.

Downloads last month
-
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Verdugie/Opus-Therapy-9B

Finetuned
Qwen/Qwen3.5-9B
Quantized
(273)
this model