Great Model!
It's definitely an improvement in terms of writing in natural english with some better lines rendering compared to both the base model and the finetune this is based on, I tested many 26B-35B models and the only model that beats it in this department is Gemma-4-Garnet-31B-it and it's not an an all-out-win either as Omega-Evolution-27B-v2.1 sometimes manage to have the best rendering of certain lines out of all the other candidates, any plans for an Omega-Evolution-27B-v2.3 or an Omega-Evolution-27B-v3.0 or something else like an Omega-Evolution-31B?
Nice job!
Thanks for the feedback!
I'm done with the Qwen 3.5 series, but may do Qwen 3.6. I'm considering Gemma 4 31B as well. I want to do it, but I've seen how much of a struggle it is for other finetuners to do Gemma 4 and get it right.
The 26B-A4B would be especially interesting because itβs many times faster, which is critical on low-end hardware. But from what I've heard, it's even harder to finetune.
I've been using Gemma4 26 A4B for a few weeks (and ARA version for a week or so, which is my current default), it is really fast and pretty darn good (or maybe that's Kobold's internal optimization for which layers to put on the GPU). Makes me hesitant to use a non MOE model, even though I've used 70B models before (usually Q2/Q3 of some sort).
While kobold doesn't seem to quite work with Qwen3.6 (probably needs the toolset updated), 3/3.5 think it works with, so I'll download and give this one a try.
This model feels more lax in it's output (characters feel kinda lackadaisical), felt like it struggles with multiple characters, demands thinking and was using words that changes contextual meaning at large. When not strictly Rping it does seem to have really natural language output.
Some of this might be the result of quanitizing mixed with an uncensor patch, so take my comments with a grain of salt.
Regardless this one isn't one i'd put at the top of my list for RPing. Not bad, just felt like it didn't do as well as i expected.
Been testing Qwen 3.6 27B @ Q8, it writes better then Gemma 4 31B. But it thinks for 2.5k-5k tokens every reply... I'm not even kidding.
On the other hand, I got half a million context on llamacpp with VRAM to spare (96GB total). I could probably push 750k or higher.
So what I'm trying to say is I'm definitely considering tuning this model.
Been testing Qwen 3.6 27B @ Q8, it writes better then Gemma 4 31B. But it thinks for 2.5k-5k tokens every reply... I'm not even kidding.
For an agent that might be important. But non agents...
Recently KoboldCPP added in where you can specify how much quota can be used for thinking, including 'none'. Which then still has a thinking block but instead has (exceeded quota for thinking) and closes, takes about 10 tokens. Doing a smaller amount of thinking quota often results in short 'user wants X, i should write like Y' and then quits out before any thinking is done. So is ultimately not very useful.
Alas when i try Qwen3.6 i only get 'kobold' a a reply. Maybe the chat template needs updating, or the toolkit needs updating, not sure yet.
If replies didn't take so damn long i wouldn't mind thinking models as much. Makes me lean more towards MOE models.