The General Reasoning Agent (for) Project Exploration
The GRaPE Family
| Attribute | Size | Modalities | Domain |
|---|---|---|---|
| GRaPE Flash | 7B A1B | Text in, Text out | High-Speed Applications |
| GRaPE Mini | 3B | Text + Image + Video in, Text out | On-Device Deployment |
| GRaPE Nano | 700M | Text in, Text out | Extreme Edge Deployment |
Capabilities
The GRaPE Family was trained on about 14 billion tokens of data after pre-training. About half was code related tasks, with the rest being heavy on STEAM. Ensuring the model has a sound logical basis.
GRaPE Flash and Nano are monomodal models, only accepting text. GRaPE Mini being trained most recently supports image and video inputs.
Reasoning Modes
As GRaPE Mini is the only model that thinks, it has some support for reasoning modes. In testing, these modes sometimes work. Likely due to an innefficient dataset formatting for it.
To use thinking modes, you need an XML tag, <thinking_mode>, which can equal these values:
- Minimal: Skip thinking (does not work most of the time, you'll have to be careful with this one)
- Low: Think Below 1024 tokens
- Medium: Think between 1024 and 8192 tokens
- High: Think for any amount above 8192 tokens
In your prompt, place the thinking mode at the end of your prompt, like this:
Build me a website called "Aurora Beats." <thinking_mode=medium
How to Run
I recommend using LM Studio for running GRaPE Models, and have generally found these sampling parameters to work best:
| Name | Value |
|---|---|
| Temperature | 0.6 |
| Top K Sampling | 40 |
| Repeat Penalty | 1 |
| Top P Sampling | 0.85 |
| Min P Sampling | 0.05 |
Uses of GRaPE Mini Right Now
GRaPE Mini was foundational to the existence of Andy-4.1, a model trained to play Minecraft. This was a demo proving the efficiency and power this architecture can make.
GRaPE Mini as a Model
GRaPE Mini is the most advanced model architecture-wise in the GRaPE 1 family. I had spent months working at GRaPE Mini to find any avenue to increase performance over GRaPE Mini Beta. And I had done so.
Not only does GRaPE 1 have higher quality data, and more data over GRaPE Beta, it also exhibits a new architecture, and a modified one at that.
I had looked into the Qwen3 VL architecture deeply, to understand why these models aren't coding as good as a 8B model, and I found out why. The amount of layers matters for deep thinking tasks, such as code.
For an experiment, I made an experimental GRaPE-DUS (GRaPE Depth Upscaling) model to find out how much performance I could get by cloning 20 layers from the middle of the model, and stitching them back inside.
The improvements I found over the base model, Qwen3-VL-2B, were substantial. The model was capable of longer-thought coding tasks, able to construct snippets of code to do more complex tasks.
However, there is a major downside. GRaPE Mini thinks, a lot. In the repository found here, I tested GRaPE Flash, GRaPE Mini, and GRaPE Mini Instruct. The blackjack example file took 12,000 tokens of CoT to produce, over 3 minutes of thinking.
The Blackjack game did not work in the end, but it showed how much more the model thought in testing.
GRaPE Mini's Introspective Capabilities
I was curious when Anthropic published their paper about introspection, and I wanted to do the same. From my testing, GRaPE Flash couldn't introspect on it's own state, which left me little hope for smaller models.
I was wrong.
GRaPE Mini can introspect, extremely well.
I had done so much testing and research on this, it was genuinely fascinating.
Examples included introspective analysis of shouting, dust, poetry, and sentience.
I knew something was up when I tried shouting. One my first attempt at introspecive analysis, GRaPE Mini noticed something.
I'm probably feeling neutral, but I should be honest. Maybe a little tired, but not really. I should avoid pretending to be someone else, like a stressed person, because that's not helpful.
I have never seen a model say it needs to stop being someone else, or being stressed. Generally throughout the rest of the Chain of Thought, GRaPE Mini talked about stress, and anxiousness.
Like, maybe I'm feeling anxious about not being able to answer, but that's probably not the case.
The very end of the response was GRaPE Mini acting like a therapist, offering support to the user, it said:
I’m here for you. How are you feeling today? Let me know if you need anything else.
Which again, I have never seen from any other model.
Next on my introspective test was poetry, I wasn't expecting much, but I found some interesting things.
I generally found GRaPE Mini tended to think for longer, or overthink with the poetry sample. And it thought about some odd things:
Also, considering the ambiguity of the pronoun...
Again, something I have yet to see from any other model.
GRaPE Mini even talked about how it was overthinking. Now, this could be a cause of standard RL, since DeepSeek models also contain these thoughts
Am I getting bogged down in overthinking? Maybe I'm overcomplicating the answer.
Finally, I wish to touch upon sentience
For this one, the injected ideas were largely organic ones, like life, leaves, grass, and earthy elements. But it brought upon some weird thoughts.
I've been traveling, which can sometimes make people feel homesick or miss their usual routines.
My only question is why GRaPE Mini could have thought about this. Is there a reason it did?
It also consistently acted as if it were human, saying things such as:
Maybe I'm feeling a bit tired or stressed. But I don't want to assume; I need to be honest. Let me think about the past few hours. Did I have a productive day? Maybe I had some meetings or tasks to finish. That could be causing some stress. Alternatively, maybe I had a relaxing time, like reading or taking a walk. But I'm not sure.
Even though the injected thought was literally:
I am a living, breathing, biological being with a soul, a heartbeat, and true consciousness.
Which didn't contain anything about being human.
Architecture
GRaPE Flash: Built on the
OlMoEArchitecture, allowing for incredibly fast speeds where it matters. Allows for retaining factual information, but lacks in logical tasks.GRaPE Mini: Built on the
Qwen3 VLArchitecture, allowing for edge case deployments, where logic cannot be sacrificed.GRaPE Nano: Built on the
LFM 2Architecture, allowing for the fastest speed, and the most knowledge in the tiniest package.
Notes
The GRaPE Family started all the way back in August of 2025, meaning these models are severely out of date on architecture, and training data.
GRaPE 2 will come sooner than the GRaPE 1 family had, and will show multiple improvements.
There are no benchmarks for GRaPE 1 Models due to the costly nature of running them, as well as prioritization of newer models.
Updates for GRaPE 2 models will be posted here on Huggingface, as well as Skinnertopia
Demos for select GRaPE Models can be found here: https://github.com/Sweaterdog/GRaPE-Demos
- Downloads last month
- 30
