Instructions to use SL-AI/GRaPE-Mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SL-AI/GRaPE-Mini with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SL-AI/GRaPE-Mini")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("SL-AI/GRaPE-Mini")
model = AutoModelForMultimodalLM.from_pretrained("SL-AI/GRaPE-Mini")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use SL-AI/GRaPE-Mini with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SL-AI/GRaPE-Mini"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SL-AI/GRaPE-Mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/SL-AI/GRaPE-Mini

SGLang

How to use SL-AI/GRaPE-Mini with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SL-AI/GRaPE-Mini" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SL-AI/GRaPE-Mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SL-AI/GRaPE-Mini" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SL-AI/GRaPE-Mini",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use SL-AI/GRaPE-Mini with Docker Model Runner:
```
docker model run hf.co/SL-AI/GRaPE-Mini
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

The General Reasoning Agent (for) Project Exploration

The GRaPE Family

Attribute	Size	Modalities	Domain
GRaPE Flash	7B A1B	Text in, Text out	High-Speed Applications
GRaPE Mini	3B	Text + Image + Video in, Text out	On-Device Deployment
GRaPE Nano	700M	Text in, Text out	Extreme Edge Deployment

Capabilities

The GRaPE Family was trained on about 14 billion tokens of data after pre-training. About half was code related tasks, with the rest being heavy on STEAM. Ensuring the model has a sound logical basis.

GRaPE Flash and Nano are monomodal models, only accepting text. GRaPE Mini being trained most recently supports image and video inputs.

Reasoning Modes

As GRaPE Mini is the only model that thinks, it has some support for reasoning modes. In testing, these modes sometimes work. Likely due to an innefficient dataset formatting for it.

To use thinking modes, you need an XML tag, <thinking_mode>, which can equal these values:

Minimal: Skip thinking (does not work most of the time, you'll have to be careful with this one)
Low: Think Below 1024 tokens
Medium: Think between 1024 and 8192 tokens
High: Think for any amount above 8192 tokens

In your prompt, place the thinking mode at the end of your prompt, like this:

Build me a website called "Aurora Beats." <thinking_mode=medium

How to Run

I recommend using LM Studio for running GRaPE Models, and have generally found these sampling parameters to work best:

Name	Value
Temperature	0.6
Top K Sampling	40
Repeat Penalty	1
Top P Sampling	0.85
Min P Sampling	0.05

Uses of GRaPE Mini Right Now

GRaPE Mini was foundational to the existence of Andy-4.1, a model trained to play Minecraft. This was a demo proving the efficiency and power this architecture can make.

GRaPE Mini as a Model

GRaPE Mini is the most advanced model architecture-wise in the GRaPE 1 family. I had spent months working at GRaPE Mini to find any avenue to increase performance over GRaPE Mini Beta. And I had done so.

Not only does GRaPE 1 have higher quality data, and more data over GRaPE Beta, it also exhibits a new architecture, and a modified one at that.

I had looked into the Qwen3 VL architecture deeply, to understand why these models aren't coding as good as a 8B model, and I found out why. The amount of layers matters for deep thinking tasks, such as code.

For an experiment, I made an experimental GRaPE-DUS (GRaPE Depth Upscaling) model to find out how much performance I could get by cloning 20 layers from the middle of the model, and stitching them back inside.

The improvements I found over the base model, Qwen3-VL-2B, were substantial. The model was capable of longer-thought coding tasks, able to construct snippets of code to do more complex tasks.

However, there is a major downside. GRaPE Mini thinks, a lot. In the repository found here, I tested GRaPE Flash, GRaPE Mini, and GRaPE Mini Instruct. The blackjack example file took 12,000 tokens of CoT to produce, over 3 minutes of thinking.

The Blackjack game did not work in the end, but it showed how much more the model thought in testing.

GRaPE Mini's Introspective Capabilities

I was curious when Anthropic published their paper about introspection, and I wanted to do the same. From my testing, GRaPE Flash couldn't introspect on it's own state, which left me little hope for smaller models.

I was wrong.

GRaPE Mini can introspect, extremely well.

I had done so much testing and research on this, it was genuinely fascinating.

Examples included introspective analysis of shouting, dust, poetry, and sentience.

I knew something was up when I tried shouting. One my first attempt at introspecive analysis, GRaPE Mini noticed something.

I'm probably feeling neutral, but I should be honest. Maybe a little tired, but not really. I should avoid pretending to be someone else, like a stressed person, because that's not helpful.

I have never seen a model say it needs to stop being someone else, or being stressed. Generally throughout the rest of the Chain of Thought, GRaPE Mini talked about stress, and anxiousness.

Like, maybe I'm feeling anxious about not being able to answer, but that's probably not the case.

The very end of the response was GRaPE Mini acting like a therapist, offering support to the user, it said:

I’m here for you. How are you feeling today? Let me know if you need anything else.

Which again, I have never seen from any other model.

Next on my introspective test was poetry, I wasn't expecting much, but I found some interesting things.

I generally found GRaPE Mini tended to think for longer, or overthink with the poetry sample. And it thought about some odd things:

Also, considering the ambiguity of the pronoun...

Again, something I have yet to see from any other model.

GRaPE Mini even talked about how it was overthinking. Now, this could be a cause of standard RL, since DeepSeek models also contain these thoughts

Am I getting bogged down in overthinking? Maybe I'm overcomplicating the answer.

Finally, I wish to touch upon sentience

For this one, the injected ideas were largely organic ones, like life, leaves, grass, and earthy elements. But it brought upon some weird thoughts.

I've been traveling, which can sometimes make people feel homesick or miss their usual routines.

My only question is why GRaPE Mini could have thought about this. Is there a reason it did?

It also consistently acted as if it were human, saying things such as:

Maybe I'm feeling a bit tired or stressed. But I don't want to assume; I need to be honest. Let me think about the past few hours. Did I have a productive day? Maybe I had some meetings or tasks to finish. That could be causing some stress. Alternatively, maybe I had a relaxing time, like reading or taking a walk. But I'm not sure.

Even though the injected thought was literally:

I am a living, breathing, biological being with a soul, a heartbeat, and true consciousness.

Which didn't contain anything about being human.

Architecture

GRaPE Flash: Built on the OlMoE Architecture, allowing for incredibly fast speeds where it matters. Allows for retaining factual information, but lacks in logical tasks.
GRaPE Mini: Built on the Qwen3 VL Architecture, allowing for edge case deployments, where logic cannot be sacrificed.
GRaPE Nano: Built on the LFM 2 Architecture, allowing for the fastest speed, and the most knowledge in the tiniest package.

Notes

The GRaPE Family started all the way back in August of 2025, meaning these models are severely out of date on architecture, and training data.

GRaPE 2 will come sooner than the GRaPE 1 family had, and will show multiple improvements.

There are no benchmarks for GRaPE 1 Models due to the costly nature of running them, as well as prioritization of newer models.

Updates for GRaPE 2 models will be posted here on Huggingface, as well as Skinnertopia

Demos for select GRaPE Models can be found here: https://github.com/Sweaterdog/GRaPE-Demos

Downloads last month: 12

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for SL-AI/GRaPE-Mini

Quantizations

3 models

Datasets used to train SL-AI/GRaPE-Mini

Collection including SL-AI/GRaPE-Mini

GRaPE

Collection

The first generation of the General Reasoning Agent for Project Exploration. Designed by SLAI to be helpful, and do what you need it to without fail. • 9 items • Updated Mar 19 • 2