Instructions to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/GCIRS-Reasoning-1.5B-R1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/GCIRS-Reasoning-1.5B-R1")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/GCIRS-Reasoning-1.5B-R1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/GCIRS-Reasoning-1.5B-R1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/GCIRS-Reasoning-1.5B-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/GCIRS-Reasoning-1.5B-R1

SGLang

How to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/GCIRS-Reasoning-1.5B-R1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/GCIRS-Reasoning-1.5B-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/GCIRS-Reasoning-1.5B-R1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/GCIRS-Reasoning-1.5B-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/GCIRS-Reasoning-1.5B-R1
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

GCIRS-Reasoning-1.5B-R1

GCIRS-Reasoning-1.5B-R1 is a research-grade reasoning model fine-tuned from Qwen2.5-1.5B-Instruct, focused on non-fictional reasoning, factual consistency, and scientific depth. Trained with reinforcement learning using the Big Reasoning Traces dataset from DeepSeek, this model is tailored for complex analytical tasks and scientific rigor in high-stakes or research environments.

GGUF: https://huggingface.co/prithivMLmods/GCIRS-Reasoning-1.5B-R1-GGUF

Key Features

Reinforcement Learning on Big Reasoning Traces Fine-tuned using DeepSeek’s Big Reasoning Traces, ensuring clarity in multi-step reasoning, factual deduction, and long-form scientific argumentation.
Research-Ready Scientific Fidelity Designed for researchers, educators, and analysts—offers reliable factual recall, logical structuring, and precise step-by-step explanation.
Structured Output in LaTeX, Markdown, and JSON Supports technical documentation and publishing with seamless integration of LaTeX equations, Markdown formatting, and JSON output.
Multilingual Technical Reasoning Effective across 20+ languages, especially in scientific, academic, and technical domains.
Efficient for Inference Despite its 1.5B parameter scale, it's optimized for low-latency inference across modern GPUs and research pipelines.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/GCIRS-Reasoning-1.5B-R1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain the principle of entropy in thermodynamics with examples."

messages = [
    {"role": "system", "content": "You are a scientific reasoning assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)