Instructions to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="prithivMLmods/GCIRS-Reasoning-1.5B-R1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/GCIRS-Reasoning-1.5B-R1") model = AutoModelForCausalLM.from_pretrained("prithivMLmods/GCIRS-Reasoning-1.5B-R1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "prithivMLmods/GCIRS-Reasoning-1.5B-R1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/GCIRS-Reasoning-1.5B-R1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/prithivMLmods/GCIRS-Reasoning-1.5B-R1
- SGLang
How to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "prithivMLmods/GCIRS-Reasoning-1.5B-R1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/GCIRS-Reasoning-1.5B-R1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "prithivMLmods/GCIRS-Reasoning-1.5B-R1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/GCIRS-Reasoning-1.5B-R1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use prithivMLmods/GCIRS-Reasoning-1.5B-R1 with Docker Model Runner:
docker model run hf.co/prithivMLmods/GCIRS-Reasoning-1.5B-R1
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/GCIRS-Reasoning-1.5B-R1")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/GCIRS-Reasoning-1.5B-R1")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))GCIRS-Reasoning-1.5B-R1
GCIRS-Reasoning-1.5B-R1 is a research-grade reasoning model fine-tuned from Qwen2.5-1.5B-Instruct, focused on non-fictional reasoning, factual consistency, and scientific depth. Trained with reinforcement learning using the Big Reasoning Traces dataset from DeepSeek, this model is tailored for complex analytical tasks and scientific rigor in high-stakes or research environments.
GGUF: https://huggingface.co/prithivMLmods/GCIRS-Reasoning-1.5B-R1-GGUF
Key Features
Reinforcement Learning on Big Reasoning Traces Fine-tuned using DeepSeek’s Big Reasoning Traces, ensuring clarity in multi-step reasoning, factual deduction, and long-form scientific argumentation.
Research-Ready Scientific Fidelity Designed for researchers, educators, and analysts—offers reliable factual recall, logical structuring, and precise step-by-step explanation.
Structured Output in LaTeX, Markdown, and JSON Supports technical documentation and publishing with seamless integration of LaTeX equations, Markdown formatting, and JSON output.
Multilingual Technical Reasoning Effective across 20+ languages, especially in scientific, academic, and technical domains.
Efficient for Inference Despite its 1.5B parameter scale, it's optimized for low-latency inference across modern GPUs and research pipelines.
Quickstart with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/GCIRS-Reasoning-1.5B-R1"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain the principle of entropy in thermodynamics with examples."
messages = [
{"role": "system", "content": "You are a scientific reasoning assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Intended Use
- Scientific and research-grade question answering
- Conceptual explanations in physics, biology, and chemistry
- Factual, non-fictional structured content generation
- Academic tutoring and reasoning assessment
- High-fidelity inference in low-latency research settings
Limitations
- Not designed for casual chat or storytelling
- Performance may decline outside scientific/technical domains
- Limited creativity and abstract generalization
- Context limitations in extremely long research documents
References
- Downloads last month
- 23

# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="prithivMLmods/GCIRS-Reasoning-1.5B-R1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)