Instructions to use ruslanmv/granite-3.1-2b-Reasoning-LORA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ruslanmv/granite-3.1-2b-Reasoning-LORA") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ruslanmv/granite-3.1-2b-Reasoning-LORA", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ruslanmv/granite-3.1-2b-Reasoning-LORA" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ruslanmv/granite-3.1-2b-Reasoning-LORA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ruslanmv/granite-3.1-2b-Reasoning-LORA
- SGLang
How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ruslanmv/granite-3.1-2b-Reasoning-LORA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ruslanmv/granite-3.1-2b-Reasoning-LORA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ruslanmv/granite-3.1-2b-Reasoning-LORA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ruslanmv/granite-3.1-2b-Reasoning-LORA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ruslanmv/granite-3.1-2b-Reasoning-LORA to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ruslanmv/granite-3.1-2b-Reasoning-LORA to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ruslanmv/granite-3.1-2b-Reasoning-LORA to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="ruslanmv/granite-3.1-2b-Reasoning-LORA", max_seq_length=2048, ) - Docker Model Runner
How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with Docker Model Runner:
docker model run hf.co/ruslanmv/granite-3.1-2b-Reasoning-LORA
Granite-3.1-2B-Reasoning-LORA (Efficient Fine-Tuned Model)
Model Overview
This model is a LoRA fine-tuned version of ibm-granite/granite-3.1-2b-instruct, optimized for reasoning tasks while maintaining efficiency and low computational cost. Using LoRA (Low-Rank Adaptation), this model retains the full power of the base model while applying targeted modifications for logical and analytical reasoning.
- Developed by: ruslanmv
- License: Apache 2.0
- Base Model: ibm-granite/granite-3.1-2b-instruct
- Fine-tuned for: Logical reasoning, structured problem-solving, long-context tasks
- Training Method: LoRA (Low-Rank Adaptation)
- Supported Languages: English
Why Use the LoRA Version?
This LoRA fine-tuned model provides several benefits:
✅ Memory-efficient fine-tuning with LoRA
✅ 2x Faster Training using Unsloth and Hugging Face TRL
✅ Retains the base model’s capabilities while enhancing reasoning skills
✅ Easier to merge with other adapters or apply to specific tasks
Installation & Usage
To use this LoRA fine-tuned model, install the necessary dependencies:
pip install torch torchvision torchaudio
pip install accelerate
pip install transformers
pip install peft
pip install bitsandbytes
Running the Model
Load and merge the LoRA adapter with the base model:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
base_model_path = "ibm-granite/granite-3.1-2b-instruct"
lora_model_path = "ruslanmv/granite-3.1-2b-Reasoning-LORA"
tokenizer = AutoTokenizer.from_pretrained(base_model_path)
model = AutoModelForCausalLM.from_pretrained(base_model_path, device_map="auto")
# Load LoRA adapter
model = PeftModel.from_pretrained(model, lora_model_path)
model.eval()
input_text = "Can you explain the difference between inductive and deductive reasoning?"
input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_length=4000)
output_text = tokenizer.batch_decode(output)
print(output_text)
Intended Use
Granite-3.1-2B-Reasoning-LORA is optimized for efficient reasoning while keeping computational costs low, making it ideal for:
- Logical and analytical problem-solving
- Text-based reasoning tasks
- Mathematical and symbolic reasoning
- Advanced instruction-following
This LoRA-based fine-tuning method is particularly useful for lightweight deployment and quick adaptability to specific tasks.
License & Acknowledgments
This model is released under the Apache 2.0 license. It is fine-tuned from IBM’s Granite 3.1-2B-Instruct model using LoRA fine-tuning. Special thanks to the IBM Granite Team for developing the base model.
For more details, visit the IBM Granite Documentation.
Citation
If you use this model in your research or applications, please cite:
@misc{ruslanmv2025granite,
title={LoRA Fine-Tuning of Granite-3.1 for Advanced Reasoning},
author={Ruslan M.V.},
year={2025},
url={https://huggingface.co/ruslanmv/granite-3.1-2b-Reasoning-LORA}
}
Model tree for ruslanmv/granite-3.1-2b-Reasoning-LORA
Base model
ibm-granite/granite-3.1-2b-base