Instructions to use ruslanmv/granite-3.1-2b-Reasoning-LORA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ruslanmv/granite-3.1-2b-Reasoning-LORA")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ruslanmv/granite-3.1-2b-Reasoning-LORA", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ruslanmv/granite-3.1-2b-Reasoning-LORA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ruslanmv/granite-3.1-2b-Reasoning-LORA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ruslanmv/granite-3.1-2b-Reasoning-LORA

SGLang

How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ruslanmv/granite-3.1-2b-Reasoning-LORA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ruslanmv/granite-3.1-2b-Reasoning-LORA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ruslanmv/granite-3.1-2b-Reasoning-LORA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ruslanmv/granite-3.1-2b-Reasoning-LORA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ruslanmv/granite-3.1-2b-Reasoning-LORA to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ruslanmv/granite-3.1-2b-Reasoning-LORA to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ruslanmv/granite-3.1-2b-Reasoning-LORA to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="ruslanmv/granite-3.1-2b-Reasoning-LORA",
    max_seq_length=2048,
)

Docker Model Runner
How to use ruslanmv/granite-3.1-2b-Reasoning-LORA with Docker Model Runner:
```
docker model run hf.co/ruslanmv/granite-3.1-2b-Reasoning-LORA
```

Granite-3.1-2B-Reasoning-LORA (Efficient Fine-Tuned Model)

Model Overview

This model is a LoRA fine-tuned version of ibm-granite/granite-3.1-2b-instruct, optimized for reasoning tasks while maintaining efficiency and low computational cost. Using LoRA (Low-Rank Adaptation), this model retains the full power of the base model while applying targeted modifications for logical and analytical reasoning.

Developed by: ruslanmv
License: Apache 2.0
Base Model: ibm-granite/granite-3.1-2b-instruct
Fine-tuned for: Logical reasoning, structured problem-solving, long-context tasks
Training Method: LoRA (Low-Rank Adaptation)
Supported Languages: English

Why Use the LoRA Version?

This LoRA fine-tuned model provides several benefits:

✅ Memory-efficient fine-tuning with LoRA
✅ 2x Faster Training using Unsloth and Hugging Face TRL
✅ Retains the base model’s capabilities while enhancing reasoning skills
✅ Easier to merge with other adapters or apply to specific tasks

Installation & Usage

To use this LoRA fine-tuned model, install the necessary dependencies:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers
pip install peft
pip install bitsandbytes

Running the Model

Load and merge the LoRA adapter with the base model:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
base_model_path = "ibm-granite/granite-3.1-2b-instruct"
lora_model_path = "ruslanmv/granite-3.1-2b-Reasoning-LORA"

tokenizer = AutoTokenizer.from_pretrained(base_model_path)
model = AutoModelForCausalLM.from_pretrained(base_model_path, device_map="auto")

# Load LoRA adapter
model = PeftModel.from_pretrained(model, lora_model_path)
model.eval()

input_text = "Can you explain the difference between inductive and deductive reasoning?"
input_tokens = tokenizer(input_text, return_tensors="pt").to(device)

output = model.generate(**input_tokens, max_length=4000)
output_text = tokenizer.batch_decode(output)

print(output_text)

Intended Use

Granite-3.1-2B-Reasoning-LORA is optimized for efficient reasoning while keeping computational costs low, making it ideal for:

Logical and analytical problem-solving
Text-based reasoning tasks
Mathematical and symbolic reasoning
Advanced instruction-following

This LoRA-based fine-tuning method is particularly useful for lightweight deployment and quick adaptability to specific tasks.

License & Acknowledgments

This model is released under the Apache 2.0 license. It is fine-tuned from IBM’s Granite 3.1-2B-Instruct model using LoRA fine-tuning. Special thanks to the IBM Granite Team for developing the base model.

For more details, visit the IBM Granite Documentation.

Citation

If you use this model in your research or applications, please cite:

@misc{ruslanmv2025granite,
  title={LoRA Fine-Tuning of Granite-3.1 for Advanced Reasoning},
  author={Ruslan M.V.},
  year={2025},
  url={https://huggingface.co/ruslanmv/granite-3.1-2b-Reasoning-LORA}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ruslanmv/granite-3.1-2b-Reasoning-LORA

Base model

ibm-granite/granite-3.1-2b-base

Finetuned

ibm-granite/granite-3.1-2b-instruct

Finetuned

(13)

this model