Inelly 4.5 Blaze

Model Description

Inelly 4.5 Blaze is a fine-tuned version of Qwen2.5-1.5B-Instruct, trained on a focused mixture of chain-of-thought reasoning, math, coding, and general knowledge data. It is the compact, fast variant of the Inelly 4.5 family -- optimized for quick inference while retaining strong reasoning capabilities.

Developed by: bry
Base model: Qwen2.5-1.5B-Instruct
Fine-tuning method: QLoRA (4-bit NF4, rank 16)
Parameters: 1.54B (base) + ~3.1M trainable (LoRA adapters)
License: Apache 2.0 (inherited from Qwen2.5)

Intended Use

Inelly 4.5 Blaze is intended for:

Chain-of-Thought reasoning – Step-by-step problem solving
Math – Algebra, arithmetic, word problems
Code generation – Python functions with clear logic
Logical deduction – Syllogisms, puzzles, multi-step reasoning
General knowledge Q&A – Science, everyday facts
Quick prototyping – Fast inference on consumer hardware

Out of Scope

Not intended for production deployment without further safety evaluation
Less conversational polish than the 3B variant (Inelly 4.5)
May struggle with very long or complex multi-step tasks

Training Data

Inelly 4.5 Blaze was fine-tuned for 1 epoch on ~5,225 samples drawn from:

Dataset	Samples	Purpose
Bespoke-Stratos-35k	3,000	Chain-of-thought math & reasoning
OpenThoughts-114k	2,500	Code generation with reasoning
dolphin-r1	2,000	General reasoning (DeepSeek-R1 distill)

All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.

Training Hyperparameters

Parameter	Value
Base model	Qwen2.5-1.5B-Instruct
Quantization	4-bit NF4 (bitsandbytes)
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05
Learning rate	2e-4
Batch size	8 (gradient accumulation)
Epochs	1
Max seq length	512
Optimizer	AdamW 8-bit
LR scheduler	cosine
Warmup ratio	0.05
Training time	~35 min
Hardware	RTX 2080 Ti (11GB VRAM)

Model Architecture

Property	Value
Model type	Qwen2ForCausalLM
Hidden size	1,536
Layers	28
Attention heads	12
Head dim	128
Intermediate size	8,960
Vocab size	151,936
Context length	32,768
Total parameters	~1.54B
Trainable parameters	~3.1M (LoRA)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("path/to/inelly-4.5-blaze", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("path/to/inelly-4.5-blaze")

messages = [{"role": "user", "content": "Solve for x: 3x + 7 = 22. Show all steps."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Performance

Informal GPU testing across 8 categories:

Category	Result
Chain-of-Thought reasoning	✅ Correct step-by-step logic
Math	✅ Accurate algebraic solutions
Code generation	✅ Clean Python with comments
Logic puzzles	✅ Sound deductive reasoning
General knowledge	✅ Accurate, clear explanations
Speed	✅ ~1-2s per response (faster than 3B/7B)

Inelly 4.5 Family Comparison

Model	Size	Focus	Training Data
Inelly 4.5	3B	Conversation + CoT	5,700 samples (incl. politeness, conv)
Inelly 4.5 Blaze (this)	1.5B	Fast reasoning + CoT	5,225 samples (reasoning-focused)
Matrix 2	7B	Deep reasoning	5,225 samples (reasoning-focused)

When to use Blaze vs standard 4.5:

Blaze – When you need fast reasoning, math, or coding help and don't need conversational polish
4.5 (3B) – When you want a friendly, polite conversationalist that can also reason

Limitations

Conversational ability: Less polished in casual chat compared to the 3B variant (no conversational fine-tuning data)
Safety: Inherited from Qwen2.5 base; not specifically safety-tuned
Context length: Fine-tuned on 512-token sequences
Factual accuracy: May hallucinate in specialized domains

Acknowledgments

Qwen2.5 by Alibaba Cloud (base model)
Bespoke Labs for Stratos dataset
OpenThoughts team
Cognitive Computations for dolphin-r1

Citation

@misc{inelly45blaze,
  title = {Inelly 4.5 Blaze: A Compact Chain-of-Thought Reasoning Model},
  author = {Genue},
  year = {2026},
  note = {Fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA},
}

Downloads last month: 26

Safetensors

Model size

0.9B params

Tensor type

F32

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including GenueAI/Inelly-4.5-Blaze

June Launch

Collection

Launched models in june, WILL UPDATE • 3 items • Updated 3 days ago