YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Inelly 4.5 Blaze

Model Description

Inelly 4.5 Blaze is a fine-tuned version of Qwen2.5-1.5B-Instruct, trained on a focused mixture of chain-of-thought reasoning, math, coding, and general knowledge data. It is the compact, fast variant of the Inelly 4.5 family -- optimized for quick inference while retaining strong reasoning capabilities.

  • Developed by: bry
  • Base model: Qwen2.5-1.5B-Instruct
  • Fine-tuning method: QLoRA (4-bit NF4, rank 16)
  • Parameters: 1.54B (base) + ~3.1M trainable (LoRA adapters)
  • License: Apache 2.0 (inherited from Qwen2.5)

Intended Use

Inelly 4.5 Blaze is intended for:

  • Chain-of-Thought reasoning – Step-by-step problem solving
  • Math – Algebra, arithmetic, word problems
  • Code generation – Python functions with clear logic
  • Logical deduction – Syllogisms, puzzles, multi-step reasoning
  • General knowledge Q&A – Science, everyday facts
  • Quick prototyping – Fast inference on consumer hardware

Out of Scope

  • Not intended for production deployment without further safety evaluation
  • Less conversational polish than the 3B variant (Inelly 4.5)
  • May struggle with very long or complex multi-step tasks

Training Data

Inelly 4.5 Blaze was fine-tuned for 1 epoch on ~5,225 samples drawn from:

Dataset Samples Purpose
Bespoke-Stratos-35k 3,000 Chain-of-thought math & reasoning
OpenThoughts-114k 2,500 Code generation with reasoning
dolphin-r1 2,000 General reasoning (DeepSeek-R1 distill)

All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.


Training Hyperparameters

Parameter Value
Base model Qwen2.5-1.5B-Instruct
Quantization 4-bit NF4 (bitsandbytes)
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Learning rate 2e-4
Batch size 8 (gradient accumulation)
Epochs 1
Max seq length 512
Optimizer AdamW 8-bit
LR scheduler cosine
Warmup ratio 0.05
Training time ~35 min
Hardware RTX 2080 Ti (11GB VRAM)

Model Architecture

Property Value
Model type Qwen2ForCausalLM
Hidden size 1,536
Layers 28
Attention heads 12
Head dim 128
Intermediate size 8,960
Vocab size 151,936
Context length 32,768
Total parameters ~1.54B
Trainable parameters ~3.1M (LoRA)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("path/to/inelly-4.5-blaze", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("path/to/inelly-4.5-blaze")

messages = [{"role": "user", "content": "Solve for x: 3x + 7 = 22. Show all steps."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Performance

Informal GPU testing across 8 categories:

Category Result
Chain-of-Thought reasoning βœ… Correct step-by-step logic
Math βœ… Accurate algebraic solutions
Code generation βœ… Clean Python with comments
Logic puzzles βœ… Sound deductive reasoning
General knowledge βœ… Accurate, clear explanations
Speed βœ… ~1-2s per response (faster than 3B/7B)

Inelly 4.5 Family Comparison

Model Size Focus Training Data
Inelly 4.5 3B Conversation + CoT 5,700 samples (incl. politeness, conv)
Inelly 4.5 Blaze (this) 1.5B Fast reasoning + CoT 5,225 samples (reasoning-focused)
Matrix 2 7B Deep reasoning 5,225 samples (reasoning-focused)

When to use Blaze vs standard 4.5:

  • Blaze – When you need fast reasoning, math, or coding help and don't need conversational polish
  • 4.5 (3B) – When you want a friendly, polite conversationalist that can also reason

Limitations

  • Conversational ability: Less polished in casual chat compared to the 3B variant (no conversational fine-tuning data)
  • Safety: Inherited from Qwen2.5 base; not specifically safety-tuned
  • Context length: Fine-tuned on 512-token sequences
  • Factual accuracy: May hallucinate in specialized domains

Acknowledgments


Citation

@misc{inelly45blaze,
  title = {Inelly 4.5 Blaze: A Compact Chain-of-Thought Reasoning Model},
  author = {Genue},
  year = {2026},
  note = {Fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA},
}
Downloads last month
26
Safetensors
Model size
0.9B params
Tensor type
F32
Β·
F16
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including GenueAI/Inelly-4.5-Blaze