June Launch
Collection
Launched models in june, WILL UPDATE β’ 3 items β’ Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Inelly 4.5 Blaze is a fine-tuned version of Qwen2.5-1.5B-Instruct, trained on a focused mixture of chain-of-thought reasoning, math, coding, and general knowledge data. It is the compact, fast variant of the Inelly 4.5 family -- optimized for quick inference while retaining strong reasoning capabilities.
Inelly 4.5 Blaze is intended for:
Inelly 4.5 Blaze was fine-tuned for 1 epoch on ~5,225 samples drawn from:
| Dataset | Samples | Purpose |
|---|---|---|
| Bespoke-Stratos-35k | 3,000 | Chain-of-thought math & reasoning |
| OpenThoughts-114k | 2,500 | Code generation with reasoning |
| dolphin-r1 | 2,000 | General reasoning (DeepSeek-R1 distill) |
All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.
| Parameter | Value |
|---|---|
| Base model | Qwen2.5-1.5B-Instruct |
| Quantization | 4-bit NF4 (bitsandbytes) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 2e-4 |
| Batch size | 8 (gradient accumulation) |
| Epochs | 1 |
| Max seq length | 512 |
| Optimizer | AdamW 8-bit |
| LR scheduler | cosine |
| Warmup ratio | 0.05 |
| Training time | ~35 min |
| Hardware | RTX 2080 Ti (11GB VRAM) |
| Property | Value |
|---|---|
| Model type | Qwen2ForCausalLM |
| Hidden size | 1,536 |
| Layers | 28 |
| Attention heads | 12 |
| Head dim | 128 |
| Intermediate size | 8,960 |
| Vocab size | 151,936 |
| Context length | 32,768 |
| Total parameters | ~1.54B |
| Trainable parameters | ~3.1M (LoRA) |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("path/to/inelly-4.5-blaze", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("path/to/inelly-4.5-blaze")
messages = [{"role": "user", "content": "Solve for x: 3x + 7 = 22. Show all steps."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Informal GPU testing across 8 categories:
| Category | Result |
|---|---|
| Chain-of-Thought reasoning | β Correct step-by-step logic |
| Math | β Accurate algebraic solutions |
| Code generation | β Clean Python with comments |
| Logic puzzles | β Sound deductive reasoning |
| General knowledge | β Accurate, clear explanations |
| Speed | β ~1-2s per response (faster than 3B/7B) |
| Model | Size | Focus | Training Data |
|---|---|---|---|
| Inelly 4.5 | 3B | Conversation + CoT | 5,700 samples (incl. politeness, conv) |
| Inelly 4.5 Blaze (this) | 1.5B | Fast reasoning + CoT | 5,225 samples (reasoning-focused) |
| Matrix 2 | 7B | Deep reasoning | 5,225 samples (reasoning-focused) |
When to use Blaze vs standard 4.5:
@misc{inelly45blaze,
title = {Inelly 4.5 Blaze: A Compact Chain-of-Thought Reasoning Model},
author = {Genue},
year = {2026},
note = {Fine-tuned from Qwen2.5-1.5B-Instruct using QLoRA},
}