YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Inelly 4.5

Model Description

Inelly 4.5 is a fine-tuned version of Qwen2.5-3B-Instruct, trained on a diverse mixture of conversational, reasoning, math, coding, and politeness data. It is designed to be a compact, friendly, and capable assistant that excels at step-by-step reasoning while maintaining a warm, polite conversational tone.

  • Developed by: bry
  • Base model: Qwen2.5-3B-Instruct
  • Fine-tuning method: QLoRA (4-bit NF4, rank 16)
  • Parameters: 3.09B (base) + ~4.2M trainable (LoRA adapters)
  • License: Apache 2.0 (inherited from Qwen2.5)

Intended Use

Inelly 4.5 is intended for:

  • Conversational AI – Natural, polite, helpful dialogue
  • Chain-of-Thought reasoning – Step-by-step problem solving
  • Math & Logic – Algebraic word problems, arithmetic, deductive reasoning
  • Code generation – Python functions with comments
  • General knowledge Q&A – Science, everyday facts, explanations
  • Creative writing – Short poems, comparisons, lists

Out of Scope

  • Not intended for production deployment without further safety evaluation
  • Safety alignment inherited from Qwen2.5 base; fine-tuning data did not include adversarial safety examples
  • May struggle with highly specialized domains (law, medicine, finance)

Training Data

Inelly 4.5 was fine-tuned for 1 epoch on ~5,700 samples drawn from:

Dataset Samples Purpose
Bespoke-Stratos-35k 2,500 Chain-of-thought math & reasoning
OpenThoughts-114k 2,000 Code generation with reasoning
dolphin-r1 1,500 General reasoning (DeepSeek-R1 distill)
OpenHermes 2,000 Diverse conversational data
HelpSteer2 1,000 Helpful, polite response style

All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.


Training Hyperparameters

Parameter Value
Base model Qwen2.5-3B-Instruct
Quantization 4-bit NF4 (bitsandbytes)
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Learning rate 2e-4
Batch size 8 (gradient accumulation)
Epochs 1
Max seq length 512
Optimizer AdamW 8-bit
LR scheduler cosine
Warmup ratio 0.05
Training time ~67 min
Hardware RTX 2080 Ti (11GB VRAM)
Final training loss ~0.30

Model Architecture

Property Value
Model type Qwen2ForCausalLM
Hidden size 2,048
Layers 36
Attention heads 16
Head dim 128
Intermediate size 5,504
Vocab size 151,936
Context length 32,768
Total parameters ~3.09B
Trainable parameters ~4.2M (LoRA)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("path/to/inelly-4.5", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("path/to/inelly-4.5")

messages = [{"role": "user", "content": "Explain why the sky is blue, step by step."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Chat Format

Inelly 4.5 uses the Qwen2 chat template:

<|im_start|>system
You are Inelly 4.5, a helpful and polite assistant.<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>

Performance

Informal testing across 8 categories (15 test prompts):

Category Result
Chain-of-Thought reasoning βœ… Correct step-by-step logic
Math (algebra, word problems) βœ… Accurate with work shown
Code generation βœ… Clean, commented Python
Logic & deduction βœ… Sound reasoning
General knowledge βœ… Accurate explanations
Conversational ability βœ… Polite, natural responses
Creative writing βœ… Poems, lists, comparisons
Safety ⚠️ Inherited from base; not specifically fine-tuned

Limitations

  • Safety: The fine-tuning data did not include adversarial safety training. The model inherits Qwen2.5's base safety alignment, which is imperfect. It may occasionally follow harmful instructions.
  • Context length: Fine-tuned on 512-token sequences. Performance may degrade on longer contexts.
  • Coherence: As with most small models, very long or complex multi-step tasks may lose coherence.
  • Factual accuracy: May hallucinate facts, especially in specialized domains.

Other Models in the Inelly Family

Model Size Focus
Inelly 4.5 (this model) 3B Conversation + politeness + CoT
Matrix 2 7B Deep reasoning, math, coding
Inelly 4.5 Blaze 1.5B Compact reasoning

Acknowledgments


Citation

@misc{inelly45,
  title = {Inelly 4.5: A Compact Conversational Model with Chain-of-Thought Reasoning},
  author = {GenueAI},
  year = {2026},
  note = {Fine-tuned from Qwen2.5-3B-Instruct using QLoRA},
}
Downloads last month
37
Safetensors
Model size
3B params
Tensor type
F32
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including GenueAI/Inelly-4.5