Qwen3-4B-Islamic-Arabic-LoRA

LoRA adapter for Qwen3-4B, trained on Islamic Arabic Q&A. Apply on top of Qwen/Qwen3-4B base.

This repository contains only the PEFT LoRA adapter weights (264 MB) produced by QLoRA fine-tuning of Qwen3-4B on 17,944 Islamic Arabic question-answer pairs. Load it on top of the unquantized or BitsAndBytes-quantized base model.

For direct inference without adapter management, use the fully merged model: NightPrince/Qwen3-4B-Islamic-Arabic.

Trained by Yahya Alnwsany (NightPrince) — 2026-05-05.


Model Variants

Variant Repo Description
Merged FP16 NightPrince/Qwen3-4B-Islamic-Arabic Canonical merged model, FP16, ~7.6 GB — drop-in for transformers or vLLM
LoRA Adapter (this model) NightPrince/Qwen3-4B-Islamic-Arabic-LoRA PEFT adapter only, 264 MB — apply on top of Qwen/Qwen3-4B
INT4 Quantized NightPrince/Qwen3-4B-Islamic-Arabic-INT4 W4A16 compressed-tensors for fast vLLM serving, 2.5 GB
MLX 4-bit NightPrince/Qwen3-4B-Islamic-Arabic-mlx-4Bit Apple Silicon / MLX — native Mac inference, 4-bit quantized
GGUF NightPrince/Qwen3-4B-Islamic-Arabic-GGUF llama.cpp / Ollama / LM Studio — Q4_K_M (2.3 GB), Q8_0 (4.0 GB), F16 (7.5 GB)
Dataset NightPrince/islamic-arabic-qa 17,944 train / 2,101 val / 1,042 test — Islamic Arabic Q&A pairs

Usage

Load with 4-bit Quantization (Recommended, ~5 GB VRAM)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

base_model_id = "Qwen/Qwen3-4B"
adapter_id = "NightPrince/Qwen3-4B-Islamic-Arabic-LoRA"

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load base model in 4-bit
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

# Attach the LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

SYSTEM_PROMPT = (
    "أنت مساعد عالم إسلامي متخصص. "
    "أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. "
    "استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً."
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "ما حكم صلاة الجمعة على المسافر؟"},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Load in FP16 (No Quantization, ~8 GB VRAM)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "Qwen/Qwen3-4B"
adapter_id = "NightPrince/Qwen3-4B-Islamic-Arabic-LoRA"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Merge and Save Locally

If you want to merge the adapter into the base weights for faster inference (equivalent to the merged model):

# Requires loading base model in fp16 (not quantized) first
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./Qwen3-4B-Islamic-Arabic-merged")
tokenizer.save_pretrained("./Qwen3-4B-Islamic-Arabic-merged")

Training Summary

This adapter was produced by QLoRA fine-tuning (r=64, α=128) of Qwen3-4B on the NightPrince/islamic-arabic-qa dataset over 3 epochs on 4× RTX 2080 Ti GPUs.

Metric Value
Final train loss 1.8918
Best eval loss 2.4094
Training duration 7.59 hours
Trainable parameters 132,120,576 (5.65% of 4.15B)

For full training details, hyperparameters, and evaluation results, see the main model card.


LoRA Configuration

Parameter Value
Rank (r) 64
Alpha (α) 128
Dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Adapter size ~264 MB

Note

For easier inference without managing a base model + adapter, use the merged model NightPrince/Qwen3-4B-Islamic-Arabic. The adapter repo is intended for users who want maximum flexibility — e.g., experimenting with different merge strategies, running further fine-tuning, or applying the adapter on top of a locally modified base.


Citation

@misc{alnwsany2026qwen3islamicarbic,
  author       = {Yahya Alnwsany},
  title        = {Qwen3-4B-Islamic-Arabic: QLoRA Fine-Tuning of Qwen3-4B on Islamic Arabic Q\&A},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic}},
  note         = {Base model: Qwen/Qwen3-4B. Dataset: NightPrince/islamic-arabic-qa.}
}

License

Apache 2.0 — consistent with the base model Qwen/Qwen3-4B.

Downloads last month
52
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NightPrince/Qwen3-4B-Islamic-Arabic-LoRA

Finetuned
Qwen/Qwen3-4B
Adapter
(992)
this model

Dataset used to train NightPrince/Qwen3-4B-Islamic-Arabic-LoRA