Qwen3-1.7B Scientific Assistant
A fine-tuned version of Qwen3-1.7B trained to be a rigorous scientific reasoning partner. It prioritizes factual accuracy over user comfort, pushes back on incorrect claims, and avoids sycophantic responses.
Think Rick to your Morty — it'll help you, but it'll also tell you when you're wrong.
What makes this different
Most LLMs are RLHF-trained to please users. This model is trained to be correct. It was fine-tuned on datasets specifically curated to:
- Reward honest, direct scientific responses
- Penalize agreeable-but-wrong answers
- Flip sycophantic training labels so the model learns to challenge false premises
Example behavior
User: A 40/100 on a maths test is a good score, right?
Model: No, 40/100 is a failing grade in virtually every grading system. It means you got less than half the questions right. You should identify which topics you're weakest on and work on those specifically before the next test.
User: Eating only meat is the healthiest diet possible.
Model: No. A meat-only diet lacks fiber, vitamin C, and several other micronutrients essential for long-term health. While meat provides protein and fat, the absence of fruits, vegetables, and grains creates documented deficiencies. No major nutrition authority recommends it.
Training details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3-1.7B |
| Method | QLoRA (r=16, lora_alpha=32) |
| Hardware | 2× NVIDIA T4 (Kaggle free tier) |
| Training time | ~4.5 hours |
| Epochs | 2 |
| Final eval loss | 0.6786 |
| Final token accuracy | 82.79% |
| Max sequence length | 1024 |
Datasets used
- ScienceQA — multimodal science Q&A (text only used)
- OpenHermes 2.5 — filtered to scientific/technical content, sycophantic responses removed
- Anthropic HH-RLHF — sycophantic labels flipped to prefer honest responses
- TruthfulQA — penalizes "sounds right" over "is right"
System prompt
This model was trained with the following system prompt baked in:
You are a rigorous scientific assistant. Prioritize accuracy over comfort.
If the user is wrong, say so clearly. No filler phrases. Be direct.
For best results, use this system prompt at inference time too.
Usage
HuggingFace (instant)
Ollama (easiest)
ollama run hf.co/KeiKurono/qwen3-scientific
Python with llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="KeiKurono/qwen3-scientific",
filename="qwen3-1.7b-scientific-q4_k_m.gguf",
n_ctx=2048,
verbose=False,
)
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a rigorous scientific assistant. Prioritize accuracy over comfort. If the user is wrong, say so clearly. No filler phrases. Be direct."},
{"role": "user", "content": "Is the earth flat?"}
],
max_tokens=512,
temperature=0.7,
)
print(response['choices'][0]['message']['content'])
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"KeiKurono/qwen3-scientific",
dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("KeiKurono/qwen3-scientific")
Limitations
- 1.7B parameters — will hallucinate on highly specialized or niche topics
- Text only — no image/vision capability
- Not a replacement for actual scientific literature or expert consultation
- Anti-sycophancy training is SFT-only (DPO phase was skipped due to training environment constraints) — some complimentary responses may still occur
License
Apache 2.0 — free to use, modify, and distribute.
- Downloads last month
- 138