Qwen3-1.7B Scientific Assistant

A fine-tuned version of Qwen3-1.7B trained to be a rigorous scientific reasoning partner. It prioritizes factual accuracy over user comfort, pushes back on incorrect claims, and avoids sycophantic responses.

Think Rick to your Morty — it'll help you, but it'll also tell you when you're wrong.

What makes this different

Most LLMs are RLHF-trained to please users. This model is trained to be correct. It was fine-tuned on datasets specifically curated to:

  • Reward honest, direct scientific responses
  • Penalize agreeable-but-wrong answers
  • Flip sycophantic training labels so the model learns to challenge false premises

Example behavior

User: A 40/100 on a maths test is a good score, right?

Model: No, 40/100 is a failing grade in virtually every grading system. It means you got less than half the questions right. You should identify which topics you're weakest on and work on those specifically before the next test.


User: Eating only meat is the healthiest diet possible.

Model: No. A meat-only diet lacks fiber, vitamin C, and several other micronutrients essential for long-term health. While meat provides protein and fat, the absence of fruits, vegetables, and grains creates documented deficiencies. No major nutrition authority recommends it.

Training details

Property Value
Base model Qwen/Qwen3-1.7B
Method QLoRA (r=16, lora_alpha=32)
Hardware 2× NVIDIA T4 (Kaggle free tier)
Training time ~4.5 hours
Epochs 2
Final eval loss 0.6786
Final token accuracy 82.79%
Max sequence length 1024

Datasets used

  • ScienceQA — multimodal science Q&A (text only used)
  • OpenHermes 2.5 — filtered to scientific/technical content, sycophantic responses removed
  • Anthropic HH-RLHF — sycophantic labels flipped to prefer honest responses
  • TruthfulQA — penalizes "sounds right" over "is right"

System prompt

This model was trained with the following system prompt baked in:

You are a rigorous scientific assistant. Prioritize accuracy over comfort.
If the user is wrong, say so clearly. No filler phrases. Be direct.

For best results, use this system prompt at inference time too.

Usage

HuggingFace (instant)

chat

Ollama (easiest)

ollama run hf.co/KeiKurono/qwen3-scientific

Python with llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="KeiKurono/qwen3-scientific",
    filename="qwen3-1.7b-scientific-q4_k_m.gguf",
    n_ctx=2048,
    verbose=False,
)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a rigorous scientific assistant. Prioritize accuracy over comfort. If the user is wrong, say so clearly. No filler phrases. Be direct."},
        {"role": "user",   "content": "Is the earth flat?"}
    ],
    max_tokens=512,
    temperature=0.7,
)
print(response['choices'][0]['message']['content'])

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "KeiKurono/qwen3-scientific",
    dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("KeiKurono/qwen3-scientific")

Limitations

  • 1.7B parameters — will hallucinate on highly specialized or niche topics
  • Text only — no image/vision capability
  • Not a replacement for actual scientific literature or expert consultation
  • Anti-sycophancy training is SFT-only (DPO phase was skipped due to training environment constraints) — some complimentary responses may still occur

License

Apache 2.0 — free to use, modify, and distribute.

Downloads last month
138
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with KeiKurono/qwen3-scientific.

Model tree for KeiKurono/qwen3-scientific

Finetuned
Qwen/Qwen3-1.7B
Quantized
(260)
this model

Space using KeiKurono/qwen3-scientific 1