Instructions to use preparebuddy/ielts-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use preparebuddy/ielts-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="preparebuddy/ielts-4b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("preparebuddy/ielts-4b") model = AutoModelForCausalLM.from_pretrained("preparebuddy/ielts-4b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use preparebuddy/ielts-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "preparebuddy/ielts-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "preparebuddy/ielts-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/preparebuddy/ielts-4b
- SGLang
How to use preparebuddy/ielts-4b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "preparebuddy/ielts-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "preparebuddy/ielts-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "preparebuddy/ielts-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "preparebuddy/ielts-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use preparebuddy/ielts-4b with Docker Model Runner:
docker model run hf.co/preparebuddy/ielts-4b
- PrepareBuddy IELTS-4B (Qwen3.5)
- Where it fits best (real-world use cases)
- Pros & cons
- Links
- What it generates
- Prompt format (not a chat model โ use the tag prefix)
- Examples (real, unedited outputs from this model; passages/transcripts trimmed with [โฆ])
- Writing โ Task 2 (temp 0.7)
- Speaking โ Part 2 (temp 0.7)
- Listening โ dialogue + questions (temp 0.7)
- Reading โ True/False/Not Given (grounded, temp 0.3 โ keys verified)
- Reading โ Multiple Choice (grounded, temp 0.3 โ key verified)
- Reading โ Sentence Completion (grounded, temp 0.3 โ answers in-passage)
- Supported types per section โ and how to prompt each
- Generating a full exam section (one passage โ all question types)
- Usage (transformers)
- Recommended architecture for reliable output (important)
- Strengths & honest limits (4B)
- Training
- License
- The 2B / 4B / 9B family โ pick the right one
- Getting better results: grounding + the re-checking loop (the biggest quality lever)
- Prompt tips
- Where it fits best (real-world use cases)
PrepareBuddy IELTS-4B (Qwen3.5)
A specialised content-generation model that produces IELTS Academic practice material across all four sections โ Reading, Writing, Listening, Speaking โ from a simple structured prompt. A fine-tune of Qwen3.5-4B (Apache-2.0), trained on PrepareBuddy's own curated IELTS content. This is the recommended, best-balanced model in the family.
A content generator, not an assessment tool. It writes passages, transcripts, tasks, questions and answer keys. It does not score student work. A fine-tune of Qwen3.5-4B โ not a from-scratch foundation model.
Part of a 3-model study (2B / 4B / 9B): we trained identical data on three sizes to ask "does data or size drive quality?" The honest finding โ fine-tuning's benefit is inversely proportional to base capability; data can rival size on a target skill โ is in the technical report. This 4B is the balanced pick: strong across types, no extreme weakness.
Where it fits best (real-world use cases)
The 4B is the default choice for most applications โ one model, all sections, no extreme weakness:
- The backbone of a practice-test app โ generates every section + full exam sections at reliable quality.
- Interactive tools & live demos โ best quality-per-visible-defect; the demo Space defaults to it.
- A single mid-range GPU โ ~9 GB bf16, or ~3 GB in 4-bit on a small GPU.
- Teacher / author tools โ drafts mixed-type sections that need only light review.
Not the best pick for extreme cost/footprint sensitivity (โ 2B) or maximum-fidelity fact-heavy passages (โ 9B).
Pros & cons
| โ Pros | โ ๏ธ Cons |
|---|---|
| Most balanced โ strong across all sections, no weak spot | ~8% verdict-logic slips (โ25% on varied from-scratch) โ verify |
| 100% completion answers in-passage; good facts | MCQ correct-answer letter can cluster โ spread at serving |
| Strong Writing / Speaking / Listening / MCQ | From-scratch passages can still carry a wrong fact โ ground |
| Reasonable size; great default; Apache-2.0 | Heavier than the 2B for edge/on-device use |
Links
- ๐ง Models: ielts-2b ยท ielts-4b ยท ielts-9b
- ๐ป Apple Silicon / LM Studio (MLX): ielts-2b-mlx ยท ielts-4b-mlx
- ๐ Try the live demo: Hugging Face Space
- ๐ Full technical report & findings: ielts-qwen3.5
What it generates
| Section | Types | Output |
|---|---|---|
| Reading | TFNG, YNNG, MCQ, Sentence/Summary Completion, Matching, Long-form | passage + questions + answer key with justifications |
| Writing | Task 1 (chart), Task 2 (essay) | task prompt + word limit + timing (+ chart data for T1) |
| Listening | dialogue/monologue | transcript + questions + answer key (text for downstream TTS) |
| Speaking | Part 1, 2, 3 | examiner question / cue card + model answer |
Prompt format (not a chat model โ use the tag prefix)
<TEST=IELTS><SECTION=READING><TYPE=TFNG><DIFF=medium><TOPIC=ocean currents> Generate a short passage with 4 True/False/Not Given statements and an answer key.
SECTION=READING | WRITING | LISTENING | SPEAKINGTYPE(Reading) =TFNG | YNNG | MCQ | MCQ_MULTI | SENTENCE_COMPLETION | SUMMARY_COMPLETION | MATCHING_HEADINGS | MATCHING_FEATURES | MATCHING_ENDINGS | LONGFORM; (Writing)TASK1 | TASK2; (Speaking)PART1 | PART2 | PART3; (Listening)LISTENINGDIFF=easy | medium | hard- Serve with
enable_thinking=Falseโ for this task, reasoning mode lowers verdict accuracy.
Examples (real, unedited outputs from this model; passages/transcripts trimmed with [โฆ])
โ ๏ธ Not a chat model โ it expects the
<TEST=IELTS>โฆprefix. Reading examples below use the recommended grounded approach (generate against a real passage), which is how you get verified-correct answer keys โ every key shown here was checked against its passage.
Writing โ Task 2 (temp 0.7)
Input
<TEST=IELTS><SECTION=WRITING><TYPE=TASK2><DIFF=medium><TOPIC=whether university education should be free> Generate an IELTS Academic Writing Task 2 question with word limit and timing.
Output
TASK:
Some people believe that university education should be free for all students who wish to
pursue a degree, while others think this would result in very high costs.
To what extent do you agree or disagree?
WORD LIMIT: 250 words TIME: 40 minutes
Speaking โ Part 2 (temp 0.7)
Input
<TEST=IELTS><SECTION=SPEAKING><TYPE=PART2><DIFF=medium><TOPIC=a memorable journey> Generate an IELTS Speaking Part 2 cue card with a model answer.
Output
CUE CARD:
Describe a journey you remember clearly. You should say:
- what the journey was and where it took place
- who you were with
- what you experienced along the way โ and why it stood out
MODEL ANSWER:
The journey I'd like to talk about is my solo train trip across northern Italy, which took place
during a summer break when I was nineteen. Rather than flying, I took the regional train from
Milan northwards through the Alpine region [โฆ] What struck me most was how dramatically the
landscape changed over just four days โ from dense urban sprawl to alpine valleys with snow-capped
peaks. Despite the language barrier in the small mountain villages, gestures and smiles made
communication surprisingly easy โ it reminded me how much more meaningful travel becomes when you
step out of your comfort zone.
Listening โ dialogue + questions (temp 0.7)
Input
<TEST=IELTS><SECTION=LISTENING><TYPE=LISTENING><DIFF=medium><TOPIC=booking a guided tour> Generate an IELTS Listening transcript with 4 questions and an answer key.
Output
TRANSCRIPT:
Staff: Good morning, welcome to the City Heritage Centre. I'm Leo from tours โ how can we help?
Customer: We're a small family of four, interested in the Roman wall section.
Staff: For the Roman wall there's an afternoon tour leaving at one o'clock every day this week. It
runs about two hours and includes the wall, a Roman bathhouse, then the city centre. [โฆ] We have
discounted lunch vouchers with afternoon tickets โ one hot meal at a partner restaurant, about
thirty-five pounds per person. [โฆ] The one o'clock tour runs Monday to Saturday, not Sundays. [โฆ]
QUESTIONS โ Choose the correct letter, A, B or C.
1. What time does the Roman-wall afternoon tour leave each day? A. 10:30 B. 12:00 C. 1:00
2. The discounted lunch voucher covers โฆ A. any food B. a hot meal only C. drinks only
[โฆ 2 more โฆ]
ANSWER KEY:
1. C โ "an afternoon tour leaving at one o'clock every day this week."
2. B โ "a hot meal at a partner restaurant."
Reading โ True/False/Not Given (grounded, temp 0.3 โ keys verified)
Input
<TEST=IELTS><SECTION=READING><TYPE=TFNG><DIFF=medium><TOPIC=coral reefs> Using ONLY the passage below, write True/False/Not Given statements with an answer key. Do not write a new passage.
PASSAGE: Coral reefs โฆ are built by tiny animals called polyps, which secrete calcium carbonate to form hard skeletons. Reefs grow only in warm, shallow, sunlit water โฆ The Great Barrier Reef, off Australia, is the largest reef system in the world.
Output
QUESTIONS:
1. Coral reefs are built by tiny animals called polyps that secrete calcium carbonate.
2. Coral reefs grow in cold, deep water because the polyps depend on algae for food.
3. The Great Barrier Reef is the largest reef system on Earth.
ANSWER KEY:
1. TRUE โ "built by tiny animals called polyps, which secrete calcium carbonate".
2. FALSE โ the passage says reefs "grow only in warm, shallow, sunlit water", not cold deep water.
3. TRUE โ "The Great Barrier Reef โฆ is the largest reef system in the world".
Reading โ Multiple Choice (grounded, temp 0.3 โ key verified)
Output
QUESTION โ Choose the correct letter, A, B, C or D.
Why do coral polyps depend on algae living in their tissues?
A. The algae provide calcium carbonate to build the skeleton.
B. Algae help polyps survive in cold, deep water.
C. The algae supply food to the polyps through photosynthesis.
D. Algae protect polyps from bleaching.
ANSWER KEY:
1. C โ the passage states polyps "depend on algae living in their tissues for food"; the other options are not stated.
Reading โ Sentence Completion (grounded, temp 0.3 โ answers in-passage)
Output
QUESTIONS โ Complete each sentence with ONE word from the passage.
1. Reefs grow only in warm, shallow water because the polyps depend on algae for _____. โ food
2. When water becomes too warm, the polyps expel the algae and turn white โ a process known as _____. โ bleaching
Supported types per section โ and how to prompt each
| Section ยท Type | Prompt <TYPE=โฆ> |
Temp | What you get |
|---|---|---|---|
| Reading ยท True/False/Not Given | TFNG |
0.3 | passage + statements + key (TRUE/FALSE/NOT GIVEN) |
| Reading ยท Yes/No/Not Given | YNNG |
0.3 | opinion passage + statements + key |
| Reading ยท Multiple choice | MCQ / MCQ_MULTI |
0.3 | passage + AโD question(s) + key |
| Reading ยท Sentence/Summary completion | SENTENCE_COMPLETION / SUMMARY_COMPLETION |
0.3 | passage + gap items + key (words from passage) |
| Reading ยท Matching | MATCHING_HEADINGS / MATCHING_FEATURES / MATCHING_ENDINGS |
0.5 | passage + matching task + key (experimental) |
| Reading ยท Long-form | LONGFORM |
0.6 | ~600-word passage + mixed question types + key |
| Writing ยท Task 1 / Task 2 | TASK1 / TASK2 |
0.7 | task + word limit + timing (+ chart data for T1) |
| Speaking ยท Part 1/2/3 | PART1 / PART2 / PART3 |
0.7 | examiner question / cue card + model answer |
| Listening | LISTENING |
0.7 | transcript + questions + key |
Tip โ for dependable Reading answer keys, generate grounded: prepend a real passage and add "Using ONLY the passage below โฆ Do not write a new passage." (as in the examples above).
Generating a full exam section (one passage โ all question types)
Don't ask for a whole section in one shot. Generate one passage, then each question type against it:
<โฆTYPE=LONGFORMโฆ> Write ONLY a ~600-word IELTS reading passage. No questions.- For each type:
Using ONLY the passage below, write 5 TFNG statements with an answer key. Do not write a new passage.\nPASSAGE:\n<passage> - Concatenate passage + each block โ a real-exam-style section (e.g. TFNG ร5 + MCQ ร4 + Sentence completion ร4). This grounds every question in one source text, so facts stay consistent and keys are verifiable. (The demo Space does exactly this.)
Usage (transformers)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "preparebuddy/ielts-4b"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto").eval()
SYSTEM = ("You generate authentic IELTS Academic practice content across reading, writing, "
"listening, and speaking. Produce passages, transcripts, tasks, questions, and answer "
"keys or model answers as appropriate to the section. Use IELTS-style register: "
"academic, neutral, factually plausible. This is content generation, not assessment.")
user = "<TEST=IELTS><SECTION=READING><TYPE=TFNG><DIFF=medium><TOPIC=solar power> Generate a short passage with 4 True/False/Not Given statements and an answer key."
inp = tok.apply_chat_template([{"role":"system","content":SYSTEM},{"role":"user","content":user}],
add_generation_prompt=True, enable_thinking=False, return_tensors="pt", return_dict=True).to(model.device)
out = model.generate(**inp, max_new_tokens=900, do_sample=True, temperature=0.3, top_p=0.9)
print(tok.decode(out[0][inp["input_ids"].shape[1]:], skip_special_tokens=True))
Settings: temp 0.3 for verdicts (TFNG/YNNG/MCQ), 0.7 for passages/writing/speaking; top_p 0.9; one SECTION+TYPE per call.
Recommended architecture for reliable output (important)
The model is a strong drafter. For dependable answer keys, run it as a system:
- Ground โ generate questions against a real passage (facts come from the source, not the model).
- Verify โ re-check each answer key with an independent judge (this 4B is itself a good verifier, ~74โ79%) and flag disagreements.
- Review/regenerate the small flagged minority.
Measured end-to-end: raw grounded generation โ 75% โ โ 85โ90% with this verify loop. A reference verify-and-flag loop is in the demo Space's app.py.
Strengths & honest limits (4B)
- โ Most balanced model in the family โ Writing/Speaking/Listening/MCQ strong; completion answers 100% verbatim-in-passage; facts generally accurate.
- โ ๏ธ Verdict generation (TFNG/YNNG): ~8% of generated verdicts have logic slips on an easy set (โ25% on varied from-scratch passages), and from-scratch passages can include a wrong fact โ review verdict items, or use grounding + verification.
- โ ๏ธ Minor: MCQ correct-answer letter can cluster โ spread positions at serving.
- โ 0 non-English-token leak; controlled length.
- Listening/Speaking output is text (for downstream TTS); no audio. Not an assessment tool.
Training
LoRA fine-tune of Qwen3.5-4B (bf16; r16/ฮฑ32; completion-only loss; enable_thinking=False; light recipe: 2 epochs, lr 1e-4) on 1,438 curated + balanced examples (โโ
NOT GIVEN in verdict types), trained on NVIDIA cloud GPUs (up to 48 GB); runs on a laptop. Dataset not released (proprietary). Full method, hardware and results: technical report.
License
Apache-2.0, inheriting from Qwen3.5-4B. Free to use, modify, distribute (incl. commercially); retain attribution to the base model and PrepareBuddy.
The 2B / 4B / 9B family โ pick the right one
| ielts-2b | ielts-4b โญ | ielts-9b | |
|---|---|---|---|
| Best for | cheapest; best verdict judge/verifier | balanced general use | best facts (from scratch) |
| Verdict accuracy (fine-tuned)ยน | 80% | 74% | 77% |
| Completion answers in-passage | โ ๏ธ 37% | โ 100% | โ 100% |
| Facts in from-scratch passages | weakest | good | โ best |
| MCQ answer-position | ok | ok | โ ๏ธ skews "B" |
| Size (bf16) | ~5 GB | ~9 GB | ~18 GB |
| Use with grounding | strongly | recommended | recommended |
ยน greedy, 101-item held-out gold. Fine-tuning's benefit is inversely proportional to base capability โ it transformed the 2B (+40) and was flat on the 4B/9B. Full method + findings + tables: technical report.
Getting better results: grounding + the re-checking loop (the biggest quality lever)
The model is a strong drafter. For reliable answer keys, run it as a small system โ this matters more than picking a bigger model:
1. Ground โ generate against a real passage so facts come from the source:
Using ONLY the passage below, write 4 True/False/Not Given statements with an answer key. Do NOT write a new passage.
PASSAGE: <your real passage>
2. Re-check (verify) โ independently re-judge each answer key, flag disagreements:
for statement in generated_statements:
verdict = judge(model, passage, statement) # TRUE / FALSE / NOT GIVEN
if verdict != generated_key[statement]:
flag_for_review_or_regenerate(statement) # the verifier catches ~75-80% of errors
A trained 2B catches ~75โ80% of verdict errors as a verifier (cheap); any 4B+ works too. Training a small model makes it a far better checker (40%โ80%), not just a better generator. 3. Review / regenerate the flagged minority. Measured: raw grounded generation โ 75%, lifted to โ 85โ90% by this loop. It's a strong filter, not a magic fixer.
Prompt tips
- Always use the tag prefix
<TEST=IELTS><SECTION=โฆ><TYPE=โฆ><DIFF=โฆ><TOPIC=โฆ>โ it's not a chat model. - Temperature: 0.3 for verdicts (TFNG/YNNG/MCQ), 0.7 for passages/writing/speaking; top_p 0.9.
enable_thinking=Falseโ reasoning mode lowers verdict accuracy here.- One SECTION+TYPE per call; build a full section by generating each type against one shared passage.
- Downloads last month
- -


