PrepareBuddy IELTS-3B

A specialized content-generation model that produces IELTS Academic practice material across all four sections — Reading, Writing, Listening, Speaking — from a simple structured prompt. It is a fine-tune of SmolLM3-3B (Apache-2.0), trained on PrepareBuddy's own curated IELTS content.

This is a content generator, not an assessment tool. It writes passages, transcripts, tasks, questions and answer keys. It does not score student work — scoring is intentionally out of scope.

Built by PrepareBuddy. A fine-tune of SmolLM3-3B — not a from-scratch foundation model.

Run it anywhere:

  • 🖥️ LM Studio / Ollama / llama.cpp — the GGUF build (Q8_0, runs on Mac / Windows / Linux). Easiest for most people.
  • 🍎 Apple Silicon (MLX) — the MLX build (mlx-lm, or LM Studio's MLX runtime).
  • 🐍 In codetransformers (this repo); see Usage below.
  • 🌐 In your browser — the demo Space (free tier — may be slow).

What it generates

Section Types Output
Reading TFNG, YNNG, MCQ, Sentence/Summary Completion, Matching*, Long-form passage + questions + answer key with justifications
Writing Task 1 (chart), Task 2 (essay) task prompt + word limit + timing
Listening dialogue/monologue transcript + questions + answer key (text for downstream TTS)
Speaking Part 1, 2, 3 examiner question / cue card + model answer

* Matching is experimental — see Limitations.

Prompt format

Conditioned on a structured tag prefix + a short instruction:

<TEST=IELTS><SECTION=READING><TYPE=TFNG><DIFF=medium><TOPIC=ocean currents> Generate a short passage with 4 True/False/Not Given statements and an answer key.
  • SECTION = READING | WRITING | LISTENING | SPEAKING
  • TYPE (Reading) = TFNG | YNNG | MCQ | SENTENCE_COMPLETION | SUMMARY_COMPLETION | MATCHING_HEADINGS | MATCHING_FEATURES | MATCHING_ENDINGS | LONGFORM
  • TYPE (Writing) = TASK1 | TASK2; (Speaking) = PART1 | PART2 | PART3; (Listening) = LISTENING
  • DIFF = easy | medium | hard

Supported types per section — and how to generate each

Reading — set <TYPE=...> to one of these (direct, one call → passage + that question type):

IELTS question type TYPE token Reliability
True / False / Not Given TFNG good (review verdicts)
Yes / No / Not Given YNNG good (review verdicts)
Multiple choice (single answer) MCQ strong
Multiple choice (choose two) MCQ_MULTI ok
Sentence completion SENTENCE_COMPLETION good
Summary completion SUMMARY_COMPLETION ok
Matching headings MATCHING_HEADINGS experimental
Matching features MATCHING_FEATURES experimental
Matching sentence endings MATCHING_ENDINGS experimental
Full passage, mixed question types LONGFORM variable
<TEST=IELTS><SECTION=READING><TYPE=MCQ><DIFF=medium><TOPIC=the printing press> Generate a short passage followed by one multiple-choice question (A-D) with an answer key.

Listening — uses a single token, <TYPE=LISTENING>, which produces a transcript + questions (there are no per-question-type tokens for listening). To control the question type, generate the transcript first, then ask for the type you want against it (this is "exam mode"):

Step 1: <TEST=IELTS><SECTION=LISTENING><TYPE=LISTENING><DIFF=medium><TOPIC=booking a tour> Write only the transcript (label each speaker). No questions.
Step 2: Using ONLY the transcript below, write 5 multiple-choice questions (A-C) with an answer key. <transcript>

Supported listening question types this way: multiple choice, sentence / note / form completion, matching. (Map/plan labelling and audio are out of scope — output is text for downstream TTS.)

Writing<TYPE=TASK1> (Academic Task 1: describe a chart/graph) · <TYPE=TASK2> (Task 2 essay).

Speaking<TYPE=PART1> (interview) · <TYPE=PART2> (cue card / long turn) · <TYPE=PART3> (discussion).

Whole section at once (one passage/transcript → all question types, or all parts/tasks) — see Generating a full exam section below; a working implementation lives in the demo Space (preparebuddy/ielts-3b-demoapp.py).

Examples (input → output)

⚠️ Not a chat model. It expects the structured <TEST=IELTS>... prefix — plain chat questions ("make me a test") give weaker results. The examples below are real, unedited outputs (passages/transcripts trimmed with […] for length).

Reading — True/False/Not Given (temp 0.3)

Input

<TEST=IELTS><SECTION=READING><TYPE=TFNG><DIFF=medium><TOPIC=the Sahara Desert> Generate a short passage with 4 True/False/Not Given statements and an answer key.

Output

PASSAGE:
The Sahara Desert is the largest hot desert in the world, covering approximately 9 million
square kilometres across North Africa. It is divided into two main regions: the Western Sahara
and the Eastern Sahara. The Western Sahara is generally cooler and receives more rainfall than
the Eastern Sahara, which is hotter and drier. The Sahara is home to several archaeological
sites, including the ancient city of Timbuktu, once a major centre of learning and trade.

Statements:
1. The Sahara Desert is the largest hot desert in the world.
2. The Sahara Desert covers less than 5 million square kilometres.
3. The Eastern Sahara is cooler and receives more rainfall than the Western Sahara.
4. Archaeological sites in the Sahara include the ancient city of Timbuktu.

ANSWER KEY (with justification):
1. TRUE  — The passage states the Sahara is "the largest hot desert in the world".
2. FALSE — The passage states it covers "approximately 9 million square kilometres".
3. FALSE — The passage says the Western Sahara (not the Eastern) is cooler and wetter.
4. TRUE  — The passage names Timbuktu as an archaeological site in the Sahara.

(A real mix of TRUE/FALSE here; NOT GIVEN is the model's least reliable verdict — see Limitations.)

Reading — Multiple Choice (temp 0.3)

Input

<TEST=IELTS><SECTION=READING><TYPE=MCQ><DIFF=medium><TOPIC=the discovery of DNA> Generate a short passage followed by one multiple-choice question (A-D) with an answer key.

Output

PASSAGE:
[...] In 1953, James Watson and Francis Crick proposed a double-helix model for DNA [...]
Rosalind Franklin's X-ray diffraction images, particularly Photo 51, provided critical
evidence that supported the double-helix model. [...]

QUESTIONS — Choose the correct letter, A, B, C or D.
1. Which scientist's X-ray diffraction images provided critical evidence for the double-helix model?
   A. Rosalind Franklin   B. James Watson   C. Francis Crick   D. Maurice Wilkins

ANSWER KEY (with justification):
1. A — Rosalind Franklin's X-ray diffraction images, particularly Photo 51, provided the critical evidence.

Writing — Task 2 (temp 0.7)

Input

<TEST=IELTS><SECTION=WRITING><TYPE=TASK2><DIFF=medium><TOPIC=whether governments should fund the arts> Generate an IELTS Academic Writing Task 2 question with word limit and timing.

Output

TASK:
Some people think that governments should fund the arts, while others believe that
private funding is sufficient.
Do you agree or disagree?

WORD LIMIT: 250 words   TIME: 40 minutes

Writing — Task 1 (temp 0.7)

Input

<TEST=IELTS><SECTION=WRITING><TYPE=TASK1><DIFF=medium><TOPIC=a chart of city populations> Generate an IELTS Academic Writing Task 1 question describing a chart, with word limit and timing.

Output

TASK:
The graph below shows the population growth (in millions) of four cities — A, B, C, and D —
between 2010 and 2020. Summarise the information by selecting and reporting the main
features, and make comparisons where relevant.

CHART DESCRIPTION:
A line graph showing population figures (in millions) from 2010 to 2020 [...] City A
starts at 6.1 million and reaches 7.3 million [...] City D falls from 2.1 to 1.5 million.

WORD LIMIT: 150 words   TIME: 20 minutes

Speaking — Part 2 (temp 0.7)

Input

<TEST=IELTS><SECTION=SPEAKING><TYPE=PART2><DIFF=medium><TOPIC=a skill you would like to learn> Generate an IELTS Speaking Part 2 cue card with a model answer.

Output

EXAMINER QUESTION:
What's one skill you'd like to learn in your free time?

MODEL ANSWER:
I'd love to learn photography. There's something incredibly satisfying about capturing a
moment with a camera [...] I'd love to learn the finer details of composition, lighting,
and editing software to really develop my skill.

Listening (temp 0.7)

Input

<TEST=IELTS><SECTION=LISTENING><TYPE=LISTENING><DIFF=medium><TOPIC=registering for a library membership> Generate an IELTS Listening transcript with 4 questions and an answer key.

Output

TRANSCRIPT:
Customer Service Representative: [...] welcome to the library. How may I assist you?
Student: I'd like to register for a library membership. [...]
CSR: [...] fill out a form [...] full name, email address, home phone number, mobile
phone number, and a brief statement about how you plan to use the library. [...]
CSR: [...] library membership in the United Kingdom is free of charge. [...]

QUESTIONS
1. What did the representative ask the student to do to register?
   A. Call a friend  B. Read a book  C. Fill out a form  D. Attend a workshop
[... 3 more ...]

ANSWER KEY (with justification):
1. C — The representative asked the student to fill out a form. [...]
4. A — Library membership in the United Kingdom is free of charge.

Matching (experimental)

<TYPE=MATCHING_HEADINGS|MATCHING_FEATURES|MATCHING_ENDINGS> produce a passage + a statements-to-options matching task, but are not reliable — they may truncate or mis-format. Generate with a validate-and-retry loop and review the result. See Limitations.


Usage (transformers)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "preparebuddy/ielts-3b"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.float16, device_map="auto").eval()

SYSTEM = ("You generate authentic IELTS Academic practice content across reading, writing, "
          "listening, and speaking. Produce passages, transcripts, tasks, questions, and answer "
          "keys or model answers as appropriate to the section. Use IELTS-style register: "
          "academic, neutral, factually plausible. This is content generation, not assessment.")
user = ("<TEST=IELTS><SECTION=READING><TYPE=TFNG><DIFF=medium><TOPIC=solar power> "
        "Generate a short passage with 4 True/False/Not Given statements and an answer key.")

inp = tok.apply_chat_template(
    [{"role":"system","content":SYSTEM},{"role":"user","content":user}],
    add_generation_prompt=True, return_tensors="pt", return_dict=True).to(model.device)
out = model.generate(**inp, max_new_tokens=900, do_sample=True, temperature=0.3, top_p=0.9)
print(tok.decode(out[0][inp["input_ids"].shape[1]:], skip_special_tokens=True))

Recommended settings

  • Temperature 0.3 for verdict tasks (TFNG / YNNG / MCQ) — cleaner, more consistent verdicts.
  • Temperature 0.7 for passages, writing, speaking, listening — natural variety. top_p = 0.9.
  • One SECTION+TYPE per call; assemble a full test from multiple calls.
  • For production, validate the structure and regenerate on malformed output (the fragile reading types occasionally need a retry). A reference validate-and-regenerate implementation is in the demo Space's app.py (preparebuddy/ielts-3b-demo).

Generating a full exam section (one passage → all question types)

Real IELTS sections have one passage/transcript with several question types. A single model call makes one passage + one type, so to build a real-exam section you orchestrate: generate the passage/transcript once, then generate each question type against that same context, and assemble. (Each call stays small, so quality holds.)

PASSAGE = generate_reading_passage(topic)          # 1) one ~600-word passage
tfng = ask_questions(PASSAGE, "TFNG", n=5)         # 2) questions grounded in THAT passage
mcq  = ask_questions(PASSAGE, "MCQ",  n=4)
comp = ask_questions(PASSAGE, "SENTENCE_COMPLETION", n=4)
full_reading_section = PASSAGE + tfng + mcq + comp # 3) assemble

where ask_questions prompts: "Using ONLY the passage below, write N {TYPE} questions with an answer key. Do not write a new passage." (the model writes questions about the supplied passage). The same pattern builds a full Listening section (transcript → MCQ + completion), a full Speaking test (Part 1 + 2 + 3), and a full Writing test (Task 1 + Task 2). A working implementation (the full_section(...) function) is in the demo Space's app.py (preparebuddy/ielts-3b-demo).

Strengths

  • Writing, Speaking, Listening, and Reading-MCQ are consistently strong and natural.
  • Reading TFNG / YNNG / sentence-completion / long-form are reliable with validate+regenerate.
  • ~3B params; runs on a laptop or a modest GPU (and on Apple Silicon via MLX after conversion).

Limitations (please read)

  • Verdict correctness (~70–75%). For TFNG/YNNG the verdict is occasionally wrong or inconsistent with its own justification — a small-model reasoning ceiling. Recommended: gate verdict items through your own answer-checking step before use.
  • Matching types are experimental and may fail to produce a complete, well-formed item.
  • Occasional factual slips in passages — plausible, not a factual reference.
  • Listening/Speaking output is text for downstream TTS; no audio is produced.
  • Not an assessment/scoring tool.

Training

Fine-tune of SmolLM3-3B on PrepareBuddy's proprietary, human-curated IELTS content, plus a small set of author-written examples to stabilize under-represented formats. The training dataset is not released (proprietary).

License & attribution

Apache-2.0, inheriting from the SmolLM3-3B base. Free to use, modify, and distribute, including commercially; please retain attribution to the base model and to PrepareBuddy.

Downloads last month
119
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for preparebuddy/ielts-3b

Finetuned
(140)
this model
Quantizations
1 model

Space using preparebuddy/ielts-3b 1