Instructions to use DoodDood/abercrombie-grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DoodDood/abercrombie-grpo with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B") model = PeftModel.from_pretrained(base_model, "DoodDood/abercrombie-grpo") - Notebooks
- Google Colab
- Kaggle
Abercrombie-GRPO
LoRA adapter for Qwen/Qwen3.5-4B, trained with GRPO to classify trademarks on the Abercrombie distinctiveness spectrum (Generic / Descriptive / Suggestive / Arbitrary / Fanciful).
Important: this adapter only works with the specific system prompt below and with
enable_thinking=False. The model was trained to emit a strict 6-line format; without the system prompt or with thinking mode on, output will be unreliable.
Links
- Training environment (Prime Intellect, public): smolclaims/abercrombie
- Base model: Qwen/Qwen3.5-4B
- Benchmark: LegalBench Abercrombie
Results
On the 95-row LegalBench Abercrombie held-out test set (non-thinking inference, greedy decoding):
| Category | Base Qwen3.5-4B | + Abercrombie-GRPO LoRA | Delta |
|---|---|---|---|
| Generic | 89% | 100% | +11 |
| Descriptive | 100% | 74% | -26 |
| Suggestive | 5% | 26% | +21 |
| Arbitrary | 0% | 47% | +47 |
| Fanciful | 5% | 95% | +90 |
| Overall | 40.0% | 68.4% | +28.4 |
Mean ordinal distance: 1.09 -> 0.53 (halved).
Output format
The model is trained to emit exactly six lines and nothing else:
Q1: [Yes/No]
Q2: [Yes/No]
Q3: [Yes/No]
Q4: [Yes/No]
Q5: [Yes/No]
FINAL_CLASSIFICATION: [Generic/Descriptive/Suggestive/Arbitrary/Fanciful]
Each Qn is a doctrinal sub-question. Q1 = coined term test, Q2 = semantic relationship, Q3 = imagination test, Q4 = immediate conveyance, Q5 = genus test. The routing rule (Q1=Yes -> Fanciful, else Q2=No -> Arbitrary, else Q5=Yes -> Generic, else Q4=Yes -> Descriptive, else Q3=Yes -> Suggestive) is baked into the system prompt.
Usage
1. Install
pip install transformers accelerate peft torch
2. System prompt (required - do not modify)
SYSTEM_PROMPT = """You are a trademark distinctiveness classifier. Given a mark and the goods or services it identifies, classify the mark on the Abercrombie spectrum: Generic, Descriptive, Suggestive, Arbitrary, or Fanciful.
Answer five questions about the mark, then provide a final classification. Evaluate each question in relation to the specific goods or services and the relevant purchasing public. Treat the mark as a whole; do not decompose compound marks into separate components.
Q1 - Coined Term Test. Is the mark an invented term created solely for trademark use, with no prior independent meaning?
Q2 - Semantic Relationship Test. Does the mark's ordinary dictionary meaning have any plausible semantic relationship to the goods or services?
Q3 - Imagination Test. Must the consumer use imagination, thought, or a multi-step mental process to connect the mark to the nature of the goods or services?
Q4 - Immediate Conveyance Test. Does the mark immediately convey an idea of a feature, quality, function, ingredient, or characteristic of the goods or services to the relevant purchasing public?
Q5 - Genus Test. Does the relevant purchasing public understand the mark primarily as the name of the general category of goods or services, rather than as an indicator of source?
When Q2=Yes and Q5=No, exactly one of Q3 or Q4 must be Yes: a semantically-related, non-generic mark is either descriptively immediate or suggestively imaginative, never neither.
Apply this routing rule to determine the final classification:
- If Q1 = Yes, classify as Fanciful
- Else if Q2 = No, classify as Arbitrary
- Else if Q5 = Yes, classify as Generic
- Else if Q4 = Yes, classify as Descriptive
- Else if Q3 = Yes, classify as Suggestive
Respond in exactly this format with no other text:
Q1: [Yes/No]
Q2: [Yes/No]
Q3: [Yes/No]
Q4: [Yes/No]
Q5: [Yes/No]
FINAL_CLASSIFICATION: [Generic/Descriptive/Suggestive/Arbitrary/Fanciful]"""
3. Load and run
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "Qwen/Qwen3.5-4B"
LORA = "DoodDood/abercrombie-grpo"
dtype = torch.bfloat16
tok = AutoTokenizer.from_pretrained(BASE)
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=dtype, device_map="auto")
model = PeftModel.from_pretrained(model, LORA)
model.eval()
def classify(mark_and_goods: str) -> str:
msgs = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": mark_and_goods},
]
prompt = tok.apply_chat_template(
msgs, tokenize=False, add_generation_prompt=True,
enable_thinking=False,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs, max_new_tokens=128, do_sample=False,
pad_token_id=tok.eos_token_id,
)
return tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
# Input format: `The mark "X" for Y.` (matches LegalBench phrasing)
print(classify('The mark "Kodak" for cameras.'))
# Expected: Q1: Yes, Q2-Q5: No, FINAL_CLASSIFICATION: Fanciful
print(classify('The mark "Apple" for personal computers.'))
# Expected: Q1: No, Q2: No, ..., FINAL_CLASSIFICATION: Arbitrary
print(classify('The mark "Salt" for packages of sodium chloride.'))
# Expected: Q1-Q4: No, Q5: Yes, FINAL_CLASSIFICATION: Generic
Important caveats
- Don't modify the system prompt. The model was trained against this exact prompt, including the Q-numbering and routing rule. Changes will degrade output.
- Always use
enable_thinking=False. The adapter was shaped on non-thinking forward passes; thinking-mode inference produces unreliable outputs. - Greedy decoding only. Sampling adds noise to a strict-format task. Use
do_sample=False. - Phrase the input as
The mark "X" for Y.This matches the LegalBench surface form the model was trained on. Other phrasings may work but are not guaranteed.
Method
Trained on Prime Intellect's hosted RL with the Verifiers framework on a custom synthetic dataset (2,100 marks, balanced across 5 classes, with a generator blacklist that excludes every LegalBench test mark - no train/test contamination).
Reward stack (5 functions, weights 1.0 / 0.3 / 0.2 / 0.15 / 0.3):
- Ordinal accuracy on the final label - distance-based, dominant signal.
- Decisive Q - the dispositive sub-element for the true label only.
- Consistency bonus - gated on correct answer AND matching decisive Q.
- Routing consistency - stated FINAL matches own self-routing.
- Routed truth - own Q-chain decomposition lands on the true label.
300 steps, batch 128, 16 rollouts/example, LoRA r=16. Total compute: ~$12.
The full environment, reward functions, and synthetic training data are public at the Prime Intellect env page.
- Downloads last month
- 54