Post
112
"Summarization Bias" — a short data-engineering note.
Write a scene so that no emotion word appears, then ask a model to
summarize it. It comes back as "the character feels anxious." The model
performed the exact move the text was built to avoid: it re-attached
the label.
This bites hardest in evaluation. An LLM-as-judge runs the same step
internally, silently re-labels what was shown, then penalizes text that
did its job. So the dataset ships a transparent, rule-based detector
instead of an LLM judge, plus hard negatives that mark the re-labeling
move as a negative.
Honest limit: one rule (atmosphere contradiction) detects at ~10% —
the boundary where flat pattern-matching runs out.
Learn More: https://huggingface.co/blog/leventbulut/summarization-bias
Dataset: leventbulut/objective-projection
Write a scene so that no emotion word appears, then ask a model to
summarize it. It comes back as "the character feels anxious." The model
performed the exact move the text was built to avoid: it re-attached
the label.
This bites hardest in evaluation. An LLM-as-judge runs the same step
internally, silently re-labels what was shown, then penalizes text that
did its job. So the dataset ships a transparent, rule-based detector
instead of an LLM judge, plus hard negatives that mark the re-labeling
move as a negative.
Honest limit: one rule (atmosphere contradiction) detects at ~10% —
the boundary where flat pattern-matching runs out.
Learn More: https://huggingface.co/blog/leventbulut/summarization-bias
Dataset: leventbulut/objective-projection