π§ Roadblock Classification Model (v2)
π Overview
The Roadblock Classification Model (v2) is a fine-tuned transformer-based model built on BERT to classify student check-ins into two categories:
- ROADBLOCK β The student cannot move forward
- NOT_ROADBLOCK β The student is still making progress
This model is designed to understand semantic meaning, not just keywords, enabling it to differentiate between difficulty and true blockage.
π§ Motivation
β Problem with Version 1
The first version of this model attempted to classify:
- struggles
- confusion
- being stuck
all under one label
This created a major issue:
The model could not distinguish between temporary difficulty and actual inability to proceed
π₯ Why Version 2 Was Created
Version 2 was developed to separate definitions clearly:
| Concept | Meaning |
|---|---|
| Struggle | The student is experiencing difficulty |
| Roadblock | The student cannot move forward |
π₯ Key Insight
Not all struggles are roadblocks.
Example:
| Check-in | Correct Label |
|---|---|
| "I had problems but made progress" | NOT_ROADBLOCK |
| "I can't fix my code and I'm stuck" | ROADBLOCK |
βοΈ Model Architecture
- Base Model:
bert-base-uncased - Task: Binary Classification
- Framework: Hugging Face Transformers
- Training Environment: Google Colab (GPU)
π Dataset Design
The dataset was synthetically generated and refined iteratively to ensure:
β Semantic Accuracy
- Focus on meaning, not keywords
β Balanced Classes
- ROADBLOCK vs NOT_ROADBLOCK distribution controlled
β Language Diversity
- Includes:
- formal phrasing
- informal/slang expressions
- varied sentence structures
π¨ Bias Identification and Correction
π Initial Problem
Early versions of the dataset showed strong keyword bias, such as:
"problem"β always NOT_ROADBLOCK"can't"β always ROADBLOCK"stuck"β always ROADBLOCK
β οΈ Why This Was Dangerous
The model learned:
β keyword β label
instead of
β meaning β label
This caused incorrect predictions in real-world scenarios.
π§ Bias Mitigation Strategy
To eliminate bias, the dataset was redesigned to include:
1. Keyword Symmetry
Each keyword appears in both labels:
| Keyword | ROADBLOCK | NOT_ROADBLOCK |
|---|---|---|
| "problem" | βοΈ | βοΈ |
| "can't" | βοΈ | βοΈ |
| "stuck" | βοΈ | βοΈ |
2. Contrastive Examples
Pairs of sentences with similar wording but different meanings:
- "I can't fix it and I'm stuck" β ROADBLOCK
- "I can't fix it yet but I'm making progress" β NOT_ROADBLOCK
3. Pattern Diversity
Avoided over-reliance on patterns like:
"but"β NOT_ROADBLOCK
Instead included:
- "and I fixed it"
- "and it's working now"
- "and I solved it"
β Result
The model now learns:
progress vs no progress
instead of relying on surface-level patterns.
π§ͺ Model Evaluation
The model was tested on:
1. Clean Synthetic Data
- Achieved near-perfect validation scores (expected due to dataset similarity)
2. Edge Cases
- Handled ambiguous phrasing correctly
3. Realistic Language
Test examples:
| Input | Prediction |
|---|---|
| "lowkey stuck but I think I got it" | NOT_ROADBLOCK |
| "this bug annoying but I fixed it" | NOT_ROADBLOCK |
| "ngl I can't get this working" | ROADBLOCK |
| "still stuck idk what to do" | ROADBLOCK |
β οΈ Observed Limitation
Minor generalization gap:
- "I was confused but it's working now" β incorrectly predicted ROADBLOCK
π§ Fix Approach
Instead of regenerating the dataset:
Add targeted examples to cover missing language patterns
π Active Learning Strategy
This model is designed to serve as a base model for active learning.
π₯ Active Learning Workflow
- Model predicts on real check-ins
- Identify incorrect predictions
- Collect high-value error samples
- Add corrected examples to dataset
- Retrain model
π₯ Key Principle
High-confidence errors are more valuable than random samples
π― Goal
Continuously improve the model using real-world feedback, not just synthetic data.
π Future Improvements
- Integrate real Slack check-in data
- Expand dataset with informal and noisy text
- Add confidence-based filtering for active learning
- Combine with a Struggle Detection Model for multi-signal analysis
π§ Final Insight
This model represents a shift from:
β pattern-based classification
to
β meaning-based understanding
π― Conclusion
The Roadblock Classification Model (v2):
- Correctly distinguishes difficulty vs blockage
- Handles diverse language patterns
- Minimizes keyword bias
- Serves as a strong foundation for active learning systems
π₯ This is not just a model β it is a continuously improving system.
- Downloads last month
- 37