🚧 Roadblock Classification Model (v2)

📌 Overview

The Roadblock Classification Model (v2) is a fine-tuned transformer-based model built on BERT to classify student check-ins into two categories:

ROADBLOCK → The student cannot move forward
NOT_ROADBLOCK → The student is still making progress

This model is designed to understand semantic meaning, not just keywords, enabling it to differentiate between difficulty and true blockage.

🧠 Motivation

❌ Problem with Version 1

The first version of this model attempted to classify:

struggles
confusion
being stuck

all under one label

This created a major issue:

The model could not distinguish between temporary difficulty and actual inability to proceed

🔥 Why Version 2 Was Created

Version 2 was developed to separate definitions clearly:

Concept	Meaning
Struggle	The student is experiencing difficulty
Roadblock	The student cannot move forward

💥 Key Insight

Not all struggles are roadblocks.

Example:

Check-in	Correct Label
"I had problems but made progress"	NOT_ROADBLOCK
"I can't fix my code and I'm stuck"	ROADBLOCK

⚙️ Model Architecture

Base Model: bert-base-uncased
Task: Binary Classification
Framework: Hugging Face Transformers
Training Environment: Google Colab (GPU)

📊 Dataset Design

The dataset was synthetically generated and refined iteratively to ensure:

✅ Semantic Accuracy

Focus on meaning, not keywords

✅ Balanced Classes

ROADBLOCK vs NOT_ROADBLOCK distribution controlled

✅ Language Diversity

Includes:
- formal phrasing
- informal/slang expressions
- varied sentence structures

🚨 Bias Identification and Correction

🔍 Initial Problem

Early versions of the dataset showed strong keyword bias, such as:

"problem" → always NOT_ROADBLOCK
"can't" → always ROADBLOCK
"stuck" → always ROADBLOCK

⚠️ Why This Was Dangerous

The model learned:

❌ keyword → label
instead of
✅ meaning → label

This caused incorrect predictions in real-world scenarios.

🔧 Bias Mitigation Strategy

To eliminate bias, the dataset was redesigned to include:

1. Keyword Symmetry

Each keyword appears in both labels:

Keyword	ROADBLOCK	NOT_ROADBLOCK
"problem"	✔️	✔️
"can't"	✔️	✔️
"stuck"	✔️	✔️

2. Contrastive Examples

Pairs of sentences with similar wording but different meanings:

"I can't fix it and I'm stuck" → ROADBLOCK
"I can't fix it yet but I'm making progress" → NOT_ROADBLOCK

3. Pattern Diversity

Avoided over-reliance on patterns like:

"but" → NOT_ROADBLOCK

Instead included:

"and I fixed it"
"and it's working now"
"and I solved it"

✅ Result

The model now learns:

progress vs no progress
instead of relying on surface-level patterns.

🧪 Model Evaluation

The model was tested on:

1. Clean Synthetic Data

Achieved near-perfect validation scores (expected due to dataset similarity)

2. Edge Cases

Handled ambiguous phrasing correctly

3. Realistic Language

Test examples:

Input	Prediction
"lowkey stuck but I think I got it"	NOT_ROADBLOCK
"this bug annoying but I fixed it"	NOT_ROADBLOCK
"ngl I can't get this working"	ROADBLOCK
"still stuck idk what to do"	ROADBLOCK

⚠️ Observed Limitation

Minor generalization gap:

"I was confused but it's working now" → incorrectly predicted ROADBLOCK

🔧 Fix Approach

Instead of regenerating the dataset:

Add targeted examples to cover missing language patterns

🔁 Active Learning Strategy

This model is designed to serve as a base model for active learning.

🔥 Active Learning Workflow

Model predicts on real check-ins
Identify incorrect predictions
Collect high-value error samples
Add corrected examples to dataset
Retrain model

💥 Key Principle

High-confidence errors are more valuable than random samples

🎯 Goal

Continuously improve the model using real-world feedback, not just synthetic data.

🚀 Future Improvements

Integrate real Slack check-in data
Expand dataset with informal and noisy text
Add confidence-based filtering for active learning
Combine with a Struggle Detection Model for multi-signal analysis

🧠 Final Insight

This model represents a shift from:

❌ pattern-based classification
to
✅ meaning-based understanding

💯 Conclusion

The Roadblock Classification Model (v2):

Correctly distinguishes difficulty vs blockage
Handles diverse language patterns
Minimizes keyword bias
Serves as a strong foundation for active learning systems

🔥 This is not just a model — it is a continuously improving system.

Downloads last month: 37

Safetensors

Model size

0.1B params

Tensor type

F32

mjpsm
/

roadblock-classifier-v2