🚧 Roadblock Classification Model (v2)

πŸ“Œ Overview

The Roadblock Classification Model (v2) is a fine-tuned transformer-based model built on BERT to classify student check-ins into two categories:

  • ROADBLOCK β†’ The student cannot move forward
  • NOT_ROADBLOCK β†’ The student is still making progress

This model is designed to understand semantic meaning, not just keywords, enabling it to differentiate between difficulty and true blockage.


🧠 Motivation

❌ Problem with Version 1

The first version of this model attempted to classify:

  • struggles
  • confusion
  • being stuck

all under one label

This created a major issue:

The model could not distinguish between temporary difficulty and actual inability to proceed


πŸ”₯ Why Version 2 Was Created

Version 2 was developed to separate definitions clearly:

Concept Meaning
Struggle The student is experiencing difficulty
Roadblock The student cannot move forward

πŸ’₯ Key Insight

Not all struggles are roadblocks.

Example:

Check-in Correct Label
"I had problems but made progress" NOT_ROADBLOCK
"I can't fix my code and I'm stuck" ROADBLOCK

βš™οΈ Model Architecture

  • Base Model: bert-base-uncased
  • Task: Binary Classification
  • Framework: Hugging Face Transformers
  • Training Environment: Google Colab (GPU)

πŸ“Š Dataset Design

The dataset was synthetically generated and refined iteratively to ensure:

βœ… Semantic Accuracy

  • Focus on meaning, not keywords

βœ… Balanced Classes

  • ROADBLOCK vs NOT_ROADBLOCK distribution controlled

βœ… Language Diversity

  • Includes:
    • formal phrasing
    • informal/slang expressions
    • varied sentence structures

🚨 Bias Identification and Correction

πŸ” Initial Problem

Early versions of the dataset showed strong keyword bias, such as:

  • "problem" β†’ always NOT_ROADBLOCK
  • "can't" β†’ always ROADBLOCK
  • "stuck" β†’ always ROADBLOCK

⚠️ Why This Was Dangerous

The model learned:

❌ keyword β†’ label
instead of
βœ… meaning β†’ label

This caused incorrect predictions in real-world scenarios.


πŸ”§ Bias Mitigation Strategy

To eliminate bias, the dataset was redesigned to include:

1. Keyword Symmetry

Each keyword appears in both labels:

Keyword ROADBLOCK NOT_ROADBLOCK
"problem" βœ”οΈ βœ”οΈ
"can't" βœ”οΈ βœ”οΈ
"stuck" βœ”οΈ βœ”οΈ

2. Contrastive Examples

Pairs of sentences with similar wording but different meanings:

  • "I can't fix it and I'm stuck" β†’ ROADBLOCK
  • "I can't fix it yet but I'm making progress" β†’ NOT_ROADBLOCK

3. Pattern Diversity

Avoided over-reliance on patterns like:

  • "but" β†’ NOT_ROADBLOCK

Instead included:

  • "and I fixed it"
  • "and it's working now"
  • "and I solved it"

βœ… Result

The model now learns:

progress vs no progress
instead of relying on surface-level patterns.


πŸ§ͺ Model Evaluation

The model was tested on:

1. Clean Synthetic Data

  • Achieved near-perfect validation scores (expected due to dataset similarity)

2. Edge Cases

  • Handled ambiguous phrasing correctly

3. Realistic Language

Test examples:

Input Prediction
"lowkey stuck but I think I got it" NOT_ROADBLOCK
"this bug annoying but I fixed it" NOT_ROADBLOCK
"ngl I can't get this working" ROADBLOCK
"still stuck idk what to do" ROADBLOCK

⚠️ Observed Limitation

Minor generalization gap:

  • "I was confused but it's working now" β†’ incorrectly predicted ROADBLOCK

πŸ”§ Fix Approach

Instead of regenerating the dataset:

Add targeted examples to cover missing language patterns


πŸ” Active Learning Strategy

This model is designed to serve as a base model for active learning.


πŸ”₯ Active Learning Workflow

  1. Model predicts on real check-ins
  2. Identify incorrect predictions
  3. Collect high-value error samples
  4. Add corrected examples to dataset
  5. Retrain model

πŸ’₯ Key Principle

High-confidence errors are more valuable than random samples


🎯 Goal

Continuously improve the model using real-world feedback, not just synthetic data.


πŸš€ Future Improvements

  • Integrate real Slack check-in data
  • Expand dataset with informal and noisy text
  • Add confidence-based filtering for active learning
  • Combine with a Struggle Detection Model for multi-signal analysis

🧠 Final Insight

This model represents a shift from:

❌ pattern-based classification
to
βœ… meaning-based understanding


πŸ’― Conclusion

The Roadblock Classification Model (v2):

  • Correctly distinguishes difficulty vs blockage
  • Handles diverse language patterns
  • Minimizes keyword bias
  • Serves as a strong foundation for active learning systems

πŸ”₯ This is not just a model β€” it is a continuously improving system.

Downloads last month
37
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using mjpsm/roadblock-classifier-v2 1