Upload folder using huggingface_hub

d38284c verified 4 months ago

3.21 kB

license: apache-2.0
language: en
tags:
  - text-classification
  - ai-detection
  - academic-integrity
  - transformers
pipeline_tag: text-classification
base_model: roberta-base
model-index:
  - name: roberta-ai-detector-v2
    results:
      - task:
          type: text-classification
          name: AI Text Detection
        metrics:
          - type: accuracy
            value: 99.04
            name: Accuracy
          - type: f1
            value: 99.04
            name: F1 Score
          - type: roc_auc
            value: 99.74
            name: ROC AUC

roberta-ai-detector-v2

RoBERTa-based AI text detector fine-tuned for academic writing

Model Description

This model is fine-tuned to detect AI-generated text in academic papers and essays. It distinguishes between human-written and AI-generated content with high accuracy.

Model type: roberta
Language(s): EN
License: Apache 2.0
Fine-tuned from: roberta-base

Intended Use

This model is intended for:

Detecting AI-generated content in academic submissions
Research on AI text detection
Educational tools for academic integrity

Important: This model should be used as one signal among many when evaluating text authenticity. It should not be the sole basis for academic misconduct decisions.

Performance

Metric	Score
Accuracy	99.04%
F1 Score	99.04%
ROC AUC	99.74%

Training Data

The model was trained on 56,213 samples of paired human and AI-generated academic text, including outputs from:

Claude (Anthropic)
GPT models (OpenAI)
Gemini (Google)

Evaluation

Evaluated on 11,023 held-out test samples.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
model_name = "coai/roberta-ai-detector-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Predict
text = "Your text to analyze..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    ai_probability = probs[0][1].item()  # Probability of AI-generated

print(f"AI Probability: {ai_probability:.2%}")

Limitations

Optimized for academic/formal writing; may be less accurate on casual text
Performance may vary on text from AI models not in the training set
Should not be used as the sole determinant of academic misconduct
May have reduced accuracy on very short texts (<50 words)

Ethical Considerations

False positives can have serious consequences for students
Always use human judgment alongside model predictions
Consider the context and provide opportunities for appeal
This tool is meant to assist, not replace, human evaluation

Citation

If you use this model, please cite:

@misc{roberta_ai_detector_v2},
  author = {COAI},
  title = {roberta-ai-detector-v2: AI Text Detection Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/coai/roberta-ai-detector-v2}
}

Contact

For questions or issues, please open an issue on the model repository.