Audio Transcription Model Stages

This repository contains comparable exports of the same audio transcription stack at two training stages:

stage_a
stage_b

Each stage subfolder contains:

the full Whisper encoder weights
the full LLM weights
the stage-specific audio_projector weights
tokenizer and feature extractor files

Load A Specific Stage

import torch
from transformers import AutoModel, AutoTokenizer, WhisperFeatureExtractor

stage = "stage_b"

model = AutoModel.from_pretrained(
    "RaghaRao314159/transcription-models",
    subfolder=stage,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(
    "RaghaRao314159/transcription-models",
    subfolder=stage,
    trust_remote_code=True,
)
feature_extractor = WhisperFeatureExtractor.from_pretrained(
    "RaghaRao314159/transcription-models",
    subfolder=stage,
)

Compare Both Stages

python pull_model_and_infer.py --model-source "RaghaRao314159/transcription-models" --stage both --audio-path test.mp3

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for RaghaRao314159/transcription-models

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(788)

this model