Summarization
Transformers
ONNX
Safetensors
English
t5
text2text-generation
text-summarization
meeting-summarization
qmsum
text-generation-inference
Instructions to use CodeXRyu/meeting-summarizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CodeXRyu/meeting-summarizer with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="CodeXRyu/meeting-summarizer")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("CodeXRyu/meeting-summarizer") model = AutoModelForSeq2SeqLM.from_pretrained("CodeXRyu/meeting-summarizer") - Notebooks
- Google Colab
- Kaggle
| language: en | |
| license: apache-2.0 | |
| tags: | |
| - text-summarization | |
| - meeting-summarization | |
| - t5 | |
| - transformers | |
| - qmsum | |
| datasets: | |
| - qmsum | |
| metrics: | |
| - rouge | |
| pipeline_tag: summarization | |
| # Meeting Summarizer | |
| This model is a fine-tuned version of `t5-small` for meeting summarization tasks. | |
| ## Model Details | |
| - **Base Model**: t5-small | |
| - **Task**: Abstractive Meeting Summarization | |
| - **Training Data**: QMSum Dataset + Enhanced Training | |
| - **Parameters**: t5-small architecture | |
| ## Training Configuration | |
| - **Max Input Length**: 256 tokens | |
| - **Max Output Length**: 64 tokens | |
| - **Batch Size**: 16 | |
| - **Learning Rate**: 5e-05 | |
| - **Training Epochs**: 1 | |
| - **Training Samples**: N/A | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSeq2SeqLM | |
| # Load model and tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("CodeXRyu/meeting-summarizer") | |
| model = AutoModelForSeq2SeqLM.from_pretrained("CodeXRyu/meeting-summarizer") | |
| def generate_summary(meeting_text, max_length=150): | |
| # Prepare input | |
| input_text = "summarize: " + meeting_text | |
| inputs = tokenizer(input_text, max_length=512, truncation=True, return_tensors="pt") | |
| # Generate summary | |
| summary_ids = model.generate( | |
| inputs["input_ids"], | |
| max_length=max_length, | |
| num_beams=4, | |
| length_penalty=2.0, | |
| early_stopping=True | |
| ) | |
| return tokenizer.decode(summary_ids[0], skip_special_tokens=True) | |
| # Example usage | |
| meeting_transcript = ''' | |
| John: Good morning team. Let's discuss our Q3 results. | |
| Sarah: Our sales exceeded targets by 15%, reaching $2.1M in revenue. | |
| Mike: The new marketing campaign was very effective. | |
| John: Great work everyone. Let's plan for Q4. | |
| ''' | |
| summary = generate_summary(meeting_transcript) | |
| print(summary) | |
| ``` | |
| ## Training Data | |
| This model was trained on the QMSum dataset, which contains real meeting transcripts from multiple domains: | |
| - Academic meetings | |
| - Product development meetings | |
| - Committee meetings | |
| ## Performance | |
| The model achieves competitive ROUGE scores on meeting summarization benchmarks. | |
| ## Limitations | |
| - Optimized for English meeting transcripts | |
| - Performance may vary on very long meetings (>512 tokens input) | |
| - Best suited for structured meeting formats with speaker labels | |
| ## Citation | |
| If you use this model, please cite: | |
| ``` | |
| @misc{meeting-summarizer-codexryu, | |
| author = {CodeXRyu}, | |
| title = {Meeting Summarizer}, | |
| year = {2025}, | |
| publisher = {Hugging Face}, | |
| url = {https://huggingface.co/CodeXRyu/meeting-summarizer} | |
| } | |
| ``` | |