Instructions to use NemoStation/Marlin-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NemoStation/Marlin-2B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("NemoStation/Marlin-2B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("NemoStation/Marlin-2B", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Question about the evaluation metrics for captioning benchmarks
Hi, thanks for releasing Marlin-2B and the evaluation results. I have a question regarding the metric used in the leaderboard figures.
In the captioning plots, the y-axis is labeled as “VideoEvalV2 mean / 10” for benchmarks such as DREAM-1K and CaReBench. I noticed that the reported scores do not match the official leaderboard scores, which use the Recall/Precision/F1 metric.
Could you clarify: What exactly is “VideoEvalV2”?
I’m very interested in video caption tasks, so I’d really appreciate any clarification.
this week we are releasing a series of blog post on what benchmarks we are using and our whole journey that will shed more light on the benchmarks and "VideoEvalV2" is our benchmarks where we used llm as a judge on videos directly rather than using text based ground truth as they were used in carbench and dream-1k
Here is our blog post series: https://nemostation.com/blog/marlin-2b-the-map-was-wrong
will release the benchmark by the end of the blog series.