Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

Model card for leoflx/pvera_dinov2_b_clevrcount

This is the PVeRA adapter trained on the Clevr-Count dataset from the VTAB-1k benchmark. It is based on the ViT-B variant of DINOv2.

If you use this adapter, please cite.

@InProceedings{fillioux2025pvera,
  title={{PVeRA}: Probabilistic Vector-Based Random Matrix Adaptation},
  author={Fillioux, Leo and Ferrante, Enzo and Cournède, Paul-Henry and Vakalopoulou, Maria and Christodoulidis, Stergios},
  booktitle={Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
  year={2026}
}

Model Details

Model Description

This holds the weights for the PVeRA adapter, as well as the linear classification head, using the VTAB-1k (few-shot) version of the Clevr-Count dataset. The adapters were introduced in a frozen DINOv2 (ViT-B) model. More information about the training procedure below. The model scored an accuracy of 0.7388 on the predefined test set. Please see the original GitHub repository for instructions on how to download the dataset (here), and for the dataset class implementation (here).

Important note: this does not exactly reproduce the results from the original PVeRA paper (different implementation, average across multiple seed, ...).

Developed by: Leo Fillioux
Finetuned from model: DINOv2-B
Paper: PVeRA: Probabilistic Vector-Based Random Matrix Adaptation

Recommendations

This adapter was trained for DINOv2-B. Performance using other base models will likely lead to a loss in performance.

How to Get Started with the Model

Use the code below to get started with the model.

from peft import PeftModel
from transformers import AutoModelForImageClassification

base = AutoModelForImageClassification.from_pretrained("facebook/dinov2-base", num_labels=8)
model = PeftModel.from_pretrained(base, "leoflx/pvera_dinov2_b_clevrcount")

Training Details

Training Data

The dataset used is the VTAB-1k (few-shot) variant of the Clevr-Count dataset.

Training Procedure

Similarly to the original paper, a grid search was performed over 3 adapter learning rates (1e-3, 3e-3, 1e-2), with the final version corresponding to the model with the best validation accuracy. A fixed learning rate is used for the classifier head.

Preprocessing

The preprocessing steps are the following.

transforms.Compose([transforms.Resize([224, 224]),
                    transforms.ToTensor(),
                    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))])