PVT v2
Collection
Improved Baselines with Pyramid Vision Transformer • 8 items • Updated
How to use OpenGVLab/pvt_v2_b3 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-classification", model="OpenGVLab/pvt_v2_b3")
pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png") # Load model directly
from transformers import AutoImageProcessor, AutoModelForImageClassification
processor = AutoImageProcessor.from_pretrained("OpenGVLab/pvt_v2_b3")
model = AutoModelForImageClassification.from_pretrained("OpenGVLab/pvt_v2_b3")This is the Hugging Face PyTorch implementation of the PVTv2 model.
The Pyramid Vision Transformer v2 (PVTv2) is a powerful, lightweight hierarchical transformer backbone for vision tasks. PVTv2 infuses convolution operations into its transformer layers to infuse properties of CNNs that enable them to learn image data efficiently. This mix transformer architecture requires no added positional embeddings, and produces multi-scale feature maps which are known to be beneficial for dense and fine-grained prediction tasks.
Vision models using PVTv2 for a backbone: