arxiv:2603.28480

INSID3: Training-Free In-Context Segmentation with DINOv3

Published on Mar 30

· Submitted by

Gabriele Trivigno on Mar 31

Politecnico di Torino

Upvote

Authors:

Abstract

INSID3 demonstrates that frozen DINOv3 features can support versatile segmentation tasks without supervision or auxiliary models, achieving superior performance with reduced parameters.

AI-generated summary

In-context segmentation (ICS) aims to segment arbitrary concepts, e.g., objects, parts, or personalized instances, given one annotated visual examples. Existing work relies on (i) fine-tuning vision foundation models (VFMs), which improves in-domain results but harms generalization, or (ii) combines multiple frozen VFMs, which preserves generalization but yields architectural complexity and fixed segmentation granularities. We revisit ICS from a minimalist perspective and ask: Can a single self-supervised backbone support both semantic matching and segmentation, without any supervision or auxiliary models? We show that scaled-up dense self-supervised features from DINOv3 exhibit strong spatial structure and semantic correspondence. We introduce INSID3, a training-free approach that segments concepts at varying granularities only from frozen DINOv3 features, given an in-context example. INSID3 achieves state-of-the-art results across one-shot semantic, part, and personalized segmentation, outperforming previous work by +7.5 % mIoU, while using 3x fewer parameters and without any mask or category-level supervision. Code is available at https://github.com/visinf/INSID3 .

View arXiv page View PDF Project page GitHub 60 Add to collection

Community

gabTriv

Paper submitter about 11 hours ago

INSID3

A collaboration between Politecnico di Torino, TU Darmstadt, and TU Munich.

A training-free framework for in-context segmentation built directly on frozen DINOv3 features, without decoders, fine-tuning, or multi-model pipelines.

Shows that dense self-supervised representations alone can solve semantic, part, and personalized segmentation with strong generalization across domains.