Active Learning for Segmentation with Vision Foundation Models

The goal of this thesis is to develop and evaluate Active Learning methods for semantic segmentation using modern Vision Foundation Models (VFMs) such as SAM, DINOv3, or RADIO. The focus is on reducing annotation effort, especially in Out-Of-Distribution (OOD) settings, where the target domain differs significantly from the source data used during pretraining.

The work investigates how representation-space information from foundation models can be used to identify informative samples for annotation. In contrast to classical uncertainty-based Active Learning approaches, the thesis focuses on feature-space novelty, segmentation instability, and representation diversity as acquisition signals for sample selection.

The data used may include natural image datasets as well as specialized domains such as medical imaging or cultural heritage data. A particular emphasis is placed on evaluating how Active Learning strategies behave under domain shift and how efficiently foundation models can be adapted to new segmentation tasks with limited labeled data.

The work includes a literature review, the preparation of suitable datasets, the implementation of Active Learning pipelines, and the evaluation of different acquisition strategies. A particular focus is placed on the development and investigation of novel acquisition functions tailored to Vision Foundation Models and out-of-distribution segmentation settings. Finally, the results should be documented and analyzed with respect to annotation efficiency, segmentation performance, and robustness under OOD conditions. Additionally, interest in scientific writing and the potential publication of the developed methods is highly encouraged.

Active Learning Illustration — Active learning pipeline illustration.

Core Related Work

Revisiting Active Learning in the Era of Vision Foundation Models

Tasks

Conduct a literature review on Active Learning, semantic segmentation, and Vision Foundation Models
Prepare and analyse of (cross-domain) segmentation datasets
Implement segmentation pipelines using pretrained Vision Foundation Models
Develop and evaluate Active Learning strategies based on feature-space novelty and segmentation instability
Compare representation-based acquisition methods with classical uncertainty-based approaches
Assess annotation efficiency and robustness under domain shift
Document results and discuss practical applicability and future research directions

Contact

To apply or for further enquiries, please send an email to:

Rafael Sterzinger: rafael.sterzinger@tuwien.ac.at