spot_img
HomeResearch & DevelopmentSmart Labeling: How ConformalSAM Improves Segmentation with Foundational Models

Smart Labeling: How ConformalSAM Improves Segmentation with Foundational Models

TLDR: ConformalSAM is a novel framework for semi-supervised semantic segmentation that leverages foundational models like SEEM. It addresses the issue of low-quality pseudo-labels from these models by using conformal prediction to filter out unreliable pixel labels, ensuring only high-confidence labels are used for early training. A subsequent self-reliance training stage mitigates overfitting. This two-stage approach significantly boosts performance on standard benchmarks and can be integrated as a plug-in for existing SSSS methods.

In the world of artificial intelligence, especially in tasks like semantic segmentation where computers identify and outline objects pixel by pixel in images, there’s a big challenge: getting enough high-quality labeled data. Manually labeling images at this level is incredibly time-consuming and expensive. To ease this burden, a field called semi-supervised semantic segmentation (SSSS) has emerged, aiming to train models effectively using a mix of both labeled and a much larger amount of unlabeled data.

Recently, powerful ‘foundational models’ like the Segment Anything Model (SAM) and its variant SEEM, which are pre-trained on vast datasets, have shown remarkable ability to understand and segment images across different scenarios. This naturally leads to an exciting question: can these advanced foundational models help solve the data scarcity problem by acting as annotators for unlabeled images?

Researchers explored this by trying to use SEEM, a version of SAM fine-tuned for text input, to generate predictive masks for unlabeled images. However, simply using these SEEM-generated masks directly as training supervision didn’t work well. The reason is a ‘domain gap’ – the foundational model’s pre-training data might be different from the specific target data, leading to low-quality or inconsistent pixel labels.

Introducing ConformalSAM

To overcome these limitations and truly unlock the potential of foundational models in specific target domains with limited labels, a new framework called ConformalSAM has been proposed. ConformalSAM is built upon a technique called Conformal Prediction (CP), which is a powerful tool for quantifying uncertainty in predictions.

The framework operates in two main stages:

Stage I: CP-Calibrated Foundation Model

In the first stage, ConformalSAM uses the small amount of available labeled data to ‘calibrate’ the foundational model (SEEM). Think of this as teaching SEEM how to be more reliable for the specific task at hand. Conformal Prediction helps filter out unreliable pixel labels from SEEM’s initial predictions, ensuring that only high-confidence labels are used as supervision for the unlabeled data. This is particularly useful for identifying objects that are not background, especially when background pixels are dominant in an image. This calibrated approach helps the model learn effectively in its early training phase.

Stage II: Self-Reliance Training

While the calibrated SEEM masks are great for initial learning, relying on them too much can lead to the model ‘overfitting’ to any remaining inaccuracies in these pseudo-labels. To prevent this, ConformalSAM transitions to a ‘self-reliance training’ strategy in the later stages. Here, the model stops using SEEM-generated masks and instead generates its own pseudo-labels, refining its understanding of the target domain. A dynamic weighting strategy is also employed to adjust the balance between using ground-truth labels and these self-generated pseudo-labels, further mitigating overfitting and enhancing the model’s generalization.

Also Read:

Performance and Impact

Experiments on standard semi-supervised semantic segmentation benchmarks like PASCAL VOC and ADE20K demonstrate that ConformalSAM achieves superior performance compared to many recent SSSS methods. It significantly improves the quality of SEEM masks and, when combined with other strong SSSS methods like AllSpark, further boosts their performance, acting as a versatile ‘plug-in’. This highlights ConformalSAM’s ability to effectively balance the strong generalization capabilities of foundational models with the specific nuances of domain data.

While ConformalSAM’s effectiveness depends somewhat on the overlap between the foundational model’s knowledge and the target task, its flexible nature suggests it will become even more valuable as foundational models continue to expand their capabilities. This work paves the way for using foundational segmentation models as reliable annotators, especially when properly calibrated for specific tasks. For more technical details, you can refer to the full research paper here.

Ananya Rao
Ananya Raohttp://edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -