SAM 3

3 min read Original article ↗

AI RESEARCH FROM META

Introducing Meta Segment Anything Model 3 (SAM 3)

With SAM 3 you can use text and visual prompts to precisely identify, segment, and follow any object in images or videos—coming soon to Instagram Edits and Vibes on the Meta AI app.

With SAM 3 you can use text and visual prompts to precisely identify, segment, and follow any object in images or videos—coming soon to Instagram Edits and Vibes on the Meta AI app.

SAM 3 CAPABILITIES

Advanced features, simple prompts

Using open vocabulary text or visual prompts, SAM 3 can detect, segment and track all matching objects in images and videos.

Text prompts

You can prompt SAM 3 with words and short phrases, to mask all objects matching the text description.

Exemplar Prompts

With exemplar prompts, you can simply draw a box around an example of the object you want to segment, and SAM 3 will mask all objects matching the outlined example.

Visual prompts

With all the capabilities of SAM 2, SAM 3 allows you to segment objects using positive and negative clicks.

Interactivity

If SAM 3 ever misses an object or makes a mistake, you can easily add follow-up prompts to help further guide the model.

BENCHMARKS

State-of-the-art performance

SAM 3 is state-of-the-art across all text and visual segmentation tasks in both images and videos. The model additionally maintains all the performance and functionality of SAM 2.

SAM 3 performance chart

Designed for real-world applications

Edits is the new video creation app by Instagram that helps creators make great videos on their phones. Creators will soon be able to use SAM 3 in Edits to quickly apply effects to people or objects in their videos, helping their creations stand out.

ENHANCED CAPABILITIES

Evolution of SAM

The Segment Anything models build on each other, offering increasingly advanced capabilities for developers and researchers to create, experiment and uplevel media workflows.

SAM 3

Detect, segment and track every example of any object category in an image or video, using text or examples

Segment an object from a click

Track segmented objects in videos

Refine prediction with follow up clicks

Detect and segment matching instances from text

Refine detection with visual examples

SAM 2

Segment and track any object in any image or video using click, box or mask prompts

Segment an object from a click

Track segmented objects in videos

Refine prediction with follow up clicks

SAM 1

Segment any object in any image with as little as a single click

Segment an object from a click

Refine prediction with follow up clicks

Try SAM 3 today

Experiment with SAM 3 in the Segment Anything Playground.

OUR APPROACH

New unified architecture

SAM 3 is built as a unified, promptable model that enables segmentation with language, exemplars and visual prompts across images and videos. It leverages a large-scale, diverse training dataset and a powerful perception encoder backbone to achieve state-of-the-art performance in segmentation and tracking using open-vocabulary short text phrases and visual prompts.

More from Segment Anything

SAM 3D

SAM 3D enables precise reconstruction and analysis of 3D people and objects, providing new opportunities for spatial understanding and applications.