Meta’s new AI model tags and tracks every object in your videos
Meta has a new AI model that can label and track any object in a video as it moves. The Segment Anything Model 2 (SAM 2) expands the capabilities of its predecessor, SAM, which was limited to images, and opens up new possibilities for video editing and analysis.
SAM 2’s real-time segmentation is a potentially huge technical leap. It shows how AI can process moving images and distinguish between elements on the screen, even as they move around or out of the frame and back in.
Segmentation is the term for how software determines which pixels in an image belong to which objects. An AI assistant that can do this makes it a lot easier to process or edit complex images. That was the breakthrough of Meta’s original SAM. SAM has helped segment sonar images of coral reefs, parsed satellite imagery to aid disaster response, and even analyzed cellular images to detect skin cancer.
SAM 2 increases video capacity, which is no small feat and would have been unachievable until recently. As part of SAM 2’s debut, Meta shared a database of 50,000 videos that were captured to train the model. That’s on top of the 100,000 other videos that Meta mentioned. Along with all the training data, real-time video segmentation takes a significant amount of computing power, so while SAM 2 is currently open and free, it likely won’t stay that way forever.
Segment success
SAM 2 allowed video editors to isolate and manipulate objects within a scene more easily than with the limited capabilities of current editing software, far beyond manually adjusting each frame. Meta also envisions SAM 2 revolutionizing interactive video, allowing users to select and manipulate objects within live video or virtual spaces using its AI model.
Meta believes that SAM 2 can also play a crucial role in the development and training of computer vision systems, particularly in autonomous vehicles. Accurate and efficient object tracking is essential for these systems to safely interpret and navigate their environments. SAM 2’s capabilities can accelerate the annotation process of visual data and provide high-quality training data for these AI systems.
Much of the AI video hype has revolved around generating video from text prompts. Models like Sora, Runway, and OpenAI’s Google Veo get a lot of attention for a reason. But the kind of editing power that SAM 2 offers could play an even bigger role in embedding AI into video creation.
And while Meta may have a head start for now, other AI video developers are eager to produce their own versions. Google’s recent research, for example, has led to video summarization and object recognition features that it’s testing on YouTube. Adobe and its Firefly AI tools are also focused on photo and video editing, and include content-aware fill and auto-reframing features.