Segment Anything | Meta AI

Meta AI Computer Vision Research

Introduction

What is Segment Anything?

Segment Anything is a Meta AI research project that introduces a new AI model called Segment Anything Model (SAM), which can "cut out" any object in an image with a single click. SAM is a promptable segmentation system that can generalize to unfamiliar objects and images without additional training.

Features of Segment Anything
  • Zero-shot generalization to unfamiliar objects and images
  • Promptable design enables flexible integration with other systems
  • Extensible outputs can be used as inputs to other AI systems
  • Can take input prompts from other systems, such as object detectors or AR/VR headsets
  • Can generate multiple valid masks for ambiguous prompts
How to Use Segment Anything
  • Try the demo to see how SAM can segment objects in an image
  • Use interactive points and boxes to prompt SAM
  • Automatically segment everything in an image
  • Generate multiple valid masks for ambiguous prompts
Price
  • The model is currently available for research purposes, and the pricing for commercial use is not disclosed.
Helpful Tips
  • SAM can be used for a wide range of applications, including image editing, object tracking, and creative tasks like collaging.
  • The model can be integrated with other AI systems to enable more complex tasks.
  • The dataset used to train SAM is available for download and includes over 1.1 billion segmentation masks collected on 11 million licensed and privacy-preserving images.
Frequently Asked Questions

What type of prompts are supported?

  • Foreground/background points
  • Bounding box
  • Mask
  • Text prompts (not currently released)

What is the structure of the model?

  • A ViT-H image encoder that runs once per image and outputs an image embedding
  • A prompt encoder that embeds input prompts such as clicks or boxes
  • A lightweight transformer-based mask decoder that predicts object masks from the image embedding and prompt embeddings

What platforms does the model use?

  • PyTorch for the image encoder
  • ONNX runtime for the prompt encoder and mask decoder, which can run on CPU or GPU across a variety of platforms

How big is the model?

  • The image encoder has 632M parameters
  • The prompt encoder and mask decoder have 4M parameters

How long does inference take?

  • The image encoder takes ~0.15 seconds on an NVIDIA A100 GPU
  • The prompt encoder and mask decoder take ~50ms on CPU in the browser using multithreaded SIMD execution

Recommended For You

More Products

Social Caption

Visit Website

Image captions for social media in seconds with AI

Offers hassle-free website content editing with on-page editing and the power of GPT models.

Automate Your Blogging with AI Content Creation

Visit Website

Instanews.ai takes blogging on auto pilot to the next level. Our AI-driven platform crafts engaging articles from your Instagram posts effortlessly. Transform your social media into a bustling blog today.

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates