Segment Anything | Meta AI

Meta AI Computer Vision Research

Introduction

What is Segment Anything?

Segment Anything is a Meta AI research project that introduces a new AI model called Segment Anything Model (SAM), which can "cut out" any object in an image with a single click. SAM is a promptable segmentation system that can generalize to unfamiliar objects and images without additional training.

Features of Segment Anything
  • Zero-shot generalization to unfamiliar objects and images
  • Promptable design enables flexible integration with other systems
  • Extensible outputs can be used as inputs to other AI systems
  • Can take input prompts from other systems, such as object detectors or AR/VR headsets
  • Can generate multiple valid masks for ambiguous prompts
How to Use Segment Anything
  • Try the demo to see how SAM can segment objects in an image
  • Use interactive points and boxes to prompt SAM
  • Automatically segment everything in an image
  • Generate multiple valid masks for ambiguous prompts
Price
  • The model is currently available for research purposes, and the pricing for commercial use is not disclosed.
Helpful Tips
  • SAM can be used for a wide range of applications, including image editing, object tracking, and creative tasks like collaging.
  • The model can be integrated with other AI systems to enable more complex tasks.
  • The dataset used to train SAM is available for download and includes over 1.1 billion segmentation masks collected on 11 million licensed and privacy-preserving images.
Frequently Asked Questions

What type of prompts are supported?

  • Foreground/background points
  • Bounding box
  • Mask
  • Text prompts (not currently released)

What is the structure of the model?

  • A ViT-H image encoder that runs once per image and outputs an image embedding
  • A prompt encoder that embeds input prompts such as clicks or boxes
  • A lightweight transformer-based mask decoder that predicts object masks from the image embedding and prompt embeddings

What platforms does the model use?

  • PyTorch for the image encoder
  • ONNX runtime for the prompt encoder and mask decoder, which can run on CPU or GPU across a variety of platforms

How big is the model?

  • The image encoder has 632M parameters
  • The prompt encoder and mask decoder have 4M parameters

How long does inference take?

  • The image encoder takes ~0.15 seconds on an NVIDIA A100 GPU
  • The prompt encoder and mask decoder take ~50ms on CPU in the browser using multithreaded SIMD execution

Recommended For You

More Products

Shakespeare Translator

Visit Website

Effortlessly translate between Modern English and Shakespearean English. Accurate, professional, and free.

AI TransPDF

Visit Website

AI Document Translation | supports online translation of pdf/pptx/xlsx/epub/srt/html etc.

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates