Built with developer experience in mind, Tensorkube simplifies the process of deploying serverless GPU apps. In this guide, we will walk you through the process of deploying SAM2 on your private cloud.

Meta recently unveiled Segment Anything Model 2 (SAM 2), a revolutionary advancement in object segmentation. SAM 2 integrates real-time, promptable object segmentation for both images and videos, enhancing accuracy and speed

Prerequisites

Before you begin, ensure you have configured Tensorkube on your AWS account. If you haven’t done that yet, follow the Getting Started guide.

Deploying SAM2 with Tensorfuse

Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile). While deploying machine learning models, it is beneficial if your model is also a part of your container image. This reduces cold-start times by a significant margin. To enable this, along with the Fast API app, we will download the model weights and make them part of the Dockerfile

Code files

We will write a small FastAPI app that takes image as input and outputs predicted labels. The FastAPI app will have three endpoints - /readiness, /, and /segment. Remember that the /readiness endpoint is used by Tensorkube to check the health of your deployments.

main.py
import torch
from fastapi import FastAPI, UploadFile, File, Form
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
import shutil
import numpy as np
import json
from PIL import Image

app = FastAPI()
checkpoint = "./checkpoints/sam2_hiera_large.pt"
model_cfg = "sam2_hiera_l.yaml"
predictor = SAM2ImagePredictor(build_sam2(model_cfg, checkpoint))

@app.get("/")
async def root():
    is_cuda_available = torch.cuda.is_available()
    return {
        "message": "Hello World-2",
        "cuda_available": is_cuda_available,
    }

@app.get("/readiness")
async def readiness():
    return {"status": "ready"}

# an inference endpoint for text generation
@app.post("/segment")
async def generate_text(data: str = Form(...), image: UploadFile = File(...)):
    with open(image.filename, 'wb') as buffer:
        shutil.copyfileobj(image.file, buffer)

    data = json.loads(data)

    with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
        img = np.array(Image.open(image.filename).convert("RGB"))
        predictor.set_image(img)
        input_point = np.array([data["point"]])
        input_label = np.array(data["label"])
        print(data["point"])
        print(data["label"])
        masks, scores, logits = predictor.predict(point_coords=input_point, 
                                                  point_labels=input_label,
                                                  multimask_output=True,)
        sorted_ind = np.argsort(scores)[::-1]
        masks = masks[sorted_ind]
        scores = scores[sorted_ind]
        logits = logits[sorted_ind]
        return {"masks": masks.tolist(), "scores": scores.tolist(), "logits": logits.tolist()}

Environment files (Dockerfile)

Next, create a Dockerfile for your FastAPI app. Given below is a simple Dockerfile that you can use:

Dockerfile
# Use the nvidia cuda base image
FROM nvidia/cuda:12.1.1-devel-ubuntu22.04
# Update and install required packages
RUN apt-get update && apt-get install -y \
    python3.10 \
    python3.10-dev \
    python3-pip \
    git \
    wget \
    && rm -rf /var/lib/apt/lists/*
# Set Python 3.10 as the default Python version
RUN ln -s /usr/bin/python3.10 /usr/bin/python
# Upgrade pip
RUN pip3 install --no-cache-dir --upgrade pip \
    && pip install transformers \
    && pip install --upgrade torch fastapi uvicorn pydantic huggingface_hub torchvision packaging setuptools python-multipart \
    && git clone https://github.com/facebookresearch/segment-anything-2.git \
    && pip install -e segment-anything-2
# Set working directory
WORKDIR /code
# Run the downloader script to download the model
RUN mkdir checkpoints \
    && cd checkpoints \
    && wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt \
    && cd ..
COPY main.py /code/main.py
EXPOSE 80
# Start a uvicorn server on port 80
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]

Deploying the app

SAM2 is now ready to be deployed on Tensorkube. Navigate to your project root and run the following command:

tensorkube deploy --gpus 1 --gpu-type a10g

SAM2 is now deployed on your AWS account. You can access your app at the URL provided in the output or using the following command:

tensorkube list deployments

And that’s it! You have successfully deployed SAM2 on serverless GPUs using Tensorkube. 🚀

To test it out you can run the following command by replacing the URL with the one provided in the output:


curl -X POST <YOUR_APP_URL_HERE>/segment -H "accept: application/json" -H "Content-Type: multipart/form-data" -F "data={\"point\": [100, 200], \"label\": [1]}" -F "image=@./img_car.jpeg"

You can also use the readiness endpoint to wake up your nodes in case you are expecting incoming traffic

curl <YOUR_APP_URL_HERE>/readiness