Built with developer experience in mind, Tensorkube simplifies the process of deploying serverless GPU apps. In this guide, we will walk you through the process of deploying Resnet model that runs on an A10G GPU.

Prerequisites

Before you begin, ensure you have the configured Tensorkube on your AWS account. If you haven’t done that yet, follow the Getting Started guide.

Deploying ResNet18 on Tensorfuse

Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile). While deploying machine learning models, it is beneficial if your model is also a part of your container image. This reduces cold-start times by a significant margin. To enable this, in addition to a FastAPI app and a dockerfile, we will also write code to download the model and place it in our image file.

Download the model

We will write a small script that downloads the ResNet model from the Hugging Face model hub and saves it in the ./models/resnet-18 directory.

download.py
from transformers import AutoModelForImageClassification, AutoImageProcessor

def save_model_to_local(model_name, save_directory):
    # Load the model and image processor
    model = AutoModelForImageClassification.from_pretrained(model_name)
    image_processor = AutoImageProcessor.from_pretrained(model_name)

    # Save the model and image processor
    model.save_pretrained(save_directory)
    image_processor.save_pretrained(save_directory)

if __name__ == "__main__":
    model_name = "microsoft/resnet-18"
    save_directory = "./models/resnet-18"
    save_model_to_local(model_name, save_directory)
    print(f"Model and image processor saved to {save_directory}")

Code files

We will write a small FastAPI app that loads the model and serves predictions. The FastAPI app will have three endpoints - /readiness, /gpu_check and /predict/. Remember that the /readiness endpoint is very important as it is used by Tensorkube to check the health of your deployments.

app.py
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import JSONResponse
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch
import io

# Initialize the FastAPI app
app = FastAPI()

# Load the model and image processor from local directory
model_directory = "./models/resnet-18"
image_processor = AutoImageProcessor.from_pretrained(model_directory)
model = AutoModelForImageClassification.from_pretrained(model_directory)

# Move the model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Define the inference function
def predict(image: Image.Image):
    inputs = image_processor(images=image, return_tensors="pt").to(device)
    with torch.no_grad():
        logits = model(**inputs).logits
    predicted_label = logits.argmax(-1).item()
    return model.config.id2label[predicted_label]

# Define the image upload endpoint
@app.post("/predict/")
async def predict_image(file: UploadFile = File(...)):
    contents = await file.read()
    image = Image.open(io.BytesIO(contents))
    prediction = predict(image)
    return JSONResponse({"predicted_label": prediction})

# Define the readiness endpoint
@app.get("/readiness")
async def readiness():
    return {"status": "ready"}

# Define the GPU check endpoint
@app.get("/gpu_check")
async def gpu_check():
    if torch.cuda.is_available():
        return {"gpu": "available", "gpu_name": torch.cuda.get_device_name(0)}
    else:
        return {"gpu": "not available"}

Environment files (Dockerfile)

Next, create a Dockerfile for your FastAPI app. Remember that Tensorkube assumes that your server runs on pod 80. Given below is a simple Dockerfile that you can use:

Dockerfile
# Use the NVIDIA CUDA base image
FROM nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04

# Set up the environment
ENV DEBIAN_FRONTEND=noninteractive

# Install dependencies
RUN apt-get update && apt-get install -y \
    python3-pip \
    python3-dev \
    && rm -rf /var/lib/apt/lists/*

# Install FastAPI, Uvicorn, and other dependencies
RUN pip3 install fastapi uvicorn transformers torch pillow pydantic

# Copy the application code
COPY app.py /app/app.py
COPY download.py /app/download.py

# Set the working directory
WORKDIR /app

# Download the model
RUN python3 download.py

# Expose the port
EXPOSE 80

# Run the application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "80"]

Deploying the app

ResNet is now ready to be deployed on Tensorkube. Navigate to your project root and run the following command:

tensorkube deploy --gpus 1 --gpu-type a10g --cpu 500 --memory 512

This command can take anywhere from 1 minute to about 10 minutes to complete depending on the size of the model and the complexity of your Dockerfile. You can list all options using the following command:

tensorkube deploy --help

Resnet is now deployed on your AWS account. You can access your app at the URL provided in the output or using the following command:

tensorkube list deployments

And that’s it! You have successfully deployed ResNet on serverless GPUs using Tensorkube. 🚀

To test it out you can run the following command by replacing the URL with the one provided in the output:


curl -X POST <YOUR_APP_URL_HERE> -F "file=@/Users/path/to/dog.jpeg"

You can also use the readiness endpoint to wake up your nodes in case you are expecting incoming traffic

curl <YOUR_APP_URL_HERE>/readiness

Happy Resnet-ing! 🎉