Built with developer experience in mind, Tensorkube simplifies the process of deploying serverless GPU apps. In this guide, we will walk you through the process of setting up Tensorkube and deploying a FastAPI app on it.


Before you begin, ensure you have the following:

  • AWS credentials setup on your machine
  • Python and pip installed on your machine

You can run the following commands to setup AWS credentials on your machine:

aws configure

or you can manually export them as environment variables:

export AWS_ACCESS_KEY_ID=your_access_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_access_key
export AWS_DEFAULT_REGION=your_default_region


First, install the tensorkube Python package:

pip install tensorkube

After this, run the following command to setup Tensorkube on your AWS account:

tensorkube configure

This is a one time setup that will create a CloudFormation stack on your AWS account. This needs to be run once per AWS account. If you are a team that is using tensorfuse, only one of the team members is required to run this command.

This command sets up a Cloudformation stack, creates a k8s cluster and deploys custom resources in order to enable serverless GPUs on your AWS account. It can take anywhere from 8 to 15 minutes to complete.

Deploying your first Tensorkube app

Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile).

Code files

Let’s create a simple FastAPI app and deploy it on Tensorkube. Before deploying your app, ensure you have a /readiness endpoint configured in your FastAPI app. Tensorkube uses this endpoint to check the health of your deployments. Given below is a simple FastAPI app that you can deploy:

from fastapi import FastAPI

app = FastAPI()

def readiness():
    return {"status": "ready"}

def read_root():
    return {"message": "Hello, World!"}

Environment files

Add your python dependencies to requirements.txt:


Next, create a Dockerfile for your FastAPI app. Given below is a simple Dockerfile that you can use:

# Use an official Python runtime as a parent image
FROM python:3.9-slim

# Set the working directory in the container

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 80 available to the world outside this container

# Run app.py when the container launches
CMD ["uvicorn", "main:app", "--host", "", "--port", "80"]

Deploying the app

This is the easiest part. Navigate to your project root and run the following command:

tensorkube deploy

Voila! Your first deployment is ready to go. You can access your app at the URL provided in the output.

Deploying with GPUs

If you want to deploy your app with GPUs, you can specify the number of GPUs you want to use in your deployment:

tensorkube deploy --gpus 1 --gpu-type a10g

The --gpu-type arguement supports all the GPU types that are available on AWS. You can find the list of supported GPU types here.

Check the status of your deployment

You can list all your deployments using the following command:

tensorkube list deployments

You can also query specific details about a particular deployment, such as the enpoint url or the status of the deployment:

tensorkube get deployment <deployment_id>

And that’s it. Your automatic GPU serverless deployment is now up and running. Enjoy!