Introduction

Features

Run any code on your private cloud without worrying about infrastructure.
Use GPUs that can scale down to zero and up to handle many concurrent function calls.
Run functions in custom container environments or use our optimized Machine Images.
Monitor and debug multimodal chains.
Run your code on various hardware, including GPUs (A10G, A100, H100), Trainium/Inferentia chips, TPUs, or FPGAs.
Expose your models as OpenAI compatible APIs.
Run serverless training jobs without managing conda environments or turning off ML instances.

How Does It Work?

Tensorfuse manages a Kubernetes cluster on your infrastructure. It provides the necessary custom resources and code to enable serverless GPUs. Here’s how it works:

Cluster Management: Tensorfuse monitors your current workloads and scales nodes to zero when not in use.
Function Execution: Tensorfuse brings in general purpose AMIs that let your run functions on different hardware such as GPUs, TPUs, Inferentia / Trainium chips and FPGAs.
Custom Docker Implementation: Tensorfuse includes a custom Docker implementation to build larger images (including models) with faster cold start times.
Autoscaling: An autoscaler adjusts resources based on the number of incoming HTTP requests.
Custom Networking Layer : Tensorfuse extends Istio to set up a custom networking layer, allowing you to define endpoints and communicate between functions and data sources.

The best part is that all of this is abstracted away. While working with tensorfuse, you will not be dealing with any of the concepts mentioned above.

Basics

Guides

Features

How Does It Work?

Basics

Guides

​Features

​How Does It Work?

Features

How Does It Work?