Model API

API Key

Serverless Clusters

You have no active clusters

Events

FAQ

Can I use the API for free?

Yes, you get 100 free credits every month.

If you need more than 100 credits per month, upgrade to Premium for unlimited access.

What's a credit?

1 credit is $0.01 USD

How much inference is 100 free credits?

Tokens generatedImages generatedVideos generated
up to 5,000,000 tokensup to 12,000 imagesup to 250 short videos

Above are only a few examples of what you can do with 100 credits. The Model API supports >40k open + closed models, across dozens of tasks (e.g. chat models, object detection, segmentation, video generation, and more). Build something awesome. Have fun.

What does inference cost?

Cost is based on model size.

We benchmark models for optimal GPU placement. For example, a 7 billion param model requiring 10GB RAM is placed onto a 16GB "micro" instance.

Language models (text-gen, chat, text-to-image, etc) are billed at the language models rate (e.g., $0.0000872/sec for the 7B model example), while non-language models use the "other models" rate.

Instance$ / sec
sizeGPU RAMFree creditsLanguage modelsOther models
micro16
xs24
sm64
md96
lg128
xl192
xxl320
super640

Can you help me understand your Serverless billing?

When it comes to serverless, there are 2 things to think about, 1) Cost per second, and 2) how billing is calculated.

Cost per second

Compared to serverless on GCP, AWS, and Azure, Bytez offers 10x savings and throws in GPUs.

ServicevCPUMemory (GB)GPU (GB)$ / secSavingsBilling
Bytez Model API "micro" instance416160.0000872instance
GCP Cloud Functions (2nd Gen)4160.0001361.56xrequest-based
GCP Cloud Run with GPU416240.00104411.97xinstance-based
AWS Lambda1.7910.240.00017071.96xrequest-based
Azure Functions Premium Plan4140.00027453.15xinstance-based

Billing

Serverless billing on GCP, AWS, and Azure is done in 2 ways, 1) Request-based billing, and 2) Instance-based billing. Bytez offers a sweet spot in between.

Lambda and Cloud Functions follow Request-based billing, where you are billed during request execution.

Azure Functions (Premium Plan) and Cloud Run follow Instance-based billing, where you are billed for the entire instance lifecycle.

With Model API, you are billed for only a portion of the instance lifecycle. Billing starts when the instance is booted, and billing ceases when the instance receives a shut down signal.

Compared to Request-based and Instance-based billing, our billing model minimizes your costs and maximizes speed for large ML inference by avoiding expensive model reloads and reducing latency. Idle instances auto-shutdown to save costs. All of this can be configured by you, the developer (see our docs).

Bytez Model API offers serverless inference, meant for economies of scale.

Learn more about credits and billing in our docsRead docs
Plan$ / mnreqs / sec
Free10 / sec
Premium$10100 / sec
Enterprisecustomized for youunlimited