Model API
You have no active clusters
Yes, you get 100 free credits every month.
If you need more than 100 credits per month, upgrade to Premium for unlimited access.
1 credit is $0.01 USD
Tokens generated | Images generated | Videos generated |
---|---|---|
up to 5,000,000 tokens | up to 12,000 images | up to 250 short videos |
Above are only a few examples of what you can do with 100 credits. The Model API supports >40k open + closed models, across dozens of tasks (e.g. chat models, object detection, segmentation, video generation, and more). Build something awesome. Have fun.
Cost is based on model size.
We benchmark models for optimal GPU placement. For example, a 7 billion param model requiring 10GB RAM is placed onto a 16GB "micro" instance.
Language models (text-gen, chat, text-to-image, etc) are billed at the language models rate (e.g., $0.0000872/sec for the 7B model example), while non-language models use the "other models" rate.
Instance | $ / sec | |||
---|---|---|---|---|
size | GPU RAM | Free credits | Language models | Other models |
micro | 16 | |||
xs | 24 | |||
sm | 64 | |||
md | 96 | |||
lg | 128 | |||
xl | 192 | |||
xxl | 320 | |||
super | 640 |
When it comes to serverless, there are 2 things to think about, 1) Cost per second, and 2) how billing is calculated.
Cost per second
Compared to serverless on GCP, AWS, and Azure, Bytez offers 10x savings and throws in GPUs.
Service | vCPU | Memory (GB) | GPU (GB) | $ / sec | Savings | Billing |
---|---|---|---|---|---|---|
Bytez Model API "micro" instance | 4 | 16 | 16 | 0.0000872 | instance | |
GCP Cloud Functions (2nd Gen) | 4 | 16 | 0.000136 | 1.56x | request-based | |
GCP Cloud Run with GPU | 4 | 16 | 24 | 0.001044 | 11.97x | instance-based |
AWS Lambda | 1.79 | 10.24 | 0.0001707 | 1.96x | request-based | |
Azure Functions Premium Plan | 4 | 14 | 0.0002745 | 3.15x | instance-based |
Billing
Serverless billing on GCP, AWS, and Azure is done in 2 ways, 1) Request-based billing, and 2) Instance-based billing. Bytez offers a sweet spot in between.
Lambda and Cloud Functions follow Request-based billing, where you are billed during request execution.
Azure Functions (Premium Plan) and Cloud Run follow Instance-based billing, where you are billed for the entire instance lifecycle.
With Model API, you are billed for only a portion of the instance lifecycle. Billing starts when the instance is booted, and billing ceases when the instance receives a shut down signal.
Compared to Request-based and Instance-based billing, our billing model minimizes your costs and maximizes speed for large ML inference by avoiding expensive model reloads and reducing latency. Idle instances auto-shutdown to save costs. All of this can be configured by you, the developer (see our docs).
Bytez Model API offers serverless inference, meant for economies of scale.
Plan | $ / mn | reqs / sec |
---|---|---|
Free | 10 / sec | |
Premium | $10 | 100 / sec |
Enterprise | customized for you | unlimited |