Pricing

Find the plan to meet your real time inference needs

API

Utilize our hosted models and pay by token consumption.

Priced varies by

Token-in

Token-out

Context window

Coming Soon

DEDICATED

Access capacity across all hyperscalers and GPU neoclouds with zero infrastructure friction.

Starting at

L40/hr

$1.20

H100/hr

$2.00

A100/hr

$1.60

H200/hr

$2.30

Solutions

Need help with devising the right solution or implementing it?

Our Forward Deployed Engineers can take you all the way.

Support

Business hour support is included for all teams.

Contact us for 24/7 Enterprise support plans.