Pricing

Find the plan to meet your real time inference needs

API

Utilize our hosted models and pay by token consumption.

Priced varies by

Token-in

Token-out

Context window

DEDICATED

Access capacity across all hyperscalers and GPU neoclouds with zero infrastructure friction.

Starting at

L40/hr

$1.20

H100/hr

$2.00

A100/hr

$1.60

H200/hr

$2.30

Book a demo

Book a demo

Solutions

Need help with devising the right solution or implementing it?

Our Forward Deployed Engineers can take you all the way.

Support

Business hour support is included for all teams.

Contact us for 24/7 Enterprise support plans.