Start free. Scale when you're ready

Try it free. Ship with your team. Scale with ours.

100x Faster Development

Starter

Starts free · $25/month

with pay-as-you-go after

Build high-performing custom models faster. No need to bring your own compute.

Oumi Agent included with workspace, with a monthly credit allowance for pooled team usage
1 deployment and unlimited proxy model, billed per GPU-second
1 concurrent training job, standard GPU queue
25 GB included storage, overage at standard rate
Inference logs with 7-day retention
Community support

Start with Free Credits→$25/month after

Self-Hosted

Team

For serious teams shipping custom models · $499/month

Everything in Starter, plus the reliability and priority real workloads need.

Priority GPU queue with SLA
Up to 5 deployments and unlimited proxy models, billed per GPU-second
100 GB included storage, overage at standard rate
Inference logs with 30-day retention
Priority email support, next-business-day SLA

Start a Team plan

For Production at Scale

Enterprise

Custom Pricing

Oumi's team works alongside yours to build custom models and agents for your most critical use cases.

Dedicated experts embedded with your team
Models and agents tuned to your domain
Bespoke engagements scoped to your goals

Hosted Platform – detailed pricing

Detailed breakdown of tools, storage, training, and inference pricing.

Tools & Storage

Evaluation	1,000 judgments / $1
Data Synthesis	1,000 rows / $1
Storage	4 GB/month / $1

Training

Priced per 1M training tokens — calculated as the number of tokens in your training dataset multiplied by the number of epochs.

Model Size	Price
Up to 16B	$0.49
16.1–32B	$2.00
32.1–80B	$3.00
80.1–300B	$6.00

Deployment

Hosted inference deployments are priced per GPU hour. GPU type is subject to availability. Typical deployments run on 8 GPU nodes.

GPU Type	$/hour
A100 80GB	$3.00/hr
H100/H200 80GB/141GB	$7.00/hr

On-Policy Distillation

Priced per GPU hour. Training runs on 8 GPUs. GPU used is subject to availability.

GPU	Price
A100-80GB	$2.90
H100	$4.00

Inference for evaluation & synthesis

Model Size	Input / 1M	Output / 1M
Llama 3.1 70B	$1.00	$1.00
Llama 3.1 8B	$0.22	$0.22
Qwen2.5 7B Instruct	$0.22	$0.22
Qwen3 235B A22B Instruct	$0.25	$0.90
Qwen3.5 9B	$0.11	$0.17
Qwen3.5 397B A17B	$0.70	$4.00
Mixtral 8x7B Instruct v0.1	$0.55	$0.55
Mistral 7B Instruct v0.3	$0.22	$0.22
Kimi K2 Instruct	$0.70	$2.80
Kimi K2 Thinking	$0.70	$2.80
Kimi K2 Instruct 0905	$0.70	$2.80
gpt-oss-120b	$0.15	$0.60
DeepSeek V3.1	$0.60	$1.70
DeepSeek-V4-Pro	$1.91	$3.83
GLM-4.6	$0.60	$2.40
GLM-5	$1.10	$3.50
GLM-5.1	$1.55	$4.85
Gemma 4 31B	$0.22	$0.55

Inference is only charged when you utilize models hosted by Oumi to power an action on the platform.

Production Inference

Deploy fully fine-tuned or LoRA models for inference and pay:

Per token (auto-scales based on traffic)

Per GPU hour (you control capacity)