Start free. Scale when you're ready
Try it free. Ship with your team. Scale with ours.
100x Faster Development
Starter
Starts free · $25/month
with pay-as-you-go after
Build high-performing custom models faster. No need to bring your own compute.
- Oumi Agent included with workspace, with a monthly credit allowance for pooled team usage
- 1 deployment and unlimited proxy model, billed per GPU-second
- 1 concurrent training job, standard GPU queue
- 25 GB included storage, overage at standard rate
- Inference logs with 7-day retention
- Community support
Self-Hosted
Team
For serious teams shipping custom models · $499/month
Everything in Starter, plus the reliability and priority real workloads need.
- Priority GPU queue with SLA
- Up to 5 deployments and unlimited proxy models, billed per GPU-second
- 100 GB included storage, overage at standard rate
- Inference logs with 30-day retention
- Priority email support, next-business-day SLA
For Production at Scale
Enterprise
Custom Pricing
Oumi's team works alongside yours to build custom models and agents for your most critical use cases.
- Dedicated experts embedded with your team
- Models and agents tuned to your domain
- Bespoke engagements scoped to your goals
Hosted Platform – detailed pricing
Detailed breakdown of tools, storage, training, and inference pricing.
Tools & Storage
Evaluation | 1,000 judgments / $1 |
Data Synthesis | 1,000 rows / $1 |
Storage | 4 GB/month / $1 |
Training
Priced per 1M training tokens — calculated as the number of tokens in your training dataset multiplied by the number of epochs.
| Model Size | Price |
|---|---|
Up to 16B | $0.49 |
16.1–32B | $2.00 |
32.1–80B | $3.00 |
80.1–300B | $6.00 |
Deployment
Hosted inference deployments are priced per GPU hour. GPU type is subject to availability. Typical deployments run on 8 GPU nodes.
| GPU Type | $/hour |
|---|---|
A100 80GB | $3.00/hr |
H100/H200 80GB/141GB | $7.00/hr |
On-Policy Distillation
Priced per GPU hour. Training runs on 8 GPUs. GPU used is subject to availability.
| GPU | Price |
|---|---|
A100-80GB | $2.90 |
H100 | $4.00 |
Inference for evaluation & synthesis
| Model Size | Input / 1M | Output / 1M |
|---|---|---|
Llama 3.1 70B | $1.00 | $1.00 |
Llama 3.1 8B | $0.22 | $0.22 |
Qwen2.5 7B Instruct | $0.22 | $0.22 |
Qwen3 235B A22B Instruct | $0.25 | $0.90 |
Qwen3.5 9B | $0.11 | $0.17 |
Qwen3.5 397B A17B | $0.70 | $4.00 |
Mixtral 8x7B Instruct v0.1 | $0.55 | $0.55 |
Mistral 7B Instruct v0.3 | $0.22 | $0.22 |
Kimi K2 Instruct | $0.70 | $2.80 |
Kimi K2 Thinking | $0.70 | $2.80 |
Kimi K2 Instruct 0905 | $0.70 | $2.80 |
gpt-oss-120b | $0.15 | $0.60 |
DeepSeek V3.1 | $0.60 | $1.70 |
DeepSeek-V4-Pro | $1.91 | $3.83 |
GLM-4.6 | $0.60 | $2.40 |
GLM-5 | $1.10 | $3.50 |
GLM-5.1 | $1.55 | $4.85 |
Gemma 4 31B | $0.22 | $0.55 |
Inference is only charged when you utilize models hosted by Oumi to power an action on the platform.
Production Inference
Deploy fully fine-tuned or LoRA models for inference and pay: