Oumi AI

Start free. Scale when you're ready

Try it free. Ship with your team. Scale with ours.

100x Faster Development

Starter

Starts free · $25/month

with pay-as-you-go after

Build high-performing custom models faster. No need to bring your own compute.

  • Oumi Agent included with workspace, with a monthly credit allowance for pooled team usage
  • 1 deployment and unlimited proxy model, billed per GPU-second
  • 1 concurrent training job, standard GPU queue
  • 25 GB included storage, overage at standard rate
  • Inference logs with 7-day retention
  • Community support

Self-Hosted

Team

For serious teams shipping custom models · $499/month

Everything in Starter, plus the reliability and priority real workloads need.

  • Priority GPU queue with SLA
  • Up to 5 deployments and unlimited proxy models, billed per GPU-second
  • 100 GB included storage, overage at standard rate
  • Inference logs with 30-day retention
  • Priority email support, next-business-day SLA

For Production at Scale

Enterprise

Custom Pricing

Oumi's team works alongside yours to build custom models and agents for your most critical use cases.

  • Dedicated experts embedded with your team
  • Models and agents tuned to your domain
  • Bespoke engagements scoped to your goals

Hosted Platform – detailed pricing

Detailed breakdown of tools, storage, training, and inference pricing.

Tools & Storage

Evaluation
1,000 judgments / $1
Data Synthesis
1,000 rows / $1
Storage
4 GB/month / $1

Training

Priced per 1M training tokens — calculated as the number of tokens in your training dataset multiplied by the number of epochs.

Model SizePrice
Up to 16B
$0.49
16.1–32B
$2.00
32.1–80B
$3.00
80.1–300B
$6.00

Deployment

Hosted inference deployments are priced per GPU hour. GPU type is subject to availability. Typical deployments run on 8 GPU nodes.

GPU Type$/hour
A100 80GB
$3.00/hr
H100/H200 80GB/141GB
$7.00/hr

On-Policy Distillation

Priced per GPU hour. Training runs on 8 GPUs. GPU used is subject to availability.

GPUPrice
A100-80GB
$2.90
H100
$4.00

Inference for evaluation & synthesis

Model SizeInput / 1MOutput / 1M
Llama 3.1 70B
$1.00
$1.00
Llama 3.1 8B
$0.22
$0.22
Qwen2.5 7B Instruct
$0.22
$0.22
Qwen3 235B A22B Instruct
$0.25
$0.90
Qwen3.5 9B
$0.11
$0.17
Qwen3.5 397B A17B
$0.70
$4.00
Mixtral 8x7B Instruct v0.1
$0.55
$0.55
Mistral 7B Instruct v0.3
$0.22
$0.22
Kimi K2 Instruct
$0.70
$2.80
Kimi K2 Thinking
$0.70
$2.80
Kimi K2 Instruct 0905
$0.70
$2.80
gpt-oss-120b
$0.15
$0.60
DeepSeek V3.1
$0.60
$1.70
DeepSeek-V4-Pro
$1.91
$3.83
GLM-4.6
$0.60
$2.40
GLM-5
$1.10
$3.50
GLM-5.1
$1.55
$4.85
Gemma 4 31B
$0.22
$0.55

Inference is only charged when you utilize models hosted by Oumi to power an action on the platform.

Production Inference

Deploy fully fine-tuned or LoRA models for inference and pay:

Per token (auto-scales based on traffic)
Per GPU hour (you control capacity)
Contact us for pricing

Frequently asked questions

Sign up today with a corporate email for $50 in credits, or a personal email for $25.