Pricing designed to scale well

Whether you're testing ideas or scaling enterprise workloads, pick a pricing model that fits your needs.

No infrastructure? No problem. Just use the API.

150+ models with usage-based pricing. Optimized for throughput.

Model

DeepSeek-R1

Advanced reasoning model with state-of-the-art performance

DeepSeek-V3

Efficient 671B MoE model with state-of-the-art performance

Gemma 3 4B

Compact multimodal model with large context for reasoning tasks

Gemma 3 1B

Lightweight multimodal model for text and image understanding

Llama 3.1 405B

Top-tier LLM for advanced needs

Llama 3.1 70B

High-capacity LLM for complex tasks

Llama 3.1 8B

Compact LLM with fast inference

Llama 3.3 70B

Versatile and scalable language model for complex tasks

Phi-3 128K

Lightweight SLM with extended context length

Phi-3 4K

Powerful LLM for extensive applications

Qwen2.5 72B

Cutting-edge LLM for extensive applications

Qwen2.5 7B Instruct

Instruction-tuned version of Qwen for high versatility

Qwen3 4B

Multilingual reasoning with text and code generation

Price per 1M tokens

$ 3.90
$ 0.90
$ 0.10
$ 0.06
$ 3.00
$ 0.74
$ 0.13
$ 0.74
$ 0.08
$ 0.08
$ 1.08
$ 0.30
$ 0.10

Model

Flux 1.1 [Pro]

Image generation with superior composition and artistic fidelity

Flux Dev

Next-gen model for fast, high-quality image generation

Flux.1 Kontext

Image to Image generation model, high-quality image editing

Flux 1.1 [Pro] Redux

Rapid image transformations with FLUX1.1 precision

Flux.1 [Pro] Canny

Generate detailed images with precise control and guidance

Flux.1 [Pro] Depth

Generate detailed images with precise control and guidanceaFlux.1 [Pro] Depth

SDXL

Advanced model for image generation

Price per image (1024x1024)

$ 0.05
$ 0.03
$ 0.04
$ 0.05
$ 0.05
$ 0.05
$ 0.28

Model

Whisper Large v2

Enhanced multilingual speech transcription and translation model

Whisper Large v3

Improved multilingual speech transcription and translation model

Whisper v3 Turbo

Fastest multilingual speech transcription and translation model

$ / audio minute

$ 0.0014
$ 0.0015
$ 0.0009

Dedicated GPUs for growing traffic

Ideal for production workloads. Choose a model, and we’ll recommend the optimal GPU setup with pricing.

Hardware

Nvidia T4 GPU

Nvidia L4 GPU

Nvidia A10G GPU

Nvidia A100 GPU

Nvidia H100 GPU

Nvidia H200 GPU

Nvidia B200 GPU

GPUs

1 x
1 x
1 x
1 x
1 x
1 x
1 x

Instances

1, 2, 4, 8
1, 2, 4, 8
1, 2, 4, 8
1, 2, 4, 8
1, 2, 4, 8
8
8

Cost / GPU Hour

$ 1.20
$ 1.50
$ 2.00
$ 3.00
$ 4.00
$ 5.20
Contact Us

Large-scale GPU reservations can be secured at more competitive rates than current on-demand GPU/hour pricing

Looking for a specific model?

Even if it’s not listed, you can deploy your own custom model on our platform. Just send us the details and we’ll set it up

Get Started

Pricing & Deployment Consultation

Deploy Simplismart in your private cloud, on-premises data center, or regulated environment  without compromising on performance, scalability, or security.


Our BYOC (Bring Your Own Cloud) and on-prem deployments are fully customizable to match your hardware, compliance, and operational requirements.

Run jobs on-demand. Pay-as-you-go.

Distributed inference across GPUs and nodes accelerated training.
Supporting: PEFT (LoRA, QLoRA), SFT, RFT, GRPO and DPO

Hardware

Nvidia T4 GPU

Nvidia L4 GPU

Nvidia A10G GPU

Nvidia A100 GPU

Nvidia H100 GPU

Nvidia H200 GPU

Nvidia B200 GPU

GPUs

1 x
1 x
1 x
1 x
1 x
1 x
1 x

Instances

1, 2, 4, 8
1, 2, 4, 8
1, 2, 4, 8
1, 2, 4, 8
1, 2, 4, 8
8
8

Cost / GPU Hour

$ 1.20
$ 1.50
$ 2.00
$ 3.00
$ 4.00
$ 5.20
Contact Us

Large-scale GPU reservations can be secured at more competitive rates than current on-demand GPU/hour pricing

Reserving GPU hours can unlock lower prices.

Talk to us