Pricing designed to scale well
Whether you're testing ideas or scaling enterprise workloads, pick a pricing model that fits your needs.
No infrastructure? No problem. Just use the API.
150+ models with usage-based pricing. Optimized for throughput.
Model
DeepSeek-R1
Advanced reasoning model with state-of-the-art performance
DeepSeek-V3
Efficient 671B MoE model with state-of-the-art performance

Gemma 3 4B
Compact multimodal model with large context for reasoning tasks

Gemma 3 1B
Lightweight multimodal model for text and image understanding

Llama 3.1 405B
Top-tier LLM for advanced needs

Llama 3.1 70B
High-capacity LLM for complex tasks

Llama 3.1 8B
Compact LLM with fast inference

Llama 3.3 70B
Versatile and scalable language model for complex tasks

Phi-3 128K
Lightweight SLM with extended context length

Phi-3 4K
Powerful LLM for extensive applications

Qwen2.5 72B
Cutting-edge LLM for extensive applications

Qwen2.5 7B Instruct
Instruction-tuned version of Qwen for high versatility

Qwen3 4B
Multilingual reasoning with text and code generation
Price per 1M tokens
Model

Flux 1.1 [Pro]
Image generation with superior composition and artistic fidelity

Flux Dev
Next-gen model for fast, high-quality image generation

Flux.1 Kontext
Image to Image generation model, high-quality image editing

Flux 1.1 [Pro] Redux
Rapid image transformations with FLUX1.1 precision

Flux.1 [Pro] Canny
Generate detailed images with precise control and guidance

Flux.1 [Pro] Depth
Generate detailed images with precise control and guidanceaFlux.1 [Pro] Depth
SDXL
Advanced model for image generation
Price per image (1024x1024)
Model

Whisper Large v2
Enhanced multilingual speech transcription and translation model

Whisper Large v3
Improved multilingual speech transcription and translation model

Whisper v3 Turbo
Fastest multilingual speech transcription and translation model
$ / audio minute
Dedicated GPUs for growing traffic
Ideal for production workloads. Choose a model, and we’ll recommend the optimal GPU setup with pricing.
Hardware

Nvidia T4 GPU

Nvidia L4 GPU

Nvidia A10G GPU

Nvidia A100 GPU

Nvidia H100 GPU

Nvidia H200 GPU

Nvidia B200 GPU
GPUs
Instances
Large-scale GPU reservations can be secured at more competitive rates than current on-demand GPU/hour pricing

Looking for a specific model?
Even if it’s not listed, you can deploy your own custom model on our platform. Just send us the details and we’ll set it up
Pricing & Deployment Consultation
Deploy Simplismart in your private cloud, on-premises data center, or regulated environment without compromising on performance, scalability, or security.
Our BYOC (Bring Your Own Cloud) and on-prem deployments are fully customizable to match your hardware, compliance, and operational requirements.
Run jobs on-demand. Pay-as-you-go.
Distributed inference across GPUs and nodes accelerated training.
Supporting: PEFT (LoRA, QLoRA), SFT, RFT, GRPO and DPO
Hardware

Nvidia T4 GPU

Nvidia L4 GPU

Nvidia A10G GPU

Nvidia A100 GPU

Nvidia H100 GPU

Nvidia H200 GPU

Nvidia B200 GPU
GPUs
Instances
Large-scale GPU reservations can be secured at more competitive rates than current on-demand GPU/hour pricing
