Today we're shipping two things: the Simplismart Python SDK, and a Simplismart skill for AI Agents that makes the whole workflow conversational. SDK is for when you want model deployments living in your scripts and CI pipelines. Skill for when you'd rather describe the deployment configuration and let an agent handle the execution.
Both use the same underlying battle-tested Simplismart MLOps platform. Let's install Simplismart SDK first:
pip install simplismart-sdk
What the Simplismart SDK Covers
Every operation has a Python method and a CLI equivalent. Pick what fits your workflow and stick with it.
From HuggingFace to a Production-Ready Endpoint
Here's the core workflow using the Simplismart SDK: compile Gemma 4 12B model from HuggingFace on Simplismart's H100 infrastructure, then deploy it with autoscaling.
First, set your credentials. See SDK authentication doc for where to find them.
export SIMPLISMART_PG_TOKEN=...
export ORG_ID=...
# sdk-example.py
import os
from time import sleep
from simplismart import (
Simplismart,
ModelRepoCompileAvatar,
ModelRepoCompileCreate,
ModelRepoListParams,
DeploymentCreate,
)
pg_token = os.getenv("SIMPLISMART_PG_TOKEN")
org_id = os.getenv("ORG_ID")
if not pg_token or not org_id:
raise SystemExit("Set SIMPLISMART_PG_TOKEN and ORG_ID in your environment.")
client = Simplismart(pg_token=pg_token)
MODEL_NAME = "gemma-4-12B-it"
FAILED_STATUSES = {"FAILED", "FAILED_OPTIMISING", "FAILED_LAUNCHING", "ERROR", "DELETED"}
# 1. Compile Gemma 4 12B from Hugging Face, optimised for H100
client.create_model_repo_private_compile(
ModelRepoCompileCreate(
name=MODEL_NAME,
avatar=ModelRepoCompileAvatar(
image_url=f"https://ui-avatars.com/api/?background=f3f3f3&color=000000&name={MODEL_NAME}"
),
source_type="huggingface",
source_url="google/gemma-4-12B-it",
model_class="Gemma4UnifiedForConditionalGeneration",
accelerator_type="nvidia-h100",
)
)
# 2. Wait until compilation finishes
while True:
results = client.list_model_repos(
ModelRepoListParams(org_id=org_id, name=MODEL_NAME, count=1)
)["results"]
if not results:
print("Waiting for the model repo to appear ...")
sleep(30)
continue
repo = results[0]
print("Compilation status:", repo["status"])
if repo["status"] in FAILED_STATUSES:
raise SystemExit(f"Compilation failed: {repo['status']}")
if repo["status"] == "SUCCESS":
break
sleep(30)
# 3. Deploy the compiled model on an H100 (autoscale 1–2 replicas on GPU usage)
deployment = client.create_deployment(
DeploymentCreate(
org=org_id,
model_repo=repo["uuid"],
gpu_id="nvidia-h100",
name=MODEL_NAME,
min_pod_replicas=1,
max_pod_replicas=2,
autoscale_config={"targets": [{"metric": "gpu", "target": 80}]},
)
)
print("Endpoint:", f"https://{deployment['model_endpoint']}")
# 4. Poll health until the deployment is ready
while True:
health = client.fetch_deployment_health(deployment_id=deployment["deployment_id"])
status = health.get("data", "unknown")
print("Health:", status)
if status.startswith("FAILED") or status == "ERROR":
raise SystemExit(f"Deployment failed: {status}")
if status == "Healthy":
print("Deployment is ready.")
break
sleep(30)
The autoscale config isn't limited to GPU utilization. You can target request latency at a specific percentile, throughput (requests per second), or concurrency, whatever your SLA actually cares about. See the full autoscaling reference.
For teams who prefer shell over Python, the CLI reference covers the same workflow as bash commands, which means you can drop it directly into a GitHub Actions step or any CI system that can run a shell script.
Deploy with the Simplismart Skill
If you use Claude Code or any other AI Agent, there's a faster path. Install the Simplismart skill from the Cookbook repository and describe what you want instead of writing SDK calls manually.
If you use Claude Code or any other AI Agent, there's a faster path. The Simplismart skill lives in our Cookbook repository. Clone it and copy the skill into your Claude Code skills directory:
git clone https://github.com/simpli-smart/cookbook.git
mkdir -p ~/.claude/skills/simplismart
cp cookbook/simplismart-sdk/simplismart-skill.md ~/.claude/skills/simplismart/SKILL.md
Then just describe it what you need:
"Deploy moonshotai/Kimi-K2.6, min 1 replica, max 4"
Your AI agent runs the full workflow against the CLI: compile, poll until ready, deploy, health check. The skill uses the same Simplismart SDK under the hood, so everything it does you can also do directly in code.
Get Started with the Simplismart SDK
- Get your Playground token from Simplismart Settings
- Set SIMPLISMART_PG_TOKEN and ORG_ID in your environment
- Follow the SDK reference
Your infrastructure config lives in code. There's no good reason your model deployment config shouldn't too. If you're building anything that uses AI, start using the shared endpoints today or contact the Simplismart team for a dedicated deployment tuned to your specific workload.






