Announcing the Simplismart SDK: Deploy AI Models with Code

Authors

TABLE OF CONTENTS

Regular Item

Selected Item

Last Updated

June 19, 2026

Today we're shipping two things: the Simplismart Python SDK, and a Simplismart skill for AI Agents that makes the whole workflow conversational. SDK is for when you want model deployments living in your scripts and CI pipelines. Skill for when you'd rather describe the deployment configuration and let an agent handle the execution.

‍

Both use the same underlying battle-tested Simplismart MLOps platform. Let's install Simplismart SDK first:

pip install simplismart-sdk

‍

What the Simplismart SDK Covers

‍

Capability	What you get
Model repositories	Import models from Hugging Face, AWS S3, GCP GCS or Container Registry (Docker Hub, Depot, NVIDIA NIM, Amazon ECR)
Compile and optimize	Trigger model compilation (converting weights into an optimized inference artifact for target hardware) on Simplismart infrastructure
Deployments	Create and manage dedicated or Bring Your Own Compute (BYOC) deployments
Autoscaling	Scale GPU replicas on scaling metrics: GPU%, CPU%, latency percentiles, throughput, or concurrency
Secrets	Create and manage secrets for Hugging Face, Docker Hub, Depot, Amazon ECR and NVIDIA NIM registry credentials on the Simplismart platform

‍

Every operation has a Python method and a CLI equivalent. Pick what fits your workflow and stick with it.

‍

From HuggingFace to a Production-Ready Endpoint

‍

Here's the core workflow using the Simplismart SDK: compile Gemma 4 12B model from HuggingFace on Simplismart's H100 infrastructure, then deploy it with autoscaling.

‍

First, set your credentials. See SDK authentication doc for where to find them.

‍

export SIMPLISMART_PG_TOKEN=...
export ORG_ID=...

‍

# sdk-example.py

import os
from time import sleep

from simplismart import (
    Simplismart,
    ModelRepoCompileAvatar,
    ModelRepoCompileCreate,
    ModelRepoListParams,
    DeploymentCreate,
)

pg_token = os.getenv("SIMPLISMART_PG_TOKEN")
org_id = os.getenv("ORG_ID")
if not pg_token or not org_id:
    raise SystemExit("Set SIMPLISMART_PG_TOKEN and ORG_ID in your environment.")

client = Simplismart(pg_token=pg_token)

MODEL_NAME = "gemma-4-12B-it"

FAILED_STATUSES = {"FAILED", "FAILED_OPTIMISING", "FAILED_LAUNCHING", "ERROR", "DELETED"}

# 1. Compile Gemma 4 12B from Hugging Face, optimised for H100
client.create_model_repo_private_compile(
    ModelRepoCompileCreate(
        name=MODEL_NAME,
        avatar=ModelRepoCompileAvatar(
            image_url=f"https://ui-avatars.com/api/?background=f3f3f3&color=000000&name={MODEL_NAME}"
        ),
        source_type="huggingface",
        source_url="google/gemma-4-12B-it",
        model_class="Gemma4UnifiedForConditionalGeneration",
        accelerator_type="nvidia-h100",
    )
)

# 2. Wait until compilation finishes
while True:
    results = client.list_model_repos(
        ModelRepoListParams(org_id=org_id, name=MODEL_NAME, count=1)
    )["results"]
    if not results:
        print("Waiting for the model repo to appear ...")
        sleep(30)
        continue
    repo = results[0]
    print("Compilation status:", repo["status"])
    if repo["status"] in FAILED_STATUSES:
        raise SystemExit(f"Compilation failed: {repo['status']}")
    if repo["status"] == "SUCCESS":
        break
    sleep(30)

# 3. Deploy the compiled model on an H100 (autoscale 1–2 replicas on GPU usage)
deployment = client.create_deployment(
    DeploymentCreate(
        org=org_id,
        model_repo=repo["uuid"],
        gpu_id="nvidia-h100",
        name=MODEL_NAME,
        min_pod_replicas=1,
        max_pod_replicas=2,
        autoscale_config={"targets": [{"metric": "gpu", "target": 80}]},
    )
)
print("Endpoint:", f"https://{deployment['model_endpoint']}")

# 4. Poll health until the deployment is ready
while True:
    health = client.fetch_deployment_health(deployment_id=deployment["deployment_id"])
    status = health.get("data", "unknown")
    print("Health:", status)
    if status.startswith("FAILED") or status == "ERROR":
        raise SystemExit(f"Deployment failed: {status}")
    if status == "Healthy":
        print("Deployment is ready.")
        break
    sleep(30)

‍

‍‍The autoscale config isn't limited to GPU utilization. You can target request latency at a specific percentile, throughput (requests per second), or concurrency, whatever your SLA actually cares about. See the full autoscaling reference.

‍

For teams who prefer shell over Python, the CLI reference covers the same workflow as bash commands, which means you can drop it directly into a GitHub Actions step or any CI system that can run a shell script.

‍

Deploy with the Simplismart Skill

‍

If you use Claude Code or any other AI Agent, there's a faster path. Install the Simplismart skill from the Cookbook repository and describe what you want instead of writing SDK calls manually.

‍

If you use Claude Code or any other AI Agent, there's a faster path. The Simplismart skill lives in our Cookbook repository. Clone it and copy the skill into your Claude Code skills directory:

‍

git clone https://github.com/simpli-smart/cookbook.git
mkdir -p ~/.claude/skills/simplismart
cp cookbook/simplismart-sdk/simplismart-skill.md ~/.claude/skills/simplismart/SKILL.md

Then just describe it what you need:

‍

"Deploy moonshotai/Kimi-K2.6, min 1 replica, max 4"

‍

Your AI agent runs the full workflow against the CLI: compile, poll until ready, deploy, health check. The skill uses the same Simplismart SDK under the hood, so everything it does you can also do directly in code.

‍

Get Started with the Simplismart SDK

‍

Get your Playground token from Simplismart Settings
Set SIMPLISMART_PG_TOKEN and ORG_ID in your environment
Follow the SDK reference

‍

Your infrastructure config lives in code. There's no good reason your model deployment config shouldn't too. If you're building anything that uses AI, start using the shared endpoints today or contact the Simplismart team for a dedicated deployment tuned to your specific workload.

What the Simplismart SDK Covers

From HuggingFace to a Production-Ready Endpoint

Deploy with the Simplismart Skill

Get Started with the Simplismart SDK

Find out what is tailor-made inference for you.