Q
Developer Docs

API Documentation

Integrate with the Qube Compute GPU Cloud and Groq LPX Inference APIs. Provision bare-metal GPU instances or run sub-second LLM inference — all from a single API key.

Authentication

API Key Authentication

All requests must include your API key in the Authorization header. You can generate keys from the Control Panel under Settings → API Keys.

Authorization: Bearer qb_live_sk_7f3a2b1c9d4e...

# Base URL
https://api.qubecompute.com
GPU Cloud

GPU Instances API

Create, list, inspect, and terminate bare-metal GPU instances.

POST/v1/instances

Create a new GPU instance

Request Body
{
  "gpu_type": "H200_SXM",
  "gpu_count": 8,
  "region": "kz-almaty-1",
  "image": "ubuntu-22.04-cuda-12.4",
  "ssh_key_id": "key_8f3a2b1c"
}
Response
{
  "id": "inst_7xKp2mNqR4",
  "status": "provisioning",
  "gpu_type": "H200_SXM",
  "gpu_count": 8,
  "region": "kz-almaty-1",
  "ip_address": null,
  "created_at": "2026-09-15T08:30:00Z"
}
GET/v1/instances

List all instances

Response
{
  "data": [
    {
      "id": "inst_7xKp2mNqR4",
      "status": "running",
      "gpu_type": "H200_SXM",
      "gpu_count": 8,
      "ip_address": "185.120.44.12",
      "created_at": "2026-09-15T08:30:00Z"
    }
  ],
  "has_more": false
}
GET/v1/instances/{id}

Get instance details

Response
{
  "id": "inst_7xKp2mNqR4",
  "status": "running",
  "gpu_type": "H200_SXM",
  "gpu_count": 8,
  "region": "kz-almaty-1",
  "ip_address": "185.120.44.12",
  "image": "ubuntu-22.04-cuda-12.4",
  "ssh_key_id": "key_8f3a2b1c",
  "created_at": "2026-09-15T08:30:00Z",
  "hourly_rate": "31.68"
}
DELETE/v1/instances/{id}

Terminate an instance

Response
{
  "id": "inst_7xKp2mNqR4",
  "status": "terminating",
  "terminated_at": "2026-09-15T12:45:00Z"
}
Groq LPX

Groq LPX Inference API

Ultra-low-latency LLM inference powered by Groq LPU hardware. OpenAI-compatible endpoint — swap your base URL and go.

POST/v1/completions

Curl
curl -X POST https://api.qubecompute.com/v1/completions \
  -H "Authorization: Bearer qb_live_sk_7f3a..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in one paragraph."}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'
Python
import requests

response = requests.post(
    "https://api.qubecompute.com/v1/completions",
    headers={
        "Authorization": "Bearer qb_live_sk_7f3a...",
        "Content-Type": "application/json",
    },
    json={
        "model": "llama-3.1-70b",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain quantum computing in one paragraph."},
        ],
        "max_tokens": 256,
        "temperature": 0.7,
    },
)

print(response.json())
Node.js
const response = await fetch("https://api.qubecompute.com/v1/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer qb_live_sk_7f3a...",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "llama-3.1-70b",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "Explain quantum computing in one paragraph." },
    ],
    max_tokens: 256,
    temperature: 0.7,
  }),
});

const data = await response.json();
console.log(data);
Response
{
  "id": "cmpl_9qWvX3mK",
  "object": "chat.completion",
  "model": "llama-3.1-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing leverages..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 87,
    "total_tokens": 115
  }
}

POST/v1/embeddings

curl -X POST https://api.qubecompute.com/v1/embeddings \
  -H "Authorization: Bearer qb_live_sk_7f3a..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "input": "Qube Compute provides GPU cloud infrastructure."
  }'
Response
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0154, ... ]
    }
  ],
  "model": "llama-3.1-70b",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}
Limits

Rate Limits

1,000
requests / minute
GPU Cloud API

Instance management endpoints (create, list, get, terminate). Burst up to 1,500 with backoff.

100
requests / minute
Groq LPX Inference

Completions and embeddings endpoints. Higher limits available on Enterprise plans.

Rate limit headers are included in every response: X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset.
SDKs

Official SDKs

SDKs coming soon. We are building first-class libraries for the most popular languages.

🐍
Python SDK
pip install qubecompute
Coming Soon
Node.js SDK
npm install @qubecompute/sdk
Coming Soon

Ready to Build?

Get your API key and start provisioning GPU instances or running inference in minutes.