What GPU hardware does Qube Compute use?

We deploy NVIDIA Vera Rubin R100 NVL72 — the most powerful commercially available GPU system with 1,400+ ExaFLOPS FP4 per rack and NVLink 6.0 fabric. We also offer Groq LPX for sub-10ms real-time inference.

How much does GPU cloud cost at Qube Compute?

Anchor contracts start at $14/GPU-package-hour (6-24 month terms). Cloud On-Demand is $19/hr and Spot/Night is $25/hr. Our energy cost of $0.048/kWh makes us 3x cheaper than AWS/Azure.

Is Qube Compute Sharia-compliant?

Yes. We are the world's only AFSA-certified halal GPU cloud. Our Mudaraba profit-sharing structure has zero debt (riba) and no derivatives (gharar). All payments are held in Sharia-compliant escrow at Al Hilal Bank.

Where is the data center located?

Our 8 MW Tier III TIA-942 facility is located in SEZ PIT Alatau, Almaty, Kazakhstan. The Special Economic Zone provides 0% corporate tax, VAT, and personal income tax until 2029.

How are payments protected?

All prepayments are held in escrow at Al Hilal Bank under AIFC English Common Law. Funds are released only upon verified GPU access delivery. If we fail to deliver — automatic full refund.

Developer Docs

API Documentation

Integrate with the Qube Compute GPU Cloud and Groq LPX Inference APIs. Provision bare-metal GPU instances or run sub-second LLM inference — all from a single API key.

Authentication

API Key Authentication

All requests must include your API key in the Authorization header. You can generate keys from the Control Panel under Settings → API Keys.

Authorization: Bearer qb_live_sk_7f3a2b1c9d4e...

# Base URL
https://api.qubecompute.com

GPU Cloud

GPU Instances API

Create, list, inspect, and terminate bare-metal GPU instances.

POST/v1/instances

Create a new GPU instance

Request Body

{
  "gpu_type": "H200_SXM",
  "gpu_count": 8,
  "region": "kz-almaty-1",
  "image": "ubuntu-22.04-cuda-12.4",
  "ssh_key_id": "key_8f3a2b1c"
}

Response

{
  "id": "inst_7xKp2mNqR4",
  "status": "provisioning",
  "gpu_type": "H200_SXM",
  "gpu_count": 8,
  "region": "kz-almaty-1",
  "ip_address": null,
  "created_at": "2026-09-15T08:30:00Z"
}

GET/v1/instances

List all instances

Response

{
  "data": [
    {
      "id": "inst_7xKp2mNqR4",
      "status": "running",
      "gpu_type": "H200_SXM",
      "gpu_count": 8,
      "ip_address": "185.120.44.12",
      "created_at": "2026-09-15T08:30:00Z"
    }
  ],
  "has_more": false
}

GET/v1/instances/{id}

Get instance details

Response

{
  "id": "inst_7xKp2mNqR4",
  "status": "running",
  "gpu_type": "H200_SXM",
  "gpu_count": 8,
  "region": "kz-almaty-1",
  "ip_address": "185.120.44.12",
  "image": "ubuntu-22.04-cuda-12.4",
  "ssh_key_id": "key_8f3a2b1c",
  "created_at": "2026-09-15T08:30:00Z",
  "hourly_rate": "31.68"
}

DELETE/v1/instances/{id}

Terminate an instance

Response

{
  "id": "inst_7xKp2mNqR4",
  "status": "terminating",
  "terminated_at": "2026-09-15T12:45:00Z"
}

Groq LPX

Groq LPX Inference API

Ultra-low-latency LLM inference powered by Groq LPU hardware. OpenAI-compatible endpoint — swap your base URL and go.

POST/v1/completions

Curl

curl -X POST https://api.qubecompute.com/v1/completions \
  -H "Authorization: Bearer qb_live_sk_7f3a..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in one paragraph."}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Python

import requests

response = requests.post(
    "https://api.qubecompute.com/v1/completions",
    headers={
        "Authorization": "Bearer qb_live_sk_7f3a...",
        "Content-Type": "application/json",
    },
    json={
        "model": "llama-3.1-70b",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain quantum computing in one paragraph."},
        ],
        "max_tokens": 256,
        "temperature": 0.7,
    },
)

print(response.json())

Node.js

const response = await fetch("https://api.qubecompute.com/v1/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer qb_live_sk_7f3a...",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "llama-3.1-70b",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "Explain quantum computing in one paragraph." },
    ],
    max_tokens: 256,
    temperature: 0.7,
  }),
});

const data = await response.json();
console.log(data);

Response

{
  "id": "cmpl_9qWvX3mK",
  "object": "chat.completion",
  "model": "llama-3.1-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing leverages..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 87,
    "total_tokens": 115
  }
}

POST/v1/embeddings

curl -X POST https://api.qubecompute.com/v1/embeddings \
  -H "Authorization: Bearer qb_live_sk_7f3a..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "input": "Qube Compute provides GPU cloud infrastructure."
  }'

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0154, ... ]
    }
  ],
  "model": "llama-3.1-70b",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Limits

Rate Limits

1,000

requests / minute

GPU Cloud API

Instance management endpoints (create, list, get, terminate). Burst up to 1,500 with backoff.

100

requests / minute

Groq LPX Inference

Completions and embeddings endpoints. Higher limits available on Enterprise plans.

Rate limit headers are included in every response: X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset.

SDKs

Official SDKs

SDKs coming soon. We are building first-class libraries for the most popular languages.

🐍

Python SDK

pip install qubecompute

Coming Soon

⚡

Node.js SDK

npm install @qubecompute/sdk

Coming Soon

Ready to Build?

Get your API key and start provisioning GPU instances or running inference in minutes.

Get API Key Contact Sales