Gradient Inference
/install gradient-inference
🦞 Gradient AI — Serverless Inference
⚠️ This is an unofficial community skill, not maintained by DigitalOcean. Use at your own risk.
"Why manage GPUs when the ocean provides?" — ancient lobster proverb
Use DigitalOcean's Gradient Serverless Inference to call large language models without managing infrastructure. The API is OpenAI-compatible, so standard SDKs and patterns work — just point at https://inference.do-ai.run/v1 and swim.
Authentication
All requests need a Model Access Key in the Authorization: Bearer header.
export GRADIENT_API_KEY="your-model-access-key"
Where to get one: DigitalOcean Console → Gradient AI → Model Access Keys → Create Key.
Tools
🔍 List Available Models
Window-shop for LLMs before you swipe the card.
python3 gradient_models.py # Pretty table
python3 gradient_models.py --json # Machine-readable
python3 gradient_models.py --filter "llama" # Search by name
Use this before hardcoding model IDs — models are added and deprecated over time.
Direct API call:
curl -s https://inference.do-ai.run/v1/models \
-H "Authorization: Bearer $GRADIENT_API_KEY" | python3 -m json.tool
💬 Chat Completions
The classic. Send structured messages (system/user/assistant roles), get a response. OpenAI-compatible, so you probably already know how this works.
python3 gradient_chat.py \
--model "openai-gpt-oss-120b" \
--system "You are a helpful assistant." \
--prompt "Explain serverless inference in one paragraph."
# Different model
python3 gradient_chat.py \
--model "llama3.3-70b-instruct" \
--prompt "Write a haiku about cloud computing."
Direct API call:
curl -s https://inference.do-ai.run/v1/chat/completions \
-H "Authorization: Bearer $GRADIENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai-gpt-oss-120b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1000
}'
⚡ Responses API (Recommended)
DigitalOcean's recommended endpoint for new integrations. Simpler request format and supports prompt caching — a.k.a. "stop paying twice for the same context."
# Basic usage
python3 gradient_chat.py \
--model "openai-gpt-oss-120b" \
--prompt "Summarize this earnings report." \
--responses-api
# With prompt caching (saves cost on follow-up queries)
python3 gradient_chat.py \
--model "openai-gpt-oss-120b" \
--prompt "Now compare it to last quarter." \
--responses-api --cache
Direct API call:
curl -s https://inference.do-ai.run/v1/responses \
-H "Authorization: Bearer $GRADIENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai-gpt-oss-120b",
"input": "Explain prompt caching.",
"store": true
}'
When to use which:
| Chat Completions | Responses API | |
|---|---|---|
| Request format | Array of messages with roles | Single input string |
| Prompt caching | ❌ | ✅ via store: true |
| Multi-step tool use | Manual | Built-in |
| Best for | Structured conversations | Simple queries, cost savings |
🖼️ Generate Images
Turn text prompts into images. Because sometimes a chart isn't enough.
python3 gradient_image.py --prompt "A lobster trading stocks on Wall Street"
python3 gradient_image.py --prompt "Sunset over the NYSE" --output sunset.png
python3 gradient_image.py --prompt "Fintech logo" --json
Direct API call:
curl -s https://inference.do-ai.run/v1/images/generations \
-H "Authorization: Bearer $GRADIENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "dall-e-3",
"prompt": "A lobster analyzing candlestick charts",
"n": 1
}'
🧠 Model Selection Guide
Not all models are created equal. Choose wisely, young crustacean:
| Model | Best For | Speed | Quality | Context |
|---|---|---|---|---|
openai-gpt-oss-120b |
Complex reasoning, analysis, writing | Medium | ★★★★★ | 128K |
llama3.3-70b-instruct |
General tasks, instruction following | Fast | ★★★★ | 128K |
deepseek-r1-distill-llama-70b |
Math, code, step-by-step reasoning | Slow | ★★★★★ | 128K |
qwen3-32b |
Quick triage, short tasks | Fastest | ★★★ | 32K |
🦞 Pro tip: Cost-aware routing. Use a fast model (e.g.,
qwen3-32b) to score or triage, then only escalate to a strong model (e.g.,openai-gpt-oss-120b) when depth is needed. Enable prompt caching for repeated context.
Always run python3 gradient_models.py to check what's currently available — the menu changes.
💰 Model Pricing Lookup
Check what models cost before you rack up a bill. Scrapes the official DigitalOcean pricing page — no API key needed.
python3 gradient_pricing.py # Pretty table
python3 gradient_pricing.py --json # Machine-readable
python3 gradient_pricing.py --model "llama" # Filter by model name
python3 gradient_pricing.py --no-cache # Skip cache, fetch live
How it works:
- Fetches live pricing from DigitalOcean's docs (public page, no auth)
- Caches results for 24 hours in
/tmp/gradient_pricing_cache.json - Falls back to a bundled snapshot if the live fetch fails
🦞 Pro tip: Run
python3 gradient_pricing.py --model "gpt-oss"before choosing a model to see the cost difference betweengpt-oss-120b($0.10/$0.70) andgpt-oss-20b($0.05/$0.45) per 1M tokens.
CLI Reference
All scripts accept --json for machine-readable output.
gradient_models.py [--json] [--filter QUERY]
gradient_chat.py --prompt TEXT [--model ID] [--system TEXT]
[--responses-api] [--cache] [--temperature F]
[--max-tokens N] [--json]
gradient_image.py --prompt TEXT [--model ID] [--output PATH]
[--size WxH] [--json]
gradient_pricing.py [--json] [--model QUERY] [--no-cache]
External Endpoints
| Endpoint | Purpose |
|---|---|
https://inference.do-ai.run/v1/models |
List available models |
https://inference.do-ai.run/v1/chat/completions |
Chat Completions API |
https://inference.do-ai.run/v1/responses |
Responses API (recommended) |
https://inference.do-ai.run/v1/images/generations |
Image generation |
https://docs.digitalocean.com/.../pricing/ |
Pricing page (scraped, public) |
Security & Privacy
- All requests go to
inference.do-ai.run— DigitalOcean's own endpoint - Your
GRADIENT_API_KEYis sent as a Bearer token in the Authorization header - No other credentials or local data leave the machine
- Model Access Keys are scoped to inference only — they can't manage your DO account
- Prompt caching entries are scoped to your account and automatically expire
Trust Statement
By using this skill, prompts and data are sent to DigitalOcean's Gradient Inference API. Only install if you trust DigitalOcean with the content you send to their LLMs.
Important Notes
- Run
python3 gradient_models.pybefore assuming a model exists — they rotate - All scripts exit with code 1 and print errors to stderr on failure
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install gradient-inference - 安装完成后,直接呼叫该 Skill 的名称或使用
/gradient-inference触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Gradient Inference 是什么?
Community skill (unofficial) for DigitalOcean Gradient AI Serverless Inference. Discover available models and pricing, run chat completions or the Responses... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 727 次。
如何安装 Gradient Inference?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install gradient-inference」即可一键安装,无需额外配置。
Gradient Inference 是免费的吗?
是的,Gradient Inference 完全免费(开源免费),可自由下载、安装和使用。
Gradient Inference 支持哪些平台?
Gradient Inference 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Gradient Inference?
由 Simon DeLorean(@simondelorean)开发并维护,当前版本 v0.1.3。