← Back to Skills Marketplace
wang-junjian

LLM Deploy

by 军舰 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
419
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install llm-deploy
Description
在 GPU 服务器上部署 LLM 模型服务(vLLM)。支持多服务器配置,自动检查 GPU 和端口占用,一键部署流行的开源大语言模型。
README (SKILL.md)

🚀 LLM 部署技能

在 GPU 服务器上快速部署 vLLM 模型服务。

✨ 功能特点

  • 🖥️ 多服务器支持 - 配置多个 GPU 服务器,灵活选择
  • 🔍 自动检查 - 一键检查 GPU 状态和端口占用
  • 🤖 模型库 - 预置流行模型配置
  • 快速部署 - 简单命令即可启动服务

📋 快速开始

1. 配置服务器

创建 ~/.config/llm-deploy/servers.json

{
  "servers": {
    "gpu1": {
      "host": "gpu1",
      "user": "lnsoft",
      "gpu_count": 4,
      "model_path": "/data/models/llm"
    },
    "my-gpu": {
      "host": "192.168.1.100",
      "user": "ubuntu",
      "gpu_count": 2,
      "model_path": "/home/ubuntu/models"
    }
  },
  "default_server": "gpu1"
}

2. 检查服务器状态

# 使用默认服务器
llm-deploy check

# 指定服务器
llm-deploy check --server gpu1

3. 部署模型

# 部署预设模型
llm-deploy deploy deepseek-r1-32b

# 指定端口
llm-deploy deploy deepseek-r1-32b --port 8112

🎛️ 可用命令

check - 检查服务器状态

检查 GPU 显存和端口占用情况。

llm-deploy check [--server NAME] [--port PORT]

输出示例:

✅ GPU 状态正常
- 4 × Tesla T4 (15GB)
- 显存占用: 12.6GB/卡
- 温度: 51-55°C

✅ 端口 8111 可用

deploy - 部署模型

启动 vLLM 模型服务。

llm-deploy deploy \x3CMODEL_NAME> [--server NAME] [--port PORT]

支持的模型:

  • deepseek-r1-32b - DeepSeek-R1-Distill-Qwen-32B-AWQ
  • llama-3-8b - Llama 3 8B
  • qwen-7b - Qwen 7B
  • mistral-7b - Mistral 7B

list - 列出可用模型

llm-deploy list

ps - 查看运行中的服务

llm-deploy ps [--server NAME]

stop - 停止服务

llm-deploy stop [--server NAME] [--port PORT]

🔧 手动使用(无脚本)

如果不想用封装脚本,也可以直接用原始命令:

检查 GPU

ssh \x3Cuser>@\x3Chost> nvidia-smi

检查端口

ssh \x3Cuser>@\x3Chost> "lsof -i :\x3Cport> 2>/dev/null || echo '端口可用'"

部署模型(DeepSeek R1 32B)

ssh \x3Cuser>@\x3Chost> "tmux new-session -d -s vllm '
source /data/miniconda3/etc/profile.d/conda.sh && \
conda activate vllm && \
cd /data/models/llm && \
vllm serve /data/models/llm/deepseek/DeepSeek-R1-Distill-Qwen-32B-AWQ/ \
  --tensor-parallel-size 4 \
  --max-model-len 102400 \
  --dtype half \
  --port 8111 \
  --served-model-name gpt-4o-mini
'"

📦 添加自定义模型

~/.config/llm-deploy/models.json 中添加:

{
  "my-model": {
    "name": "My Awesome Model",
    "path": "/path/to/model",
    "tensor_parallel_size": 2,
    "max_model_len": 8192,
    "dtype": "half",
    "port": 8111,
    "served_model_name": "my-model"
  }
}

⚠️ 注意事项

  1. 部署前检查 - 总是先运行 check 确认资源可用
  2. 后台运行 - 建议使用 tmux/screen 保持服务运行
  3. 端口管理 - 不同模型使用不同端口
  4. 显存估算 - 7B 模型约需 8-10GB,32B 约需 10-14GB/卡

🔗 相关链接


由 OpenClaw 社区贡献 🦞

Usage Guidance
This skill is an instruction-only deployment guide that uses SSH to run commands on your machines. Before using it: (1) Do not copy or run any 'llm-deploy' script unless you have the actual script source — this package does not include an executable even though the README suggests one. (2) Understand that SSH will use your local SSH keys/agent and that the skill will read/write ~/.config/llm-deploy/*. (3) Inspect all remote commands (tmux, conda activate, vllm serve) before running them — they will execute on the remote host and can run arbitrary programs there. (4) Ensure remote hosts are trusted, reachable, and have the expected conda/vllm installation paths, model files, and permissions. (5) If you want to proceed, ask the contributor for the llm-deploy script source or a verified release, or run the provided SSH commands manually instead of blindly copying an unknown script into your PATH.
Capability Analysis
Type: OpenClaw Skill Name: llm-deploy Version: 1.0.0 The skill bundle provides documentation and instructions for an agent to manage vLLM model deployments on remote GPU servers via SSH. The instructions in SKILL.md and README.md are consistent with the stated purpose, covering server status checks, port monitoring, and model serving using standard tools like tmux and conda. No evidence of data exfiltration, malicious code execution, or harmful prompt injection was found; the use of SSH is appropriate for the tool's administrative function.
Capability Assessment
Purpose & Capability
The name/description match the instructions: this is an SSH-based how-to for deploying vLLM on GPU servers. Requesting the ssh binary is appropriate. However the README suggests an 'llm-deploy' script that users should copy into PATH, but the package contains only SKILL.md and README.md (no script). That omission is a mismatch: either the skill is instruction-only (agent runs SSH commands directly) or it is missing an executable to install.
Instruction Scope
The SKILL.md explicitly instructs the agent/user to create and read configuration files under ~/.config/llm-deploy, to run ssh to arbitrary hosts (including running nvidia-smi, lsof, and remote tmux sessions), and to invoke conda and vllm on the remote host. Those actions are within the stated deployment purpose, but they implicitly require access to SSH keys/config and will run arbitrary commands on remote machines — the instructions do not limit or warn about that beyond high-level notes.
Install Mechanism
No install spec is present (instruction-only), which is the lowest-risk install mechanism. The README's copy-to-PATH suggestion is inconsistent with the package contents (no script provided). There's no remote download or archive extraction in the skill itself.
Credentials
The skill declares no required environment variables, which is consistent, but it implicitly depends on access to local SSH credentials (private keys and SSH agent/config) and will read/write ~/.config/llm-deploy/*. Those implicit credential/config accesses are not called out in metadata. Users should be aware SSH keys/agents will be used and that the skill will create config files in their home directory.
Persistence & Privilege
always:false and no install effects are declared. The skill does instruct creating config files under the user's home directory, but it does not request system-wide changes or persistent elevated privileges. No changes to other skills or global agent config are specified.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install llm-deploy
  3. After installation, invoke the skill by name or use /llm-deploy
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
llm-deploy v1.0.0 – 初始版本发布 - 支持在多 GPU 服务器上一键部署 vLLM 开源大语言模型服务 - 自动检查服务器 GPU 状态与端口占用 - 配置与切换多台服务器,管理流行及自定义模型 - 提供模型部署、服务进程查看、服务停止等常用命令 - 附有详细快速上手与手动操作说明
Metadata
Slug llm-deploy
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is LLM Deploy?

在 GPU 服务器上部署 LLM 模型服务(vLLM)。支持多服务器配置,自动检查 GPU 和端口占用,一键部署流行的开源大语言模型。 It is an AI Agent Skill for Claude Code / OpenClaw, with 419 downloads so far.

How do I install LLM Deploy?

Run "/install llm-deploy" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is LLM Deploy free?

Yes, LLM Deploy is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does LLM Deploy support?

LLM Deploy is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created LLM Deploy?

It is built and maintained by 军舰 (@wang-junjian); the current version is v1.0.0.

💬 Comments