← 返回 Skills 市场
huaweiclouddev

huawei-cloud-ascend-models-deploy

作者 huaweicloud-skills-team · GitHub ↗ · v0.0.1 · MIT-0
cross-platform ⚠ suspicious
36
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install huawei-cloud-ascend-models-deploy
功能描述
Huawei Cloud Ascend model deployment and testing skill for large language models on Ascend DevServer (910B series). Supports single-machine and dual-machine...
使用说明 (SKILL.md)

Huawei Cloud Ascend Models Deploy

Deploy and test large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment, model inference testing, and deployment monitoring.

Overview

This skill deploys and tests large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment for LLM, VL, Embedding, and Rerank models.

Related Skills (Agent orchestrated, no direct call, Rule 3):

  • huawei-cloud-ascend-remote-connect - SSH connection to DevServer (prerequisite for deployment)
  • huawei-cloud-ascend-command - NPU status check and monitoring (prerequisite and post-deploy monitoring)

Capabilities:

  • Model deployment (single-node, dual-node)
  • Inference testing (LLM chat, VL multimodal, Embedding, Rerank)
  • Deployment log and status monitoring
  • Model catalog and script auto-matching

Deployment Workflow (Agent orchestrated):

  1. Agent calls huawei-cloud-ascend-remote-connect to establish SSH connection
  2. Agent calls huawei-cloud-ascend-command to check NPU health and availability
  3. Agent calls this skill (huawei-cloud-ascend-models-deploy) to deploy model
  4. Agent calls huawei-cloud-ascend-command to monitor NPU status during deployment

Architecture

System Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                         Agent Orchestration                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  1. SSH connect (remote-connect)                             │    │
│  │  2. NPU health check (ascend-command)                        │    │
│  │  3. Deploy model (this skill)                                 │    │
│  │  4. Monitor NPU (ascend-command)                             │    │
│  └────────────────────────────┬────────────────────────────────┘    │
│                               │ Explicit param passing (Rule 1)    │
│                               ▼                                     │
├─────────────────────────────────────────────────────────────────────┤
│              Huawei Cloud Ascend Models Deploy                      │
│                      (Stateless, Rule 2)                            │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐    ┌──────────────────────────────────┐      │
│  │  Natural Language│    │          Deploy Helper           │      │
│  │     Commands     │───▶│  - Model Matching & Catalog      │      │
│  └──────────────────┘    │  - Script Auto-Match             │      │
│                          │  - Command Generation            │      │
│                          └──────────────────────────────────┘      │
│                                           │                         │
│          ┌─────────────────────────────────┼──────────────┐        │
│          ▼                                 ▼              ▼        │
│  ┌───────────────┐              ┌─────────────────┐ ┌────────┐    │
│  │ Model         │              │ Inference       │ │ Log    │    │
│  │ Deployment    │              │ Testing         │ │ Status │    │
│  │               │              │                 │ │        │    │
│  │ • Single-node │              │ • LLM Chat      │ │ • View │    │
│  │ • Dual-node   │              │ • VL Multimodal │ │ • Check│    │
│  │ • 910B Series │              │ • Embedding     │ │        │    │
│  └───────────────┘              │ • Rerank        │ └────────┘    │
│                                 └─────────────────┘               │
└─────────────────────────────────────────────────────────────────────┘

Agent Orchestration Flow

User request: "Deploy Qwen2.5-72B on DevServer 116.204.23.145"
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 1: SSH Connection                                 │
│   → Call huawei-cloud-ascend-remote-connect                  │
│   → Pass: host, user, password (explicit, Rule 1)            │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 2: NPU Health Check                               │
│   → Call huawei-cloud-ascend-command                         │
│   → Check: NPU list, health, HBM availability                │
│   → Fail if NPU not healthy or insufficient HBM              │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 3: Deploy Model (this skill)                      │
│   → Match model from catalog                                 │
│   → Generate deploy script                                   │
│   → Execute deployment                                        │
│   → Stateless execution (Rule 2)                             │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 4: Monitor NPU                                    │
│   → Call huawei-cloud-ascend-command                         │
│   → Monitor: HBM usage, temperature, processes               │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
      Deployment Complete

Related Skills Table

Skill Purpose Orchestration Stage
huawei-cloud-ascend-remote-connect SSH connection Pre-deploy: Establish connection to DevServer
huawei-cloud-ascend-command NPU management Pre-deploy: Health check; Post-deploy: Monitoring

Note: No direct calls between Skills. All orchestration by Agent based on user intent (Rule 3).

Prerequisites

Prerequisite check: Ascend 910B series required

  • Supported: 910B1, 910B2, 910B3, 910B4
  • Unsupported: 910A, 310, 310P, etc.
  • Check with: npu-smi info

Mandatory Rules (AI Must Follow)

  1. Never guess commands from memory — Must read "Deploy Script Auto-Match" section
  2. Must call deploy_helper.py first — Confirm model category and script URL
  3. Different models use different scripts:
    • LLM / Embedding / Rerank → deploy-large-models.sh
    • VL → deploy-qwen3-vl-model.sh
    • OpenSource → deploy-ai-models.sh
  4. Must validate before deployment — Port, NPU, model, card count
  5. Show command and wait for confirmation — Sensitive operation, never execute directly

Natural Language Understanding Rules

Extract key information from user natural language and assemble commands accurately.

Operation Type Detection

Keywords Operation
deploy / start / launch Single-machine deployment
dual-machine / two-node / dual-node Dual-machine deployment
test / inference / call Test (execute)
write command / generate command Write test command (generate only, no execute)
deployment log / view log View deployment log
deployment status / is ready View deployment status
model list / supported models Show model catalog
parameter help / API parameters Show parameter manual

Information Extraction Rules

Model Name (fuzzy match, case-insensitive, supports card count filter):

  • "qwen3-14b" → Qwen3-14B
  • "qwen3-235b" → Multiple matches, prefer Instruct version (Qwen3-235B-A22B-Instruct-2507), or ask user
  • "vl-32b" → Qwen3-VL-32B-Instruct
  • "bge-m3" → bge-m3
  • "qwen3-vl" + 2 cards → Match VL models with ≤2 cards, list for user to choose
  • "qwen3" + 2 cards → Match all Qwen3 models with ≤2 cards, list for user to choose
  • Multiple candidates → List all candidates (with card count and category), let user confirm
  • No match → Show full model catalog for user to select

Card Count:

  • "2 cards" / "use 2 cards" / "2 npus" → 2
  • "16 cards" / "16 npus" → 16
  • "dual-machine" → 16
  • Not specified → Use minimum card count from model catalog

Port:

  • "port 8022" / "port:8022" → 8022
  • Not specified → Default 8080

Missing Parameters (check each, prompt what is missing):

  • Missing model name → "Please specify model name" + show model list
  • Missing card count → "Please specify card count, e.g.: 2 cards" + show minimum cards for this model
  • Missing port → "Please specify port (default 8080), e.g.: port 8001"
  • Dual-machine missing head IP → "Please specify head node IP, e.g.: head:192.168.1.1"
  • Dual-machine missing worker IP → "Please specify worker node IP, e.g.: worker:192.168.1.2"

Head/Worker IP (dual-machine deployment):

  • "head:1.1.1.1" / "head node 1.1.1.1" → Head node IP
  • "worker:2.2.2.2" / "worker node 2.2.2.2" → Worker node IP

Prompt:

  • "prompt:hello" / "ask:hello" → Prompt text
  • Not specified → LLM default "hello", VL default "describe the image", Embedding default "I love shanghai", Rerank default "What is the capital of France?"

Image URL (VL test):

  • "image:https://xxx.jpg" / direct URL → Image URL
  • User sends image attachment → Auto-convert to base64 data URL
  • Not specified and testing multimodal model → Prompt user for image URL

Multimodal Capability Auto-Detection:

  • VL category → Supports multimodal
  • OpenSource: Qwen3.6-35B-A3B, Qwen3.6-27B → Supports multimodal
  • LLM category → Text only
  • Embedding → Text only
  • Rerank → Text only

Image URL Conversion (local image → data URL):

# Efficient base64 conversion
IMG_B64=$(base64 -w 0 ${local_image_path})
IMG_URL="data:image/jpeg;base64,${IMG_B64}"

Advanced Parameters (optional):

  • "max_tokens:64" → max_tokens=64
  • "temperature:0.7" → temperature=0.7
  • "stream" → stream=true
  • "system:You are assistant" → system_prompt
  • "disable thinking" / "no thinking" → chat_template_kwargs: {"enable_thinking": false}
  • (Default = thinking mode enabled)

Thinking Mode: Qwen3/Qwen3.6 models default to thinking mode, outputting reasoning process before final response.

  • Enable thinking: Higher quality, more token consumption
  • Disable thinking: Direct output, less token consumption, suitable for simple queries
  • Request-level control via "chat_template_kwargs": {"enable_thinking": false/true}

Supported Machine Types

Only Ascend 910B series (910B1 / 910B2 / 910B3 / 910B4). Must check NPU model before deployment, reject non-910B series.


Model Catalog

Large Language Models (LLM) — Endpoint: /v1/chat/completions

Model Min Cards
Qwen3-14B 1
Qwen3-30B-A3B-Instruct-2507 2
Qwen3-32B 2
Qwen3-235B-A22B-Thinking-2507 16
Qwen3-235B-A22B-Instruct-2507 16
DeepSeek-R1-Distill-Llama-70B 4

Vision-Language (VL) — Endpoint: /v1/chat/completions

Model Min Cards
Qwen3-VL-30B-A3B-Instruct 2
Qwen3-VL-32B-Instruct 2
Qwen3-VL-235B-A22B-Instruct 16
Qwen3-VL-235B-A22B-Instruct-W8A8 8

Embedding — Endpoint: /v1/embeddings (V0 backend only, single card only)

Model Min Cards Multi-card
Qwen3-Embedding-8B 1 No
bge-large-zh-v1.5 1 No
bge-m3 1 No

Rerank — Endpoint: /v1/rerank (single card only)

Model Min Cards Multi-card
Qwen3-Reranker-8B 1 No
bge-reranker-v2-m3 1 No

OpenSource (Multimodal)

Model Min Cards Capability
Qwen3.6-35B-A3B 2 Text + Image (MoE)
Qwen3.6-27B 2 Text + Image (MoE)
Qwen3-Next-80B-A3B-Instruct 4 Large language model
DeepSeek-V4-Flash-w8a8-mtp 8 Large language model

Deploy Script Auto-Match (Must use, never guess script URL)

Script Path: scripts/deploy_helper.py

Match Rules (hardcoded, 100% accurate):

Model Category Deploy Script Notes
LLM deploy-large-models.sh Shared with Embedding/Rerank
Embedding deploy-large-models.sh Same as above
Rerank deploy-large-models.sh Same as above
VL deploy-qwen3-vl-model.sh Multimodal specific
OpenSource deploy-ai-models.sh OpenSource specific

Usage:

# Match model (returns category, script URL, min cards, etc.)
python3 scripts/deploy_helper.py match \x3Cmodel_name>

# Generate deploy command directly
python3 scripts/deploy_helper.py command \x3Cmodel_name> \x3Ccards> \x3Cport>

# List all models (optional category filter)
python3 scripts/deploy_helper.py list [LLM|VL|Embedding|Rerank|OpenSource]

AI must call deploy_helper.py match first to confirm category and script, then use returned deploy_url to assemble command. Never guess from memory!


Core Commands

Core commands for model deployment and testing. See Operation Flow for detailed steps.

Command Description
deploy \x3Cmodel> \x3Cport> Deploy model on single machine
deploy \x3Cmodel> \x3Cport> \x3Ccards> Deploy with specified card count
dual-machine deploy \x3Cmodel> head:\x3CIP> worker:\x3CIP> port:\x3CPORT> Deploy on dual-machine cluster
test \x3Cmodel> \x3Cport> Test model inference
deployment log View deployment log
deployment status Check deployment status
model list Show supported models

Operation Flow

I. Deployment

1. Pre-deployment Check (Must execute every time, cannot skip)

Check in order, stop if any fails:

  1. NPU Model Check — Agent calls huawei-cloud-ascend-command to check chip model, reject non-910B series
  2. NPU Card Count Check — Agent calls huawei-cloud-ascend-command to check available cards, confirm >= required cards
  3. User Card Count Check — User-specified cards must be >= minimum and within supported range (1,2,4,8,16)
  4. Embedding/Rerank Single Card Check — Embedding and Rerank only support single card, reject multi-card
  5. Port Occupancy Check — Agent calls huawei-cloud-ascend-remote-connect to run ss -tlnp | grep :port, notify if occupied
  6. SSH Connectivity Check — For dual-machine, verify both head and worker nodes are SSH accessible

2. Single-machine Deployment

User says: "deploy model_name port XXXX" or "deploy model_name port XXXX N cards"

Before deploying, must SSH execute mkdir -p /home/modelarts-agent to ensure directory exists.

LLM / Embedding / Rerank Command Template:

nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/deploy-large-models.sh && chmod 755 /home/modelarts-agent/deploy-large-models.sh && sh /home/modelarts-agent/deploy-large-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &

VL Multimodal Command Template:

nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/single-machine/deploy-qwen3-vl-model.sh && chmod 755 /home/modelarts-agent/deploy-qwen3-vl-model.sh && sh /home/modelarts-agent/deploy-qwen3-vl-model.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &

OpenSource Command Template:

nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/open_source/deploy-ai-models.sh && chmod 755 /home/modelarts-agent/deploy-ai-models.sh && sh /home/modelarts-agent/deploy-ai-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &

3. Dual-machine Deployment

User says: "dual-machine deploy model_name head:IP worker:IP port XXXX"

Before dual-machine deploy, both head and worker nodes need mkdir -p /home/modelarts-agent.

Head Node Command Template:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &

Worker Node Command Template:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &

VL Dual-machine Deployment:

For VL models (Qwen3-VL-235B-A22B-Instruct, etc.), use the following scripts:

VL Head Node Command:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &

VL Worker Node Command:

nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &

4. Deployment Confirmation Flow

Sensitive operation, must show full command and wait for user "confirm" before executing.

After deploy command sent:

  1. Notify user: Ready, starting deployment of ${model}, log at /home/modelarts-agent/deploy_${model}.log
  2. Check log every 2 minutes, report progress (loading weights, Dynamo compiling, service starting, etc.)
  3. When port is listening, notify deployment success
  4. Deployment failure handling (strict compliance):
    • Deployment failed = Report failure reason, no automatic retry
    • Never auto-change image and retry
    • Never auto-modify parameters and retry
    • Never try other deployment methods
    • Only report error, let user decide next step
  5. Output API sample for user:
Deployment successful! ${model} is ready

Service URL: http://${IP}:${PORT}/v1/chat/completions

Example request:
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"user","content":"hello"}],"max_tokens":256}'

Multimodal request (if supported):
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"image_url"}},{"type":"text","text":"describe the image"}]}],"max_tokens":512}'

II. Deployment Log

User says: "deployment log model_name"

Agent uses huawei-cloud-ascend-remote-connect to execute:

tail -50 /home/modelarts-agent/deploy_${model}.log

III. Deployment Status

User says: "deployment status port XXXX"

Agent uses huawei-cloud-ascend-remote-connect to execute:

ss -tlnp | grep :

Port listening = Service ready for testing.


IV. Test (Execute)

User says: "test model_name prompt:xxx" or "test model_name image:URL"

Test flow (strict compliance):

  1. Show full curl command for user to review
  2. Wait for user "confirm" or "send" before executing
  3. Structured result output:
Test Result

| Field | Value |
|-------|-------|
| id | chatcmpl-xxx |
| model | Qwen3-VL-32B-Instruct |
| prompt_tokens | 93 |
| completion_tokens | 400 |
| total_tokens | 493 |
| finish_reason | stop |

Model Response:
[Extract full content, no truncation]

Raw Response:
[Full JSON, no truncation]

LLM Chat Completions

curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"user","content":"${prompt}"}],"max_tokens":1024,"temperature":0.7}'

Multimodal VL

curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":[{"type":"image_url","image_url":{"url":"${image_url}"}},{"type":"text","text":"${prompt}"}]}],"max_tokens":512,"temperature":0.7}'

Embedding

curl -s -X POST http://${IP}:${PORT}/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","input":"${text}"}'

Rerank

curl -s -X POST http://${IP}:${PORT}/v1/rerank \
  -H 'Content-Type: application/json' \
  -d '{"model":"${model}","query":"${query}","documents":["${doc1}","${doc2}"]}'

V. Write Test Command (Generate Only)

User says: "write test command model_name prompt:xxx"

Same logic as "test", but only output command text, no execution.


API Parameter Manual

LLM Parameters (/v1/chat/completions)

Parameter Required Default Description
model Yes Model name, same as deployment
messages Yes Message list, each with role and content
max_tokens No 16 Max generation tokens
temperature No 1.0 Sampling randomness, 0=greedy
top_p No 1.0 Nucleus sampling threshold
top_k No -1 Only consider top-K tokens
stream No false Streaming output (SSE)
chat_template_kwargs No {} Template params, e.g. {"enable_thinking": false}

VL Extra Parameters

Parameter Description
content[] Array format: image_url object + text object
detail Image precision: auto/high/low

Embedding Parameters (/v1/embeddings)

Parameter Required Description
model Yes Model name
input Yes String or string list
encoding_format No float/base64

Rerank Parameters (/v1/rerank)

Parameter Required Description
model Yes Model name
query Yes Query text
documents Yes Document list to rerank
top_n No Return top N

Execution Mode

This skill operates in stateless mode (Rule 2). All context (host, credentials, model info) must be explicitly passed by Agent (Rule 1).

Prerequisites (Agent orchestrated)

Before calling this skill, Agent MUST:

  1. Establish SSH connection using huawei-cloud-ascend-remote-connect

    • Agent receives: host, port, user, password from user
    • Agent validates connection is successful
  2. Check NPU status using huawei-cloud-ascend-command

    • Agent checks: NPU health, HBM availability
    • Agent validates: sufficient cards for model deployment

Skill Execution

This skill receives explicit parameters from Agent:

# Model matching (local operation)
python3 scripts/deploy_helper.py match \x3Cmodel_name>

# Script URL generation (local operation)
python3 scripts/deploy_helper.py script \x3Cmodel_name>

# Deploy command generation (local operation)
python3 scripts/deploy_helper.py command \x3Cmodel> \x3Ccards> \x3Cport>

Remote Deployment Execution

Agent executes deployment commands on remote server:

# Agent uses SSH to execute deployment on DevServer
ssh root@\x3Chost> "cd /path/to/model && bash deploy.sh"

Post-Deployment (Agent orchestrated)

After deployment, Agent calls huawei-cloud-ascend-command to:

  • Monitor NPU HBM usage
  • Check deployment process status
  • Verify model endpoint is responding

Parameter Flow

User Input                    Agent                      This Skill
    │                          │                            │
    │ host, password           │                            │
    ├─────────────────────────▶│                            │
    │                          │ SSH connect                │
    │                          ├───────────────────────────▶│
    │                          │                            │ (remote-connect)
    │                          │◀───────────────────────────┤
    │                          │                            │
    │                          │ NPU check                  │
    │                          ├───────────────────────────▶│
    │                          │                            │ (ascend-command)
    │                          │◀───────────────────────────┤
    │                          │                            │
    │ model_name, cards        │                            │
    ├─────────────────────────▶│                            │
    │                          │ match model                │
    │                          ├───────────────────────────▶│
    │                          │                            │ deploy_helper.py
    │                          │◀───────────────────────────┤
    │                          │                            │
    │                          │ execute deploy             │
    │                          ├───────────────────────────▶│
    │                          │                            │ (via SSH)
    │                          │◀───────────────────────────┤
    │                          │                            │
    │                          │ monitor NPU                │
    │                          ├───────────────────────────▶│
    │                          │                            │ (ascend-command)
    │                          │◀───────────────────────────┤
    │                          │                            │
    ▼                          ▼                            ▼

Note: No direct skill-to-skill calls. All orchestration by Agent (Rule 3).


References

Document Description
task-deploy-model.md Deployment task steps
task-test-model.md Testing task steps
model-catalog.md Complete model catalog
api-parameters.md API parameter reference
prerequisites.md Prerequisites checklist
verification-method.md Verification steps
troubleshooting.md Troubleshooting guide
scripts/deploy_helper.py Model matching helper
安全使用建议
Install only if you trust the publisher and the referenced Huawei Cloud deployment scripts. Before running generated commands, inspect or pin the downloaded scripts, prefer a least-privileged deployment account, confirm the target host and port, and make sure there is a clear stop/rollback process for services started under /home/modelarts-agent.
能力评估
Purpose & Capability
The declared purpose is model deployment, testing, logs, and status on Ascend DevServer, and the helper script/model catalog align with that purpose.
Instruction Scope
Runtime instructions include broad deployment/test triggers and generate shell commands for remote infrastructure; the skill requires confirmation before execution, but command scope is high-impact.
Install Mechanism
No install-time package execution is shown, but runtime deployment templates fetch shell scripts from a Huawei Cloud OBS URL and execute them with sh.
Credentials
The requested environment authority is substantial: SSH-orchestrated remote execution, /home/modelarts-agent writes, chmod, background nohup services, port checks, and examples using root SSH.
Persistence & Privilege
Deployments are intentionally long-running background services and leave downloaded scripts/logs under /home/modelarts-agent; this is purpose-aligned but should be explicitly controlled and reversible.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install huawei-cloud-ascend-models-deploy
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /huawei-cloud-ascend-models-deploy 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.0.1
Initial release of Huawei Cloud Ascend Models Deploy skill for LLM and multimodal deployment and testing. - Deploy and test large language models (LLM, VL, Embedding, Rerank) on Ascend DevServer (910B series) in single- or dual-machine mode. - Provides model inference testing, deployment log viewing, and deployment status monitoring. - Automates model matching and generates appropriate deployment scripts. - Lists supported models and checks deployment prerequisites. - Designed for orchestrated use with related SSH connection and NPU monitoring skills.
元数据
Slug huawei-cloud-ascend-models-deploy
版本 0.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

huawei-cloud-ascend-models-deploy 是什么?

Huawei Cloud Ascend model deployment and testing skill for large language models on Ascend DevServer (910B series). Supports single-machine and dual-machine... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 36 次。

如何安装 huawei-cloud-ascend-models-deploy?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install huawei-cloud-ascend-models-deploy」即可一键安装,无需额外配置。

huawei-cloud-ascend-models-deploy 是免费的吗?

是的,huawei-cloud-ascend-models-deploy 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

huawei-cloud-ascend-models-deploy 支持哪些平台?

huawei-cloud-ascend-models-deploy 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 huawei-cloud-ascend-models-deploy?

由 huaweicloud-skills-team(@huaweiclouddev)开发并维护,当前版本 v0.0.1。

💬 留言讨论