huawei-cloud-ascend-models-deploy
/install huawei-cloud-ascend-models-deploy
Huawei Cloud Ascend Models Deploy
Deploy and test large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment, model inference testing, and deployment monitoring.
Overview
This skill deploys and tests large language models on Huawei Cloud Ascend DevServer (910B series). Supports single-machine and dual-machine deployment for LLM, VL, Embedding, and Rerank models.
Related Skills (Agent orchestrated, no direct call, Rule 3):
huawei-cloud-ascend-remote-connect- SSH connection to DevServer (prerequisite for deployment)huawei-cloud-ascend-command- NPU status check and monitoring (prerequisite and post-deploy monitoring)
Capabilities:
- Model deployment (single-node, dual-node)
- Inference testing (LLM chat, VL multimodal, Embedding, Rerank)
- Deployment log and status monitoring
- Model catalog and script auto-matching
Deployment Workflow (Agent orchestrated):
- Agent calls
huawei-cloud-ascend-remote-connectto establish SSH connection - Agent calls
huawei-cloud-ascend-commandto check NPU health and availability - Agent calls this skill (
huawei-cloud-ascend-models-deploy) to deploy model - Agent calls
huawei-cloud-ascend-commandto monitor NPU status during deployment
Architecture
System Architecture Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ Agent Orchestration │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ 1. SSH connect (remote-connect) │ │
│ │ 2. NPU health check (ascend-command) │ │
│ │ 3. Deploy model (this skill) │ │
│ │ 4. Monitor NPU (ascend-command) │ │
│ └────────────────────────────┬────────────────────────────────┘ │
│ │ Explicit param passing (Rule 1) │
│ ▼ │
├─────────────────────────────────────────────────────────────────────┤
│ Huawei Cloud Ascend Models Deploy │
│ (Stateless, Rule 2) │
├─────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌──────────────────────────────────┐ │
│ │ Natural Language│ │ Deploy Helper │ │
│ │ Commands │───▶│ - Model Matching & Catalog │ │
│ └──────────────────┘ │ - Script Auto-Match │ │
│ │ - Command Generation │ │
│ └──────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌─────────────────┐ ┌────────┐ │
│ │ Model │ │ Inference │ │ Log │ │
│ │ Deployment │ │ Testing │ │ Status │ │
│ │ │ │ │ │ │ │
│ │ • Single-node │ │ • LLM Chat │ │ • View │ │
│ │ • Dual-node │ │ • VL Multimodal │ │ • Check│ │
│ │ • 910B Series │ │ • Embedding │ │ │ │
│ └───────────────┘ │ • Rerank │ └────────┘ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Agent Orchestration Flow
User request: "Deploy Qwen2.5-72B on DevServer 116.204.23.145"
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 1: SSH Connection │
│ → Call huawei-cloud-ascend-remote-connect │
│ → Pass: host, user, password (explicit, Rule 1) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 2: NPU Health Check │
│ → Call huawei-cloud-ascend-command │
│ → Check: NPU list, health, HBM availability │
│ → Fail if NPU not healthy or insufficient HBM │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 3: Deploy Model (this skill) │
│ → Match model from catalog │
│ → Generate deploy script │
│ → Execute deployment │
│ → Stateless execution (Rule 2) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Agent Step 4: Monitor NPU │
│ → Call huawei-cloud-ascend-command │
│ → Monitor: HBM usage, temperature, processes │
└─────────────────────────────────────────────────────────────┘
│
▼
Deployment Complete
Related Skills Table
| Skill | Purpose | Orchestration Stage |
|---|---|---|
huawei-cloud-ascend-remote-connect |
SSH connection | Pre-deploy: Establish connection to DevServer |
huawei-cloud-ascend-command |
NPU management | Pre-deploy: Health check; Post-deploy: Monitoring |
Note: No direct calls between Skills. All orchestration by Agent based on user intent (Rule 3).
Prerequisites
Prerequisite check: Ascend 910B series required
- Supported: 910B1, 910B2, 910B3, 910B4
- Unsupported: 910A, 310, 310P, etc.
- Check with:
npu-smi info
Mandatory Rules (AI Must Follow)
- Never guess commands from memory — Must read "Deploy Script Auto-Match" section
- Must call deploy_helper.py first — Confirm model category and script URL
- Different models use different scripts:
- LLM / Embedding / Rerank →
deploy-large-models.sh - VL →
deploy-qwen3-vl-model.sh - OpenSource →
deploy-ai-models.sh
- LLM / Embedding / Rerank →
- Must validate before deployment — Port, NPU, model, card count
- Show command and wait for confirmation — Sensitive operation, never execute directly
Natural Language Understanding Rules
Extract key information from user natural language and assemble commands accurately.
Operation Type Detection
| Keywords | Operation |
|---|---|
| deploy / start / launch | Single-machine deployment |
| dual-machine / two-node / dual-node | Dual-machine deployment |
| test / inference / call | Test (execute) |
| write command / generate command | Write test command (generate only, no execute) |
| deployment log / view log | View deployment log |
| deployment status / is ready | View deployment status |
| model list / supported models | Show model catalog |
| parameter help / API parameters | Show parameter manual |
Information Extraction Rules
Model Name (fuzzy match, case-insensitive, supports card count filter):
- "qwen3-14b" → Qwen3-14B
- "qwen3-235b" → Multiple matches, prefer Instruct version (Qwen3-235B-A22B-Instruct-2507), or ask user
- "vl-32b" → Qwen3-VL-32B-Instruct
- "bge-m3" → bge-m3
- "qwen3-vl" + 2 cards → Match VL models with ≤2 cards, list for user to choose
- "qwen3" + 2 cards → Match all Qwen3 models with ≤2 cards, list for user to choose
- Multiple candidates → List all candidates (with card count and category), let user confirm
- No match → Show full model catalog for user to select
Card Count:
- "2 cards" / "use 2 cards" / "2 npus" → 2
- "16 cards" / "16 npus" → 16
- "dual-machine" → 16
- Not specified → Use minimum card count from model catalog
Port:
- "port 8022" / "port:8022" → 8022
- Not specified → Default 8080
Missing Parameters (check each, prompt what is missing):
- Missing model name → "Please specify model name" + show model list
- Missing card count → "Please specify card count, e.g.: 2 cards" + show minimum cards for this model
- Missing port → "Please specify port (default 8080), e.g.: port 8001"
- Dual-machine missing head IP → "Please specify head node IP, e.g.: head:192.168.1.1"
- Dual-machine missing worker IP → "Please specify worker node IP, e.g.: worker:192.168.1.2"
Head/Worker IP (dual-machine deployment):
- "head:1.1.1.1" / "head node 1.1.1.1" → Head node IP
- "worker:2.2.2.2" / "worker node 2.2.2.2" → Worker node IP
Prompt:
- "prompt:hello" / "ask:hello" → Prompt text
- Not specified → LLM default "hello", VL default "describe the image", Embedding default "I love shanghai", Rerank default "What is the capital of France?"
Image URL (VL test):
- "image:https://xxx.jpg" / direct URL → Image URL
- User sends image attachment → Auto-convert to base64 data URL
- Not specified and testing multimodal model → Prompt user for image URL
Multimodal Capability Auto-Detection:
- VL category → Supports multimodal
- OpenSource: Qwen3.6-35B-A3B, Qwen3.6-27B → Supports multimodal
- LLM category → Text only
- Embedding → Text only
- Rerank → Text only
Image URL Conversion (local image → data URL):
# Efficient base64 conversion
IMG_B64=$(base64 -w 0 ${local_image_path})
IMG_URL="data:image/jpeg;base64,${IMG_B64}"
Advanced Parameters (optional):
- "max_tokens:64" → max_tokens=64
- "temperature:0.7" → temperature=0.7
- "stream" → stream=true
- "system:You are assistant" → system_prompt
- "disable thinking" / "no thinking" → chat_template_kwargs: {"enable_thinking": false}
- (Default = thinking mode enabled)
Thinking Mode: Qwen3/Qwen3.6 models default to thinking mode, outputting reasoning process before final response.
- Enable thinking: Higher quality, more token consumption
- Disable thinking: Direct output, less token consumption, suitable for simple queries
- Request-level control via
"chat_template_kwargs": {"enable_thinking": false/true}
Supported Machine Types
Only Ascend 910B series (910B1 / 910B2 / 910B3 / 910B4). Must check NPU model before deployment, reject non-910B series.
Model Catalog
Large Language Models (LLM) — Endpoint: /v1/chat/completions
| Model | Min Cards |
|---|---|
| Qwen3-14B | 1 |
| Qwen3-30B-A3B-Instruct-2507 | 2 |
| Qwen3-32B | 2 |
| Qwen3-235B-A22B-Thinking-2507 | 16 |
| Qwen3-235B-A22B-Instruct-2507 | 16 |
| DeepSeek-R1-Distill-Llama-70B | 4 |
Vision-Language (VL) — Endpoint: /v1/chat/completions
| Model | Min Cards |
|---|---|
| Qwen3-VL-30B-A3B-Instruct | 2 |
| Qwen3-VL-32B-Instruct | 2 |
| Qwen3-VL-235B-A22B-Instruct | 16 |
| Qwen3-VL-235B-A22B-Instruct-W8A8 | 8 |
Embedding — Endpoint: /v1/embeddings (V0 backend only, single card only)
| Model | Min Cards | Multi-card |
|---|---|---|
| Qwen3-Embedding-8B | 1 | No |
| bge-large-zh-v1.5 | 1 | No |
| bge-m3 | 1 | No |
Rerank — Endpoint: /v1/rerank (single card only)
| Model | Min Cards | Multi-card |
|---|---|---|
| Qwen3-Reranker-8B | 1 | No |
| bge-reranker-v2-m3 | 1 | No |
OpenSource (Multimodal)
| Model | Min Cards | Capability |
|---|---|---|
| Qwen3.6-35B-A3B | 2 | Text + Image (MoE) |
| Qwen3.6-27B | 2 | Text + Image (MoE) |
| Qwen3-Next-80B-A3B-Instruct | 4 | Large language model |
| DeepSeek-V4-Flash-w8a8-mtp | 8 | Large language model |
Deploy Script Auto-Match (Must use, never guess script URL)
Script Path: scripts/deploy_helper.py
Match Rules (hardcoded, 100% accurate):
| Model Category | Deploy Script | Notes |
|---|---|---|
| LLM | deploy-large-models.sh |
Shared with Embedding/Rerank |
| Embedding | deploy-large-models.sh |
Same as above |
| Rerank | deploy-large-models.sh |
Same as above |
| VL | deploy-qwen3-vl-model.sh |
Multimodal specific |
| OpenSource | deploy-ai-models.sh |
OpenSource specific |
Usage:
# Match model (returns category, script URL, min cards, etc.)
python3 scripts/deploy_helper.py match \x3Cmodel_name>
# Generate deploy command directly
python3 scripts/deploy_helper.py command \x3Cmodel_name> \x3Ccards> \x3Cport>
# List all models (optional category filter)
python3 scripts/deploy_helper.py list [LLM|VL|Embedding|Rerank|OpenSource]
AI must call deploy_helper.py match first to confirm category and script, then use returned deploy_url to assemble command. Never guess from memory!
Core Commands
Core commands for model deployment and testing. See Operation Flow for detailed steps.
| Command | Description |
|---|---|
deploy \x3Cmodel> \x3Cport> |
Deploy model on single machine |
deploy \x3Cmodel> \x3Cport> \x3Ccards> |
Deploy with specified card count |
dual-machine deploy \x3Cmodel> head:\x3CIP> worker:\x3CIP> port:\x3CPORT> |
Deploy on dual-machine cluster |
test \x3Cmodel> \x3Cport> |
Test model inference |
deployment log |
View deployment log |
deployment status |
Check deployment status |
model list |
Show supported models |
Operation Flow
I. Deployment
1. Pre-deployment Check (Must execute every time, cannot skip)
Check in order, stop if any fails:
- NPU Model Check — Agent calls
huawei-cloud-ascend-commandto check chip model, reject non-910B series - NPU Card Count Check — Agent calls
huawei-cloud-ascend-commandto check available cards, confirm >= required cards - User Card Count Check — User-specified cards must be >= minimum and within supported range (1,2,4,8,16)
- Embedding/Rerank Single Card Check — Embedding and Rerank only support single card, reject multi-card
- Port Occupancy Check — Agent calls
huawei-cloud-ascend-remote-connectto runss -tlnp | grep :port, notify if occupied - SSH Connectivity Check — For dual-machine, verify both head and worker nodes are SSH accessible
2. Single-machine Deployment
User says: "deploy model_name port XXXX" or "deploy model_name port XXXX N cards"
Before deploying, must SSH execute mkdir -p /home/modelarts-agent to ensure directory exists.
LLM / Embedding / Rerank Command Template:
nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/deploy-large-models.sh && chmod 755 /home/modelarts-agent/deploy-large-models.sh && sh /home/modelarts-agent/deploy-large-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &
VL Multimodal Command Template:
nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/single-machine/deploy-qwen3-vl-model.sh && chmod 755 /home/modelarts-agent/deploy-qwen3-vl-model.sh && sh /home/modelarts-agent/deploy-qwen3-vl-model.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &
OpenSource Command Template:
nohup bash -c 'export model_name=${model} && export required_cards=${cards} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/single-machine/open_source/deploy-ai-models.sh && chmod 755 /home/modelarts-agent/deploy-ai-models.sh && sh /home/modelarts-agent/deploy-ai-models.sh ${model} ${cards} ${port}' > /home/modelarts-agent/deploy_${model}.log 2>&1 &
3. Dual-machine Deployment
User says: "dual-machine deploy model_name head:IP worker:IP port XXXX"
Before dual-machine deploy, both head and worker nodes need mkdir -p /home/modelarts-agent.
Head Node Command Template:
nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &
Worker Node Command Template:
nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-large-models/dual-machine/qwen3-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-235b-a22b.sh && sh /home/modelarts-agent/qwen3-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &
VL Dual-machine Deployment:
For VL models (Qwen3-VL-235B-A22B-Instruct, etc.), use the following scripts:
VL Head Node Command:
nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh head ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_head.log 2>&1 &
VL Worker Node Command:
nohup bash -c 'export ray_head_ip=${head_ip} && export model_name=${model} && export port=${port} && wget -P /home/modelarts-agent/ https://documentation-samples-17.obs.cn-north-9.myhuaweicloud.com/solution-as-code-publicbucket/solution-as-code-module/quickly-deploy-llm-on-modelarts-lite-devserver/userdata/deploy-vl-model/dual-machine/qwen3-vl-235b-a22b.sh && chmod 755 /home/modelarts-agent/qwen3-vl-235b-a22b.sh && sh /home/modelarts-agent/qwen3-vl-235b-a22b.sh worker ${head_ip} ${model} ${port}' > /home/modelarts-agent/deploy_${model}_worker.log 2>&1 &
4. Deployment Confirmation Flow
Sensitive operation, must show full command and wait for user "confirm" before executing.
After deploy command sent:
- Notify user: Ready, starting deployment of ${model}, log at
/home/modelarts-agent/deploy_${model}.log - Check log every 2 minutes, report progress (loading weights, Dynamo compiling, service starting, etc.)
- When port is listening, notify deployment success
- Deployment failure handling (strict compliance):
- Deployment failed = Report failure reason, no automatic retry
- Never auto-change image and retry
- Never auto-modify parameters and retry
- Never try other deployment methods
- Only report error, let user decide next step
- Output API sample for user:
Deployment successful! ${model} is ready
Service URL: http://${IP}:${PORT}/v1/chat/completions
Example request:
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","messages":[{"role":"user","content":"hello"}],"max_tokens":256}'
Multimodal request (if supported):
curl -X POST http://${IP}:${PORT}/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"image_url"}},{"type":"text","text":"describe the image"}]}],"max_tokens":512}'
II. Deployment Log
User says: "deployment log model_name"
Agent uses huawei-cloud-ascend-remote-connect to execute:
tail -50 /home/modelarts-agent/deploy_${model}.log
III. Deployment Status
User says: "deployment status port XXXX"
Agent uses huawei-cloud-ascend-remote-connect to execute:
ss -tlnp | grep :
Port listening = Service ready for testing.
IV. Test (Execute)
User says: "test model_name prompt:xxx" or "test model_name image:URL"
Test flow (strict compliance):
- Show full curl command for user to review
- Wait for user "confirm" or "send" before executing
- Structured result output:
Test Result
| Field | Value |
|-------|-------|
| id | chatcmpl-xxx |
| model | Qwen3-VL-32B-Instruct |
| prompt_tokens | 93 |
| completion_tokens | 400 |
| total_tokens | 493 |
| finish_reason | stop |
Model Response:
[Extract full content, no truncation]
Raw Response:
[Full JSON, no truncation]
LLM Chat Completions
curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","messages":[{"role":"user","content":"${prompt}"}],"max_tokens":1024,"temperature":0.7}'
Multimodal VL
curl -s -X POST http://${IP}:${PORT}/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":[{"type":"image_url","image_url":{"url":"${image_url}"}},{"type":"text","text":"${prompt}"}]}],"max_tokens":512,"temperature":0.7}'
Embedding
curl -s -X POST http://${IP}:${PORT}/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","input":"${text}"}'
Rerank
curl -s -X POST http://${IP}:${PORT}/v1/rerank \
-H 'Content-Type: application/json' \
-d '{"model":"${model}","query":"${query}","documents":["${doc1}","${doc2}"]}'
V. Write Test Command (Generate Only)
User says: "write test command model_name prompt:xxx"
Same logic as "test", but only output command text, no execution.
API Parameter Manual
LLM Parameters (/v1/chat/completions)
| Parameter | Required | Default | Description |
|---|---|---|---|
| model | Yes | — | Model name, same as deployment |
| messages | Yes | — | Message list, each with role and content |
| max_tokens | No | 16 | Max generation tokens |
| temperature | No | 1.0 | Sampling randomness, 0=greedy |
| top_p | No | 1.0 | Nucleus sampling threshold |
| top_k | No | -1 | Only consider top-K tokens |
| stream | No | false | Streaming output (SSE) |
| chat_template_kwargs | No | {} | Template params, e.g. {"enable_thinking": false} |
VL Extra Parameters
| Parameter | Description |
|---|---|
| content[] | Array format: image_url object + text object |
| detail | Image precision: auto/high/low |
Embedding Parameters (/v1/embeddings)
| Parameter | Required | Description |
|---|---|---|
| model | Yes | Model name |
| input | Yes | String or string list |
| encoding_format | No | float/base64 |
Rerank Parameters (/v1/rerank)
| Parameter | Required | Description |
|---|---|---|
| model | Yes | Model name |
| query | Yes | Query text |
| documents | Yes | Document list to rerank |
| top_n | No | Return top N |
Execution Mode
This skill operates in stateless mode (Rule 2). All context (host, credentials, model info) must be explicitly passed by Agent (Rule 1).
Prerequisites (Agent orchestrated)
Before calling this skill, Agent MUST:
-
Establish SSH connection using
huawei-cloud-ascend-remote-connect- Agent receives: host, port, user, password from user
- Agent validates connection is successful
-
Check NPU status using
huawei-cloud-ascend-command- Agent checks: NPU health, HBM availability
- Agent validates: sufficient cards for model deployment
Skill Execution
This skill receives explicit parameters from Agent:
# Model matching (local operation)
python3 scripts/deploy_helper.py match \x3Cmodel_name>
# Script URL generation (local operation)
python3 scripts/deploy_helper.py script \x3Cmodel_name>
# Deploy command generation (local operation)
python3 scripts/deploy_helper.py command \x3Cmodel> \x3Ccards> \x3Cport>
Remote Deployment Execution
Agent executes deployment commands on remote server:
# Agent uses SSH to execute deployment on DevServer
ssh root@\x3Chost> "cd /path/to/model && bash deploy.sh"
Post-Deployment (Agent orchestrated)
After deployment, Agent calls huawei-cloud-ascend-command to:
- Monitor NPU HBM usage
- Check deployment process status
- Verify model endpoint is responding
Parameter Flow
User Input Agent This Skill
│ │ │
│ host, password │ │
├─────────────────────────▶│ │
│ │ SSH connect │
│ ├───────────────────────────▶│
│ │ │ (remote-connect)
│ │◀───────────────────────────┤
│ │ │
│ │ NPU check │
│ ├───────────────────────────▶│
│ │ │ (ascend-command)
│ │◀───────────────────────────┤
│ │ │
│ model_name, cards │ │
├─────────────────────────▶│ │
│ │ match model │
│ ├───────────────────────────▶│
│ │ │ deploy_helper.py
│ │◀───────────────────────────┤
│ │ │
│ │ execute deploy │
│ ├───────────────────────────▶│
│ │ │ (via SSH)
│ │◀───────────────────────────┤
│ │ │
│ │ monitor NPU │
│ ├───────────────────────────▶│
│ │ │ (ascend-command)
│ │◀───────────────────────────┤
│ │ │
▼ ▼ ▼
Note: No direct skill-to-skill calls. All orchestration by Agent (Rule 3).
References
| Document | Description |
|---|---|
| task-deploy-model.md | Deployment task steps |
| task-test-model.md | Testing task steps |
| model-catalog.md | Complete model catalog |
| api-parameters.md | API parameter reference |
| prerequisites.md | Prerequisites checklist |
| verification-method.md | Verification steps |
| troubleshooting.md | Troubleshooting guide |
| scripts/deploy_helper.py | Model matching helper |
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install huawei-cloud-ascend-models-deploy - After installation, invoke the skill by name or use
/huawei-cloud-ascend-models-deploy - Provide required inputs per the skill's parameter spec and get structured output
What is huawei-cloud-ascend-models-deploy?
Huawei Cloud Ascend model deployment and testing skill for large language models on Ascend DevServer (910B series). Supports single-machine and dual-machine... It is an AI Agent Skill for Claude Code / OpenClaw, with 36 downloads so far.
How do I install huawei-cloud-ascend-models-deploy?
Run "/install huawei-cloud-ascend-models-deploy" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is huawei-cloud-ascend-models-deploy free?
Yes, huawei-cloud-ascend-models-deploy is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does huawei-cloud-ascend-models-deploy support?
huawei-cloud-ascend-models-deploy is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created huawei-cloud-ascend-models-deploy?
It is built and maintained by huaweicloud-skills-team (@huaweiclouddev); the current version is v0.0.1.