← 返回 Skills 市场

Huawei Cloud Cce Availability Risk Scanner

Name: Huawei Cloud Cce Availability Risk Scanner
Author: pintudeyudi

作者 shijingcheng · GitHub ↗ · v0.1.2 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install huawei-cloud-cce-availability-risk-scanner

功能描述

Huawei Cloud CCE availability risk scanning skill using Python SDK dispatcher for read-only cluster risk assessment. Use this skill when the user wants to: (...

使用说明 (SKILL.md)

\r \r

Huawei Cloud CCE Availability Risk Scanner\r

⚠️ Execution Method (Must Read): This skill executes queries via the local Python dispatcher script. Using hcloud, openstack, or other CLI tools or direct API calls is prohibited.\r \r

The dispatcher script is located at scripts/huawei-cloud.py within the skill directory\r

All scripts and environment check scripts are inside the skill package. You must use skill action=exec to execute them. Do not run them directly in a shell.\r

Do not attempt hcloud, openstack, curl IAM, or any other CLI/API methods. This skill does not depend on those tools.\r

All paths are relative to the skill directory, which is the directory where this SKILL.md is located.\r \r

Overview\r

\r This skill scans Huawei Cloud CCE clusters for availability risks. It performs read-only checks, produces risk-rated reports, and generates remediation plans with YAML suggestions. It does NOT directly modify workloads, PDBs, affinity rules, probes, node pools, or cluster configuration.\r \r Architecture: Python dispatcher (scripts/huawei-cloud.py) → Huawei Cloud Python SDK + Kubernetes client → Nodes, Pods, Deployments, StatefulSets, DaemonSets, PDBs, Services, Ingresses, Events, Metrics → Risk classification → Remediation plan → Reports\r \r Related Skills:\r \r | Skill | Purpose |\r |-------|---------|\r | huawei-cloud-cce-pod-failure-diagnoser | Pod-level failure diagnosis (CrashLoopBackOff, OOMKilled, Pending) |\r | huawei-cloud-cce-node-failure-diagnoser | Node-level failure diagnosis (NotReady, pressure) |\r | huawei-cloud-cce-network-failure-diagnoser | Network failure diagnosis (Service, DNS, Ingress, ELB) |\r | huawei-cloud-cce-root-cause-analyzer | Cross-resource root cause correlation |\r | huawei-cloud-cce-auto-remediation-runner | Execute remediation actions (scale, PDB, affinity, probes) |\r | huawei-cloud-cce-cce-workload-manager | Workload lifecycle management (Deployment/StatefulSet operations) |\r \r Capabilities:\r \r

One-shot availability risk scan with automated inventory collection and risk classification (huawei_scan_cce_availability_risk)\r
Control-plane visibility and master HA assessment (node count, AZ distribution, CPU/memory metrics)\r
Node AZ distribution and nodepool distribution analysis\r
Workload risk detection: single replicas, missing PDBs, Pod AZ/node concentration, missing health probes, hard affinity, anti-affinity gaps, topology spread gaps\r
Gateway workload identification and distribution assessment (nginx, gateway, ingress, proxy, kong, apisix, traefik)\r
Core addon anti-affinity and distribution checks (CoreDNS, nginx-ingress, ingress-nginx)\r
Resource request/limit overcommit detection and cluster capacity illusion identification\r
Risk-rated reports with severity classification, remediation suggestions, and authorized execution plans\r \r Typical Use Cases:\r \r

"Scan my CCE cluster for availability risks"\r
"Check if my cluster has single points of failure"\r
"Assess master HA and node AZ distribution"\r
"Find workloads missing PodDisruptionBudgets"\r
"Identify gateway workloads concentrated on a single node or AZ"\r
"Detect resource overcommit and capacity illusions"\r
"Check health probe coverage for my Deployments"\r
"Assess workload affinity and topology spread"\r
"Review core addon (CoreDNS, nginx-ingress) anti-affinity"\r
"Generate an availability risk report with remediation plan"\r \r

Prerequisites\r

1. Python Requirements (MANDATORY)\r

Python >= 3.6 installed\r
Required packages: huaweicloudsdkcore, huaweicloudsdkcce, huaweicloudsdkaom, huaweicloudsdkhss, huaweicloudsdkvpc, huaweicloudsdkecs, huaweicloudsdkces, huaweicloudsdkevs, huaweicloudsdkeip, huaweicloudsdkelb, huaweicloudsdkiam, kubernetes\r
Verify: python3 --version\r
Install packages: pip3 install huaweicloudsdkcore huaweicloudsdkcce huaweicloudsdkaom huaweicloudsdkhss huaweicloudsdkvpc huaweicloudsdkecs huaweicloudsdkces huaweicloudsdkevs huaweicloudsdkeip huaweicloudsdkelb huaweicloudsdkiam kubernetes\r \r

2. Credential Configuration\r

Valid Huawei Cloud credentials (AK/SK mode)\r
Security Rules:\r
- 🚫 Never expose AK/SK values in code, conversation, or commands\r
- 🚫 Never use echo $HUAWEI_AK or echo $HUAWEI_SK to check credentials\r
- 🚫 Never write credentials to files, logs, or responses\r
- ✅ Use environment variables: HUAWEI_AK, HUAWEI_SK, HUAWEI_REGION\r
- ✅ Credentials exist only in the current request call stack and are released after each invocation\r
- ✅ Prefer IAM users over root account for cloud operations\r \r Configuration Method (Environment Variables Only):\r \r

export HUAWEI_AK=\x3Cyour-ak>\r
export HUAWEI_SK=\x3Cyour-sk>\r
export HUAWEI_REGION=cn-north-4\r
```\r
\r
**Additional Variables**:\r
\r
| Variable | Required | Description |\r
|----------|----------|-------------|\r
| `HUAWEI_AK` | Yes | Huawei Cloud Access Key |\r
| `HUAWEI_SK` | Yes | Huawei Cloud Secret Key |\r
| `HUAWEI_REGION` | No | Default region (overrides `region` param if set) |\r
| `HUAWEI_PROJECT_ID` | No | Project ID (auto-obtained via IAM API when not set) |\r
| `HUAWEI_SECURITY_TOKEN` | No | Required when using temporary AK/SK |\r
\r
### 3. IAM Permission Requirements\r
\r
| API Action | Service | Purpose |\r
|------------|---------|---------|\r
| CCE cluster read | CCE | `huawei_list_cce_clusters` |\r
| CCE node read | CCE | `huawei_get_kubernetes_nodes`, `huawei_get_cce_nodes` |\r
| CCE workload read | CCE | `huawei_get_cce_pods`, `huawei_get_cce_deployments` |\r
| CCE nodepool read | CCE | `huawei_list_cce_nodepools` |\r
| CCE addon read | CCE | `huawei_list_cce_addons` |\r
| AOM metrics read | AOM | `huawei_get_cce_node_metrics`, `huawei_get_cce_node_metrics_topN`, `huawei_get_aom_metrics` |\r
| Kubernetes API read | CCE (kubeconfig) | `huawei_get_cce_pods`, `huawei_get_cce_deployments`, `huawei_list_cce_statefulsets`, `huawei_list_cce_daemonsets` |\r
\r
**Permission Failure Handling**:\r
\r
1. When any action fails due to permission errors, display the required permission list\r
2. Guide the user to create a custom policy in the IAM console\r
3. Pause execution and wait for user confirmation that permissions have been granted\r
4. Retry the failed action\r
\r
## Core Commands\r
\r
All actions are invoked via the dispatcher script:\r
\r
```bash\r
python3 scripts/huawei-cloud.py \x3Caction> region=\x3Cregion> cluster_id=\x3Ccluster_id> [key=value ...]\r
```\r
\r
### 1. Primary Scan Action (One-Call)\r
\r
The primary scan command that collects all availability risk data in a single call and outputs a risk-rated report.\r
\r
```bash\r
python3 scripts/huawei-cloud.py huawei_scan_cce_availability_risk \\r
  region=cn-north-4 cluster_id=\x3Ccluster_id> \\r
  exclude_namespaces=kube-system \\r
  gateway_keywords=nginx,gateway,ingress,proxy,kong,apisix,traefik \\r
  metrics_hours=24 \\r
  output_dir=./output\r
```\r
\r
Returns: risk-rated issues, severity classification, inventory summary, data gaps, remediation suggestions, and optionally `availability-risk-summary.json` and `availability-risk-report.md` files.\r
\r
### 2. Inventory Collection Actions\r
\r
| Action | Required Params | Description |\r
|--------|----------------|-------------|\r
| `huawei_get_kubernetes_nodes` | `region`, `cluster_id` | Query v1.Node Ready/conditions/AZ distribution |\r
| `huawei_get_cce_pods` | `region`, `cluster_id` | List Pod phase/reason/state/node/AZ |\r
| `huawei_get_cce_deployments` | `region`, `cluster_id` | List Deployments with replicas/PDB/affinity |\r
| `huawei_get_cce_services` | `region`, `cluster_id` | List Services for workload correlation |\r
| `huawei_get_cce_ingresses` | `region`, `cluster_id` | List Ingresses for gateway identification |\r
| `huawei_list_cce_nodepools` | `region`, `cluster_id` | List node pools with AZ distribution |\r
| `huawei_list_cce_daemonsets` | `region`, `cluster_id` | List DaemonSets for probe/affinity check |\r
| `huawei_list_cce_statefulsets` | `region`, `cluster_id` | List StatefulSets for PDB/single-replica check |\r
| `huawei_get_cce_node_metrics_topN` | `region`, `cluster_id` | Top-N node CPU/memory metrics |\r
| `huawei_get_aom_metrics` | `region` | AOM metric data for master/node trends |\r
| `huawei_list_cce_clusters` | `region` | List CCE clusters (for cluster selection) |\r
\r
### 3. Supplementary Query Actions\r
\r
For targeted evidence when the user requests specific information:\r
\r
```bash\r
# Node AZ distribution detail\r
python3 scripts/huawei-cloud.py huawei_get_kubernetes_nodes \\r
  region=cn-north-4 cluster_id=\x3Ccluster_id>\r
\r
# Pod distribution across AZs\r
python3 scripts/huawei-cloud.py huawei_get_cce_pods \\r
  region=cn-north-4 cluster_id=\x3Ccluster_id> namespace=default\r
\r
# Deployment detail with PDB and affinity\r
python3 scripts/huawei-cloud.py huawei_get_cce_deployments \\r
  region=cn-north-4 cluster_id=\x3Ccluster_id> namespace=default\r
\r
# Node pool AZ distribution\r
python3 scripts/huawei-cloud.py huawei_list_cce_nodepools \\r
  region=cn-north-4 cluster_id=\x3Ccluster_id>\r
\r
# Node metrics trend\r
python3 scripts/huawei-cloud.py huawei_get_cce_node_metrics_topN \\r
  region=cn-north-4 cluster_id=\x3Ccluster_id> top_n=10\r
```\r
\r
## Parameter Reference\r
\r
### `huawei_scan_cce_availability_risk` (Primary Action)\r
\r
| Parameter | Required | Default | Description |\r
|-----------|----------|---------|-------------|\r
| `region` | Yes | - | Huawei Cloud region (e.g., `cn-north-4`) |\r
| `cluster_id` | Yes | - | CCE cluster ID |\r
| `exclude_namespaces` | No | `kube-system` | Namespaces excluded from business risk scanning; core addons still checked |\r
| `gateway_keywords` | No | `nginx,gateway,ingress,proxy,kong,apisix,traefik` | Keywords for identifying gateway-class workloads |\r
| `metrics_hours` | No | 24 | Lookback window for master/node CPU/memory trend metrics |\r
| `output_dir` | No | - | Directory for `availability-risk-summary.json` and `availability-risk-report.md` output |\r
\r
### Common Parameters\r
\r
| Parameter | Required | Description | Default |\r
|-----------|----------|-------------|---------|\r
| `region` | Yes | Huawei Cloud region | - |\r
| `cluster_id` | Yes (most actions) | CCE cluster ID | - |\r
| `namespace` | Context-dependent | Kubernetes namespace | - |\r
| `top_n` | No | Number of top results | 10 |\r
| `metrics_hours` | No | Metric lookback hours | 24 |\r
\r
### Common Region IDs\r
\r
| Region Name | Region ID |\r
|-------------|-----------|\r
| North China - Beijing 4 | `cn-north-4` |\r
| North China - Beijing 1 | `cn-north-1` |\r
| East China - Shanghai 1 | `cn-east-3` |\r
| East China - Shanghai 2 | `cn-east-2` |\r
| South China - Guangzhou | `cn-south-1` |\r
| South China - Shenzhen | `cn-south-4` |\r
| Southwest China - Guiyang 1 | `cn-southwest-2` |\r
| Asia Pacific - Bangkok | `ap-southeast-2` |\r
| Asia Pacific - Singapore | `ap-southeast-1` |\r
| Asia Pacific - Hong Kong | `ap-southeast-3` |\r
| Europe - Paris | `eu-west-0` |\r
\r
## Output Format\r
\r
The primary action `huawei_scan_cce_availability_risk` returns structured risk data. See [Output Schema](references/output-schema.md) for the full JSON response schema.\r
\r
**Key Output Fields**:\r
\r
| Field | Description |\r
|-------|-------------|\r
| `success` | Whether the scan completed successfully |\r
| `scope` | Scan scope (region, cluster_id, excluded namespaces, gateway keywords) |\r
| `inventory` | Collected resource counts (nodes, workloads, pods, PDBs, services, ingresses) and AZ distribution |\r
| `cluster.control_plane` | Master HA status, visible node count, zone distribution, metrics |\r
| `cluster.resources` | CPU/memory request/limit allocatable ratios, missing request containers count |\r
| `issues[]` | Risk issues with severity, category, resource, message, recommendation |\r
| `summary.risk_level` | Overall risk level: critical, high, medium, low |\r
| `summary.issue_count` | Total issues with severity breakdown |\r
| `recommendations` | Remediation recommendations list |\r
| `remediation_plan` | Authorized execution plan items |\r
| `data_gaps` | Data gaps when control-plane or metrics are unavailable |\r
| `files` | Optional output file paths (summary JSON, report Markdown, raw inventory) |\r
\r
**Issue Severity Levels**:\r
\r
| Severity | Criteria |\r
|----------|----------|\r
| `critical` | Single replica gateway, no master HA, single-AZ concentration of all Ready nodes |\r
| `high` | Multi-replica workload missing PDB, Pod concentration on single node/AZ, missing health probes |\r
| `medium` | Missing resource requests, memory overcommit ratio > 2x, core addon single replica |\r
| `low` | CPU overcommit ratio > 4x (may be intentional burst), minor affinity gaps |\r
\r
**Issue Categories**:\r
\r
| Category | Description |\r
|----------|-------------|\r
| `single-replica` | Workload or gateway running with \x3C 2 replicas |\r
| `pdb` | Multi-replica workload missing PodDisruptionBudget |\r
| `health-check` | Workload missing readinessProbe or livenessProbe |\r
| `affinity` | Hard affinity pinning to single AZ/node/nodepool, missing anti-affinity |\r
| `az-distribution` | Nodes or Pods concentrated in a single AZ |\r
| `gateway` | Gateway workload risk (concentration, missing PDB, missing probes) |\r
| `resources` | Missing requests, overcommit, or capacity illusion |\r
\r
## Verification\r
\r
See [Verification Method](references/verification-method.md) for step-by-step verification.\r
\r
## Best Practices\r
\r
1. **Primary action first**: Always call `huawei_scan_cce_availability_risk` first; use manual inventory queries only if the primary scan fails or the user requests specific detail\r
2. **Control-plane data gap**: When CCE managed control plane does not expose master nodes, mark it as a data gap in the report — do NOT assume master HA\r
3. **Core addon awareness**: Even when `kube-system` is in `exclude_namespaces`, CoreDNS, nginx-ingress, and ingress-nginx are still individually identified and checked\r
4. **Gateway identification**: Use `gateway_keywords` to identify gateway-class workloads; adjust keywords for custom gateway implementations\r
5. **Remediation authorization**: All real remediation (scaling replicas, creating PDB, modifying probes, adjusting affinity, migrating nodes, resizing node pools) requires explicit user authorization before execution\r
6. **Remediation hand-off**: When remediation is needed, hand off to `huawei-cloud-cce-auto-remediation-runner` with proper safeguards and user confirmation\r
7. **Read-only boundary**: This skill does NOT scale replicas, create PDBs, modify probes, adjust affinity, migrate nodes, or resize node pools — it only generates remediation plans and YAML suggestions\r
8. **Resource overcommit interpretation**: CPU overcommit ratio > 4x is marked as low risk (may be intentional burst); memory overcommit ratio > 2x is marked as medium risk (OOM and bin-packing risk)\r
\r
## Reference Documents\r
\r
| Document | Description |\r
|----------|-------------|\r
| [Workflow](references/workflow.md) | Scan workflow, evidence collection steps, and risk classification rules |\r
| [Risk Rules](references/risk-rules.md) | Safety constraints, mutation boundaries, and authorization requirements |\r
| [Output Schema](references/output-schema.md) | Complete JSON response format for scan results |\r
| [Verification Method](references/verification-method.md) | Step-by-step verification for skill setup and scan execution |\r
| [Common Pitfalls](references/common-pitfalls.md) | Troubleshooting guides for scan pitfalls |\r
\r
## Notes\r
\r
- **Read-only by design** — this skill does NOT modify workloads, PDBs, probes, affinity, node pools, or cluster configuration\r
- **Remediation hand-off** — all mutation suggestions are handed off to `huawei-cloud-cce-auto-remediation-runner` with `requires_confirmation: true`\r
- **Never expose or log AK/SK or environment variable values**\r
- **All actions are executed via `python3 scripts/huawei-cloud.py \x3Caction>`; do not use hcloud CLI or direct API calls**\r
- **Data gaps** — when CCE managed control plane does not expose master nodes, the scan marks this as a data gap and recommends verifying in the CCE console/API\r
- **Gateway keywords** — default keywords cover common gateway implementations; custom gateways can be added via `gateway_keywords` parameter\r
- **`kube-system` exclusion** — business risk scanning excludes `kube-system` by default, but core addons (CoreDNS, nginx-ingress, ingress-nginx) are still individually checked for anti-affinity and distribution risks\r
\r
## Common Pitfalls\r
\r
See [Common Pitfalls & Solutions](references/common-pitfalls.md) for detailed troubleshooting guides.\r
\r
**Quick Reference**:\r
\r
| Pitfall | Symptom | Quick Fix |\r
|---------|---------|-----------|\r
| Assuming master HA | Report concludes "master HA OK" with no visible master nodes | Mark as data gap; recommend CCE console/API verification |\r
| Skipping PDB check | Missing PDB for multi-replica gateway not flagged | Include gateway keywords and check PDB for all multi-replica workloads |\r
| Ignoring gateway concentration | All gateway Pods on one node/AZ | Use `gateway_keywords` and check Pod distribution across nodes/AZs |\r
| Treating CPU overcommit as critical | CPU limit/request ratio > 4x flagged as critical | Mark as low risk; confirm whether intentional burst design |\r
| Missing resource requests | Containers with no CPU/memory requests not flagged | Always check request/limit presence; mark missing requests as medium risk |\r
| Excluding core addons | `kube-system` excluded removes CoreDNS from checks | Core addons are individually identified regardless of namespace exclusion |\r
| Wrong cluster_id | API returns 404 or empty results | Verify cluster ID via `huawei_list_cce_clusters` |\r
| Credential permission denied | API returns 403 | Check IAM permissions for CCE node/workload/metrics access |\r
| Metrics API unavailable | Node/Pod metrics query fails | Ensure metrics-server addon is installed in cluster |

安全使用建议

Install only if you are prepared to audit and restrict it as a privileged cloud administration tool, not just a scanner. Use a dedicated read-only Huawei IAM user and Kubernetes RBAC identity, avoid granting permissions for create/update/delete/patch/scale/secret/log access, and do not pass confirm=true unless you intentionally want a production change.

能力标签

requires-walletrequires-sensitive-credentials

能力评估

⚠ Purpose & Capability

The stated purpose, SKILL.md, references, and skill-profile describe read-only availability assessment, but the code registers actions for deleting and creating CCE resources, scaling and resizing workloads, managing addons, changing AOM alarms, draining nodes, rolling back workloads, binding EIPs, returning kubeconfig material, reading Secrets, and fetching pod logs.

⚠ Instruction Scope

The runtime instruction tells agents to use a generic dispatcher script while documenting only scanner-style actions and explicitly saying the skill does not modify cluster state; the actual registered action surface is much broader, and confirm=true is only a CLI parameter gate.

ℹ Install Mechanism

No hidden install hook or startup persistence was found; installation is mainly Python SDK dependencies, but the skill requires sensitive Huawei Cloud AK/SK credentials.

⚠ Credentials

Read-only CCE, Kubernetes, and metrics access fits the scanner purpose, but the packaged admin and mutation capabilities are not proportionate unless the user independently enforces read-only IAM/RBAC.

⚠ Persistence & Privilege

There is no background persistence, but the skill can write reports/raw inventory to user-supplied paths and includes privileged paths that can return kubeconfig, Secret data, and logs; temporary certificate files are generally cleaned up but still created during use.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install huawei-cloud-cce-availability-risk-scanner
安装完成后，直接呼叫该 Skill 的名称或使用 /huawei-cloud-cce-availability-risk-scanner 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.1.2

Initial release

v0.1.1

Initial release

v0.1.0

Initial release

元数据

Slug huawei-cloud-cce-availability-risk-scanner

版本 0.1.2

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 3

常见问题