← Back to Skills Marketplace
pintudeyudi

Huawei Cloud Cce Auto Remediation Runner

by shijingcheng · GitHub ↗ · v0.1.2 · MIT-0
cross-platform ⚠ suspicious
38
Downloads
0
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install huawei-cloud-cce-auto-remediation-runner
Description
Huawei Cloud CCE auto-remediation runner skill that converts remediation intent into preview-first, confirm-required, post-verify execution plans. Use this s...
README (SKILL.md)

\r \r

CCE Auto Remediation Runner\r

\r

⚠️ Execution Method (Must Read): This skill executes remediation actions via local Python scripts using the scripts/huawei-cloud.py dispatcher. Using hcloud, kubectl, or other CLI tools or direct API calls is prohibited.\r \r

  • All actions are dispatched through scripts/huawei-cloud.py with --action \x3Caction_name> and --params \x3Cjson_params>\r
  • All scripts and environment check scripts are inside the skill package. You must use skill action=exec to execute them; do not run them directly in a shell\r
  • For action names and parameters, see the Core Tools section below\r
  • Do not attempt hcloud, kubectl, curl IAM, or other CLI/API methods. This skill does not depend on these tools\r
  • All paths are relative to the skill directory, which is the directory where this SKILL.md resides\r \r

Overview\r

\r This skill converts remediation intent into reviewable, confirmable, verifiable execution plans. It operates in preview-first mode by default — all mutation actions require preview without confirm=true, explicit user confirmation of action/object/risks, then execution with confirm=true, followed by read-only verification.\r \r This skill is applicable to the following scenarios:\r \r

  1. Remediation actions triggered by root-cause analysis conclusions (e.g., Deployment rollback for CrashLoop/ImagePull/CommandNotFound)\r
  2. Node operations: cordon, uncordon, drain, reboot ECS\r
  3. Workload operations: scale, resize, rollback, delete\r
  4. Node pool operations: resize node pool\r
  5. Cluster operations: hibernate, awake\r
  6. Security operations: HSS vulnerability status change\r
  7. Auto-remediation orchestration via huawei_auto_remediation_run for multi-step remediation plans\r
  8. Traffic cutover: bind/unbind cluster EIP\r
  9. ECS instance operations: start, stop\r \r This skill does NOT handle the following:\r \r
  10. Read-only diagnosis (use huawei-cloud-cce-root-cause-analyzer or domain-specific diagnoser skills)\r
  11. Auto-executing remediation without preview and user confirmation\r
  12. Guessing or fabricating remediation results without evidence\r
  13. Batch or fuzzy-target remediation without explicit user confirmation per object\r \r ---\r \r

Prerequisites\r

\r Before using, you must run the environment check script to complete environment validation and dependency installation in one step:\r \r

  • Linux / macOS: skill action=exec: bash skill://scripts/check_env.sh\r
  • Windows: skill action=exec: powershell -ExecutionPolicy Bypass -File skill://scripts/check_env.ps1\r \r

Windows Note: Do not use && to chain commands (PowerShell 5.x does not support it). Use semicolons if you need to change directories first.\r \r The script will check in sequence: Python >= 3.6 → install dependencies → validate SDK → validate credentials → validate service availability.\r If the environment check fails, fix the issues before continuing with other actions.\r \r Environment Variables:\r \r | Variable | Required | Description |\r |----------|----------|-------------|\r | HW_ACCESS_KEY | Yes | Huawei Cloud AK |\r | HW_SECRET_KEY | Yes | Huawei Cloud SK |\r | HW_REGION_NAME | No | Default cn-north-4 |\r | HW_PROJECT_ID | No | Project ID (automatically obtained via IAM API when not set) |\r | HW_SECURITY_TOKEN | No | Required when using temporary AK/SK |\r | HW_CLUSTER_ID | No | Default CCE cluster ID (can also be passed per action) |\r \r Security Constraints:\r \r

  1. Never persist credentials (AK/SK/Token/Certificate) to the filesystem\r
  2. AK/SK exist only within the current request call stack; released after use\r
  3. Only non-sensitive project IDs are cached in process memory (never written to disk)\r
  4. All temporary certificate files must be deleted immediately after use\r
  5. Never expose AK/SK in logs, responses, or error messages\r \r Do not output the values of the above environment variables.\r \r ---\r \r

IAM Permission Requirements\r

\r | API Action | Permission | Purpose |\r |-----------|------------|---------|\r | cce:cluster:get | Get cluster | View cluster details |\r | cce:cluster:list | List clusters | List CCE clusters |\r | cce:node:get | Get node | View node details |\r | cce:node:list | List nodes | List cluster nodes |\r | cce:node:update | Update node | Cordon/uncordon/drain nodes |\r | cce:nodepool:update | Update node pool | Resize node pools |\r | cce:nodepool:get | Get node pool | View node pool details |\r | cce:nodepool:list | List node pools | List node pools |\r | aom:*:get | Read AOM | Query AOM metrics and alarms |\r | aom:alarmRule:list | List alarm rules | Query alarm rules for validation |\r | aom:event:list | List events | Query AOM alarm events |\r \r Permission Failure Handling:\r

  1. When any command fails due to permission errors, display required permission list and policy JSON\r
  2. Guide the user to create a custom policy in the IAM console and grant authorization\r
  3. Pause execution and wait for user confirmation that permissions have been granted\r \r ---\r \r

Core Tools\r

\r All actions are dispatched through scripts/huawei-cloud.py using skill action=exec.\r \r

Auto-Remediation Orchestration\r

\r | Action | Required Parameters | Description |\r |--------|---------------------|-------------|\r | huawei_auto_remediation_run | region, cluster_id, strategy | Orchestrate multi-step remediation plan; strategy determines actions (rollback_previous_revision, scale_out, drain_and_replace, etc.) |\r \r

Workload Actions\r

\r | Action | Required Parameters | Description |\r |--------|---------------------|-------------|\r | huawei_rollback_cce_workload | region, cluster_id, namespace, kind, name | Rollback Deployment/StatefulSet/DaemonSet to previous revision |\r | huawei_scale_cce_workload | region, cluster_id, namespace, kind, name, replicas | Scale workload replicas |\r | huawei_resize_cce_workload | region, cluster_id, namespace, kind, name | Resize workload resource limits |\r | huawei_delete_cce_workload | region, cluster_id, namespace, kind, name | Delete a workload |\r \r

Node Actions\r

\r | Action | Required Parameters | Description |\r |--------|---------------------|-------------|\r | huawei_cce_node_cordon | region, cluster_id, node_name | Mark node as unschedulable |\r | huawei_cce_node_uncordon | region, cluster_id, node_name | Mark node as schedulable again |\r | huawei_cce_node_drain | region, cluster_id, node_name | Evict all pods from node |\r | huawei_reboot_ecs | region, ecs_id | Reboot the underlying ECS instance |\r \r

Node Pool and Cluster Actions\r

\r | Action | Required Parameters | Description |\r |--------|---------------------|-------------|\r | huawei_resize_cce_nodepool | region, cluster_id, nodepool_id, target_count | Resize node pool to target count |\r | huawei_hibernate_cce_cluster | region, cluster_id | Hibernate (sleep) the CCE cluster |\r | huawei_awake_cce_cluster | region, cluster_id | Awake (wake) the CCE cluster |\r | huawei_delete_cce_cluster | region, cluster_id | Delete the CCE cluster |\r | huawei_delete_cce_node | region, cluster_id, node_name | Delete a node from the cluster |\r \r

ECS Instance Actions\r

\r | Action | Required Parameters | Description |\r |--------|---------------------|-------------|\r | huawei_start_ecs_instance | region, ecs_id | Start ECS instance |\r | huawei_stop_ecs_instance | region, ecs_id | Stop ECS instance |\r \r

Elastic Scaling Policy\r

\r | Action | Required Parameters | Description |\r |--------|---------------------|-------------|\r | huawei_configure_cce_hpa | region, cluster_id, namespace, kind, name, min_replicas, max_replicas | Configure HPA policy for workload |\r \r

Network / Traffic Actions\r

\r | Action | Required Parameters | Description |\r |--------|---------------------|-------------|\r | huawei_bind_cce_cluster_eip | region, cluster_id, eip_id | Bind EIP to cluster for external access |\r | huawei_unbind_cce_cluster_eip | region, cluster_id | Unbind EIP from cluster |\r | huawei_network_verify_pod_scheduling | region, cluster_id, namespace | Verify pod scheduling network connectivity |\r \r

Security Actions\r

\r | Action | Required Parameters | Description |\r |--------|---------------------|-------------|\r | huawei_hss_change_vul_status | region, vul_id, status | Change HSS vulnerability handling status |\r \r

Verification (Read-Only) Actions\r

\r | Action | Required Parameters | Description |\r |--------|---------------------|-------------|\r | huawei_get_cce_pods | region, cluster_id | List pods in cluster |\r | huawei_get_kubernetes_nodes | region, cluster_id | List Kubernetes nodes in cluster |\r | huawei_get_cce_events | region, cluster_id | List Kubernetes Events in cluster |\r | huawei_workload_rollout_diagnose | region, cluster_id, namespace, kind, name | Diagnose workload rollout status |\r | huawei_root_cause_analyze | region, cluster_id | Comprehensive root cause analysis (cross-skill: huawei-cloud-cce-root-cause-analyzer) |\r | huawei_dependency_impact_analyze | region, cluster_id | Dependency impact analysis (cross-skill: huawei-cloud-cce-root-cause-analyzer) |\r | huawei_node_diagnose | region, cluster_id | Node-level diagnosis |\r | huawei_workload_diagnose | region, cluster_id | Workload status diagnosis |\r \r ---\r \r

Parameter Reference\r

\r Common Parameters:\r \r | Parameter | Required | Description |\r |-----------|----------|-------------|\r | region | Yes | Huawei Cloud region, e.g., cn-north-4 |\r | cluster_id | Yes* | CCE cluster ID |\r | namespace | Yes* | Kubernetes namespace (required for workload actions) |\r | kind | Yes* | Workload type: Deployment, StatefulSet, or DaemonSet |\r | name | Yes* | Workload name or node name |\r | node_name | Yes* | Node name (required for node actions) |\r | nodepool_id | Yes* | Node pool ID (required for node pool resize) |\r | ecs_id | Yes* | ECS instance ID (required for ECS actions) |\r | replicas | Yes* | Target replica count (required for scale) |\r | target_count | Yes* | Target node count (required for node pool resize) |\r | strategy | Yes* | Remediation strategy (required for auto-remediation) |\r | confirm | No | Set to true ONLY after explicit user confirmation |\r \r *Required for specific actions as noted.\r \r Optional Parameters (passed via --params JSON):\r \r | Parameter | Description |\r |-----------|-------------|\r | ak | Override AK (uses HW_ACCESS_KEY by default) |\r | sk | Override SK (uses HW_SECRET_KEY by default) |\r | project_id | Override project ID (auto-obtained via IAM when not set) |\r | min_replicas | HPA minimum replicas |\r | max_replicas | HPA maximum replicas |\r | vul_id | HSS vulnerability ID |\r | status | HSS vulnerability handling status |\r | eip_id | EIP ID for bind action |\r \r ---\r \r

Output Format\r

\r

Remediation Preview (confirm=false)\r

\r

{\r
  "success": false,\r
  "requires_confirmation": true,\r
  "remediation_trace_id": "ARR-...",\r
  "strategy": "rollback_previous_revision",\r
  "diagnosis": {},\r
  "action_result": {},\r
  "preview": {\r
    "action": "huawei_rollback_cce_workload",\r
    "target": {\r
      "region": "cn-north-4",\r
      "cluster_id": "cluster-id",\r
      "namespace": "default",\r
      "kind": "Deployment",\r
      "name": "app-server"\r
    },\r
    "current_state": {},\r
    "expected_state": {},\r
    "impact_scope": {},\r
    "rollback_method": "Re-apply current revision"\r
  },\r
  "risk_level": "R2",\r
  "rollback_notes": [],\r
  "summary": "Remediation plan preview — requires user confirmation before execution"\r
}\r
```\r
\r
### Remediation Execution (confirm=true)\r
\r
```json\r
{\r
  "success": true,\r
  "requires_confirmation": false,\r
  "confirmation_received": true,\r
  "remediation_trace_id": "ARR-...",\r
  "strategy": "rollback_previous_revision",\r
  "action_result": {},\r
  "execution": {\r
    "action": "huawei_rollback_cce_workload",\r
    "timestamp": "...",\r
    "result": {}\r
  },\r
  "verification": [\r
    {\r
      "method": "huawei_get_cce_pods",\r
      "status": "healthy",\r
      "details": {}\r
    }\r
  ],\r
  "report_markdown": "# CCE Auto Remediation Execution Report...",\r
  "report_file": "optional"\r
}\r
```\r
\r
### Full Auto-Remediation Orchestration Output\r
\r
```json\r
{\r
  "success": false,\r
  "requires_confirmation": true,\r
  "remediation_trace_id": "ARR-...",\r
  "strategy": "rollback_previous_revision",\r
  "diagnosis": {},\r
  "action_result": {},\r
  "verification": {},\r
  "summary": "remediation plan or execution result",\r
  "action": "huawei_auto_remediation_run",\r
  "risk_level": "R2",\r
  "target": {\r
    "region": "cn-north-4",\r
    "cluster_id": "optional",\r
    "resource": "optional"\r
  },\r
  "preview": {},\r
  "requires_confirmation": true,\r
  "confirmation_received": false,\r
  "execution": {},\r
  "verification": [],\r
  "rollback_notes": [],\r
  "report_markdown": "# CCE Auto Remediation Execution Report...",\r
  "report_file": "optional"\r
}\r
```\r
\r
---\r
\r
## Verification\r
\r
1. Run the environment check script to confirm dependencies and credentials are available\r
2. Use `huawei_cce_node_cordon` (without `confirm=true`) on a test node to verify preview mode returns `requires_confirmation: true`\r
3. After user confirmation, execute with `confirm=true` and verify node status with `huawei_get_kubernetes_nodes`\r
4. Use `huawei_rollback_cce_workload` preview mode to verify it shows current vs expected state\r
5. After rollback execution, verify workload health with `huawei_workload_rollout_diagnose`\r
6. Use `huawei_auto_remediation_run` preview mode to verify multi-step orchestration plan is shown before execution\r
7. Confirm that all R3 actions (drain, reboot, delete, hibernate) require explicit user confirmation\r
8. Verify that post-execution verification actions return healthy/expected status\r
\r
---\r
\r
## Best Practices\r
\r
1. **Always preview first**: Never call any mutation action with `confirm=true` on the first invocation. Always preview without `confirm=true` first\r
2. **State the four essentials**: Before confirmation, restate the action, object, parameters, impact scope, and rollback plan to the user\r
3. **Prefer rollback for deployment failures**: If root cause is from `huawei-cloud-cce-root-cause-analyzer` and involves startup command, CrashLoop, probe, or image causing new version unavailability, prefer `huawei_auto_remediation_run` with `rollback_previous_revision` strategy\r
4. **Verify after execution**: Every execution must be followed by read-only verification (Pod status, Node status, Events, workload rollout diagnosis)\r
5. **Classify risk correctly**: Refer to `references/risk-rules.md` for R1/R2/R3 classification; apply appropriate confirmation requirements\r
6. **Never auto-add confirm**: Deployment rollback, scale, resize, resource modification, delete cluster/node/workload, drain, reboot, and HSS vulnerability status change must all be preview → user confirm → execute → verify\r
7. **Use auto-remediation orchestration for multi-step plans**: When remediation involves multiple actions, use `huawei_auto_remediation_run` to produce a complete execution report with diagnosis basis, action results, and verification results\r
8. **Cross-skill handoff for diagnosis**: When root cause analysis is needed before remediation, hand off to `huawei-cloud-cce-root-cause-analyzer`; this skill only executes confirmed remediation actions\r
9. **Document rollback notes**: Every execution plan must include rollback method — how to revert if the remediation causes unintended effects\r
\r
---\r
\r
## Reference Documents\r
\r
- Workflow and action orchestration steps: `references/workflow.md`\r
- Risk classification and confirm=true rules: `references/risk-rules.md`\r
- Output execution record schema: `references/output-schema.md`\r
- [Huawei Cloud CCE Documentation](https://support.huaweicloud.com/cce/index.html)\r
- [Huawei Cloud Python SDK Documentation](https://support.huaweicloud.com/api-cce/cce_02_0113.html)\r
\r
---\r
\r
## Notes\r
\r
1. This skill is a **MUTATION skill** — it performs write actions (drain, cordon, scale, restart, delete, reboot, hibernate, vulnerability status change). Preview+confirm workflow is mandatory\r
2. Do not output the values of HW_ACCESS_KEY, HW_SECRET_KEY, HW_SECURITY_TOKEN, or other environment variables\r
3. All scripts must be executed via `skill action=exec`; do not run them directly in a shell\r
4. NEVER auto-add `confirm=true`. User must explicitly confirm the specific action, object, and risks\r
5. The environment check script must be run before any remediation action\r
6. When using temporary AK/SK, HW_SECURITY_TOKEN must be set\r
7. After execution, must call read-only verification actions to confirm status\r
8. Cross-skill references: diagnosis → `huawei-cloud-cce-root-cause-analyzer`; domain-specific diagnosis → `huawei-cloud-cce-pod-failure-diagnoser`, `huawei-cloud-cce-node-failure-diagnoser`, `huawei-cloud-cce-network-failure-diagnoser`\r
\r
---\r
\r
## Common Pitfalls\r
\r
1. **Auto-adding confirm=true** — The most critical pitfall. NEVER assume user intent implies confirmation. Always preview first, show results, and wait for explicit user confirmation\r
2. **Skipping preview for R2 actions** — Even medium-risk actions (scale, resize, cordon, rollback) require preview. No mutation action may skip the preview step\r
3. **Not verifying after execution** — Every R2/R3 execution must be followed by read-only verification (Pod/Node/Workload/Events status). Skipping verification leaves remediation unconfirmed\r
4. **Batch or fuzzy-target remediation** — R3 actions (drain, reboot, delete, hibernate) must have explicit, specific target objects. Never execute with vague or batch targets without per-object confirmation\r
5. **Not documenting rollback method** — Every remediation plan must state how to revert if the action causes unintended effects. Omitting rollback notes is a safety hazard\r
6. **Executing remediation without diagnosis** — Always confirm root cause via `huawei-cloud-cce-root-cause-analyzer` or domain diagnoser before remediation. Blind remediation without evidence is prohibited\r
7. **Confusing R2 and R3 risk levels** — R2 (runtime impact) requires preview+confirm; R3 (destructive) requires explicit per-object confirmation with additional verification. See `references/risk-rules.md`\r
8. **Not restating the plan to the user** — Before requesting confirmation, restate the action, target object, region, cluster_id, expected impact, and rollback plan. The user must confirm all four essentials
Usage Guidance
Review this skill carefully before installing. Use only isolated test environments or least-privileged, disposable Huawei Cloud credentials until the credential exposure, TLS verification, secret/kubeconfig export, and missing confirmation gates are fixed. Do not point it at production clusters unless you are prepared for it to read secrets, write local reports/credentials, and make live infrastructure changes.
Capability Tags
requires-walletrequires-sensitive-credentials
Capability Assessment
Purpose & Capability
The stated purpose is preview-first CCE remediation, which explains cloud credentials and mutation authority, but the dispatcher exposes broader capabilities including kubeconfig export, Kubernetes Secret data access, addon changes, and node creation that are not clearly disclosed in the advertised tool profile.
Instruction Scope
SKILL.md repeatedly promises preview-first, confirm-required mutation handling, but verified registered actions can directly call live create/update APIs without an internal confirmation gate, and sensitive read actions lack warning or confirmation.
Install Mechanism
No hidden installer or startup persistence was found, but the documented check_env.sh/check_env.ps1 environment validation scripts are not present in the artifact, which weakens the install/runtime story.
Credentials
The skill requires Huawei Cloud AK/SK and cluster access, but code paths embed AK/SK in generated subagent commands/prompts, return kubeconfig/Secret material, and disable TLS verification in multiple Kubernetes client paths.
Persistence & Privilege
Some local reporting is expected for diagnostics, but the artifact writes caller-controlled reports/raw inventories and writes kubeconfig/certificate material to temporary paths, including predictable /tmp kubeconfig paths, despite documentation saying credentials should not be persisted.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install huawei-cloud-cce-auto-remediation-runner
  3. After installation, invoke the skill by name or use /huawei-cloud-cce-auto-remediation-runner
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.1.2
Initial release
v0.1.1
Initial release
v0.1.0
Initial release
Metadata
Slug huawei-cloud-cce-auto-remediation-runner
Version 0.1.2
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 3
Frequently Asked Questions

What is Huawei Cloud Cce Auto Remediation Runner?

Huawei Cloud CCE auto-remediation runner skill that converts remediation intent into preview-first, confirm-required, post-verify execution plans. Use this s... It is an AI Agent Skill for Claude Code / OpenClaw, with 38 downloads so far.

How do I install Huawei Cloud Cce Auto Remediation Runner?

Run "/install huawei-cloud-cce-auto-remediation-runner" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Huawei Cloud Cce Auto Remediation Runner free?

Yes, Huawei Cloud Cce Auto Remediation Runner is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Huawei Cloud Cce Auto Remediation Runner support?

Huawei Cloud Cce Auto Remediation Runner is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Huawei Cloud Cce Auto Remediation Runner?

It is built and maintained by shijingcheng (@pintudeyudi); the current version is v0.1.2.

💬 Comments