Huawei Cloud Cce Alarm Correlation Engine
/install huawei-cloud-cce-alarm-correlation-engine
Huawei Cloud CCE Alarm Correlation Engine
⚠️ Execution Method (Must Read): This skill executes queries via the local Python dispatcher script. Using hcloud, openstack, or other CLI tools or direct API calls is prohibited.
- The dispatcher script is located at
scripts/huawei-cloud.pywithin the skill directory- All scripts and environment check scripts are inside the skill package. You must use
skill action=execto execute them. Do not run them directly in a shell.- Do not attempt hcloud, openstack, curl IAM, or any other CLI/API methods. This skill does not depend on those tools.
- All paths are relative to the skill directory, which is the directory where this SKILL.md is located.
Overview
This skill correlates Huawei Cloud AOM active and historical alarms for CCE, transforming raw alarm event streams into actionable alarm leads. The core principle is to consider both active and history alarms, avoiding omission of resource-type alarms that have already recovered but still impact diagnosis.
This skill has both read-only tools (alarm query, analysis, inspection, rule query) and mutation tools (alarm rule create/update/delete, action rule delete). All mutation operations require a two-step confirmation workflow with confirm=true.
Related Skills
| Skill | Purpose |
|---|---|
huawei-cloud-cce-pod-failure-diagnoser |
Pod-level failure diagnosis (CrashLoopBackOff, ImagePullBackOff, etc.) |
huawei-cloud-cce-node-failure-diagnoser |
Node failure diagnosis (NotReady, resource pressure, NPD events) |
huawei-cloud-cce-network-failure-diagnoser |
Network failure diagnosis (Ingress 502/504, ELB anomalies) |
huawei-cloud-cce-auto-remediation-runner |
Execute remediation actions (scale, reboot, drain) |
huawei-cloud-cce-root-cause-analyzer |
Multi-category alarm root cause analysis |
huawei-cloud-cce-observability-context-builder |
Observability context enrichment |
Capabilities
- Query active + history alarms merged and deduplicated (
huawei_list_aom_alarms) - Query current active alarms only (
huawei_list_aom_current_alarms) - Analyze alarms: deduplication, severity grouping, burst/steady identification (
huawei_analyze_aom_alarms) - Query, create, update, delete, enable, disable AOM alarm rules (mutation requires
confirm=true) - Create AOM event alarm rules referencing CCE event list (
huawei_create_aom_event_alarm_rule) - Query and delete AOM action/notification rules (delete requires
confirm=true) - Query AOM mute rules (
huawei_list_aom_mute_rules) - CCE cluster alarm inspection with risk summary (
huawei_aom_alarm_inspection)
Typical Use Cases
- Query all alarms (active + history) for a CCE cluster and group by severity
- Reduce alarm storm noise by deduplicating and classifying burst vs. steady alarms
- Inspect a CCE cluster for alarm health risks
- Create, update, or delete AOM alarm rules with preview + confirmation workflow
- Check whether alarms are suppressed by action rules or mute rules causing notification gaps
- Create event alarm rules for CCE workload, node, network, storage, or autoscaling events
Prerequisites
Python Dependencies
The dispatcher script requires Python >= 3.6 and the following packages:
huaweicloudsdkcorehuaweicloudsdkaomhuaweicloudsdkiam
Credential Configuration
| Variable | Required | Description |
|---|---|---|
| HUAWEI_AK | Yes | Huawei Cloud Access Key |
| HUAWEI_SK | Yes | Huawei Cloud Secret Key |
| HUAWEI_REGION | No | Default region (overrides region param if set) |
| HUAWEI_PROJECT_ID | No | Project ID (auto-obtained via IAM API when not set) |
| HUAWEI_SECURITY_TOKEN | No | Required when using temporary AK/SK |
🚫 Never expose or log AK/SK values. Credentials exist only in the current request call stack and are released after each invocation. Do not write credentials to files, logs, or responses.
✅ Use environment variables HUAWEI_AK / HUAWEI_SK for authentication. The dispatcher reads them automatically.
Security rules for credentials:
- No persistent storage — never write AK/SK, tokens, or certificates to disk files
- No long-term memory cache — AK/SK exists only during the current API call and is released afterward
- Project ID memory cache only — only non-sensitive project IDs may be cached in process memory (never written to disk)
- No log leakage — never include AK/SK in logs, response output, or error messages
- Output desensitization — output only alarm, resource, and rule information; never expose authentication credentials
AK/SK may be provided in two ways:
- Via environment variables
HUAWEI_AK/HUAWEI_SK(recommended) - Via per-call parameters
akandsk(not recommended for production)
IAM Permissions
| Permission | Description |
|---|---|
aom:event:list |
Query AOM alarm events |
aom:alarmRule:list |
Query AOM alarm rules |
aom:alarmRule:create |
Create AOM alarm rules |
aom:alarmRule:update |
Update AOM alarm rules |
aom:alarmRule:delete |
Delete AOM alarm rules |
aom:actionRule:list |
Query AOM action rules |
aom:muteRule:list |
Query AOM mute rules |
cce:cluster:list |
Get cluster name and auxiliary filter info via cluster ID |
Core Tools
All actions are invoked via the dispatcher script:
python3 scripts/huawei-cloud.py \x3Caction> region=\x3Cregion> [key=value ...]
Two-Step Confirmation Workflow for Mutation Operations
Mutation operations (create, update, delete, enable, disable alarm rules; delete action rules) require
confirm=trueto execute. Withoutconfirm, the tool returns a preview and confirmation prompt only.
Step 1: Preview — call without confirm:
python3 scripts/huawei-cloud.py huawei_create_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule metric_name=cpuUsage \
namespace=PAAS.NODE comparison_operator='>' threshold=80 \
period=60 evaluation_periods=3 statistic=average alarm_level=2
Returns: operation preview, target rule, rule fields, and confirmation example. No real creation is performed.
Step 2: Confirm execution — call again with confirm=true:
python3 scripts/huawei-cloud.py huawei_create_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule metric_name=cpuUsage \
namespace=PAAS.NODE comparison_operator='>' threshold=80 \
period=60 evaluation_periods=3 statistic=average alarm_level=2 \
confirm=true
Operations Requiring Confirmation
| Tool | Operation | Risk Level | Description |
|---|---|---|---|
huawei_create_aom_alarm_rule |
Create | 🟡 Medium | Create new AOM alarm rule, may introduce new alarm notifications |
huawei_create_aom_event_alarm_rule |
Create | 🟡 Medium | Create AOM event alarm rule, may introduce new event notifications |
huawei_update_aom_alarm_rule |
Update | 🟠 High | Update AOM alarm rule threshold, toggle, notification action, description, etc. |
huawei_delete_aom_alarm_rule |
Delete | 🔴 High | Delete AOM alarm rule, may prevent future alarms from triggering |
huawei_disable_aom_alarm_rule |
Disable | 🔴 High | Disable AOM alarm rule, may stop related alarms from triggering |
huawei_enable_aom_alarm_rule |
Enable | 🟠 High | Enable AOM alarm rule, may restore and trigger alarm notifications |
huawei_delete_aom_action_rule |
Delete | 🔴 High | Delete AOM notification action rule, may prevent alarm notifications |
Prohibited Actions
| Action | Description |
|---|---|
| Create/update action rules | Do not create or update notification action rules |
| Modify mute rules | Do not create, update, or delete mute rules |
| Execute remediation actions | Do not scale, reboot, drain, or delete workloads or nodes |
| Modify cluster resources | Do not change CCE, ECS, ELB, EIP, VPC, security groups, etc. |
If analysis results require scaling, rebooting, draining, vulnerability status changes, or other remediation actions, output recommendations only and hand off to huawei-cloud-cce-auto-remediation-runner for preview, confirmation, and post-verification.
Alarm Query and Correlation (Read-Only)
| Action | Description | Cluster Filter | Required Params |
|---|---|---|---|
huawei_list_aom_alarms |
Query active + history alarms, merged and deduplicated | Supports cluster_id |
region |
huawei_list_aom_current_alarms |
Query current active alarms only | Supports cluster_id |
region |
huawei_analyze_aom_alarms |
Analyze alarms: deduplication, severity grouping, burst/steady identification | Supports cluster_id |
region |
# Query active + history alarms in a region
python3 scripts/huawei-cloud.py huawei_list_aom_alarms region=cn-north-4
# Query alarms for a specific cluster
python3 scripts/huawei-cloud.py huawei_list_aom_alarms \
region=cn-north-4 cluster_id=xxx
# Query current active alarms for a cluster
python3 scripts/huawei-cloud.py huawei_list_aom_current_alarms \
region=cn-north-4 cluster_id=xxx
# Analyze alarms for a cluster (burst, attention, steady groups)
python3 scripts/huawei-cloud.py huawei_analyze_aom_alarms \
region=cn-north-4 cluster_id=xxx
Alarm Rule Management (Mutation Requires confirm=true)
Event alarm rule constraints:
- When creating event alarm rules via
huawei_create_aom_event_alarm_rule, theevent_namemust reference the event list and naming format inreferences/cce-event-list.md(recommended format:Chinese event description##Event name). - Event alarm rules are created with alarm noise reduction enabled by default (
route_group_enable=true).
Metric alarm rule constraints:
- When creating metric alarm rules via
huawei_create_aom_alarm_rule, PromQL/metric thresholds should referencereferences/cce-prometheus-metric-alarms.md.
| Action | Description | Risk Level | Requires confirm |
Required Params |
|---|---|---|---|---|
huawei_list_aom_alarm_rules |
Query AOM alarm rules | 🟢 Low | No | region |
huawei_create_aom_alarm_rule |
Create AOM metric alarm rule | 🟡 Medium | Yes | region, rule_name, metric_name, namespace, comparison_operator, threshold, period, evaluation_periods, statistic, alarm_level |
huawei_create_aom_event_alarm_rule |
Create AOM event alarm rule | 🟡 Medium | Yes | region, rule_name, event_name, namespace |
huawei_update_aom_alarm_rule |
Update AOM alarm rule | 🟠 High | Yes | region, rule_name |
huawei_delete_aom_alarm_rule |
Delete AOM alarm rule | 🔴 High | Yes | region, rule_name |
huawei_disable_aom_alarm_rule |
Disable AOM alarm rule | 🔴 High | Yes | region, rule_id |
huawei_enable_aom_alarm_rule |
Enable AOM alarm rule | 🟠 High | Yes | region, rule_id |
huawei_list_aom_action_rules |
Query AOM action/notification rules | 🟢 Low | No | region |
huawei_delete_aom_action_rule |
Delete AOM notification action rule | 🔴 High | Yes | region, rule_name |
huawei_list_aom_mute_rules |
Query AOM mute rules | 🟢 Low | No | region |
# Query alarm rules
python3 scripts/huawei-cloud.py huawei_list_aom_alarm_rules region=cn-north-4
# Preview create alarm rule (no execution)
python3 scripts/huawei-cloud.py huawei_create_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule metric_name=cpuUsage \
namespace=PAAS.NODE comparison_operator='>' threshold=80 \
period=60 evaluation_periods=3 statistic=average alarm_level=2
# Confirm create alarm rule
python3 scripts/huawei-cloud.py huawei_create_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule metric_name=cpuUsage \
namespace=PAAS.NODE comparison_operator='>' threshold=80 \
period=60 evaluation_periods=3 statistic=average alarm_level=2 \
confirm=true
# Preview update alarm rule
python3 scripts/huawei-cloud.py huawei_update_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule threshold=80
# Confirm update alarm rule
python3 scripts/huawei-cloud.py huawei_update_aom_alarm_rule \
region=cn-north-4 rule_name=my-rule threshold=80 confirm=true
# Preview delete alarm rule
python3 scripts/huawei-cloud.py huawei_delete_aom_alarm_rule \
region=cn-north-4 rule_name=xxx
# Confirm delete alarm rule
python3 scripts/huawei-cloud.py huawei_delete_aom_alarm_rule \
region=cn-north-4 rule_name=xxx confirm=true
# Preview disable alarm rule
python3 scripts/huawei-cloud.py huawei_disable_aom_alarm_rule \
region=cn-north-4 rule_id=xxx
# Confirm disable alarm rule
python3 scripts/huawei-cloud.py huawei_disable_aom_alarm_rule \
region=cn-north-4 rule_id=xxx confirm=true
# Preview enable alarm rule
python3 scripts/huawei-cloud.py huawei_enable_aom_alarm_rule \
region=cn-north-4 rule_id=xxx
# Confirm enable alarm rule
python3 scripts/huawei-cloud.py huawei_enable_aom_alarm_rule \
region=cn-north-4 rule_id=xxx confirm=true
# Query action rules
python3 scripts/huawei-cloud.py huawei_list_aom_action_rules region=cn-north-4
# Preview delete action rule
python3 scripts/huawei-cloud.py huawei_delete_aom_action_rule \
region=cn-north-4 rule_name=xxx
# Confirm delete action rule
python3 scripts/huawei-cloud.py huawei_delete_aom_action_rule \
region=cn-north-4 rule_name=xxx confirm=true
# Query mute rules
python3 scripts/huawei-cloud.py huawei_list_aom_mute_rules region=cn-north-4
Cluster Alarm Inspection (Read-Only)
| Action | Description | Required Params |
|---|---|---|
huawei_aom_alarm_inspection |
Inspect AOM alarms for a CCE cluster and output risk items | region, cluster_id |
# Inspect alarms for a specific cluster
python3 scripts/huawei-cloud.py huawei_aom_alarm_inspection \
region=cn-north-4 cluster_id=xxx
Parameter Reference
Alarm Query Parameters
| Parameter | Required | Description |
|---|---|---|
region |
Yes | Huawei Cloud region (e.g., cn-north-4) |
cluster_id |
No | CCE cluster ID; when provided, only alarms related to this cluster are returned |
ak |
No | Access Key ID; HUAWEI_AK environment variable preferred |
sk |
No | Secret Access Key; HUAWEI_SK environment variable preferred |
project_id |
No | Huawei Cloud project ID; auto-obtained via IAM API when not provided |
Alarm Rule Mutation Parameters
| Parameter | Required | Description |
|---|---|---|
region |
Yes | Huawei Cloud region |
rule_name |
Yes (create, update, delete) | Alarm rule name |
rule_id |
Yes (enable, disable) | Alarm rule ID |
metric_name |
Yes (create metric rule) | Metric name (e.g., cpuUsage) |
namespace |
Yes (create) | Metric namespace (e.g., PAAS.NODE) |
event_name |
Yes (create event rule) | Event name; reference references/cce-event-list.md for naming format |
comparison_operator |
Yes (create metric rule) | Threshold comparison operator (e.g., >, \x3C, >=, \x3C=) |
threshold |
Yes (create metric rule) | Alarm threshold value |
period |
Yes (create metric rule) | Statistics period in seconds (recommended: 60) |
evaluation_periods |
Yes (create metric rule) | Consecutive trigger period count (recommended: 3) |
statistic |
Yes (create metric rule) | Statistics method (e.g., average, max, min) |
alarm_level |
Yes (create) | Alarm severity level (1=Critical, 2=Major, 3=Minor, 4=Info) |
fields |
No (create) | Additional JSON fields for rule creation, e.g., {"unit":"%","is_turn_on":true} |
updates |
No (update) | JSON batch update fields, e.g., {"threshold":"80","is_turn_on":true} |
enterprise_project_id |
No (list action rules) | Enterprise project scope; default all_granted_eps |
confirm |
No | Must be explicitly set to true for mutation operations to execute |
ak |
No | Access Key ID |
sk |
No | Secret Access Key |
project_id |
No | Huawei Cloud project ID |
Output Format
Alarm Summary
Output must include:
| Field | Description |
|---|---|
region |
Queried region |
cluster_id |
Cluster ID (if specified by user) |
total_count |
Total alarm count |
firing_count |
Currently firing alarm count |
resolved_count |
Resolved alarm count |
severity_stats |
Severity level distribution |
type_stats |
Alarm type grouping statistics |
Resource Clues
For CCE alarms, prioritize the following resource dimensions:
| Field | Description |
|---|---|
cluster_name |
Cluster name |
namespace |
Kubernetes namespace |
pod_name |
Pod name |
resource_kind |
Resource type |
event_name |
Alarm name |
message |
Alarm message |
Recommended Diagnosis Skills
| Alarm Characteristics | Recommended Skill |
|---|---|
CrashLoopBackOff, BackOffStart, FailedStart, ImagePullBackOff |
huawei-cloud-cce-pod-failure-diagnoser |
FailedScheduling, Insufficient cpu, Insufficient memory |
huawei-cloud-cce-pod-failure-diagnoser or huawei-cloud-cce-node-failure-diagnoser |
NodeNotReady, node resource pressure, NPD events |
huawei-cloud-cce-node-failure-diagnoser |
| Ingress 502/504, Service unreachable, ELB anomalies | huawei-cloud-cce-network-failure-diagnoser |
| Multiple alarm categories impacting business simultaneously | huawei-cloud-cce-root-cause-analyzer |
| Scaling, reboot, drain, or other remediation actions needed | huawei-cloud-cce-auto-remediation-runner |
See references/output-schema.md for the full JSON response schema.
Verification
- Run the dispatcher with a known region and cluster to confirm connectivity:
python3 scripts/huawei-cloud.py huawei_list_aom_alarms region=cn-north-4 cluster_id=\x3Ccluster_id> - Execute
huawei_analyze_aom_alarmsand verify that burst, attention, and steady groupings are returned correctly - Verify
huawei_aom_alarm_inspectionreturns cluster alarm summary with risk items - Test mutation preview workflow: call
huawei_create_aom_alarm_rulewithoutconfirmand verify it returns a preview only (no actual creation) - After a mutation operation with
confirm=true, callhuawei_list_aom_alarm_rulesto verify the rule state change
Best Practices
- Always query both active and history alarms via
huawei_list_aom_alarms; never assume absence of active alarms means no problems — check history alarms - Use
huawei_analyze_aom_alarmsfor alarm storm scenarios to deduplicate and identify burst vs. steady alarms - For mutation operations, always follow the two-step workflow: preview first (without
confirm), then confirm (withconfirm=true) only after explicit user approval - After creating event alarm rules, verify that
event_namefollows the format inreferences/cce-event-list.md - When checking notification gaps, query both action rules (
huawei_list_aom_action_rules) and mute rules (huawei_list_aom_mute_rules) - Do not interpret absence of active alarms as "no problem"; always cross-reference history alarms
- When creating metric alarm rules, reference
references/cce-prometheus-metric-alarms.mdfor recommended PromQL expressions and thresholds
Reference Documents
| Document | Description |
|---|---|
references/workflow.md |
Alarm correlation workflow: collection, deduplication, grouping, and diagnosis handoff |
references/output-schema.md |
Output JSON schema for alarm correlation and inspection results |
references/risk-rules.md |
Risk boundary rules: read-only vs. mutation actions, prohibited operations |
references/cce-event-list.md |
CCE event list with naming format for creating event alarm rules |
references/cce-prometheus-metric-alarms.md |
Prometheus metric alarm reference for creating metric alarm rules |
| Huawei Cloud Python SDK Documentation | SDK reference |
| Huawei Cloud API Explorer | API interactive explorer |
Notes
- This skill has both read-only and mutation tools — mutation operations (create, update, delete, enable, disable alarm rules; delete action rules) require
confirm=truetwo-step confirmation - Never create, update, or delete action rules or mute rules — only query and delete action rules with confirmation
- If remediation actions are needed (scale, reboot, drain), output recommendations only and hand off to
huawei-cloud-cce-auto-remediation-runner - Never expose or log AK/SK or environment variable values
- All actions are executed via
python3 scripts/huawei-cloud.py \x3Caction>; do not use hcloud CLI or direct API calls - Do not interpret absence of active alarms as "no problem" — always verify with history alarms
Common Pitfalls
| Pitfall | Correct Approach |
|---|---|
| Only querying active alarms and ignoring history | Always use huawei_list_aom_alarms which merges active + history; history alarms may indicate recurring resource issues |
| Calling mutation tools without preview step | Always call without confirm first to preview; only add confirm=true after explicit user approval |
Creating event alarm rules with incorrect event_name format |
Reference references/cce-event-list.md and use Chinese description##EventName format |
| Creating metric alarm rules with arbitrary thresholds | Reference references/cce-prometheus-metric-alarms.md for recommended PromQL and threshold values |
| Deleting action rules without understanding notification impact | Preview first; verify which alarms depend on the action rule before confirming deletion |
| Executing remediation actions directly from this skill | This skill does not perform remediation; hand off to huawei-cloud-cce-auto-remediation-runner |
| Assuming "no active alarms" means "no problems" | Check history alarms — resolved alarms may indicate ongoing resource issues that flare periodically |
| Not checking mute rules when alarms are missing from notifications | Always query mute rules (huawei_list_aom_mute_rules) alongside action rules to identify suppression |
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install huawei-cloud-cce-alarm-correlation-engine - 安装完成后,直接呼叫该 Skill 的名称或使用
/huawei-cloud-cce-alarm-correlation-engine触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Huawei Cloud Cce Alarm Correlation Engine 是什么?
Huawei Cloud AOM alarm correlation analysis skill for CCE operations. Use this skill when the user wants to: (1) query AOM active and historical alarms, (2)... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 38 次。
如何安装 Huawei Cloud Cce Alarm Correlation Engine?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install huawei-cloud-cce-alarm-correlation-engine」即可一键安装,无需额外配置。
Huawei Cloud Cce Alarm Correlation Engine 是免费的吗?
是的,Huawei Cloud Cce Alarm Correlation Engine 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Huawei Cloud Cce Alarm Correlation Engine 支持哪些平台?
Huawei Cloud Cce Alarm Correlation Engine 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Huawei Cloud Cce Alarm Correlation Engine?
由 shijingcheng(@pintudeyudi)开发并维护,当前版本 v0.1.2。