← 返回 Skills 市场
1w2w3y

AMG Key Vault Health Check

作者 1w2w3y · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
58
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install amg-check-key-vault
功能描述
Run only when the user explicitly asks for a fleet-wide Azure Key Vault health check — pulse check for availability, API latency, throttling (429s), auth fai...
使用说明 (SKILL.md)

\x3C!-- Auto-generated for OpenClaw by pack-openclaw. Notes for OpenClaw users: - Claude Code dynamic expressions (!...) in this file are NOT evaluated by OpenClaw and appear as literal text. Run them manually at the start of the workflow. - Invoke this skill only via slash command (e.g. /amg-check-key-vault). Auto-invocation is disabled on Claude Code but not on OpenClaw. -->

OpenClaw Setup (one-time)

This skill calls MCP tools prefixed with mcp__amg__*, so OpenClaw must have an MCP server registered under the exact name amg. Run this once per workspace before invoking the skill:

openclaw mcp set amg '{"url":"https://\x3Cyour-grafana-instance>/api/azure-mcp","transport":"streamable-http","headers":{"Authorization":"Bearer \x3Cyour-token>"}}'

Replace \x3Cyour-grafana-instance> with your Azure Managed Grafana endpoint and \x3Cyour-token> with a valid Grafana service-account token (starts with glsa_). The server name must be amg — the skill's allowed-tools reference mcp__amg__* and will not find tools under any other name.

Verify the server is registered:

openclaw mcp list

Official skill source: https://github.com/Azure/amg-skills

Runtime Context

  • Current UTC time: !date -u +%Y-%m-%dT%H:%M:%SZ
  • Config: !cat memory/amg-check-key-vault/config.md 2>/dev/null || echo "NOT_CONFIGURED"
  • Prior report: ![ -f memory/amg-check-key-vault/report.md ] && echo "exists ($(grep -c '^### KV-' memory/amg-check-key-vault/report.md) bugs documented)" || echo "not found"
  • Arguments: time-range=$0, subscription-override=$1

Known Issues: Before presenting findings, cross-reference results against memory/amg-check-key-vault/report.md.

Azure Key Vault Health Check

Analyze Azure Key Vault health using a two-phase approach: a single amgmcp_pulse_check call for fleet-wide summary, followed by targeted deep dives into the top 7 most interesting vaults only.

Critical Constraints

  • Do NOT use subagents (Agent tool) for MCP queries. Subagents do not have access to MCP tools. All MCP tool calls must be made directly from the main conversation context.
  • Deep dive limit: at most 7 vaults. Select the most interesting vaults from pulse check results. Do not deep-dive the entire fleet.
  • Time format: ISO 8601 UTC with explicit from/to — NEVER use timespan (it causes errors).
  • Parallelism cap: 30 concurrent MCP calls per batch. Reduce to 4-5 if rate-limited.

Prerequisites

  • An AMG-MCP server must be connected (the allowed-tools frontmatter references the MCP server name — update it if your server has a different name)
  • The MCP server's Grafana service account token environment variable must be set

Configuration

If Config shows NOT_CONFIGURED: Run First-Run Setup at the bottom of this file, then return here.

If Config is populated: Extract the datasource UID and subscription ID(s) from the pre-loaded Runtime Context above and use them for all queries. Use $1 as the subscription override if provided.

  • Datasource UID: from ## Azure Monitor Datasource > UID
  • Subscription ID(s): from ## Subscriptions (or $1 if provided)
  • Resource Type: microsoft.keyvault/vaults (lowercase)
  • ARM ID template: /subscriptions/{SUB}/resourceGroups/{RG}/providers/Microsoft.KeyVault/vaults/{name}

Time Range

Default is 7 days (pastDays: 7) for pulse check and deep-dive metrics, 24 hours for resource logs. If the user specifies a different range via $ARGUMENTS[0] (e.g., /amg-check-key-vault 3d), adjust accordingly. For resource log queries, keep the range narrow (1-2 days) to avoid timeouts.


Workflow

Phase 1: Validate Datasource & Discover Vaults

Step 1a: Validate Datasource

Call amgmcp_datasource_list with no parameters.

Search the results for a datasource with type equal to grafana-azure-monitor-datasource. Extract its uid.

  • If it matches the configured UID, proceed.
  • If it differs, update memory/amg-check-key-vault/config.md, warn the user, and use the new UID.
  • If no Azure Monitor datasource is found, abort with a clear error.

Step 1b: Discover All Key Vaults

Call amgmcp_query_resource_graph once using the configured datasource UID and subscription ID(s):

azureMonitorDatasourceUid: {DATASOURCE_UID}
query: |
  resources
  | where type == 'microsoft.keyvault/vaults'
  | where subscriptionId in ({SUBSCRIPTION_IDS})
  | project name, resourceGroup, location, subscriptionId, sku=properties.sku.name, enableSoftDelete=properties.enableSoftDelete, enablePurgeProtection=properties.enablePurgeProtection, provisioningState=properties.provisioningState
  | order by location asc, name asc

Replace {SUBSCRIPTION_IDS} with the configured subscription IDs formatted as comma-separated quoted strings (e.g., 'sub-id-1', 'sub-id-2').

Constructing the ARM resource ID: Use subscriptionId from each row:

/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.KeyVault/vaults/{name}

Region summary: Derive from the vault list by counting vaults per unique location value.

SKU summary: Count vaults by sku (standard vs premium).

Note any vaults not in "Succeeded" provisioning state — flag them immediately.

If zero vaults are found, report "No Key Vaults found" and stop.

Phase 2: Fleet-Wide Pulse Check

Call amgmcp_pulse_check once to get a summary across all key vaults:

azureMonitorDatasourceUid: {DATASOURCE_UID}
pastDays: 7
scenarios: keyvault_summary

If $1 provides a subscription ID, add subscriptionId to scope the scan. Otherwise, if the config has a single subscription, pass it.

After the pulse check, verify:

  1. The number of scanned resources is close to the Phase 1 vault count.
  2. The scenario shows status: "completed".
  3. If errors occurred, retry once. If still failing, note the failure in the report.

Cross-reference pulse check results with Phase 1 inventory to enrich each vault with its resource group, region, and SKU from the Resource Graph data.

Phase 3: Top 7 Deep Dive

From the pulse check results, select at most 7 vaults for detailed investigation. Prioritize vaults with the most interesting signals:

  1. Availability drops — any vault below 99.9% availability
  2. Highest error counts — vaults with the most non-Success responses (especially 401, 403, 429)
  3. Highest latency — vaults with the highest average API latency
  4. Throttling (429) — vaults showing rate-limiting responses
  5. Diversity — prefer selecting vaults from different regions to maximize coverage

If the pulse check shows fewer than 7 vaults with notable signals, only deep-dive those that have something worth investigating. Do not pad to 7.

If the pulse check shows the entire fleet is healthy with no notable signals, skip Phase 3 entirely and report the fleet as healthy.

Step 3a: Deep Metrics

For each selected vault, query these metrics in parallel using amgmcp_query_resource_metric. Compute from (matching pastDays from Phase 2) and to (now) in ISO 8601 UTC.

Metric Name Aggregation Interval Purpose
Availability Average PT6H Availability trend
ServiceApiLatency Average PT6H Average API latency
ServiceApiLatency Maximum PT6H Tail latency spikes
ServiceApiHit Total PT6H Total API call volume
ServiceApiResult (errors) Total PT6H Error count (filter: StatusCode ne '200')
ServiceApiResult (throttled) Total PT6H Throttling count (filter: StatusCode eq '429')
ServiceApiResult (auth failures) Total PT6H Auth errors (filter: StatusCode eq '401' or StatusCode eq '403')
SaturationShoebox Average PT6H Vault capacity saturation

All 7 vaults can be queried in parallel (7 vaults x 8 metrics = 56 calls, within the 30-call batch cap when split into 2 batches).

Correlation analysis — when analyzing metrics together:

  • High latency + high API hits = vault under load, possibly approaching throttle limits
  • High latency + throttling (429) = vault is being rate-limited, clients retrying
  • High errors + normal latency = client-side issues (bad auth, missing secrets)
  • High 401/403 + specific time window = key rotation event or misconfigured client
  • Availability drop + high errors = sustained vault issue
  • High saturation + high API hits = vault approaching transaction limits
  • Dormant (all empty) = no traffic, investigate if orphaned

Step 3b: Resource Logs

For each selected vault, query Key Vault resource logs using amgmcp_query_resource_log. Keep time range to 1-2 days.

Log Query 1: Failed requests by status code

AzureDiagnostics
| where ResourceType == "VAULTS"
| where TimeGenerated between (datetime(\x3CSTART>) .. datetime(\x3CEND>))
| where httpStatusCode_d >= 400
| summarize count() by httpStatusCode_d, OperationName, bin(TimeGenerated, 1h)
| order by TimeGenerated asc

Log Query 2: Top error operations

AzureDiagnostics
| where ResourceType == "VAULTS"
| where TimeGenerated between (datetime(\x3CSTART>) .. datetime(\x3CEND>))
| where httpStatusCode_d >= 400
| summarize count() by OperationName, httpStatusCode_d, ResultSignature
| order by count_ desc
| take 30

Log Query 3: Throttled requests (429) over time

AzureDiagnostics
| where ResourceType == "VAULTS"
| where TimeGenerated between (datetime(\x3CSTART>) .. datetime(\x3CEND>))
| where httpStatusCode_d == 429
| summarize count() by OperationName, bin(TimeGenerated, 1h)
| order by TimeGenerated asc

Log Query 4: Request volume trend

AzureDiagnostics
| where ResourceType == "VAULTS"
| where TimeGenerated between (datetime(\x3CSTART>) .. datetime(\x3CEND>))
| summarize
    TotalRequests=count(),
    FailedRequests=countif(httpStatusCode_d >= 400),
    ThrottledRequests=countif(httpStatusCode_d == 429),
    AvgDurationMs=round(avg(DurationMs), 2)
    by bin(TimeGenerated, 1h)
| order by TimeGenerated asc

Log Query 5: Authentication failures

AzureDiagnostics
| where ResourceType == "VAULTS"
| where TimeGenerated between (datetime(\x3CSTART>) .. datetime(\x3CEND>))
| where httpStatusCode_d in (401, 403)
| summarize count() by OperationName, CallerIPAddress, identity_claim_appid_g
| order by count_ desc
| take 20

Note: If AzureDiagnostics with ResourceType == "VAULTS" returns no data, diagnostic settings may not be configured. Note it and skip logs for that vault.


Classification

Severity Criteria
CRITICAL Availability avg \x3C 99.0%, OR SaturationShoebox avg > 75%
WARNING Availability avg \x3C 99.9%, OR ServiceApiLatency avg > 200ms, OR sustained 429 throttling across multiple time windows, OR sustained 401/403 errors
DORMANT All metrics return empty timeSeries (no traffic in scan period)
HEALTHY All metrics within normal ranges

Analysis Guidance

For known patterns, deep-dive queries, and correlation techniques, see reference/analysis-patterns.md.

For optional deep-dive queries, see reference/deep-dive-queries.md.


Output Format

Present a summary report with these sections:

1. Fleet Inventory

Vault count by region, subscription, and SKU. Flag any vaults not in "Succeeded" provisioning state. Note soft-delete and purge-protection status.

2. Pulse Check Summary

Fleet-wide summary from the keyvault_summary pulse check:

  • Total vaults scanned and scan status
  • Key fleet-wide metrics (availability, latency, API hits, error counts)
  • Breakdown by health status (healthy / warning / critical / dormant)
  • Which vaults were selected for deep dive and why each was chosen

3. Deep Dive Findings (Top 7)

For each selected vault:

  • Metric timeline: Availability, API latency, API hits, saturation over the scan window
  • Anomaly characterization: spike vs sustained, onset time, duration, recovery
  • Correlation analysis: which metrics moved together
  • Error rate: error results / total API hits percentage

4. Resource Log Findings

For each deep-dived vault:

  • Error summary: top error status codes, affected operations, throttling timeline
  • Latency analysis: slow operations by type
  • Authentication failures: if any, source IPs and application IDs

5. Known Issue Cross-Reference

Compare findings against memory/amg-check-key-vault/report.md. For each known bug, state: still active / improving / worsening / resolved.

6. Action Items

Prioritized list:

  • Critical: vaults with availability \x3C 99.0% or saturation > 75%
  • High: vaults with sustained throttling (429), high latency, or auth failures
  • Medium: vaults with elevated latency, intermittent errors, or approaching saturation
  • Low: dormant vaults (investigate if orphaned), informational items

Update Known Issues

After presenting findings, update memory/amg-check-key-vault/report.md:

  1. Read the current file (create if it doesn't exist).
  2. Update status of existing bugs based on today's telemetry.
  3. Add new bugs with: severity, vault name, region, metric evidence, log evidence, root cause, recommended action.
  4. Update the "Updated" date in the header.

Only add genuine issues: sustained availability drops, persistent throttling, high error rates, or latency degradation.


Error Handling

See ${CLAUDE_SKILL_DIR}/reference/error-handling.md for the full recovery table.


Reference

  • Key Vault resource type: Microsoft.KeyVault/vaults
  • ARM ID template: /subscriptions/{SUB}/resourceGroups/{RG}/providers/Microsoft.KeyVault/vaults/{name}
  • Resource log table: AzureDiagnostics (with ResourceType == "VAULTS")
  • Key status codes: 429 (throttled), 401 (unauthorized), 403 (forbidden), 404 (not found)
  • Key metrics: Availability, ServiceApiHit, ServiceApiLatency, ServiceApiResult, SaturationShoebox
  • Known issues: memory/amg-check-key-vault/report.md
  • Configuration: memory/amg-check-key-vault/config.md

First-Run Setup

Run only when Config shows NOT_CONFIGURED. After completing, return to the Workflow above.

1. Discover Datasource UID: Call amgmcp_datasource_list. Filter type == "grafana-azure-monitor-datasource". Prefer uid == "azure-monitor-oob" if multiple match. Abort if zero match.

2. Discover Subscription ID(s): Run this Resource Graph query to list all subscriptions with key vaults, then present the results as a table and ask the user which subscription(s) to use:

resources
| where type == 'microsoft.keyvault/vaults'
| join kind=inner (
    resourcecontainers
    | where type == 'microsoft.resources/subscriptions'
    | project subscriptionId, subscriptionName=name
) on subscriptionId
| summarize KeyVaults=count() by subscriptionId, subscriptionName
| order by KeyVaults desc

Present the results as a table with columns: Subscription Name, Subscription ID, Key Vaults. Then ask the user: "Which subscription ID(s) should I configure for this health check?"

3. Write config: Write memory/amg-check-key-vault/config.md:

# amg-check-key-vault Configuration

User-specific values for the Key Vault health check skill.
This file is auto-generated on first run and can be edited manually.

## Azure Monitor Datasource
- **UID**: {discovered_uid}
- **Name**: {discovered_name}

## Subscriptions
- {subscription_id_1}
- {subscription_id_2}

4. Confirm: Show the resolved config and ask for confirmation before proceeding.

安全使用建议
This skill appears to implement a legitimate AMG-driven Key Vault health check, but review these before installing: - Credential declaration mismatch: SKILL.md requires a Grafana service-account token (glsa_*) and instructs running openclaw mcp set to store it in workspace MCP config, but the skill metadata does not declare any required credentials. Confirm where and how the token will be stored and who/what has access to workspace MCP configs. - Persistence: the skill will create/update memory/amg-check-key-vault/config.md and report.md and may update MCP server entries. These stored artifacts contain configuration and findings — remove or audit them if they contain secrets. - Code execution fallback: error-handling instructs parsing large results with node -e, python -c, jq, or pwsh. Those commands can execute arbitrary code if not carefully constructed. Ask the skill author to replace exec-style fallbacks with pure parsing utilities (jq, safe Python scripts stored in the skill bundle, or explicit non-exec instructions) or require explicit user approval before running any interpreter commands on result files. - Source provenance: the SKILL.md claims an official source (GitHub) but registry source is unknown and homepage is none. Verify the repository and review its code (if any) before trusting the token you supply. - Least privilege: create a Grafana service-account token with the minimal scope required and rotate/ revoke it after use if possible. Limit the MCP server registration name to a workspace-local, auditable entry. If the author can (a) declare the required credential in the metadata, (b) remove or make safe the exec-style fallbacks, and (c) show the official source code/repo, the inconsistencies would be resolved and the skill would be reasonable to use. Until then, proceed cautiously and avoid supplying high-privilege tokens.
功能分析
Type: OpenClaw Skill Name: amg-check-key-vault Version: 1.0.0 The amg-check-key-vault skill is a legitimate tool designed for fleet-wide monitoring of Azure Key Vault health. It utilizes Azure Managed Grafana (AMG) via MCP tools to perform pulse checks and deep-dive analysis using KQL queries. The skill maintains configuration and persistent reports in a local memory directory and uses Bash for basic file operations. No indicators of malicious intent, data exfiltration, or unauthorized persistence were found; the logic is transparent and strictly aligned with its stated operational purpose.
能力评估
Purpose & Capability
Name and description match the SKILL.md workflow: it performs a fleet-wide AMG (Azure Managed Grafana) + Azure Monitor-based Key Vault health check using MCP tools. Requiring a Grafana MCP server named 'amg' and calling mcp__amg__* tools is coherent for this purpose. However, the skill does not declare any required credentials in its registry metadata even though the runtime setup explicitly requires a Grafana service-account token (glsa_). That missing declaration is an inconsistency.
Instruction Scope
The SKILL.md is detailed and scoped to Key Vault telemetry and metrics. Concerns: (1) it instructs registering an MCP server by running openclaw mcp set with a Grafana token (sensitive secret stored in workspace MCP config) but the registry metadata lists no required env/credential. (2) Error handling suggests saving large results to a temp file and parsing with interpreters using commands like node -e, python -c, jq, or pwsh; these fallbacks can execute arbitrary code or run user-supplied scripts if not tightly controlled. (3) The skill reads/writes agent memory files (memory/amg-check-key-vault/*) which will persist findings across sessions — expected for the
Install Mechanism
Instruction-only skill with no install spec and no code files. This minimizes disk-write/install risk. All execution is via allowed-tools and shell commands described in SKILL.md.
Credentials
The skill needs a Grafana service-account token to call AMG MCP endpoints (explicitly shown in the one-time setup command), yet the registry metadata lists no required env vars or primary credential. That mismatch is notable: a secret is required at runtime but not declared. The skill also expects access to agent memory and the ability to register an MCP server name 'amg' in workspace config, which is a workspace-level credential/config change and should be justified/declared.
Persistence & Privilege
always:false and disable-model-invocation:true (skill cannot be run autonomously by the model) both reduce risk. The skill writes/updates memory files under memory/amg-check-key-vault and may update the workspace MCP server configuration (openclaw mcp set), which persists credentials/config in the workspace. Those are reasonable for a monitoring skill but require the user to trust where credentials are stored and who can read workspace config.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install amg-check-key-vault
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /amg-check-key-vault 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of amg-check-key-vault skill. - Performs explicit, on-demand fleet-wide Azure Key Vault health checks: API availability, latency, throttling, auth failures, and saturation. - Auto-discovers all Key Vaults and summarizes by region and SKU. - Deep-dives into the top 7 most interesting vaults based on health signals for detailed metric and log analysis. - Tracks known issues across sessions with a persistent report. - Includes comprehensive setup guidance and automatic detection of required Azure Monitor datasource and subscription.
元数据
Slug amg-check-key-vault
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

AMG Key Vault Health Check 是什么?

Run only when the user explicitly asks for a fleet-wide Azure Key Vault health check — pulse check for availability, API latency, throttling (429s), auth fai... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 58 次。

如何安装 AMG Key Vault Health Check?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install amg-check-key-vault」即可一键安装,无需额外配置。

AMG Key Vault Health Check 是免费的吗?

是的,AMG Key Vault Health Check 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

AMG Key Vault Health Check 支持哪些平台?

AMG Key Vault Health Check 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 AMG Key Vault Health Check?

由 1w2w3y(@1w2w3y)开发并维护,当前版本 v1.0.0。

💬 留言讨论