/install amg-check-cosmosdb-mongo-ru
\x3C!-- Auto-generated for OpenClaw by pack-openclaw. Notes for OpenClaw users:
- Claude Code dynamic expressions (!...) in this file are NOT evaluated by OpenClaw
and appear as literal text. Run them manually at the start of the workflow.
- Invoke this skill only via slash command (e.g. /amg-check-cosmosdb-mongo-ru). Auto-invocation is
disabled on Claude Code but not on OpenClaw. -->
OpenClaw Setup (one-time)
This skill calls MCP tools prefixed with mcp__amg__*, so OpenClaw must have an MCP server registered under the exact name amg. Run this once per workspace before invoking the skill:
openclaw mcp set amg '{"url":"https://\x3Cyour-grafana-instance>/api/azure-mcp","transport":"streamable-http","headers":{"Authorization":"Bearer \x3Cyour-token>"}}'
Replace \x3Cyour-grafana-instance> with your Azure Managed Grafana endpoint and \x3Cyour-token> with a valid Grafana service-account token (starts with glsa_). The server name must be amg — the skill's allowed-tools reference mcp__amg__* and will not find tools under any other name.
Verify the server is registered:
openclaw mcp list
Official skill source: https://github.com/Azure/amg-skills
Runtime Context
- Current UTC time: !
date -u +%Y-%m-%dT%H:%M:%SZ - Config: !
cat memory/amg-check-cosmosdb-mongo-ru/config.md 2>/dev/null || echo "NOT_CONFIGURED" - Prior report: !
[ -f memory/amg-check-cosmosdb-mongo-ru/report.md ] && echo "exists ($(grep -c '^### BUG-' memory/amg-check-cosmosdb-mongo-ru/report.md) bugs documented)" || echo "not found" - Arguments: time-range=$0, subscription-override=$1
Known Issues: Before presenting findings, cross-reference results against
memory/amg-check-cosmosdb-mongo-ru/report.md.
Cosmos DB for MongoDB (RU) Health Check
Critical Constraints
- No subagents for MCP. The Agent tool cannot access MCP tools — all MCP calls must be made from the main context.
- Scan every resource. No sampling or early stopping.
- Time format: ISO 8601 UTC with explicit
from/to— NEVER usetimespan(it causes errors). - Safe interval: Always use
PT1H— it works for all Cosmos DB metrics.PT6His NOT supported.DataUsage,IndexUsage, andDocumentCountdo NOT supportP1D. - Parallelism cap: 30 concurrent MCP calls per batch. Reduce to 4-5 if rate-limited.
- Result too large: Save to temp file and parse outside the context window. Prefer
node -e "..."if installed; otherwise fall back topython -c "...",jq, orpwsh -Command "...". Bash permission for the chosen interpreter will be prompted on first use.
Progress Tracking
Update checkboxes as you complete each phase:
- Phase 1a: Datasource validated
- Phase 1b: Accounts discovered (N=?)
- Phase 1c: Non-succeeded accounts investigated (if any)
- Phase 2: Metric definitions validated
- Phase 3: Pulse check completed (N scanned, N findings)
- Phase 4: Deep metrics for abnormal accounts
- Phase 5: Resource logs for abnormal accounts
- Report presented
- Known issues updated in
memory/amg-check-cosmosdb-mongo-ru/report.md
Configuration
If Config shows NOT_CONFIGURED: Run First-Run Setup at the bottom of this file, then return here.
If Config is populated: Extract the datasource UID and subscription ID from the pre-loaded Runtime Context above and use them for all queries. Use $1 as the subscription override if provided.
- Datasource UID: from
## Azure Monitor Datasource>UID - Subscription ID: from
## Subscription(or$1if provided) - Resource Type:
microsoft.documentdb/databaseaccounts(lowercase) withkind == 'MongoDB'
Time Range
Default: 7 days for metrics, 24 hours for logs. Override with $0 (e.g., 3d). Keep log queries to 1-2 days to avoid timeouts.
Workflow
Phase 1a: Validate Datasource
Call amgmcp_datasource_list (no parameters). Find entry with type == "grafana-azure-monitor-datasource".
- Matches configured UID → proceed.
- Different UID → update
memory/amg-check-cosmosdb-mongo-ru/config.md, warn user, use new UID. - Not found → abort with error.
Phase 1b: Discover All Cosmos DB for MongoDB (RU) Accounts
azureMonitorDatasourceUid: {DATASOURCE_UID}
query: |
resources
| where type == 'microsoft.documentdb/databaseaccounts'
| where kind == 'MongoDB'
| project name, resourceGroup, location, subscriptionId, id, properties.provisioningState
| order by location asc, name asc
If the config specifies subscription IDs (not "all"), add | where subscriptionId in ('{ID1}', '{ID2}'). Derive region summary by counting accounts per location. Flag accounts not in "Succeeded" state. Stop if zero accounts found.
Why
kind == 'MongoDB'? Filters for RU-based MongoDB API accounts. vCore-based MongoDB usesmicrosoft.documentdb/mongoclusters.
Phase 1c: Activity Log for Non-Succeeded Accounts
If any accounts are not in "Succeeded" state, query the activity log for up to 3 of them:
azureMonitorDatasourceUid: {DATASOURCE_UID}
scope: {account's full ARM resource ID}
startTime: now-3d
endTime: now
select: eventTimestamp,operationName,status,caller,subStatus
If the response exceeds 500 KB, retry with startTime: now-1d. Summarize: operations performed, caller type, success/in-progress status, likely cause.
Phase 2: Validate Available Metrics
Call amgmcp_query_resource_metric_definition on the first account from Phase 1. Confirm expected metrics exist. Run only once — definitions are the same across all accounts.
Phase 3: Tier 1 — Fleet-Wide Pulse Check
azureMonitorDatasourceUid: {DATASOURCE_UID}
pastDays: 7
scenarios: cosmosdb_mongo
Scans all accounts across 3 scenarios: cosmosdb_mongo_ru, cosmosdb_mongo_throttling, cosmosdb_mongo_availability.
Before moving to Phase 4, verify:
scanSummary.totalResourcesScannedmatches Phase 1 account count.- All 3 scenarios show
status: "completed"inscenarioResults. - If
errorsnon-empty, retry affected scenarios individually. - If >10% accounts missing, fall back to batched
amgmcp_query_resource_metricfor unscanned accounts.
Accounts in the findings array are abnormal. Also flag any non-Succeeded accounts from Phase 1.
Note: Sustained-high detection (>50% for 6+ hours), RU spike pattern detection (>30pp jump in 1h), and latency analysis require hourly time-series data and are performed in Phase 4 on flagged accounts only.
Phase 4: Tier 2 — Deep Metrics for Abnormal Accounts
Read reference/phase4-deep-metrics.md before starting Phase 4. It contains:
- Response size management (critical — fleet-wide PT1H queries exceed 500 KB)
- Fleet-wide triage strategy (when >50% accounts are flagged)
- Core and secondary metrics tables
- Batch strategy and correlation analysis patterns (use ultrathink)
Phase 5: Resource Logs for Abnormal Accounts
Read reference/phase5-resource-logs.md before starting Phase 5. It contains:
- 5 KQL query templates: throttling, high latency, request volume, top RU operations, error codes
- Fallback table guidance (CDBDataPlaneRequests if CDBMongoRequests is empty)
Output
Present the report using the structure in reference/output-format.md.
Classification:
| Severity | Criteria |
|---|---|
| CRITICAL | NormalizedRU = 100% sustained, OR ServiceAvailability \x3C 99.9%, OR latency avg > 50ms |
| HIGH | NormalizedRU max 85-100% with frequent spikes, OR ReplicationLatency > 1000ms |
| WARNING | NormalizedRU max 70-85% sustained, OR sustained RU > 50% for 6h+, OR RU spike >30pp in 1h, OR ServiceAvailability \x3C 99.99%, OR latency avg > 10ms, OR ReplicationLatency > 100ms |
| MODERATE | NormalizedRU max 50-70% |
| HEALTHY | All metrics within normal ranges (NormalizedRU \x3C 50%) |
Update Known Issues
After presenting findings, update memory/amg-check-cosmosdb-mongo-ru/report.md:
- Read the current file.
- Rebuild the Resource Inventory table at the end: every account, full ARM ID, region, subscription, state. Group by region, sorted alphabetically.
- Update existing bug status from today's telemetry (resolved / improving / worsening / still active).
- Add new bugs with: severity, account name, region, metric evidence, log evidence, root cause, recommended action.
- Update the "Updated" date header.
Only add genuine issues: sustained throttling, availability drops, high latency patterns, or replication problems. Skip transient single-hour spikes or expected maintenance windows.
Error Handling
See reference/error-handling.md for the full recovery table.
Analysis Guidance
- Known patterns, signals, root causes: reference/analysis-patterns.md
- Optional deep-dive KQL queries: reference/deep-dive-queries.md
Reference
- Cosmos DB resource type:
microsoft.documentdb/databaseaccounts(kind:MongoDB) - vCore resource type (different):
microsoft.documentdb/mongoclusters - Latency metrics:
ServerSideLatencyDirectandServerSideLatencyGateway(the oldServerSideLatencyis deprecated) - Resource log tables:
CDBMongoRequests(primary),CDBDataPlaneRequests(fallback) - Key error codes:
429/16500(throttling),50(server error),13(unauthorized) - Safe metric interval:
PT1Hfor all metrics (PT6H NOT supported) - Known issues:
memory/amg-check-cosmosdb-mongo-ru/report.md - User config:
memory/amg-check-cosmosdb-mongo-ru/config.md
First-Run Setup
Run only when Config shows NOT_CONFIGURED. After completing, return to the Workflow above.
1. Discover Datasource UID: Call amgmcp_datasource_list. Filter type == "grafana-azure-monitor-datasource". Prefer uid == "azure-monitor-oob" if multiple match. Abort if zero match.
2. Discover Subscription ID: Run this Resource Graph query to list all subscriptions with Cosmos DB for MongoDB (RU) accounts, then present the results as a table and ask the user which subscription(s) to use:
resources
| where type == 'microsoft.documentdb/databaseaccounts'
| where kind == 'MongoDB'
| join kind=inner (
resourcecontainers
| where type == 'microsoft.resources/subscriptions'
| project subscriptionId, subscriptionName=name
) on subscriptionId
| summarize AccountCount=count() by subscriptionId, subscriptionName
| order by AccountCount desc
Present the results as a table with columns: Subscription Name, Subscription ID, Account Count. Then ask the user: "Which subscription ID(s) should I configure for this health check? Or type 'all' to scan all subscriptions."
3. Write config: Write memory/amg-check-cosmosdb-mongo-ru/config.md:
# amg-check-cosmosdb-mongo-ru Configuration
User-specific values for the Cosmos DB for MongoDB (RU) health check skill.
This file is auto-generated on first run and can be edited manually.
## Azure Monitor Datasource
- **UID**: {discovered_uid}
- **Name**: {discovered_name}
## Subscription
- {subscription_id_or_"all"}
4. Confirm: Show the resolved config and ask for confirmation before proceeding.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install amg-check-cosmosdb-mongo-ru - After installation, invoke the skill by name or use
/amg-check-cosmosdb-mongo-ru - Provide required inputs per the skill's parameter spec and get structured output
What is AMG Cosmos DB for MongoDB (RU) Health Check?
Run only when the user explicitly asks for a fleet-wide Cosmos DB for MongoDB (RU) health check — scans NormalizedRU consumption, service availability, serve... It is an AI Agent Skill for Claude Code / OpenClaw, with 56 downloads so far.
How do I install AMG Cosmos DB for MongoDB (RU) Health Check?
Run "/install amg-check-cosmosdb-mongo-ru" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is AMG Cosmos DB for MongoDB (RU) Health Check free?
Yes, AMG Cosmos DB for MongoDB (RU) Health Check is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does AMG Cosmos DB for MongoDB (RU) Health Check support?
AMG Cosmos DB for MongoDB (RU) Health Check is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created AMG Cosmos DB for MongoDB (RU) Health Check?
It is built and maintained by 1w2w3y (@1w2w3y); the current version is v1.0.0.