← Back to Skills Marketplace
huaweiclouddev

huawei-cloud-ascend-profiler-db-explorer

by huaweicloud-skills-team · GitHub ↗ · v0.0.2 · MIT-0
cross-platform ⚠ pending
39
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install huawei-cloud-ascend-profiler-db-explorer
Description
Convert natural language questions into safe executable SQL to query Ascend PyTorch Profiler / msprof database for operator time consumption, communication,...
README (SKILL.md)

Huawei Cloud Ascend Profiler DB Explorer

Overview

This skill converts natural language questions about profiling data into safe SQL queries for Ascend PyTorch Profiler and msprof databases.

Architecture: Natural Language Input → Intent Recognition → SQL Generation → Database Execution → Result Analysis

Related Skills:

  • huawei-cloud-msot-msopprof-operator-profiler - Operator performance data collection
  • huawei-cloud-ascend-small-model-migrate - Migration workflow that uses profiling analysis
  • huawei-cloud-ascendc-operator-performance-optim - Operator optimization workflow

Architecture Components

This skill involves the following cloud services and components:

  • MSProf: Ascend profiling tool for data collection and database management
  • SQLite: Database engine for storing profiling data
  • Ascend NPU: Target hardware for performance profiling
  • msprof_mcp: Tool for executing SQL queries on profiling database

Architecture Diagram:

┌─────────────────────────────────────────────────────────────┐
│              Profiler DB Explorer Skill                    │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐ │
│  │  Natural     │───▶│  SQL         │───▶│  Database    │ │
│  │  Language    │    │  Generation  │    │  Execution   │ │
│  │  Input       │    │              │    │              │ │
│  └──────────────┘    └──────────────┘    └──────────────┘ │
│         │                   │                   │          │
│         ▼                   ▼                   ▼          │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐ │
│  │  Intent      │    │  CTE Macro   │    │  Result      │ │
│  │  Recognition │    │  Templates   │    │  Analysis    │ │
│  └──────────────┘    └──────────────┘    └──────────────┘ │
└─────────────────────────────────────────────────────────────┘

Use Cases

Typical Problem Scenarios:

  • Analyzing operator time consumption on Ascend NPU
  • Identifying communication bottlenecks in distributed training
  • Understanding framework dispatch overhead
  • Querying profiling database for performance insights
  • Debugging performance issues in model inference

Typical User Phrases:

  • "Which operators are most time-consuming?"
  • "Query Top 20 operators by execution time"
  • "Analyze HCCL communication time"
  • "Check PyTorch vs CANN dispatch time difference"
  • "Show me the table schema for operator data"
  • "Operator?"
  • "AnalysisOperatorPerformance"
  • "QueryprofilerDatabase"

Skill Objectives

  • Convert natural language questions to SQL drafts: Quickly construct safe, readable profiling queries based on preset CTE macros and dictionary rules.
  • Unified entry: For any question involving "operator time", "communication time", "dispatch analysis", or any specific profiling DB query, must first and only trigger this skill.
  • Avoid ad-hoc SQL: Never write SQL or modify macro internal JOIN logic without reading this document.

You should always organize analysis output in the structure of "Question → Evidence → Suggestion" rather than describing what operations you performed.

Role Positioning

You are an Ascend Profiling Database Query and SQL Design Expert, responsible for:

  • Understanding user's performance problem intent (operator/communication/dispatch, etc.).
  • Selecting appropriate query channel (Track A / Track B).
  • Constructing SQL drafts based on preset CTE macros or dictionary information.
  • Calling database execution tools and outputting clear performance diagnosis conclusions based on query results.

Usage Scenarios

Prioritize calling this skill in following scenarios:

  • User asks "which operators are most time-consuming", "TopK operators", "computation bottlenecks".
  • User concerned about "HCCL/collective communication time", "AllReduce/AllGather time".
  • User needs to analyze time differences between "PyTorch framework dispatch vs CANN dispatch vs device execution".
  • Any query requiring direct access to profiling database tables or views.

Trigger Words (Recall Enhancement)

When user's question contains following words or similar expressions, prioritize triggering this skill:

  • ascend-pytorch-profiler-db / ascend_pytorch_profiler*.db / msprof_*.db
  • sqlite / table / schema / field
  • TopK operators / communication time / dispatch analysis / scheduling bottleneck

Mandatory Restrictions

  • Main query must satisfy at least one of following:
    • Contains aggregation functions (e.g., SUM, AVG, COUNT, etc.), OR
    • Explicitly includes ORDER BY ... LIMIT 20 (or smaller LIMIT).
  • Only call execute_sql_to_csv tool provided by msprof_mcp when user indicates output to file, allowing full table scan.
  • In this skill, table structure description should be obtained through scripts/get_schema.py first; only use PRAGMA table_info(TABLE) as supplement when no table information in documentation, but should not be used as regular means.

Track A: Golden Views / CTE Macros (Priority)

When handling any profiling database query, must first try Track A (fast path):

  1. Intent Matching

    • Determine if user intent belongs to: operator computation / collective communication / framework dispatch.
    • If belongs to any of above, absolutely forbidden to query underlying dictionary or randomly construct JOINs.
  2. Extract Macro (CTE)

    • From "CTE Macro Definitions" below, copy corresponding WITH statement block verbatim to SQL beginning.
    • Never modify JOIN logic and field expressions inside macros.
  3. Concatenate Main Query

    • After copied WITH ... AS (...), write SELECT query for corresponding view (e.g., compute_view, comm_view, dispatch_view).
    • Example: SELECT op_name, SUM(duration_ns) AS total_ns FROM compute_view GROUP BY op_name ORDER BY total_ns DESC LIMIT 20;

Track B: Underlying Documentation / profiler_db_data_format.md

Only enter Track B when one of following conditions met:

  • User explicitly requests querying underlying hardware metrics (e.g., PMU counts, memory allocation, Step division, etc.).
  • Requirement not covered by existing views in "CTE Macro Definitions".

Core tool for Track B is scripts/get_schema.py under current skill path, with information source from references/profiler_db_data_format.md.

1. Get Real Table Names from Current DB (Recommended)

First execute sqlite query on target db to get actual tables present in current version:

sqlite3 {db_path} ".tables"
sqlite3 {db_path} "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;"

Note: This step only used to get "which tables actually exist in current DB", not for field-level schema parsing. For field descriptions, use get_schema.py --table_name.

2. Use Script for Document/DB Alignment (Recommended)

  • Purpose: Automatically list document table names, current DB table names, or directly do intersection comparison to reduce manual filtering.
  • Command line examples:
cd {skills_path}/huawei-cloud-ascend-profiler-db-explorer/scripts
python3 get_schema.py --list_tables
python3 get_schema.py --db_path {db_path} --list_db_tables
python3 get_schema.py --db_path {db_path} --compare_doc_db

3. get_schema_by_table_name(table_name)

  • Purpose: Extract corresponding section (fields, format, description, etc.) for the table from profiler_db_.md by table name.
  • Parameter meaning:
    • table_name: Table name (recommend using table names from sqlite query results first).
  • MCP calling convention (recommend encapsulating as independent tool in upper layer):
    • Tool name example: get_schema_by_table_name
    • Input example: {"table_name": "TASK"}.
  • Command line examples:
cd {skills_path}/huawei-cloud-ascend-profiler-db-explorer/scripts
python3 get_schema.py --table_name TASK
python3 get_schema.py --table_name COMMUNICATION_OP

Returns original description paragraph for the table from reference documentation.

Track B Usage Principles

  1. Use real table names from sqlite query first, then call get_schema.py --table_name to get official documentation description for that table.
  2. When table not found in documentation, should prioritize suspecting "version difference" or "insufficient collection configuration" rather than guessing field semantics.
  3. Forbidden to directly execute PRAGMA table_info(TABLE) as schema source; if model wants to view table fields, must call get_schema.py instead.

Execution and Summary

  • Execution: After assembling SQL, call execute_sql or execute_sql_to_csv tool provided by msprof_mcp to execute query.
  • Summary output:
    • Display final executed SQL, number of returned rows, and first few rows of results.

CTE Macro Definitions (Must Reuse in Track A)

[Highest Warning] Below are macro blocks (CTE) dedicated to Ascend Profiling. In Track A:

  • Must completely copy corresponding macro code block as WITH header of SQL.
  • Never modify JOIN, field meaning, or computation logic inside macros.

1. Operator Computation Detail Macro (Compute Macro)

Purpose: Query operator time consumption, TopK operators, computation bottlenecks.

WITH compute_view AS (
    SELECT c.globalTaskId, ROUND(t.endNs - t.startNs) AS duration_ns,
           n.value AS op_name, type_str.value AS op_type
    FROM COMPUTE_TASK_INFO c
    LEFT JOIN TASK t ON t.globalTaskId = c.globalTaskId
    LEFT JOIN STRING_IDS n ON n.id = c.name
    LEFT JOIN STRING_IDS type_str ON type_str.id = c.opType
)

2. Communication Detail Macro (Communication Macro)

Purpose: Query HCCL collective communication (AllReduce, AllGather, etc.) time.

WITH comm_view AS (
    SELECT ROUND(c.endNs - c.startNs) AS duration_ns, n.value AS op_name,
           t.value AS op_type, g.value AS group_name
    FROM COMMUNICATION_OP c
    LEFT JOIN STRING_IDS n ON n.id = c.opName
    LEFT JOIN STRING_IDS t ON t.id = c.opType
    LEFT JOIN STRING_IDS g ON g.id = c.groupName
)

3. Dispatch Mapping Macro (Dispatch Macro)

Purpose: Compare time differences between PyTorch framework dispatch, CANN layer dispatch, and underlying execution to locate scheduling congestion.

WITH dispatch_view AS (
    SELECT
        ROUND(t.endNs - t.startNs) AS task_duration_ns,
        ROUND(c.endNs - c.startNs) AS cann_duration_ns,
        ROUND(p.endNs - p.startNs) AS pytorch_duration_ns,
        c_str.value AS cann_api_name,
        p_str.value AS pytorch_api_name,
        t_str.value AS task_type
    FROM TASK t
    LEFT JOIN CANN_API c ON t.connectionId = c.connectionId
    LEFT JOIN CONNECTION_IDS conn ON conn.connectionId = t.connectionId
    LEFT JOIN PYTORCH_API p ON p.connectionId = conn.id
    LEFT JOIN STRING_IDS c_str ON c.name = c_str.id
    LEFT JOIN STRING_IDS p_str ON p.name = p_str.id
    LEFT JOIN STRING_IDS t_str ON t.taskType = t_str.id
)

Enhanced Features

Intelligent Bottleneck Diagnoser

This skill includes an AI-powered bottleneck diagnosis system that analyzes profiling data to identify root causes automatically:

Features:

  • Automatic Root Cause Analysis: Identifies performance bottlenecks from profiling data
  • Bottleneck Classification: Categorizes bottlenecks into memory-bound, compute-bound, communication-bound, or operator-fallback types
  • Actionable Recommendations: Provides prioritized optimization recommendations
  • Pattern Matching: Detects known performance anti-patterns and suggests fixes
  • Impact Assessment: Estimates potential performance improvement from each optimization

Bottleneck Classification:

Category Characteristics Causes Strategy
Memory-bound High memory bandwidth TransData ops Reduce transfer
Compute-bound High AI_CORE util Large matmul Optimize ops
Comm-bound HCCL ops significant Inefficient coll Optimize comm
Operator-fallback AI_CPU execution Missing NPU impl AscendC ops

Bottleneck Diagnosis Output:

## Intelligent Bottleneck Diagnosis Report

### Overall Performance Summary
- Total Inference Time: 15.2 ms
- Bottleneck Score: 78/100
- Main Bottleneck Type: Memory-bound

### Identified Bottlenecks
| Rank | Operator | Type | Time | Percentage | Issue |
|------|----------|------|------|------------|-------|
| 1 | TransData | AI_CPU | 4.2 ms | 27.6% | Frequent CPU-NPU transfer |
| 2 | IndexSelect | AI_CPU | 2.8 ms | 18.4% | Operator fallback to CPU |
| 3 | NMS | AI_CPU | 1.5 ms | 9.9% | No NPU implementation |

### Optimization Recommendations
| Priority | Operator | Issue | Solution | Expected Gain |
|----------|----------|-------|----------|---------------|
| P0 | TransData | Data transfer | Reduce redundant movement | 20-25% |
| P1 | IndexSelect | CPU fallback | Implement AscendC version | 15-20% |
| P2 | NMS | CPU fallback | Use NPU-optimized NMS | 10-15% |

### Quick Wins
1. Batch pre-processing on NPU instead of CPU
2. Use async data transfer with overlap
3. Enable memory pooling for intermediate tensors

Reference Documents

Document Description
Profiler DB Data Format Table structure
Acceptance Criteria Acceptance criteria
Verification Method Verification approach
Troubleshooting Common issues

Prerequisites

  • msprof >= 7.0.0 installed
  • sqlite3 >= 3.0.0 installed
  • Have Ascend PyTorch Profiler or msprof generated database file

Core Commands

# Query operator time consumption
python3 scripts/query_profiler_db.py \
  --db /path/to/ascend_pytorch_profiler.db \
  --query "Top 10 operators by time consumption"

Parameter Confirmation

Parameter Description Required
db Profiler database path Yes
query Natural language query Yes
output Output format No
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install huawei-cloud-ascend-profiler-db-explorer
  3. After installation, invoke the skill by name or use /huawei-cloud-ascend-profiler-db-explorer
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.0.2
No file changes detected. There are no changes to functionality or documentation in this version.
v0.0.1
Initial release of Huawei Cloud Ascend Profiler DB Explorer. - Converts natural language profiling questions into safe executable SQL queries for Ascend PyTorch Profiler/msprof databases. - Supports analysis of operator time consumption, communication, dispatch, and performance bottlenecks. - Extracts table schema from official documentation for safer queries. - Enforces strict SQL best practices: uses CTE macros for key scenarios and restricts ad-hoc SQL/join logic. - Supports table and schema introspection, with clear guidance for query construction and output formatting.
Metadata
Slug huawei-cloud-ascend-profiler-db-explorer
Version 0.0.2
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is huawei-cloud-ascend-profiler-db-explorer?

Convert natural language questions into safe executable SQL to query Ascend PyTorch Profiler / msprof database for operator time consumption, communication,... It is an AI Agent Skill for Claude Code / OpenClaw, with 39 downloads so far.

How do I install huawei-cloud-ascend-profiler-db-explorer?

Run "/install huawei-cloud-ascend-profiler-db-explorer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is huawei-cloud-ascend-profiler-db-explorer free?

Yes, huawei-cloud-ascend-profiler-db-explorer is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does huawei-cloud-ascend-profiler-db-explorer support?

huawei-cloud-ascend-profiler-db-explorer is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created huawei-cloud-ascend-profiler-db-explorer?

It is built and maintained by huaweicloud-skills-team (@huaweiclouddev); the current version is v0.0.2.

💬 Comments