← Back to Skills Marketplace

huawei-cloud-ascend-profiler-db-explorer

Name: huawei-cloud-ascend-profiler-db-explorer
Author: huaweiclouddev

by huaweicloud-skills-team · GitHub ↗ · v0.0.2 · MIT-0

cross-platform ⚠ pending

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install huawei-cloud-ascend-profiler-db-explorer

Description

Convert natural language questions into safe executable SQL to query Ascend PyTorch Profiler / msprof database for operator time consumption, communication,...

README (SKILL.md)

Huawei Cloud Ascend Profiler DB Explorer

Overview

This skill converts natural language questions about profiling data into safe SQL queries for Ascend PyTorch Profiler and msprof databases.

Architecture: Natural Language Input → Intent Recognition → SQL Generation → Database Execution → Result Analysis

Related Skills:

huawei-cloud-msot-msopprof-operator-profiler - Operator performance data collection
huawei-cloud-ascend-small-model-migrate - Migration workflow that uses profiling analysis
huawei-cloud-ascendc-operator-performance-optim - Operator optimization workflow

Architecture Components

This skill involves the following cloud services and components:

MSProf: Ascend profiling tool for data collection and database management
SQLite: Database engine for storing profiling data
Ascend NPU: Target hardware for performance profiling
msprof_mcp: Tool for executing SQL queries on profiling database

Architecture Diagram:

┌─────────────────────────────────────────────────────────────┐
│              Profiler DB Explorer Skill                    │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐ │
│  │  Natural     │───▶│  SQL         │───▶│  Database    │ │
│  │  Language    │    │  Generation  │    │  Execution   │ │
│  │  Input       │    │              │    │              │ │
│  └──────────────┘    └──────────────┘    └──────────────┘ │
│         │                   │                   │          │
│         ▼                   ▼                   ▼          │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐ │
│  │  Intent      │    │  CTE Macro   │    │  Result      │ │
│  │  Recognition │    │  Templates   │    │  Analysis    │ │
│  └──────────────┘    └──────────────┘    └──────────────┘ │
└─────────────────────────────────────────────────────────────┘

Use Cases

Typical Problem Scenarios:

Analyzing operator time consumption on Ascend NPU
Identifying communication bottlenecks in distributed training
Understanding framework dispatch overhead
Querying profiling database for performance insights
Debugging performance issues in model inference

Typical User Phrases:

"Which operators are most time-consuming?"
"Query Top 20 operators by execution time"
"Analyze HCCL communication time"
"Check PyTorch vs CANN dispatch time difference"
"Show me the table schema for operator data"
"Operator?"
"AnalysisOperatorPerformance"
"QueryprofilerDatabase"

Skill Objectives

Convert natural language questions to SQL drafts: Quickly construct safe, readable profiling queries based on preset CTE macros and dictionary rules.
Unified entry: For any question involving "operator time", "communication time", "dispatch analysis", or any specific profiling DB query, must first and only trigger this skill.
Avoid ad-hoc SQL: Never write SQL or modify macro internal JOIN logic without reading this document.

You should always organize analysis output in the structure of "Question → Evidence → Suggestion" rather than describing what operations you performed.

Role Positioning

You are an Ascend Profiling Database Query and SQL Design Expert, responsible for:

Understanding user's performance problem intent (operator/communication/dispatch, etc.).
Selecting appropriate query channel (Track A / Track B).
Constructing SQL drafts based on preset CTE macros or dictionary information.
Calling database execution tools and outputting clear performance diagnosis conclusions based on query results.

Usage Scenarios

Prioritize calling this skill in following scenarios:

User asks "which operators are most time-consuming", "TopK operators", "computation bottlenecks".
User concerned about "HCCL/collective communication time", "AllReduce/AllGather time".
User needs to analyze time differences between "PyTorch framework dispatch vs CANN dispatch vs device execution".
Any query requiring direct access to profiling database tables or views.

Trigger Words (Recall Enhancement)

When user's question contains following words or similar expressions, prioritize triggering this skill:

ascend-pytorch-profiler-db / ascend_pytorch_profiler*.db / msprof_*.db
sqlite / table / schema / field
TopK operators / communication time / dispatch analysis / scheduling bottleneck

Mandatory Restrictions

Main query must satisfy at least one of following:
- Contains aggregation functions (e.g., SUM, AVG, COUNT, etc.), OR
- Explicitly includes ORDER BY ... LIMIT 20 (or smaller LIMIT).
Only call execute_sql_to_csv tool provided by msprof_mcp when user indicates output to file, allowing full table scan.
In this skill, table structure description should be obtained through scripts/get_schema.py first; only use PRAGMA table_info(TABLE) as supplement when no table information in documentation, but should not be used as regular means.

Track A: Golden Views / CTE Macros (Priority)

When handling any profiling database query, must first try Track A (fast path):

Intent Matching
- Determine if user intent belongs to: operator computation / collective communication / framework dispatch.
- If belongs to any of above, absolutely forbidden to query underlying dictionary or randomly construct JOINs.
Extract Macro (CTE)
- From "CTE Macro Definitions" below, copy corresponding WITH statement block verbatim to SQL beginning.
- Never modify JOIN logic and field expressions inside macros.
Concatenate Main Query
- After copied WITH ... AS (...), write SELECT query for corresponding view (e.g., compute_view, comm_view, dispatch_view).
- Example: SELECT op_name, SUM(duration_ns) AS total_ns FROM compute_view GROUP BY op_name ORDER BY total_ns DESC LIMIT 20;

Track B: Underlying Documentation / profiler_db_data_format.md

Only enter Track B when one of following conditions met:

User explicitly requests querying underlying hardware metrics (e.g., PMU counts, memory allocation, Step division, etc.).
Requirement not covered by existing views in "CTE Macro Definitions".

Core tool for Track B is scripts/get_schema.py under current skill path, with information source from references/profiler_db_data_format.md.

1. Get Real Table Names from Current DB (Recommended)

First execute sqlite query on target db to get actual tables present in current version:

sqlite3 {db_path} ".tables"
sqlite3 {db_path} "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;"

Note: This step only used to get "which tables actually exist in current DB", not for field-level schema parsing. For field descriptions, use get_schema.py --table_name.

2. Use Script for Document/DB Alignment (Recommended)

Purpose: Automatically list document table names, current DB table names, or directly do intersection comparison to reduce manual filtering.
Command line examples:

cd {skills_path}/huawei-cloud-ascend-profiler-db-explorer/scripts
python3 get_schema.py --list_tables
python3 get_schema.py --db_path {db_path} --list_db_tables
python3 get_schema.py --db_path {db_path} --compare_doc_db

3. get_schema_by_table_name(table_name)

Purpose: Extract corresponding section (fields, format, description, etc.) for the table from profiler_db_.md by table name.
Parameter meaning:
- table_name: Table name (recommend using table names from sqlite query results first).
MCP calling convention (recommend encapsulating as independent tool in upper layer):
- Tool name example: get_schema_by_table_name
- Input example: {"table_name": "TASK"}.
Command line examples:

cd {skills_path}/huawei-cloud-ascend-profiler-db-explorer/scripts
python3 get_schema.py --table_name TASK
python3 get_schema.py --table_name COMMUNICATION_OP

Returns original description paragraph for the table from reference documentation.

Track B Usage Principles

Use real table names from sqlite query first, then call get_schema.py --table_name to get official documentation description for that table.
When table not found in documentation, should prioritize suspecting "version difference" or "insufficient collection configuration" rather than guessing field semantics.
Forbidden to directly execute PRAGMA table_info(TABLE) as schema source; if model wants to view table fields, must call get_schema.py instead.

Execution and Summary

Execution: After assembling SQL, call execute_sql or execute_sql_to_csv tool provided by msprof_mcp to execute query.
Summary output:
- Display final executed SQL, number of returned rows, and first few rows of results.

CTE Macro Definitions (Must Reuse in Track A)

[Highest Warning] Below are macro blocks (CTE) dedicated to Ascend Profiling. In Track A:

Must completely copy corresponding macro code block as WITH header of SQL.
Never modify JOIN, field meaning, or computation logic inside macros.

1. Operator Computation Detail Macro (Compute Macro)

Purpose: Query operator time consumption, TopK operators, computation bottlenecks.

WITH compute_view AS (
    SELECT c.globalTaskId, ROUND(t.endNs - t.startNs) AS duration_ns,
           n.value AS op_name, type_str.value AS op_type
    FROM COMPUTE_TASK_INFO c
    LEFT JOIN TASK t ON t.globalTaskId = c.globalTaskId
    LEFT JOIN STRING_IDS n ON n.id = c.name
    LEFT JOIN STRING_IDS type_str ON type_str.id = c.opType
)

2. Communication Detail Macro (Communication Macro)

Purpose: Query HCCL collective communication (AllReduce, AllGather, etc.) time.

WITH comm_view AS (
    SELECT ROUND(c.endNs - c.startNs) AS duration_ns, n.value AS op_name,
           t.value AS op_type, g.value AS group_name
    FROM COMMUNICATION_OP c
    LEFT JOIN STRING_IDS n ON n.id = c.opName
    LEFT JOIN STRING_IDS t ON t.id = c.opType
    LEFT JOIN STRING_IDS g ON g.id = c.groupName
)

3. Dispatch Mapping Macro (Dispatch Macro)

Purpose: Compare time differences between PyTorch framework dispatch, CANN layer dispatch, and underlying execution to locate scheduling congestion.

WITH dispatch_view AS (
    SELECT
        ROUND(t.endNs - t.startNs) AS task_duration_ns,
        ROUND(c.endNs - c.startNs) AS cann_duration_ns,
        ROUND(p.endNs - p.startNs) AS pytorch_duration_ns,
        c_str.value AS cann_api_name,
        p_str.value AS pytorch_api_name,
        t_str.value AS task_type
    FROM TASK t
    LEFT JOIN CANN_API c ON t.connectionId = c.connectionId
    LEFT JOIN CONNECTION_IDS conn ON conn.connectionId = t.connectionId
    LEFT JOIN PYTORCH_API p ON p.connectionId = conn.id
    LEFT JOIN STRING_IDS c_str ON c.name = c_str.id
    LEFT JOIN STRING_IDS p_str ON p.name = p_str.id
    LEFT JOIN STRING_IDS t_str ON t.taskType = t_str.id
)

Enhanced Features

Intelligent Bottleneck Diagnoser

This skill includes an AI-powered bottleneck diagnosis system that analyzes profiling data to identify root causes automatically:

Features:

Automatic Root Cause Analysis: Identifies performance bottlenecks from profiling data
Bottleneck Classification: Categorizes bottlenecks into memory-bound, compute-bound, communication-bound, or operator-fallback types
Actionable Recommendations: Provides prioritized optimization recommendations
Pattern Matching: Detects known performance anti-patterns and suggests fixes
Impact Assessment: Estimates potential performance improvement from each optimization

Bottleneck Classification:

Category	Characteristics	Causes	Strategy
Memory-bound	High memory bandwidth	TransData ops	Reduce transfer
Compute-bound	High AI_CORE util	Large matmul	Optimize ops
Comm-bound	HCCL ops significant	Inefficient coll	Optimize comm
Operator-fallback	AI_CPU execution	Missing NPU impl	AscendC ops

Bottleneck Diagnosis Output:

## Intelligent Bottleneck Diagnosis Report

### Overall Performance Summary
- Total Inference Time: 15.2 ms
- Bottleneck Score: 78/100
- Main Bottleneck Type: Memory-bound

### Identified Bottlenecks
| Rank | Operator | Type | Time | Percentage | Issue |
|------|----------|------|------|------------|-------|
| 1 | TransData | AI_CPU | 4.2 ms | 27.6% | Frequent CPU-NPU transfer |
| 2 | IndexSelect | AI_CPU | 2.8 ms | 18.4% | Operator fallback to CPU |
| 3 | NMS | AI_CPU | 1.5 ms | 9.9% | No NPU implementation |

### Optimization Recommendations
| Priority | Operator | Issue | Solution | Expected Gain |
|----------|----------|-------|----------|---------------|
| P0 | TransData | Data transfer | Reduce redundant movement | 20-25% |
| P1 | IndexSelect | CPU fallback | Implement AscendC version | 15-20% |
| P2 | NMS | CPU fallback | Use NPU-optimized NMS | 10-15% |

### Quick Wins
1. Batch pre-processing on NPU instead of CPU
2. Use async data transfer with overlap
3. Enable memory pooling for intermediate tensors

Reference Documents

Document	Description
Profiler DB Data Format	Table structure
Acceptance Criteria	Acceptance criteria
Verification Method	Verification approach
Troubleshooting	Common issues

Prerequisites

msprof >= 7.0.0 installed
sqlite3 >= 3.0.0 installed
Have Ascend PyTorch Profiler or msprof generated database file

Core Commands

# Query operator time consumption
python3 scripts/query_profiler_db.py \
  --db /path/to/ascend_pytorch_profiler.db \
  --query "Top 10 operators by time consumption"

Parameter Confirmation

Parameter	Description	Required
db	Profiler database path	Yes
query	Natural language query	Yes
output	Output format	No

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install huawei-cloud-ascend-profiler-db-explorer
After installation, invoke the skill by name or use /huawei-cloud-ascend-profiler-db-explorer
Provide required inputs per the skill's parameter spec and get structured output

Version History

v0.0.2

No file changes detected. There are no changes to functionality or documentation in this version.

v0.0.1

Initial release of Huawei Cloud Ascend Profiler DB Explorer. - Converts natural language profiling questions into safe executable SQL queries for Ascend PyTorch Profiler/msprof databases. - Supports analysis of operator time consumption, communication, dispatch, and performance bottlenecks. - Extracts table schema from official documentation for safer queries. - Enforces strict SQL best practices: uses CTE macros for key scenarios and restricts ad-hoc SQL/join logic. - Supports table and schema introspection, with clear guidance for query construction and output formatting.

Metadata

Slug huawei-cloud-ascend-profiler-db-explorer

Version 0.0.2

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is huawei-cloud-ascend-profiler-db-explorer?

Convert natural language questions into safe executable SQL to query Ascend PyTorch Profiler / msprof database for operator time consumption, communication,... It is an AI Agent Skill for Claude Code / OpenClaw, with 39 downloads so far.

How do I install huawei-cloud-ascend-profiler-db-explorer?

Run "/install huawei-cloud-ascend-profiler-db-explorer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is huawei-cloud-ascend-profiler-db-explorer free?

Yes, huawei-cloud-ascend-profiler-db-explorer is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does huawei-cloud-ascend-profiler-db-explorer support?

huawei-cloud-ascend-profiler-db-explorer is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created huawei-cloud-ascend-profiler-db-explorer?

It is built and maintained by huaweicloud-skills-team (@huaweiclouddev); the current version is v0.0.2.

More Skills