huawei-cloud-ascend-profiler-db-explorer
/install huawei-cloud-ascend-profiler-db-explorer
Huawei Cloud Ascend Profiler DB Explorer
Overview
This skill converts natural language questions about profiling data into safe SQL queries for Ascend PyTorch Profiler and msprof databases.
Architecture: Natural Language Input → Intent Recognition → SQL Generation → Database Execution → Result Analysis
Related Skills:
huawei-cloud-msot-msopprof-operator-profiler- Operator performance data collectionhuawei-cloud-ascend-small-model-migrate- Migration workflow that uses profiling analysishuawei-cloud-ascendc-operator-performance-optim- Operator optimization workflow
Architecture Components
This skill involves the following cloud services and components:
- MSProf: Ascend profiling tool for data collection and database management
- SQLite: Database engine for storing profiling data
- Ascend NPU: Target hardware for performance profiling
- msprof_mcp: Tool for executing SQL queries on profiling database
Architecture Diagram:
┌─────────────────────────────────────────────────────────────┐
│ Profiler DB Explorer Skill │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Natural │───▶│ SQL │───▶│ Database │ │
│ │ Language │ │ Generation │ │ Execution │ │
│ │ Input │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Intent │ │ CTE Macro │ │ Result │ │
│ │ Recognition │ │ Templates │ │ Analysis │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
Use Cases
Typical Problem Scenarios:
- Analyzing operator time consumption on Ascend NPU
- Identifying communication bottlenecks in distributed training
- Understanding framework dispatch overhead
- Querying profiling database for performance insights
- Debugging performance issues in model inference
Typical User Phrases:
- "Which operators are most time-consuming?"
- "Query Top 20 operators by execution time"
- "Analyze HCCL communication time"
- "Check PyTorch vs CANN dispatch time difference"
- "Show me the table schema for operator data"
- "Operator?"
- "AnalysisOperatorPerformance"
- "QueryprofilerDatabase"
Skill Objectives
- Convert natural language questions to SQL drafts: Quickly construct safe, readable profiling queries based on preset CTE macros and dictionary rules.
- Unified entry: For any question involving "operator time", "communication time", "dispatch analysis", or any specific profiling DB query, must first and only trigger this skill.
- Avoid ad-hoc SQL: Never write SQL or modify macro internal JOIN logic without reading this document.
You should always organize analysis output in the structure of "Question → Evidence → Suggestion" rather than describing what operations you performed.
Role Positioning
You are an Ascend Profiling Database Query and SQL Design Expert, responsible for:
- Understanding user's performance problem intent (operator/communication/dispatch, etc.).
- Selecting appropriate query channel (Track A / Track B).
- Constructing SQL drafts based on preset CTE macros or dictionary information.
- Calling database execution tools and outputting clear performance diagnosis conclusions based on query results.
Usage Scenarios
Prioritize calling this skill in following scenarios:
- User asks "which operators are most time-consuming", "TopK operators", "computation bottlenecks".
- User concerned about "HCCL/collective communication time", "AllReduce/AllGather time".
- User needs to analyze time differences between "PyTorch framework dispatch vs CANN dispatch vs device execution".
- Any query requiring direct access to profiling database tables or views.
Trigger Words (Recall Enhancement)
When user's question contains following words or similar expressions, prioritize triggering this skill:
ascend-pytorch-profiler-db/ascend_pytorch_profiler*.db/msprof_*.dbsqlite/table/schema/fieldTopK operators/communication time/dispatch analysis/scheduling bottleneck
Mandatory Restrictions
- Main query must satisfy at least one of following:
- Contains aggregation functions (e.g.,
SUM,AVG,COUNT, etc.), OR - Explicitly includes
ORDER BY ... LIMIT 20(or smaller LIMIT).
- Contains aggregation functions (e.g.,
- Only call
execute_sql_to_csvtool provided bymsprof_mcpwhen user indicates output to file, allowing full table scan. - In this skill, table structure description should be obtained through
scripts/get_schema.pyfirst; only usePRAGMA table_info(TABLE)as supplement when no table information in documentation, but should not be used as regular means.
Track A: Golden Views / CTE Macros (Priority)
When handling any profiling database query, must first try Track A (fast path):
-
Intent Matching
- Determine if user intent belongs to: operator computation / collective communication / framework dispatch.
- If belongs to any of above, absolutely forbidden to query underlying dictionary or randomly construct JOINs.
-
Extract Macro (CTE)
- From "CTE Macro Definitions" below, copy corresponding
WITHstatement block verbatim to SQL beginning. - Never modify JOIN logic and field expressions inside macros.
- From "CTE Macro Definitions" below, copy corresponding
-
Concatenate Main Query
- After copied
WITH ... AS (...), writeSELECTquery for corresponding view (e.g.,compute_view,comm_view,dispatch_view). - Example:
SELECT op_name, SUM(duration_ns) AS total_ns FROM compute_view GROUP BY op_name ORDER BY total_ns DESC LIMIT 20;
- After copied
Track B: Underlying Documentation / profiler_db_data_format.md
Only enter Track B when one of following conditions met:
- User explicitly requests querying underlying hardware metrics (e.g., PMU counts, memory allocation, Step division, etc.).
- Requirement not covered by existing views in "CTE Macro Definitions".
Core tool for Track B is scripts/get_schema.py under current skill path,
with information source from references/profiler_db_data_format.md.
1. Get Real Table Names from Current DB (Recommended)
First execute sqlite query on target db to get actual tables present in current version:
sqlite3 {db_path} ".tables"
sqlite3 {db_path} "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;"
Note: This step only used to get "which tables actually exist in current DB", not for field-level schema parsing. For field descriptions, use
get_schema.py --table_name.
2. Use Script for Document/DB Alignment (Recommended)
- Purpose: Automatically list document table names, current DB table names, or directly do intersection comparison to reduce manual filtering.
- Command line examples:
cd {skills_path}/huawei-cloud-ascend-profiler-db-explorer/scripts
python3 get_schema.py --list_tables
python3 get_schema.py --db_path {db_path} --list_db_tables
python3 get_schema.py --db_path {db_path} --compare_doc_db
3. get_schema_by_table_name(table_name)
- Purpose: Extract corresponding section (fields, format, description, etc.)
for the table from
profiler_db_.mdby table name. - Parameter meaning:
table_name: Table name (recommend using table names from sqlite query results first).
- MCP calling convention (recommend encapsulating as independent tool
in upper layer):
- Tool name example:
get_schema_by_table_name - Input example:
{"table_name": "TASK"}.
- Tool name example:
- Command line examples:
cd {skills_path}/huawei-cloud-ascend-profiler-db-explorer/scripts
python3 get_schema.py --table_name TASK
python3 get_schema.py --table_name COMMUNICATION_OP
Returns original description paragraph for the table from reference documentation.
Track B Usage Principles
- Use real table names from sqlite query first, then call
get_schema.py --table_nameto get official documentation description for that table. - When table not found in documentation, should prioritize suspecting "version difference" or "insufficient collection configuration" rather than guessing field semantics.
- Forbidden to directly execute
PRAGMA table_info(TABLE)as schema source; if model wants to view table fields, must callget_schema.pyinstead.
Execution and Summary
- Execution: After assembling SQL, call
execute_sqlorexecute_sql_to_csvtool provided bymsprof_mcpto execute query. - Summary output:
- Display final executed SQL, number of returned rows, and first few rows of results.
CTE Macro Definitions (Must Reuse in Track A)
[Highest Warning] Below are macro blocks (CTE) dedicated to Ascend Profiling. In Track A:
- Must completely copy corresponding macro code block as
WITHheader of SQL. - Never modify JOIN, field meaning, or computation logic inside macros.
1. Operator Computation Detail Macro (Compute Macro)
Purpose: Query operator time consumption, TopK operators, computation bottlenecks.
WITH compute_view AS (
SELECT c.globalTaskId, ROUND(t.endNs - t.startNs) AS duration_ns,
n.value AS op_name, type_str.value AS op_type
FROM COMPUTE_TASK_INFO c
LEFT JOIN TASK t ON t.globalTaskId = c.globalTaskId
LEFT JOIN STRING_IDS n ON n.id = c.name
LEFT JOIN STRING_IDS type_str ON type_str.id = c.opType
)
2. Communication Detail Macro (Communication Macro)
Purpose: Query HCCL collective communication (AllReduce, AllGather, etc.) time.
WITH comm_view AS (
SELECT ROUND(c.endNs - c.startNs) AS duration_ns, n.value AS op_name,
t.value AS op_type, g.value AS group_name
FROM COMMUNICATION_OP c
LEFT JOIN STRING_IDS n ON n.id = c.opName
LEFT JOIN STRING_IDS t ON t.id = c.opType
LEFT JOIN STRING_IDS g ON g.id = c.groupName
)
3. Dispatch Mapping Macro (Dispatch Macro)
Purpose: Compare time differences between PyTorch framework dispatch, CANN layer dispatch, and underlying execution to locate scheduling congestion.
WITH dispatch_view AS (
SELECT
ROUND(t.endNs - t.startNs) AS task_duration_ns,
ROUND(c.endNs - c.startNs) AS cann_duration_ns,
ROUND(p.endNs - p.startNs) AS pytorch_duration_ns,
c_str.value AS cann_api_name,
p_str.value AS pytorch_api_name,
t_str.value AS task_type
FROM TASK t
LEFT JOIN CANN_API c ON t.connectionId = c.connectionId
LEFT JOIN CONNECTION_IDS conn ON conn.connectionId = t.connectionId
LEFT JOIN PYTORCH_API p ON p.connectionId = conn.id
LEFT JOIN STRING_IDS c_str ON c.name = c_str.id
LEFT JOIN STRING_IDS p_str ON p.name = p_str.id
LEFT JOIN STRING_IDS t_str ON t.taskType = t_str.id
)
Enhanced Features
Intelligent Bottleneck Diagnoser
This skill includes an AI-powered bottleneck diagnosis system that analyzes profiling data to identify root causes automatically:
Features:
- Automatic Root Cause Analysis: Identifies performance bottlenecks from profiling data
- Bottleneck Classification: Categorizes bottlenecks into memory-bound, compute-bound, communication-bound, or operator-fallback types
- Actionable Recommendations: Provides prioritized optimization recommendations
- Pattern Matching: Detects known performance anti-patterns and suggests fixes
- Impact Assessment: Estimates potential performance improvement from each optimization
Bottleneck Classification:
| Category | Characteristics | Causes | Strategy |
|---|---|---|---|
| Memory-bound | High memory bandwidth | TransData ops | Reduce transfer |
| Compute-bound | High AI_CORE util | Large matmul | Optimize ops |
| Comm-bound | HCCL ops significant | Inefficient coll | Optimize comm |
| Operator-fallback | AI_CPU execution | Missing NPU impl | AscendC ops |
Bottleneck Diagnosis Output:
## Intelligent Bottleneck Diagnosis Report
### Overall Performance Summary
- Total Inference Time: 15.2 ms
- Bottleneck Score: 78/100
- Main Bottleneck Type: Memory-bound
### Identified Bottlenecks
| Rank | Operator | Type | Time | Percentage | Issue |
|------|----------|------|------|------------|-------|
| 1 | TransData | AI_CPU | 4.2 ms | 27.6% | Frequent CPU-NPU transfer |
| 2 | IndexSelect | AI_CPU | 2.8 ms | 18.4% | Operator fallback to CPU |
| 3 | NMS | AI_CPU | 1.5 ms | 9.9% | No NPU implementation |
### Optimization Recommendations
| Priority | Operator | Issue | Solution | Expected Gain |
|----------|----------|-------|----------|---------------|
| P0 | TransData | Data transfer | Reduce redundant movement | 20-25% |
| P1 | IndexSelect | CPU fallback | Implement AscendC version | 15-20% |
| P2 | NMS | CPU fallback | Use NPU-optimized NMS | 10-15% |
### Quick Wins
1. Batch pre-processing on NPU instead of CPU
2. Use async data transfer with overlap
3. Enable memory pooling for intermediate tensors
Reference Documents
| Document | Description |
|---|---|
| Profiler DB Data Format | Table structure |
| Acceptance Criteria | Acceptance criteria |
| Verification Method | Verification approach |
| Troubleshooting | Common issues |
Prerequisites
- msprof >= 7.0.0 installed
- sqlite3 >= 3.0.0 installed
- Have Ascend PyTorch Profiler or msprof generated database file
Core Commands
# Query operator time consumption
python3 scripts/query_profiler_db.py \
--db /path/to/ascend_pytorch_profiler.db \
--query "Top 10 operators by time consumption"
Parameter Confirmation
| Parameter | Description | Required |
|---|---|---|
| db | Profiler database path | Yes |
| query | Natural language query | Yes |
| output | Output format | No |
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install huawei-cloud-ascend-profiler-db-explorer - After installation, invoke the skill by name or use
/huawei-cloud-ascend-profiler-db-explorer - Provide required inputs per the skill's parameter spec and get structured output
What is huawei-cloud-ascend-profiler-db-explorer?
Convert natural language questions into safe executable SQL to query Ascend PyTorch Profiler / msprof database for operator time consumption, communication,... It is an AI Agent Skill for Claude Code / OpenClaw, with 39 downloads so far.
How do I install huawei-cloud-ascend-profiler-db-explorer?
Run "/install huawei-cloud-ascend-profiler-db-explorer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is huawei-cloud-ascend-profiler-db-explorer free?
Yes, huawei-cloud-ascend-profiler-db-explorer is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does huawei-cloud-ascend-profiler-db-explorer support?
huawei-cloud-ascend-profiler-db-explorer is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created huawei-cloud-ascend-profiler-db-explorer?
It is built and maintained by huaweicloud-skills-team (@huaweiclouddev); the current version is v0.0.2.