huawei-cloud-ascend-op-mfu-calculator
/install huawei-cloud-ascend-op-mfu-calculator
Huawei Cloud Ascend Operator MFU Calculator
Overview
This skill calculates MFU (Machine FLOP Utilization) for operators like matmul/GEMM/FlashAttention on Ascend NPU, providing clear formulas and derivation process.
Architecture: Input Validation → FLOPs Calculation → Achieved TFLOPs/s → MFU Calculation → Result Analysis
Related Skills:
huawei-cloud-ascend-profiler-db-explorer- Profiling data analysis for operator performance data
Prerequisites
- Python 3.8+ installed
- Basic understanding of FLOPs calculation concepts
Usage Scenarios
Typical Problem Scenarios:
- Evaluating how well an operator utilizes Ascend NPU compute power
- Comparing performance of different operator implementations
- Identifying optimization opportunities for matrix operations
Typical User Utterances:
- "Calculate MFU for my GEMM operator"
- "What's the machine FLOP utilization for FlashAttention?"
- "Analyze my matmul operator performance efficiency"
Workflow
- Input Collection: Gather operator parameters (matrix dimensions, data types, execution time)
- FLOPs Calculation: Compute theoretical FLOPs for the operation
- Achieved Performance: Calculate achieved TFLOPs/s from execution time
- MFU Calculation: Apply formula MFU = Achieved FLOPs / Peak FLOPs
- Result Analysis: Provide interpretation and optimization suggestions
MFU Calculation Formula
MFU = (Achieved FLOPs / Peak FLOPs) × 100%
Where:
- Achieved FLOPs = Operation FLOPs / Execution Time
- Peak FLOPs = Hardware-specific peak performance (e.g., Ascend 910B: 256 TFLOPs for FP16)
Reference Documents
| Document | Description |
|---|---|
| Ascend 910B Series Technical Specifications | Official Ascend 910B series product specifications |
| MFU Calculation Methodology | Detailed MFU calculation formulas and examples |
| FlashAttention Technical Paper | Original FlashAttention research paper |
Enhanced Features
Intelligent Bottleneck Diagnoser
- AI-powered bottleneck diagnosis that analyzes profiling data to identify root causes automatically
- Classifies bottlenecks into categories: memory-bound, compute-bound, communication-bound, or operator-fallback
- Provides actionable optimization recommendations with priority ranking
- Includes pattern matching for known performance anti-patterns
Parameter Confirmation
| Parameter | Description | Required |
|---|---|---|
| operator | Operator type (matmul/flash_attention/gemm, etc.) | Yes |
| flops | Theoretical FLOPs of the operator | Yes |
| time_ms | Operator execution time (milliseconds) | Yes |
| peak_tflops | Hardware peak computing power (TFLOPS) | Yes |
| device | NPU device type (910B/910, etc.) | No |
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install huawei-cloud-ascend-op-mfu-calculator - After installation, invoke the skill by name or use
/huawei-cloud-ascend-op-mfu-calculator - Provide required inputs per the skill's parameter spec and get structured output
What is huawei-cloud-ascend-op-mfu-calculator?
Calculate MFU (Machine FLOP Utilization) for operators like matmul/GEMM/FlashAttention on Ascend NPU, providing clear formulas and derivation process Use thi... It is an AI Agent Skill for Claude Code / OpenClaw, with 38 downloads so far.
How do I install huawei-cloud-ascend-op-mfu-calculator?
Run "/install huawei-cloud-ascend-op-mfu-calculator" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is huawei-cloud-ascend-op-mfu-calculator free?
Yes, huawei-cloud-ascend-op-mfu-calculator is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does huawei-cloud-ascend-op-mfu-calculator support?
huawei-cloud-ascend-op-mfu-calculator is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created huawei-cloud-ascend-op-mfu-calculator?
It is built and maintained by huaweicloud-skills-team (@huaweiclouddev); the current version is v0.0.2.