huawei-cloud-ascend-op-mfu-calculator
/install huawei-cloud-ascend-op-mfu-calculator
Huawei Cloud Ascend Operator MFU Calculator
Overview
This skill calculates MFU (Machine FLOP Utilization) for operators like matmul/GEMM/FlashAttention on Ascend NPU, providing clear formulas and derivation process.
Architecture: Input Validation → FLOPs Calculation → Achieved TFLOPs/s → MFU Calculation → Result Analysis
Related Skills:
huawei-cloud-ascend-profiler-db-explorer- Profiling data analysis for operator performance data
Prerequisites
- Python 3.8+ installed
- Basic understanding of FLOPs calculation concepts
Usage Scenarios
Typical Problem Scenarios:
- Evaluating how well an operator utilizes Ascend NPU compute power
- Comparing performance of different operator implementations
- Identifying optimization opportunities for matrix operations
Typical User Utterances:
- "Calculate MFU for my GEMM operator"
- "What's the machine FLOP utilization for FlashAttention?"
- "Analyze my matmul operator performance efficiency"
Workflow
- Input Collection: Gather operator parameters (matrix dimensions, data types, execution time)
- FLOPs Calculation: Compute theoretical FLOPs for the operation
- Achieved Performance: Calculate achieved TFLOPs/s from execution time
- MFU Calculation: Apply formula MFU = Achieved FLOPs / Peak FLOPs
- Result Analysis: Provide interpretation and optimization suggestions
MFU Calculation Formula
MFU = (Achieved FLOPs / Peak FLOPs) × 100%
Where:
- Achieved FLOPs = Operation FLOPs / Execution Time
- Peak FLOPs = Hardware-specific peak performance (e.g., Ascend 910B: 256 TFLOPs for FP16)
Reference Documents
| Document | Description |
|---|---|
| Ascend 910B Series Technical Specifications | Official Ascend 910B series product specifications |
| MFU Calculation Methodology | Detailed MFU calculation formulas and examples |
| FlashAttention Technical Paper | Original FlashAttention research paper |
Enhanced Features
Intelligent Bottleneck Diagnoser
- AI-powered bottleneck diagnosis that analyzes profiling data to identify root causes automatically
- Classifies bottlenecks into categories: memory-bound, compute-bound, communication-bound, or operator-fallback
- Provides actionable optimization recommendations with priority ranking
- Includes pattern matching for known performance anti-patterns
Parameter Confirmation
| Parameter | Description | Required |
|---|---|---|
| operator | Operator type (matmul/flash_attention/gemm, etc.) | Yes |
| flops | Theoretical FLOPs of the operator | Yes |
| time_ms | Operator execution time (milliseconds) | Yes |
| peak_tflops | Hardware peak computing power (TFLOPS) | Yes |
| device | NPU device type (910B/910, etc.) | No |
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install huawei-cloud-ascend-op-mfu-calculator - 安装完成后,直接呼叫该 Skill 的名称或使用
/huawei-cloud-ascend-op-mfu-calculator触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
huawei-cloud-ascend-op-mfu-calculator 是什么?
Calculate MFU (Machine FLOP Utilization) for operators like matmul/GEMM/FlashAttention on Ascend NPU, providing clear formulas and derivation process Use thi... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 38 次。
如何安装 huawei-cloud-ascend-op-mfu-calculator?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install huawei-cloud-ascend-op-mfu-calculator」即可一键安装,无需额外配置。
huawei-cloud-ascend-op-mfu-calculator 是免费的吗?
是的,huawei-cloud-ascend-op-mfu-calculator 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
huawei-cloud-ascend-op-mfu-calculator 支持哪些平台?
huawei-cloud-ascend-op-mfu-calculator 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 huawei-cloud-ascend-op-mfu-calculator?
由 huaweicloud-skills-team(@huaweiclouddev)开发并维护,当前版本 v0.0.2。