← 返回 Skills 市场
huaweiclouddev

huawei-cloud-ascend-op-mfu-calculator

作者 huaweicloud-skills-team · GitHub ↗ · v0.0.2 · MIT-0
cross-platform ✓ 安全检测通过
38
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install huawei-cloud-ascend-op-mfu-calculator
功能描述
Calculate MFU (Machine FLOP Utilization) for operators like matmul/GEMM/FlashAttention on Ascend NPU, providing clear formulas and derivation process Use thi...
使用说明 (SKILL.md)

Huawei Cloud Ascend Operator MFU Calculator

Overview

This skill calculates MFU (Machine FLOP Utilization) for operators like matmul/GEMM/FlashAttention on Ascend NPU, providing clear formulas and derivation process.

Architecture: Input Validation → FLOPs Calculation → Achieved TFLOPs/s → MFU Calculation → Result Analysis

Related Skills:

  • huawei-cloud-ascend-profiler-db-explorer - Profiling data analysis for operator performance data

Prerequisites

  1. Python 3.8+ installed
  2. Basic understanding of FLOPs calculation concepts

Usage Scenarios

Typical Problem Scenarios:

  • Evaluating how well an operator utilizes Ascend NPU compute power
  • Comparing performance of different operator implementations
  • Identifying optimization opportunities for matrix operations

Typical User Utterances:

  • "Calculate MFU for my GEMM operator"
  • "What's the machine FLOP utilization for FlashAttention?"
  • "Analyze my matmul operator performance efficiency"

Workflow

  1. Input Collection: Gather operator parameters (matrix dimensions, data types, execution time)
  2. FLOPs Calculation: Compute theoretical FLOPs for the operation
  3. Achieved Performance: Calculate achieved TFLOPs/s from execution time
  4. MFU Calculation: Apply formula MFU = Achieved FLOPs / Peak FLOPs
  5. Result Analysis: Provide interpretation and optimization suggestions

MFU Calculation Formula

MFU = (Achieved FLOPs / Peak FLOPs) × 100%

Where:

  • Achieved FLOPs = Operation FLOPs / Execution Time
  • Peak FLOPs = Hardware-specific peak performance (e.g., Ascend 910B: 256 TFLOPs for FP16)

Reference Documents

Document Description
Ascend 910B Series Technical Specifications Official Ascend 910B series product specifications
MFU Calculation Methodology Detailed MFU calculation formulas and examples
FlashAttention Technical Paper Original FlashAttention research paper

Enhanced Features

Intelligent Bottleneck Diagnoser

  • AI-powered bottleneck diagnosis that analyzes profiling data to identify root causes automatically
  • Classifies bottlenecks into categories: memory-bound, compute-bound, communication-bound, or operator-fallback
  • Provides actionable optimization recommendations with priority ranking
  • Includes pattern matching for known performance anti-patterns

Parameter Confirmation

Parameter Description Required
operator Operator type (matmul/flash_attention/gemm, etc.) Yes
flops Theoretical FLOPs of the operator Yes
time_ms Operator execution time (milliseconds) Yes
peak_tflops Hardware peak computing power (TFLOPS) Yes
device NPU device type (910B/910, etc.) No
安全使用建议
Install this if you need Ascend MFU calculation guidance. Treat hardware peak numbers and formulas as analysis aids to verify against current official documentation, and only provide profiler CSV files you intend the assistant to analyze.
能力评估
Purpose & Capability
The skill purpose, references, and examples all align around calculating MFU for matmul, GEMM, and FlashAttention performance analysis on Ascend NPUs.
Instruction Scope
The skill declares python3 as an allowed tool and includes examples that may read a user-provided profiling CSV, which is purpose-aligned but should be used only with intended performance data.
Install Mechanism
The artifact contains only markdown files and references; there are no executable scripts, package install hooks, dependencies, or setup commands.
Credentials
Requested capability is proportionate for arithmetic and CSV-based performance calculations; there is no request for credentials, account access, network calls, broad filesystem access, or data mutation.
Persistence & Privilege
No persistence, background workers, privilege escalation, credential/session handling, or long-running execution is present in the artifacts.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install huawei-cloud-ascend-op-mfu-calculator
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /huawei-cloud-ascend-op-mfu-calculator 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.0.2
No changes detected in this version. - No file changes or updates were made compared to the previous version.
v0.0.1
- Initial release of the Huawei Cloud Ascend Operator MFU Calculator skill. - Calculates MFU (Machine FLOP Utilization) for operators such as MatMul, GEMM, and FlashAttention on Ascend NPUs. - Provides step-by-step calculation: input parameters → FLOPs calculation → achieved performance → MFU computation → result analysis. - Includes intelligent bottleneck diagnosis with optimization suggestions and pattern recognition. - Offers clear usage scenarios, formula documentation, and hardware-specific references for Ascend devices.
元数据
Slug huawei-cloud-ascend-op-mfu-calculator
版本 0.0.2
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 2
常见问题

huawei-cloud-ascend-op-mfu-calculator 是什么?

Calculate MFU (Machine FLOP Utilization) for operators like matmul/GEMM/FlashAttention on Ascend NPU, providing clear formulas and derivation process Use thi... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 38 次。

如何安装 huawei-cloud-ascend-op-mfu-calculator?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install huawei-cloud-ascend-op-mfu-calculator」即可一键安装,无需额外配置。

huawei-cloud-ascend-op-mfu-calculator 是免费的吗?

是的,huawei-cloud-ascend-op-mfu-calculator 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

huawei-cloud-ascend-op-mfu-calculator 支持哪些平台?

huawei-cloud-ascend-op-mfu-calculator 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 huawei-cloud-ascend-op-mfu-calculator?

由 huaweicloud-skills-team(@huaweiclouddev)开发并维护,当前版本 v0.0.2。

💬 留言讨论