← Back to Skills Marketplace
jackdark425

Data Quality Audit

by jackdark · GitHub ↗ · v0.8.2 · MIT-0
cross-platform ✓ Security Clean
134
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install data-quality-audit
Description
Independent cross-source audit of a completed CN banker deliverable. Use when the user asks to audit / double-check / 双核实 / 交叉验证 / 数据质量检查 / 审计 an existing de...
README (SKILL.md)

Data Quality Audit Skill

独立交叉核实 CN banker 交付物的数据质量 — 找到看似合理实则错误的数字

Why this exists

现有三道 gate(verify_intelligence / cn_typo_scan / provenance_verify)保证:每个硬数字有来源无 escape typo行文对得上溯源表

但它们不保证数字本身对。具体漏洞场景:

  • Agent 抄错了 Tushare 返回值(小数点位移 / 单位换算错)
  • data-provenance.md 注明"来源:巨潮 2024 年报"但根本没去拉取
  • 单源数字没做 cross-check(Tushare 跟东方财富同期数据差 5% 以上)
  • 内部一致性违反(说 "毛利率 98%"、说 "营收 100 亿 / 毛利 80 亿" → 实际毛利率 80% 自相矛盾)
  • 市值 ≠ 股价 × 股本(股本数据过期)

本 skill 做的是独立审计:再走一遍数据源 + 对账 + 常识 check,给出审计报告。类似内审对已完成账目的二次核查。

When to trigger

用户说:

  • "audit ~/deliverables/xxx"
  • "双核实一下 xxx 这份报告"
  • "对 xxx 做数据质量检查"
  • "交叉验证 xxx 的数字"
  • "给 xxx 跑审计"

或者在 banker workflow 最后,user 要求"审计完成再交付"。

Workflow

Step 1 — Parse data-provenance.md

\x3Cdeliverable>/data-provenance.md(standard schema:每行一条"指标 | 数值 | 单位 | 期间 | Tier | 源 | URL/工具 | 取数时间 | 交叉验证状态")。可用 helper:

python3 ~/.openclaw/extensions/aigroup-lead-discovery-openclaw/skills/data-quality-audit/scripts/quality-audit.py \
    --parse-only \
    ~/deliverables/\x3Ccompany>/data-provenance.md
# stdout: JSON array of {metric, value, unit, period, tier, source, url_tool, fetched_at, status}

如果 provenance 文件格式不标准,回退到 regex 扫 analysis.md 的硬数字 + 第一次提到的 source。

Step 2 — Independent re-fetch per hard number

对每条硬数字,调用一个不同于 provenance 声明源的 MCP重新 fetch:

Provenance claims source Audit must call
Tushare / aigroup-market-mcp 东方财富 web_fetch / FMP (港股) / 巨潮 PDF
巨潮 / CNINFO Tushare / 东方财富 web_fetch
FMP / Finnhub Tushare / yfinance web_fetch
天眼查 / 企查查 国家信用 gsxt.gov.cn web_fetch / 互查
财经媒体(财新 / 21 世纪 / 财联社) 找原始官方披露(交易所 / 巨潮)

不要用声明的同一源去"核实"——那不是核实,是回抄。

取不到第二源时 → 标 FLAG (single-source-unverifiable),不算 FAIL。

Step 3 — Diff & classify

对比 audit-fetched value vs provenance value:

Diff 分类
abs(diff) / value \x3C 2% PASS (exact)
abs(diff) / value ∈ [2%, 5%) PASS (minor variance) — 两源 reporting 习惯不同属正常
abs(diff) / value ∈ [5%, 15%) FLAG (material variance) — 需人工复核
abs(diff) / value ≥ 15% 或符号反转 FAIL (material conflict) — deliverable 不得原样交付
第二源不可获取 FLAG (single-source-unverifiable)
第二源 404 / 403 / 数据不存在 FLAG (second-source-unavailable)

Step 4 — Common-sense sanity rules

把 parsed hard numbers 喂给 scripts/common-sense-rules.yaml 定义的规则集:

  • gross_margin_range: 公司毛利率应该在 [-20%, 95%](负毛利或超过 95% 需人工确认)
  • revenue_growth_range: 营收同比增速应该在 [-50%, +200%](业务剧震需 flag)
  • market_cap_price_shares: 若同份交付物声明了市值 / 股价 / 股本,abs(market_cap − price × shares_outstanding) / market_cap \x3C 3%
  • gross_profit_identity: 若声明了营收 / 毛利 / 毛利率,abs(gross_profit − revenue × gross_margin) / gross_profit \x3C 3%
  • net_margin_not_above_gross: 净利率不得 > 毛利率(基本恒等式)
  • ocf_net_income_direction: 经营现金流和净利同向(若一正一负,flag)
  • employee_market_cap_ratio: 市值 / 员工数 应该在 [50 万, 5 亿 RMB] 这个大区间(否则数据可疑,但只 flag 不 fail)
  • dividend_payout_bound: 分红率超过 100% 需人工确认(转增 / 资本公积送股可能误读)
  • roe_extreme: ROE 超过 50% 或为负需核验(杠杆 + 特殊事件)
  • valuation_multiple_sanity: P/E ≤ 200 且 EV/EBITDA ≤ 100
  • restatement_aware (NEW): 对重述敏感字段 {EPS, 归母净利润, BPS, 毛利率, 净资产, 营业收入},若 primary 值与第二源交叉验证差异 > 10%,flag 为可能 pre-restatement vs post-restatement 混用 —— 推荐优先核对交易所 XBRL 最终版最新年报重述版本(而非最早刊发的原始版本)。海天味业 2026-04-19 audit 正是被这条规则定位到 EPS 2022 1.34 元 (pre-restatement) vs 1.11 元 (post-restatement) 差 20.7% 的真实场景。
  • roe_definition_check (NEW): ROE 口径规范 —— 若 provenance.md 的 derivation 字段写作 "净利/营收" / "NI/Revenue" 而指标名是 ROE / 净资产收益率,判为 fail(这是净利率不是 ROE)。正确公式:ROE = 净利润 / 平均净资产。五粮液 2026-04-18 audit 的 ROE ~22% vs 年报 25.06% 差异即来自此错口径。
  • price_basis_check (NEW): 同一公司股价与第二源差异 > 3% → flag。常见原因:T+0 vs T-1 vs 最新实时报价 vs 币种 vs 复权/不复权。海天 2026-04-19 audit 的 37.68 元 vs 35.60 元 即价差基准不一致。

每条规则违反 → 单独产一条 FLAG 或 FAIL 条目。当前规则集总数 13(10 原有 + 3 新 restatement_aware / roe_definition_check / price_basis_check)。

Step 5 — Emit audit-report.md

输出到 \x3Cdeliverable>/audit-report.md,结构:

# Audit Report — \x3CCompany> \x3Cticker>

**Audit date:** 2026-04-18
**Target deliverable:** /Users/jackdong/deliverables/\x3Ccompany>/
**Auditor:** aigroup-lead-discovery-openclaw/[email protected]

## Overall verdict

OVERALL PASS  (12/14 PASS, 2 FLAG, 0 FAIL)

## Per-number cross-check

| Metric | MD value | Independent source | Independent value | Diff | Verdict |
|--------|---------|--------------------|-------------------|------|---------|
| 2024Q3 营收 | 1,088 亿元 | 东方财富 Choice | 1,086 亿元 | -0.2% | PASS (exact) |
| 2024-04-17 市值 | 20,150 亿元 | Tushare stock_data | 20,180 亿元 | +0.15% | PASS (exact) |
| 员工数 | 30,000 | 巨潮 2023 年报 | N/A | — | FLAG (second-source-unavailable) |
| ... |

## Common-sense rules

- ✅ gross_margin_range (88% ∈ [-20%, 95%])
- ✅ revenue_growth_range (+17% YoY ∈ [-50%, +200%])
- ❌ gross_profit_check: 声明营收 1088 × 毛利率 92% = 1001,但 MD 里毛利 980 → 差 2.1%(容差内,PASS)
- ⚠️ employee_mkt_ratio: 20150/3 = 6717 万/人(高端白酒本就如此,FLAG 非 FAIL)

## Action items

- [ ] Re-fetch 员工数: 尝试国家信用公示 gsxt.gov.cn
- [ ] 补充 2024 Q1 毛利率 source(当前单源)

## Raw audit data

JSON dump at `\x3Cdeliverable>/audit-raw.json`.

Step 6 — Return verdict to user

Agent 的 final message 必须包含:

  • overall verdict(OVERALL PASS / OVERALL FLAG: N items / OVERALL FAIL: M items)
  • path to audit-report.md
  • path to audit-raw.json
  • 前三条 action items(如有)

What this skill does NOT do

  • 不重新走 banker analysis workflow(那是 datapack-builder / dcf-model 的事)
  • 不重新跑 verify_intelligence / cn_typo_scan / provenance_verify(那是 validate-delivery.py 的事)
  • 不改 deliverable 本身 —— audit 只出报告;人工决定是否回炉改

Pair with validate-delivery.py

建议顺序:

  1. validate-delivery.py — 快速 gate(exit 0 方可考虑交付)
  2. data-quality-audit(本 skill)— 深度审计(OVERALL PASS 方可 client-ship)

gate 通过不等于 audit 通过;反之亦然。两层都是必要的。

Usage Guidance
This skill appears to do what it says: parse provenance tables, re-fetch the same metrics from independent market sources, apply sanity rules, and write an audit report. Before running it: 1) Ensure the agent/environment has the appropriate MCP connectors and API keys (Tushare, FMP, web_fetch, etc.) and that those credentials are minimal-scope and trusted. 2) Only run the audit on deliverable directories you expect it to read — it will parse and write files under that path (audit-report.md, audit-raw.json). 3) Expect network calls to official market/data endpoints; verify your platform's outbound rules if you need to restrict traffic. 4) Review the produced audit-report.md before sharing externally (the report contains fetched financial data). If you need stricter guarantees, inspect/limit which MCPs the agent may use or run the audit in an isolated environment.
Capability Analysis
Type: OpenClaw Skill Name: data-quality-audit Version: 0.8.2 The skill bundle is a specialized tool for auditing financial data quality by cross-referencing report values against independent sources (e.g., Tushare, CNINFO). The Python script `quality-audit.py` is a standard markdown table parser, and `common-sense-rules.yaml` contains domain-specific financial validation logic. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found; the instructions in `SKILL.md` are strictly aligned with the stated purpose of financial auditing.
Capability Assessment
Purpose & Capability
The name/description match the provided SKILL.md, YAML rules, and the parser script: the skill parses a deliverable's data-provenance.md, re-fetches numbers from independent market connectors, applies sanity rules, and emits an audit report. It does not request unrelated environment variables or binaries. The only external dependency implied is access to the platform's MCP/web-fetch tools (expected for cross-source verification).
Instruction Scope
SKILL.md explicitly instructs the agent to read <deliverable>/data-provenance.md, independently fetch values from other market sources, apply the rule set, write audit-report.md and audit-raw.json, and return a verdict. Those file reads/writes are coherent with the stated audit purpose. There are no instructions to read unrelated system files, to exfiltrate deliverable contents to unknown endpoints, or to access environment variables beyond what MCP fetchers may require.
Install Mechanism
No install spec; this is instruction-only with one small helper script (a deterministic markdown parser). Nothing is downloaded or written to system paths by an installer. Risk from install mechanism is minimal.
Credentials
The skill declares no required env vars or credentials, which is proportionate. Note: real cross-source fetches (Tushare, FMP, vendor MCPs) typically require API keys or access configured on the agent platform; the SKILL.md assumes those MCP tools and creds exist but does not request them. This is reasonable but means the agent executing the skill must already have the necessary credentials (scope them appropriately).
Persistence & Privilege
always:false and default model-invocation settings are normal. The skill does not request permanent presence or modify other skills. It reads and writes files inside the target deliverable directory only, which matches its purpose.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install data-quality-audit
  3. After installation, invoke the skill by name or use /data-quality-audit
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v0.8.2
- Added new rule checks: restatement_aware, roe_definition_check, and price_basis_check (total rule set increased to 13). - Enhanced common-sense sanity checks to flag or fail suspect data based on updated rules. - Improved cross-source auditing workflow: enforces use of independent data sources for each number, flags unverifiable or unavailable second sources. - Audit report now provides clearer verdicts (PASS / FLAG / FAIL) and includes actionable next steps. - Documentation thoroughly explains audit logic, typical failure scenarios, and integration with other data verification steps.
Metadata
Slug data-quality-audit
Version 0.8.2
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Data Quality Audit?

Independent cross-source audit of a completed CN banker deliverable. Use when the user asks to audit / double-check / 双核实 / 交叉验证 / 数据质量检查 / 审计 an existing de... It is an AI Agent Skill for Claude Code / OpenClaw, with 134 downloads so far.

How do I install Data Quality Audit?

Run "/install data-quality-audit" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Data Quality Audit free?

Yes, Data Quality Audit is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Data Quality Audit support?

Data Quality Audit is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Data Quality Audit?

It is built and maintained by jackdark (@jackdark425); the current version is v0.8.2.

💬 Comments