← Back to Skills Marketplace
liangzaiz666

Data Classification

by liangzai · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
48
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install data-classification
Description
对单一数据字段名或数据库表 SQL文件进行数据分类分级,支持GB/T 43697-2024 通用数据分类分级、JR/T 0197-2020 金融数据分类分级。
README (SKILL.md)

Data Classification

Purpose

Classify user-provided field names or SQL DDL into:

  1. 普通/通用数据分类分级: GB/T 43697-2024 style category + level (一般数据 / 重要数据 / 核心数据).
  2. 金融数据双标签体系: general label + JR/T 0197-2020 financial label (一级/二级/三级/四级子类 + 最低安全级别1-5).

This skill produces classification suggestions, not final regulatory determinations. Mark uncertain items for business-owner review.

Quick workflow

  1. Identify input type:
    • Single field name: classify directly.
    • SQL/DDL file: extract table names, column names, types, and comments.
  2. Run the helper when useful:
    python3 skills/data-classification/scripts/classify_data.py --field "customer_id" --mode finance
    python3 skills/data-classification/scripts/classify_data.py --sql path/to/schema.sql --mode finance --format markdown
    
  3. Review financial rows against JR/T 0197-2020 Appendix A before falling back to heuristics:
    • references/jrt0197-appendix-a-full.csv is the machine-readable full Appendix A table.
    • references/jrt0197-appendix-a-compact.md is the human-readable compact Appendix A table.
    • references/financial-dual-label.md contains dual-label workflow and fallback heuristics.
    • references/general-rules.md contains GB/T 43697-2024 logic.
  4. Return a field-level result that covers every input field. Do not replace the full field list with a summary.
  5. Choose output delivery by field count internally, but do not explain this threshold policy to the user:
    • ≤20 fields: output the complete field-level table inline in chat; do not create/attach files unless the user explicitly asks for an export/file.
    • >20 fields: save the complete field-level result as a CSV file, attach it with a MEDIA: line using the CSV file's absolute filesystem path, show the first 20 classified fields inline, and include a coverage statement in the message. Do not inline rows after the first 20. Never provide only a plain local path as the download method. Do not write the coverage statement into the CSV file itself.
  6. Run a coverage check before finalizing: compare parsed/input field count with classified output row count. If any field is missing, fix the output or explicitly mark the field as [blocked: 未解析/缺少字段信息].
  7. Ask for business context only if the field name/comment is too ambiguous.

Output requirements

For a single field, include:

  • 字段名
  • 通用分类:行业领域、描述对象/数据主体、内容类别
  • 通用分级:一般/重要/核心 + 理由
  • 置信度与需确认点
  • 金融标签(仅金融场景输出):推荐的一级/二级/三级/四级子类 + 最低安全级别
  • 候选金融标签(仅金融场景输出):当字段可落入多个 JR/T 分类时,一并列出候选项并说明推荐依据
  • 双标签结果(仅金融场景输出):通用标签 + 金融标签

For a SQL file/table, classify all parsed columns from all tables. Choose the delivery format internally and do not tell the user the threshold/routing rule.

  • ≤20 fields: the complete field-level table inline. Do not create/attach files unless the user explicitly asks for an export/file.
  • >20 fields: create a complete CSV result file. Return a concise completion note, attach the CSV with MEDIA:\x3Cabsolute-csv-path> on its own line so the UI can render a downloadable link, include the first 20 field-level rows inline, and include the coverage statement in the message. Do not inline rows after the first 20. Do not rely on a bare local path as the user's download link. Do not include the coverage statement as a row in the CSV file.

Do not provide only a subset such as “core fields”, “sample rows”, or “summary table” unless the user explicitly asks for a summary.

The following output columns are mandatory for every field and must be non-empty in all scenarios:

  1. 字段名
  2. 通用分类
  3. 通用分级
  4. 置信度

For financial data/scenarios only, also include these mandatory non-empty columns:

  1. 推荐金融分类标签
  2. JR/T最低级别
  3. 候选金融标签

For financial fields, match against references/jrt0197-appendix-a-full.csv or references/jrt0197-appendix-a-compact.md first. Use financial-dual-label.md heuristics only when Appendix A has no clear match or when field/table context creates multiple reasonable candidates.

For non-financial data, do not output 推荐金融分类标签JR/T最低级别候选金融标签.

Recommended non-financial table shape:

表名 字段名 类型/注释 通用分类 通用分级 置信度 依据/备注

Recommended financial table shape:

表名 字段名 类型/注释 通用分类 通用分级 推荐金融分类标签 JR/T最低级别 候选金融标签 双标签 置信度 依据/备注

After the table, include a coverage line:

覆盖校验:输入/解析字段 N 个,已分类 N 个,遗漏 0 个。

If output is saved to a file, still include the coverage line in the message and an attachment. For CSV outputs, include MEDIA:\x3Cabsolute-csv-path> on its own line so the user can click/download directly; use the absolute path returned by the file-writing step, not a relative workspace path. File output is allowed for >20 fields as CSV, or whenever the user explicitly requests a file/export. Do not write the coverage line into the CSV file. Do not explain that files are chosen because of the field-count threshold unless the user asks why.

Classification principles

  • Coverage is mandatory: every user-provided field/parsed SQL column must receive a classification row.
  • Use 就高从严: if multiple rules match, choose the stricter level as the recommendation, list reasonable candidate labels, and explain why.
  • Treat field names alone as weak evidence; comments and table names improve confidence.
  • Do not infer 核心数据 from a field name alone unless the field clearly describes large-scale national/security/critical-infrastructure data. Usually mark as 需人工确认.
  • 重要数据 usually requires scale, coverage, precision, or public/national impact context. For isolated personal or organization fields, default to 一般数据 unless a law/industry rule says otherwise.
  • For financial data, Appendix A match takes precedence over broad keyword heuristics. JR/T 0197 level is the minimum security level; business context may raise it.
  • For personal financial information, authentication credentials, biometric identifiers, account/payment/transaction data, and credit data should be handled conservatively.

Helper script notes

classify_data.py is deterministic and heuristic. It is designed for first-pass tagging:

  • Inputs: --field, --fields, or --sql.
  • Modes: general, finance.
  • Formats: markdown, json, csv.
  • It parses common CREATE TABLE DDL and column comments.
  • Low confidence means the assistant should inspect context and possibly ask one focused follow-up.

References

  • references/general-rules.md: compact GB/T 43697-2024 classification/grading rules.
  • references/financial-dual-label.md: financial dual-label workflow and fallback heuristics.
  • references/jrt0197-appendix-a-compact.md: compact human-readable JR/T 0197-2020 Appendix A typical data grading table.
  • references/jrt0197-appendix-a-full.csv: full machine-readable JR/T 0197-2020 Appendix A typical data grading table.
Usage Guidance
This skill looks reasonable for classifying field names and SQL/DDL schemas. Before using it, be comfortable with a local Python helper reading the schema you provide and with CSV files being generated for larger outputs. Avoid supplying real data rows or secrets when schema metadata is enough, and delete generated CSVs if they contain sensitive schema details.
Capability Analysis
Type: OpenClaw Skill Name: data-classification Version: 1.0.0 The skill bundle is a legitimate tool designed for data classification and grading according to Chinese national and financial standards (GB/T 43697-2024 and JR/T 0197-2020). The core logic in `scripts/classify_data.py` uses deterministic regex matching to analyze SQL DDL or field names, and the `SKILL.md` provides clear instructions for the agent to format results or generate CSV files for large datasets. No evidence of data exfiltration, malicious execution, or harmful prompt injection was found.
Capability Tags
cryptocan-make-purchases
Capability Assessment
Purpose & Capability
The artifacts consistently describe classifying user-provided field names or SQL/DDL using bundled reference rules; no artifact-backed purchase execution, account mutation, credential use, or network transfer is shown.
Instruction Scope
The instructions tell the agent to use an internal field-count threshold and not proactively explain that routing rule, which is not harmful by itself but is less transparent to users.
Install Mechanism
There is no install spec or remote installer; the documented helper is a bundled local Python script.
Credentials
For larger inputs, the skill directs the agent to write a CSV and attach it with an absolute path; this is aligned with bulk classification output but creates a local file containing schema/classification metadata.
Persistence & Privilege
The only persistence evidenced is generated CSV output; there is no evidence of background services, credential storage, elevated privileges, or long-running agents.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install data-classification
  3. After installation, invoke the skill by name or use /data-classification
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release: supports general data classification/grading and financial dual-label classification.
Metadata
Slug data-classification
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Data Classification?

对单一数据字段名或数据库表 SQL文件进行数据分类分级,支持GB/T 43697-2024 通用数据分类分级、JR/T 0197-2020 金融数据分类分级。 It is an AI Agent Skill for Claude Code / OpenClaw, with 48 downloads so far.

How do I install Data Classification?

Run "/install data-classification" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Data Classification free?

Yes, Data Classification is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Data Classification support?

Data Classification is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Data Classification?

It is built and maintained by liangzai (@liangzaiz666); the current version is v1.0.0.

💬 Comments