Description

中国财政部财政收支数据采集与分析。当用户提到以下场景时使用本 skill：(1) 抓取财政数据 - 触发词：抓取财政数据、采集财政数据、最新财政数据、财政数据采集；(2) 分析财政数据 - 触发词：分析财政数据、分析财政赤字、研究财政收入、对比财政收支。负责运行财政部官网的财政数据采集 pipeline，并对采集...

README (SKILL.md)

财政数据采集与分析 Skill

Name: 财政数据采集分析
Author: cy7533

环境准备

本 skill 依赖 conda 环境 scrapyEnv，运行前确认环境已安装：

conda activate scrapyEnv

若环境不存在，根据项目的 environment.yml 创建：

conda env create -f $SKILL_DIR/FinancialDataCollection/environment.yml

入口脚本

python3 $SKILL_DIR/FinancialDataCollection/scripts/run_pipeline.py

可选参数：

--start-month YYYY-MM：起始月份
--end-month YYYY-MM：结束月份
--output-dir DIR：输出目录（默认 output，统一存放于以下路径）

示例：

# 采集全部历史数据
python3 $SKILL_DIR/FinancialDataCollection/scripts/run_pipeline.py --output-dir $WORKSPACE/output

# 只采集 2024 年数据
python3 $SKILL_DIR/FinancialDataCollection/scripts/run_pipeline.py --start-month 2024-01 --end-month 2024-12 --output-dir $WORKSPACE/output

路径说明：

$SKILL_DIR：skill 自身目录（~/.openclaw/skills/financial-data-collection/），项目代码放在 $SKILL_DIR/FinancialDataCollection/ 内

$WORKSPACE：agent 工作区根目录（~/.openclaw/workspace/），采集的输出数据统一放在 $WORKSPACE/output/ 下

迁移到其他机器时，将整个 skill 目录复制过去即可，无需修改任何路径

输出结构

运行后在统一输出路径下按期间组织，结构如下：

$WORKSPACE/output/
├── YYYYMM-YYYYMM/           ← 每个统计区间一个目录
│   ├── raw_documents.xlsx   ← 原始公告层
│   └── extracted_metrics.xlsx ← 原始提取层（累计值）
└── 全量汇总/                ← 全量运行汇总（优先使用）
    └── YYYYMMDDHHMMSS/
        ├── derived_metrics_*.xlsx   ← 推导层（含单月值、赤字派生指标）
        └── monthly_summary_*.xlsx   ← 月度汇总宽表（行=指标，列=各月）

各文件说明

各期间 extracted_metrics.xlsx：原始提取层，每行 = 某指标在某个统计区间的累计值，字段包含 指标（单位：亿元）（即指标名称）、指标值、同比增速、来源公告 等。

全量汇总/derived_metrics_*.xlsx：推导层，共 3 类推导记录：

累计差值推导（单月值）
1-2月平均值拆分
赤字派生指标（窄口径、宽口径）

全量汇总/monthly_summary_*.xlsx：月度宽表，行 = 指标（共 47 项），列 = 各月（201301 起），适合直接做跨年度环比、同比分析。

数据文件使用优先级

⚠️ 优先使用全量汇总文件夹中的文件，仅在汇总文件缺失或需要验证时再查各期间的分文件。

何时用哪个：

分析"某指标的月度趋势 / 同比 / 环比"→ 优先用 monthly_summary_*.xlsx
分析"单月推导值或赤字派生过程"→ 用 derived_metrics_*.xlsx
需要验证某条原始数据来源→ 用各期间的 extracted_metrics_*.xlsx

核心指标口径

财政收入类

全国一般公共预算收入 / 全国税收收入 / 非税收入
中央 / 地方一般公共预算收入
主要税种：国内增值税、消费税、企业所得税、个人所得税、证券交易印花税等

财政支出类

全国一般公共预算支出 / 中央本级支出 / 地方支出
主要支出科目：教育、社保就业、卫生健康、节能环保、交通运输、债务付息等

派生指标

窄口径财政赤字 = 当月一般公共预算支出 − 当月一般公共预算收入
宽口径财政赤字 = (当月一般公共预算支出 + 当月政府性基金支出) − (当月一般公共预算收入 + 当月政府性基金收入)

累计值转单月值规则

1 月：累计值即为当月值
2 月及以后：单月值 = 本期累计值 − 上期累计值（上期 = 同年上一统计区间）
例：1-10月累计 − 1-9月累计 = 10月单月值

爬取异常处理

运行日志中 [WARN] 开头的行为异常记录，格式为：

[WARN] YYYY-MM 指标名称 - 异常原因

常见异常类型及修复方向：

异常类型	异常原因	修复方向
缺少上一期间数据	上期公告未抓取或解析失败	补充抓取上期数据后重跑 pipeline
解析失败	网页结构变化或指标格式不匹配	检查 src/fiscal_parser.py 的解析正则，定位变化点，更新正则或添加新指标映射规则
重复数据	同一指标在同一期间有多条记录	检查去重逻辑，清理 output 缓存后重跑
单位不一致	原文使用万元等非亿元单位	在 fiscal_transform.py 中检查单位转换逻辑

修复流程：

定位异常指标和期间
查看原始公告内容（raw_documents.xlsx 或财政部官网对应页面）
判断是解析规则问题还是数据本身问题
更新 src/fiscal_parser.py 或 fiscal_transform.py 中的对应规则
删除异常期间的缓存目录后重跑

数据分析指引

分析思路

优先读 monthly_summary_*.xlsx：该文件已将所有推导值整理为宽表格式，行 = 47 项指标，列 = 各月（201301 起），适合直接做跨年度同比、环比分析，无需额外计算
若需单月推导细节或赤字派生过程：读 derived_metrics_*.xlsx
若需验证原始数据：读各期间的 extracted_metrics.xlsx

数据粒度说明

monthly_summary：直接是单月值（由原始累计值推导得出）
derived_metrics：记录了推导过程，包含计算公式

生成文件输出路径

⚠️ 本 skill 生成的所有图表、数据文件（如 .png、.csv 等）一律保存到以下目录，不放在 workspace 根目录：

$WORKSPACE/output/artifacts/

输出文件名应包含分析主题和日期，便于识别，例如：tax_revenue_yoy_20260327.png。

分析示例

示例 1：分析 2024 年各月财政赤字趋势

读取 extracted_metrics.xlsx，过滤 2024 年相关期间（如 2024-01、2024-03、2024-06、2024-09、2024-12）
对每个指标计算单月值
财政收入 = 全国一般公共预算收入（单月值）
财政支出 = 全国一般公共预算支出（单月值）
财政赤字 = 财政支出 − 财政收入
绘图或输出表格

示例 2：对比 2024 年和 2025 年同期税收收入

提取目标年度对应期间（如两者均有 1-2 月、1-6 月等累计值）
计算同期单月值或使用累计值直接对比（注意：累计值不能跨年直接对比，需同口径）
计算同比增速

示例 3：识别异常指标

对各指标计算同比增速
标出增速异常（如增幅 > 50% 或降幅 \x3C -30%）的指标
结合政策背景判断是否为数据异常或口径调整

读取 Excel 的参考代码

import pandas as pd

# 各期间分文件
df = pd.read_excel("$WORKSPACE/output/202401-202412/extracted_metrics.xlsx")

# 全量汇总月度宽表（优先使用）
df_summary = pd.read_excel("$WORKSPACE/output/全量汇总/\x3C最新时间戳>/monthly_summary_\x3C时间戳>.xlsx")

触发规则

抓取数据：优先确认目标月份范围，优先使用 --start-month / --end-month 限制范围减少重复抓取
分析数据：优先确认用户需求的时间范围和指标范围，再决定是否需要先运行抓取

Usage Guidance

This skill appears to be what it says: a web crawler + ETL for Ministry of Finance announcements. Before running it: (1) inspect the shipped code (you have it) and run it in an isolated environment (create the conda env named scrapyEnv as instructed); (2) run with --output-dir pointing to a dedicated directory you control (not your OS home) to review generated files; (3) be aware it will make HTTP requests to https://www.mof.gov.cn and follow listing pages — if you need to restrict network access, run it in a sandbox or offline; (4) parsing is regex-based and brittle: verify outputs and test reruns for missing-previous-period warnings; (5) only proceed if you trust the skill source — although no secrets are requested, executing code from unknown authors carries risk, so prefer running in a disposable VM or container first.

Capability Analysis

Type: OpenClaw Skill Name: financial-data-collection Version: 1.0.0 The skill bundle is a legitimate data collection and analysis tool designed to scrape financial reports from the official website of the Ministry of Finance of the People's Republic of China (mof.gov.cn). The Python code implements a standard ETL (Extract, Transform, Load) pipeline using well-known libraries like requests, BeautifulSoup, and openpyxl to convert cumulative fiscal data into monthly metrics and calculate deficits. No evidence of data exfiltration, unauthorized network calls, malicious execution, or harmful prompt injection was found; the logic is transparent and strictly aligned with the stated purpose of financial data processing.

Capability Assessment

✓ Purpose & Capability

Name/description (采集并分析财政部财政收支数据) match the code and runtime instructions: a crawler (requests + BeautifulSoup) that reads Ministry of Finance pages, parsers that extract cumulative values and derive monthly values, transform/export modules that write Excel outputs. No unrelated credentials, binaries, or services are requested.

ℹ Instruction Scope

SKILL.md tells the agent to activate a conda env and run the included run_pipeline.py which enumerates listing pages, fetches official MOF pages, parses content, and writes outputs under an output directory. This stays within the stated purpose. Minor note: SKILL.md recommends storing artifacts under $WORKSPACE/output/artifacts, while the code writes to the provided --output-dir (default 'output') and creates per-period and summary directories; this is an operational mismatch but not malicious.

ℹ Install Mechanism

There is no platform install spec in the registry entry (instruction-only), but the bundle includes code and an environment.yml / requirements.txt that use common packages (requests, beautifulsoup4, lxml, openpyxl). Creating the conda env will install those packages — reasonable for a scraper but the user should run in an isolated environment (no remote arbitrary downloads during install).

✓ Credentials

The skill requires no environment variables, secrets, or external credentials. It only performs HTTP GETs to the official MOF domain and writes local Excel files — request scope is proportional to purpose.

✓ Persistence & Privilege

Skill is not always-included and does not request elevated privileges. It writes outputs into the configured output directory but does not modify other skills or system-wide configurations.

Version History

v1.0.0

Initial release

Metadata

Slug financial-data-collection

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is 财政数据采集分析?

中国财政部财政收支数据采集与分析。当用户提到以下场景时使用本 skill：(1) 抓取财政数据 - 触发词：抓取财政数据、采集财政数据、最新财政数据、财政数据采集；(2) 分析财政数据 - 触发词：分析财政数据、分析财政赤字、研究财政收入、对比财政收支。负责运行财政部官网的财政数据采集 pipeline，并对采集... It is an AI Agent Skill for Claude Code / OpenClaw, with 119 downloads so far.

How do I install 财政数据采集分析?

Run "/install financial-data-collection" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is 财政数据采集分析 free?

Yes, 财政数据采集分析 is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does 财政数据采集分析 support?

财政数据采集分析 is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created 财政数据采集分析?

It is built and maintained by cy7533 (@cy7533); the current version is v1.0.0.

More Skills

财政数据采集分析