功能描述

提供从巨潮资讯自动下载A股上市公司年报PDF并提取结构化财务数据，支持数据验证和批量处理。

使用说明 (SKILL.md)

财务分析数据采集流水线

Name: financial-data-gateway
Author: shihugh5-lab

LibraQuant 财务分析数据采集流水线包含 2 个核心技能，提供从数据源获取到结构化提取的完整解决方案。

技能总览

类别	数量	功能范围
数据采集	1	从巨潮资讯网下载A股年报PDF
数据提取	1	从PDF中提取结构化财务数据

一、数据采集 (1 Skill)

巨潮资讯年报下载

Skill	用途	数据来源
`cninfo-report-download`	从巨潮资讯网下载A股年报PDF	cninfo.com.cn

功能说明：

支持按公司名称和年份精确搜索年报
自动识别上交所/深交所
智能筛选正确版本（排除摘要、更正、英文版）
自动清理文件名中的HTML标签和非法字符

二、数据提取 (1 Skill)

财务报表数据提取

Skill	用途	输出格式
`financial-statement-extraction`	从年报PDF提取结构化财务数据	JSON

功能说明：

AI智能定位财务报表页码（双层定位策略）
三层防线过滤母公司数据（页级/混合页/行级）
自动识别并统一单位（元/千元/万元/亿元）
内置勾稽关系校验（资产负债平衡、净利率合理性）
支持资产负债表、利润表、现金流量表提取

三、使用流程

标准使用流程（推荐）

┌─────────────────────────┐
│  cninfo-report-download │  ← 输入：公司名称 + 年份
│  巨潮资讯年报下载        │     输出：PDF文件路径
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│ financial-statement-    │  ← 输入：PDF文件路径
│ extraction              │     输出：JSON结构化数据
│ 财务报表数据提取         │
└─────────────────────────┘

使用示例

示例1：单公司年报分析

cninfo-report-download — 下载"海天味业"2024年年报
financial-statement-extraction — 提取财务数据
下游分析技能（如DCF估值、财务比率分析）

示例2：批量数据收集

循环调用 cninfo-report-download — 下载多家公司年报
循环调用 financial-statement-extraction — 批量提取数据
构建财务数据库或进行横向对比分析

四、输入输出规范

cninfo-report-download

输入：

参数	类型	必填	说明
company_name	str	✅	公司简称（如"海天味业"）
year	int	✅	年报年份（如2024）
output_dir	str	可选	下载目录，默认`workspace/reports/`

输出：

成功：PDF文件路径（str）
失败：None

financial-statement-extraction

输入：

参数	类型	必填	说明
pdf_path	str	✅	年报PDF文件路径
output_dir	str	可选	输出目录
extract_notes	bool	可选	是否提取附注（v2.1+）

输出：JSON格式结构化数据

{
  "meta": { "company": "公司名", "stock_code": "代码", ... },
  "balance_sheet": { "资产总计": 82.37, ... },
  "income_statement": { "营业收入": 55.19, ... },
  "cashflow_statement": { "经营活动现金流净额": 10.88, ... },
  "validation": { "balance_check": {...}, ... }
}

五、数据质量保障

校验机制

校验项	规则	处理方式
资产负债表平衡	资产 = 负债 + 权益	差异>1%触发降级重提取
净利率合理性	净利率 > 100% 可能混入母公司数据	自动触发降级重提取
数据完整性	检查关键科目是否存在	缺失时警告
母公司数据过滤	三层防线确保只提取合并报表	页级/混合页/行级过滤

已验证的数据源

公司	代码	交易所	年份	状态
海天味业	603288	SSE	2024	✅
千禾味业	603027	SSE	2024	✅
中炬高新	600872	SSE	2024	✅
甘源食品	002991	SZSE	2024	✅

六、已知限制

限制	说明	解决方案
仅支持A股	不支持港股、美股	v2.0规划多市场
仅支持年报	半年报/季报需修改参数	v1.2规划支持
扫描件不支持	需要可复制文本PDF	v3.0规划OCR
英文财报不支持	科目映射表仅覆盖中文	v2.2规划英文
附注不提取	当前版本仅提取主表	v2.1规划附注

七、迭代路线

v1.1（当前）
    ↓
v1.2（支持半年报/季报）
    ↓
v1.3（本地缓存+断点续传）
    ↓
v2.0（批量下载+多线程+港股支持）
    ↓
v2.1（附注提取+上期数据）
    ↓
v2.2（英文财报支持）
    ↓
v3.0（OCR扫描件支持）

八、工具文件

src/tools/download_cninfo.py — 年报下载主逻辑
src/tools/extract_pdf_tables.py — 财务数据提取主逻辑
src/tools/ai_page_locator.py — AI页码定位
src/data/company_codes.py — 公司代码映射表

九、使用建议

最佳实践

先验证单公司 — 首次使用先用1-2家公司验证提取质量
检查validation字段 — 务必确认勾稽校验通过
处理异常情况 — 扫描件、加密PDF会报错，需人工处理
批量下载时间 — 建议添加延迟，避免触发巨潮限流

常见问题

Q: 为什么提取的数据和报表对不上？
A: 可能是母公司数据混入，检查validation字段，必要时重试。
Q: 下载失败怎么办？
A: 检查公司名称是否正确，或尝试用股票代码搜索。
Q: 支持哪些行业？
A: 通用制造业支持最好，银行/保险/证券等特殊行业可能需额外处理。

LibraQuant Financial Analysis Pipeline 数据采集到结构化提取的完整解决方案

安全使用建议

What to consider before installing: - Metadata mismatch: the registry shows no required env vars or install steps, but the skill's docs require pip packages and an OPENAI_API_KEY. Expect to provide an OpenAI key if you use the extraction feature. - Data exfiltration risk: the extraction flow sends page summaries/text to api.openai.com for page localization. Only run this on public/non‑sensitive documents or in an environment where you accept sending content to OpenAI. - Missing code files: the skill references src/tools/*.py but the package contains only markdown docs. Confirm whether the implementation exists elsewhere or you will need to supply/run those scripts yourself. - Installation: following the embedded install instructions will run 'pip install' for standard packages (pdfplumber, openai, requests). If you allow automatic installation, do so in an isolated environment (virtualenv/container) to limit risk. - Operational cautions: batch downloads may trigger cninfo rate limits — add delays or proxies as suggested. Verify outputs on 1–2 companies before running large batches and check the validation fields in the JSON results. Recommended actions: ask the publisher for the missing code files or a verified install spec; if you must proceed, provide an OpenAI key only in a controlled/test environment and avoid sending any non‑public documents to the skill until you confirm behavior and provenance.

功能分析

Type: OpenClaw Skill Name: financial-data-gateway Version: 1.0.1 The bundle provides a professional financial data pipeline for downloading A-share annual reports from the official CNINFO website and extracting structured data into JSON format. It utilizes standard libraries like `requests` and `pdfplumber`, and incorporates OpenAI for intelligent page localization, with clear disclosure regarding the use of an API key and data transmission to OpenAI. The documentation is highly detailed, focusing on functional logic, data validation, and error handling without any evidence of malicious intent, obfuscation, or unauthorized access.

能力评估

ℹ Purpose & Capability

The stated purpose (download A‑share annual reports from cninfo and extract structured financial statements) matches the instructions and network targets (cninfo and OpenAI). However, the top‑level registry metadata declares no required env vars, no install spec and no code files, while the skill's markdown documents list dependencies (pdfplumber, openai, requests), an OPENAI_API_KEY env var, and reference src/tools/*.py files that are not present in the package. This mismatch (claimed code/tools but no code files; undeclared env var) is inconsistent and may indicate packaging or disclosure issues.

⚠ Instruction Scope

The runtime instructions explicitly direct the agent to read PDF pages (workspace/reports/*.pdf), extract text snippets (page summaries) and send them to the OpenAI API for page localization. Transmitting PDF text to an external LLM is outside pure local processing and may expose sensitive text. The instructions also reference filesystem paths and tool files (src/tools/*.py) that are not included, meaning the agent or operator would need to run installation commands or implement missing scripts. While these actions are coherent with the stated extraction purpose, the external transmission of document content and missing code files are noteworthy risks.

ℹ Install Mechanism

There is no install spec in the registry (instruction‑only), but the included markdown documents contain 'install' sections that recommend 'pip install pdfplumber openai' and 'pip install requests'. Because installation is left to the operator/agent rather than declared in the registry, installing runtime packages would require executing pip at runtime — a moderate risk if done automatically. The install sources are standard PyPI packages (not arbitrary download URLs).

⚠ Credentials

The financial-statement-extraction document requires OPENAI_API_KEY (to call api.openai.com) to perform AI page localization; that need is plausible for the LLM-based page-locating strategy. However, the registry metadata lists no required env vars — an inconsistency. Requiring an OpenAI key is proportionate to the LLM approach, but it grants a third party (OpenAI) access to excerpts of PDF content; if PDFs include sensitive data, this could result in unintended disclosure. No other unrelated credentials are requested.

✓ Persistence & Privilege

The skill does not request elevated or persistent privileges (always:false, no system-wide changes). It specifies reading PDF files under workspace/reports/ and writing outputs to workspace/output/, which is proportionate for its purpose. There is no indication it modifies other skills or system configs. The inconsistency is that these filesystem requirements exist only in internal docs and are not declared in registry metadata.

版本历史

v1.0.1

No changes were detected in this version. - Version and content in SKILL.md remain the same as the previous release. - No updates, bug fixes, or new features included.

v1.0.0

LibraQuant 财务分析数据采集与提取流水线发布首个版本： - 支持从巨潮资讯自动下载中国A股年报PDF，智能筛选年度与正确版本。 - 实现从PDF年报中提取资产负债表、利润表、现金流量表等核心财务数据，输出结构化JSON格式。 - 内置多重校验机制：勾稽校验、母公司/合并报表过滤、单位统一等，保障数据质量。 - 完整描述使用流程、输入输出规范、适用场景及常见问题。 - 明确列出已知限制和后续迭代规划。 LibraQuant Financial Analysis Data Collection & Extraction Pipeline v1.0 Released: - Automatically downloads annual report PDFs of China A-share listed companies from CNINFO (China Securities Information Network), with intelligent filtering for full-year reports and correct versions. - Extracts core financial data including balance sheets, income statements, and cash flow statements from PDF annual reports, and outputs structured JSON format. - Built-in multi-layer validation mechanisms: articulation check, parent/consolidated statement filtering, unit unification, etc., to ensure data quality. - Fully documents the workflow, input/output specifications, applicable scenarios, and common issues. Clearly lists known limitations and future iteration roadmap.

元数据

Slug financial-data-gateway

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

financial-data-gateway 是什么？

提供从巨潮资讯自动下载A股上市公司年报PDF并提取结构化财务数据，支持数据验证和批量处理。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 108 次。

如何安装 financial-data-gateway？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install financial-data-gateway」即可一键安装，无需额外配置。

financial-data-gateway 是免费的吗？

是的，financial-data-gateway 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

financial-data-gateway 支持哪些平台？

financial-data-gateway 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 financial-data-gateway？

由 Shi Changlong（@shihugh5-lab）开发并维护，当前版本 v1.0.1。

financial-data-gateway

财务分析数据采集流水线

技能总览

一、数据采集 (1 Skill)

巨潮资讯年报下载

二、数据提取 (1 Skill)

财务报表数据提取

三、使用流程

标准使用流程（推荐）

使用示例

四、输入输出规范

cninfo-report-download

financial-statement-extraction

五、数据质量保障

校验机制

已验证的数据源

六、已知限制

七、迭代路线

八、工具文件

九、使用建议

最佳实践

常见问题

financial-data-gateway 是什么？

如何安装 financial-data-gateway？

financial-data-gateway 是免费的吗？

financial-data-gateway 支持哪些平台？

谁开发了 financial-data-gateway？

💬 留言讨论