← 返回 Skills 市场
financial-data-gateway
作者
Shi Changlong
· GitHub ↗
· v1.0.1
· MIT-0
108
总下载
2
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install financial-data-gateway
功能描述
提供从巨潮资讯自动下载A股上市公司年报PDF并提取结构化财务数据,支持数据验证和批量处理。
使用说明 (SKILL.md)
财务分析数据采集流水线
LibraQuant 财务分析数据采集流水线包含 2 个核心技能,提供从数据源获取到结构化提取的完整解决方案。
技能总览
| 类别 | 数量 | 功能范围 |
|---|---|---|
| 数据采集 | 1 | 从巨潮资讯网下载A股年报PDF |
| 数据提取 | 1 | 从PDF中提取结构化财务数据 |
一、数据采集 (1 Skill)
巨潮资讯年报下载
| Skill | 用途 | 数据来源 |
|---|---|---|
cninfo-report-download |
从巨潮资讯网下载A股年报PDF | cninfo.com.cn |
功能说明:
- 支持按公司名称和年份精确搜索年报
- 自动识别上交所/深交所
- 智能筛选正确版本(排除摘要、更正、英文版)
- 自动清理文件名中的HTML标签和非法字符
二、数据提取 (1 Skill)
财务报表数据提取
| Skill | 用途 | 输出格式 |
|---|---|---|
financial-statement-extraction |
从年报PDF提取结构化财务数据 | JSON |
功能说明:
- AI智能定位财务报表页码(双层定位策略)
- 三层防线过滤母公司数据(页级/混合页/行级)
- 自动识别并统一单位(元/千元/万元/亿元)
- 内置勾稽关系校验(资产负债平衡、净利率合理性)
- 支持资产负债表、利润表、现金流量表提取
三、使用流程
标准使用流程(推荐)
┌─────────────────────────┐
│ cninfo-report-download │ ← 输入:公司名称 + 年份
│ 巨潮资讯年报下载 │ 输出:PDF文件路径
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ financial-statement- │ ← 输入:PDF文件路径
│ extraction │ 输出:JSON结构化数据
│ 财务报表数据提取 │
└─────────────────────────┘
使用示例
示例1:单公司年报分析
cninfo-report-download— 下载"海天味业"2024年年报financial-statement-extraction— 提取财务数据- 下游分析技能(如DCF估值、财务比率分析)
示例2:批量数据收集
- 循环调用
cninfo-report-download— 下载多家公司年报 - 循环调用
financial-statement-extraction— 批量提取数据 - 构建财务数据库或进行横向对比分析
四、输入输出规范
cninfo-report-download
输入:
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
| company_name | str | ✅ | 公司简称(如"海天味业") |
| year | int | ✅ | 年报年份(如2024) |
| output_dir | str | 可选 | 下载目录,默认workspace/reports/ |
输出:
- 成功:PDF文件路径(str)
- 失败:None
financial-statement-extraction
输入:
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
| pdf_path | str | ✅ | 年报PDF文件路径 |
| output_dir | str | 可选 | 输出目录 |
| extract_notes | bool | 可选 | 是否提取附注(v2.1+) |
输出:JSON格式结构化数据
{
"meta": { "company": "公司名", "stock_code": "代码", ... },
"balance_sheet": { "资产总计": 82.37, ... },
"income_statement": { "营业收入": 55.19, ... },
"cashflow_statement": { "经营活动现金流净额": 10.88, ... },
"validation": { "balance_check": {...}, ... }
}
五、数据质量保障
校验机制
| 校验项 | 规则 | 处理方式 |
|---|---|---|
| 资产负债表平衡 | 资产 = 负债 + 权益 | 差异>1%触发降级重提取 |
| 净利率合理性 | 净利率 > 100% 可能混入母公司数据 | 自动触发降级重提取 |
| 数据完整性 | 检查关键科目是否存在 | 缺失时警告 |
| 母公司数据过滤 | 三层防线确保只提取合并报表 | 页级/混合页/行级过滤 |
已验证的数据源
| 公司 | 代码 | 交易所 | 年份 | 状态 |
|---|---|---|---|---|
| 海天味业 | 603288 | SSE | 2024 | ✅ |
| 千禾味业 | 603027 | SSE | 2024 | ✅ |
| 中炬高新 | 600872 | SSE | 2024 | ✅ |
| 甘源食品 | 002991 | SZSE | 2024 | ✅ |
六、已知限制
| 限制 | 说明 | 解决方案 |
|---|---|---|
| 仅支持A股 | 不支持港股、美股 | v2.0规划多市场 |
| 仅支持年报 | 半年报/季报需修改参数 | v1.2规划支持 |
| 扫描件不支持 | 需要可复制文本PDF | v3.0规划OCR |
| 英文财报不支持 | 科目映射表仅覆盖中文 | v2.2规划英文 |
| 附注不提取 | 当前版本仅提取主表 | v2.1规划附注 |
七、迭代路线
v1.1(当前)
↓
v1.2(支持半年报/季报)
↓
v1.3(本地缓存+断点续传)
↓
v2.0(批量下载+多线程+港股支持)
↓
v2.1(附注提取+上期数据)
↓
v2.2(英文财报支持)
↓
v3.0(OCR扫描件支持)
八、工具文件
src/tools/download_cninfo.py— 年报下载主逻辑src/tools/extract_pdf_tables.py— 财务数据提取主逻辑src/tools/ai_page_locator.py— AI页码定位src/data/company_codes.py— 公司代码映射表
九、使用建议
最佳实践
- 先验证单公司 — 首次使用先用1-2家公司验证提取质量
- 检查validation字段 — 务必确认勾稽校验通过
- 处理异常情况 — 扫描件、加密PDF会报错,需人工处理
- 批量下载时间 — 建议添加延迟,避免触发巨潮限流
常见问题
-
Q: 为什么提取的数据和报表对不上?
A: 可能是母公司数据混入,检查validation字段,必要时重试。 -
Q: 下载失败怎么办?
A: 检查公司名称是否正确,或尝试用股票代码搜索。 -
Q: 支持哪些行业?
A: 通用制造业支持最好,银行/保险/证券等特殊行业可能需额外处理。
LibraQuant Financial Analysis Pipeline 数据采集到结构化提取的完整解决方案
安全使用建议
What to consider before installing:
- Metadata mismatch: the registry shows no required env vars or install steps, but the skill's docs require pip packages and an OPENAI_API_KEY. Expect to provide an OpenAI key if you use the extraction feature.
- Data exfiltration risk: the extraction flow sends page summaries/text to api.openai.com for page localization. Only run this on public/non‑sensitive documents or in an environment where you accept sending content to OpenAI.
- Missing code files: the skill references src/tools/*.py but the package contains only markdown docs. Confirm whether the implementation exists elsewhere or you will need to supply/run those scripts yourself.
- Installation: following the embedded install instructions will run 'pip install' for standard packages (pdfplumber, openai, requests). If you allow automatic installation, do so in an isolated environment (virtualenv/container) to limit risk.
- Operational cautions: batch downloads may trigger cninfo rate limits — add delays or proxies as suggested. Verify outputs on 1–2 companies before running large batches and check the validation fields in the JSON results.
Recommended actions: ask the publisher for the missing code files or a verified install spec; if you must proceed, provide an OpenAI key only in a controlled/test environment and avoid sending any non‑public documents to the skill until you confirm behavior and provenance.
功能分析
Type: OpenClaw Skill
Name: financial-data-gateway
Version: 1.0.1
The bundle provides a professional financial data pipeline for downloading A-share annual reports from the official CNINFO website and extracting structured data into JSON format. It utilizes standard libraries like `requests` and `pdfplumber`, and incorporates OpenAI for intelligent page localization, with clear disclosure regarding the use of an API key and data transmission to OpenAI. The documentation is highly detailed, focusing on functional logic, data validation, and error handling without any evidence of malicious intent, obfuscation, or unauthorized access.
能力评估
Purpose & Capability
The stated purpose (download A‑share annual reports from cninfo and extract structured financial statements) matches the instructions and network targets (cninfo and OpenAI). However, the top‑level registry metadata declares no required env vars, no install spec and no code files, while the skill's markdown documents list dependencies (pdfplumber, openai, requests), an OPENAI_API_KEY env var, and reference src/tools/*.py files that are not present in the package. This mismatch (claimed code/tools but no code files; undeclared env var) is inconsistent and may indicate packaging or disclosure issues.
Instruction Scope
The runtime instructions explicitly direct the agent to read PDF pages (workspace/reports/*.pdf), extract text snippets (page summaries) and send them to the OpenAI API for page localization. Transmitting PDF text to an external LLM is outside pure local processing and may expose sensitive text. The instructions also reference filesystem paths and tool files (src/tools/*.py) that are not included, meaning the agent or operator would need to run installation commands or implement missing scripts. While these actions are coherent with the stated extraction purpose, the external transmission of document content and missing code files are noteworthy risks.
Install Mechanism
There is no install spec in the registry (instruction‑only), but the included markdown documents contain 'install' sections that recommend 'pip install pdfplumber openai' and 'pip install requests'. Because installation is left to the operator/agent rather than declared in the registry, installing runtime packages would require executing pip at runtime — a moderate risk if done automatically. The install sources are standard PyPI packages (not arbitrary download URLs).
Credentials
The financial-statement-extraction document requires OPENAI_API_KEY (to call api.openai.com) to perform AI page localization; that need is plausible for the LLM-based page-locating strategy. However, the registry metadata lists no required env vars — an inconsistency. Requiring an OpenAI key is proportionate to the LLM approach, but it grants a third party (OpenAI) access to excerpts of PDF content; if PDFs include sensitive data, this could result in unintended disclosure. No other unrelated credentials are requested.
Persistence & Privilege
The skill does not request elevated or persistent privileges (always:false, no system-wide changes). It specifies reading PDF files under workspace/reports/ and writing outputs to workspace/output/, which is proportionate for its purpose. There is no indication it modifies other skills or system configs. The inconsistency is that these filesystem requirements exist only in internal docs and are not declared in registry metadata.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install financial-data-gateway - 安装完成后,直接呼叫该 Skill 的名称或使用
/financial-data-gateway触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
No changes were detected in this version.
- Version and content in SKILL.md remain the same as the previous release.
- No updates, bug fixes, or new features included.
v1.0.0
LibraQuant 财务分析数据采集与提取流水线发布首个版本:
- 支持从巨潮资讯自动下载中国A股年报PDF,智能筛选年度与正确版本。
- 实现从PDF年报中提取资产负债表、利润表、现金流量表等核心财务数据,输出结构化JSON格式。
- 内置多重校验机制:勾稽校验、母公司/合并报表过滤、单位统一等,保障数据质量。
- 完整描述使用流程、输入输出规范、适用场景及常见问题。
- 明确列出已知限制和后续迭代规划。
LibraQuant Financial Analysis Data Collection & Extraction Pipeline v1.0 Released:
- Automatically downloads annual report PDFs of China A-share listed companies from CNINFO (China Securities Information Network), with intelligent filtering for full-year reports and correct versions.
- Extracts core financial data including balance sheets, income statements, and cash flow statements from PDF annual reports, and outputs structured JSON format.
- Built-in multi-layer validation mechanisms: articulation check, parent/consolidated statement filtering, unit unification, etc., to ensure data quality.
- Fully documents the workflow, input/output specifications, applicable scenarios, and common issues.
Clearly lists known limitations and future iteration roadmap.
元数据
常见问题
financial-data-gateway 是什么?
提供从巨潮资讯自动下载A股上市公司年报PDF并提取结构化财务数据,支持数据验证和批量处理。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 108 次。
如何安装 financial-data-gateway?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install financial-data-gateway」即可一键安装,无需额外配置。
financial-data-gateway 是免费的吗?
是的,financial-data-gateway 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
financial-data-gateway 支持哪些平台?
financial-data-gateway 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 financial-data-gateway?
由 Shi Changlong(@shihugh5-lab)开发并维护,当前版本 v1.0.1。
推荐 Skills