← 返回 Skills 市场
Cn Financial Notes Extraction
作者
cgxxxxxxxxxxxx
· GitHub ↗
· v1.0.0
· MIT-0
65
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install cn-financial-notes-extraction
功能描述
提取中国 A 股上市公司年报 (PDF) 中的财务报表附注明细。适用于获取主表无法体现的深层数据(如 CapEx 明细、研发费用细分、应收账款账龄、关联方交易等)。
使用说明 (SKILL.md)
核心能力
从巨潮资讯 (CNINFO) 下载的年报 PDF 中,精准定位并提取财务报表附注中的表格数据。
适用场景
- CapEx 分析: 提取“在建工程”、“固定资产”、“无形资产”等附注中的本期增加/减少明细(MD&A 口径 vs 现金流表口径)。
- 风险排查: 提取应收账款账龄、坏账准备计提比例、商誉减值明细。
- 关联交易: 提取关联方往来余额、购销金额。
- 研发细分: 提取研发费用资本化/费用化明细。
工作流程 (Workflow)
- 下载: 通过
CNInfo API Scraper或East Money Announcement Downloader下载最新年报 PDF。 - 定位: 使用
pdfplumber(系统 Python 环境) 打开 PDF,全文检索关键字财务报表附注。 - 提取:
- 从定位页开始,逐页扫描。
- 提取表格 (
extract_tables())。 - 智能过滤: 根据表头关键词(如
项目,期末余额,本期增加,本期减少,账面余额)筛选有效表格。 - 忽略纯文本页或无意义的排版表。
- 结构化: 将提取的数据转换为
Dict[List]或DataFrame格式输出。
关键参数与代码逻辑
import pdfplumber
def extract_notes(pdf_path, keywords=None):
found_data = []
with pdfplumber.open(pdf_path) as pdf:
# 1. 定位附注起始页
start_idx = 0
for i, page in enumerate(pdf.pages):
text = page.extract_text()
if text and "财务报表附注" in text:
start_idx = i
break
# 2. 扫描表格
for i in range(start_idx, len(pdf.pages)):
page = pdf.pages[i]
tables = page.extract_tables()
for table in tables:
# 过滤空表或短表
if len(table) > 3 and any(row[0] for row in table if row):
# 可选:如果指定了 keywords,检查表头是否匹配
if keywords:
headers = " ".join([str(c) for c in table[0] if c])
if any(kw in headers for kw in keywords):
found_data.append({"page": i+1, "table": table})
else:
found_data.append({"page": i+1, "table": table})
return found_data
注意事项
- 环境依赖: 使用宿主机的
python3和pdfplumber,不要在沙箱 (subagent) 中直接运行,除非确认安装了库。 - 表格合并: 跨页表格可能被拆分成两个,需逻辑合并(通过检查表头连续性)。
- 非标准排版: 极少数老旧年报可能是扫描版,需 OCR(如 MinerU 或 Tesseract),但目前 A 股年报大多为原生 PDF,
pdfplumber效果最佳。
安全使用建议
This skill is coherent with its description and doesn't request secrets, but it tells the agent to run on the host Python environment rather than in a sandbox. Before installing or running: (1) prefer executing the extraction in an isolated environment (container or VM) to limit risk from malicious or malformed PDFs and third-party OCR binaries; (2) review and test the exact Python code you run and only install pdfplumber / Tesseract from trusted sources; (3) avoid giving the agent broad host filesystem or network access—limit it to the directories and network endpoints needed to fetch known PDF sources; (4) if you cannot run in isolation, treat the 'run on host' instruction as a reason to be cautious. If you want higher assurance, ask the skill author for a signed/reviewable code implementation or run the provided extraction logic yourself in a controlled environment.
功能分析
Type: OpenClaw Skill
Name: cn-financial-notes-extraction
Version: 1.0.0
The skill is a utility for extracting financial table data from Chinese A-share annual report PDFs. The code logic in SKILL.md uses the standard 'pdfplumber' library to locate and parse tables based on specific keywords (e.g., '财务报表附注'), and it contains no evidence of data exfiltration, malicious execution, or prompt injection.
能力评估
Purpose & Capability
Name/description (extract financial-note tables from A-share annual report PDFs) align with the workflow and code snippet: locating the '财务报表附注' section, using pdfplumber to extract tables, filtering and structuring results. Referencing CNInfo/East Money as PDF sources is consistent with the stated purpose.
Instruction Scope
SKILL.md stays focused on PDF download and parsing. The only notable instruction beyond pure parsing is the advisory: 'don't run in subagent (sandbox) unless pdfplumber is installed in it'—this steers execution toward the host environment. That is not inconsistent with the purpose but does broaden the agent's operational surface (runs code against host filesystem/environment). The instructions do not ask for unrelated system files, credentials, or data exfiltration.
Install Mechanism
No install spec and no code files — instruction-only. This is low-risk from an installer perspective (nothing will be downloaded/installed automatically by the skill itself).
Credentials
The skill requests use of the host's python3 and pdfplumber (and optionally OCR tools like Tesseract/MinerU). It requests no environment variables or credentials. Using host Python is reasonable for PDF processing, but advising against sandbox use increases privilege required at runtime; that should be considered when granting execution rights.
Persistence & Privilege
Skill is user-invocable, not always-enabled, and does not request persistent system-wide privileges or modify other skills. It does not request autonomous always-on presence.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install cn-financial-notes-extraction - 安装完成后,直接呼叫该 Skill 的名称或使用
/cn-financial-notes-extraction触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of cn-financial-notes-extraction skill.
- Extracts detailed tables from financial notes sections in China A-share listed companies’ annual report PDFs.
- Supports analyses including CapEx breakdown, receivables aging, goodwill impairment, related party transactions, and R&D expense details.
- Workflow covers: annual report PDF download, financial notes section detection, table extraction/filtering, and structured output.
- Uses pdfplumber for PDF parsing; advises handling multi-page tables and scanned (non-native) PDFs with care.
元数据
常见问题
Cn Financial Notes Extraction 是什么?
提取中国 A 股上市公司年报 (PDF) 中的财务报表附注明细。适用于获取主表无法体现的深层数据(如 CapEx 明细、研发费用细分、应收账款账龄、关联方交易等)。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 65 次。
如何安装 Cn Financial Notes Extraction?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install cn-financial-notes-extraction」即可一键安装,无需额外配置。
Cn Financial Notes Extraction 是免费的吗?
是的,Cn Financial Notes Extraction 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
Cn Financial Notes Extraction 支持哪些平台?
Cn Financial Notes Extraction 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Cn Financial Notes Extraction?
由 cgxxxxxxxxxxxx(@cgxxxxxxxxxxxx)开发并维护,当前版本 v1.0.0。
推荐 Skills