← Back to Skills Marketplace
Cn Financial Notes Extraction
by
cgxxxxxxxxxxxx
· GitHub ↗
· v1.0.0
· MIT-0
65
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install cn-financial-notes-extraction
Description
提取中国 A 股上市公司年报 (PDF) 中的财务报表附注明细。适用于获取主表无法体现的深层数据(如 CapEx 明细、研发费用细分、应收账款账龄、关联方交易等)。
README (SKILL.md)
核心能力
从巨潮资讯 (CNINFO) 下载的年报 PDF 中,精准定位并提取财务报表附注中的表格数据。
适用场景
- CapEx 分析: 提取“在建工程”、“固定资产”、“无形资产”等附注中的本期增加/减少明细(MD&A 口径 vs 现金流表口径)。
- 风险排查: 提取应收账款账龄、坏账准备计提比例、商誉减值明细。
- 关联交易: 提取关联方往来余额、购销金额。
- 研发细分: 提取研发费用资本化/费用化明细。
工作流程 (Workflow)
- 下载: 通过
CNInfo API Scraper或East Money Announcement Downloader下载最新年报 PDF。 - 定位: 使用
pdfplumber(系统 Python 环境) 打开 PDF,全文检索关键字财务报表附注。 - 提取:
- 从定位页开始,逐页扫描。
- 提取表格 (
extract_tables())。 - 智能过滤: 根据表头关键词(如
项目,期末余额,本期增加,本期减少,账面余额)筛选有效表格。 - 忽略纯文本页或无意义的排版表。
- 结构化: 将提取的数据转换为
Dict[List]或DataFrame格式输出。
关键参数与代码逻辑
import pdfplumber
def extract_notes(pdf_path, keywords=None):
found_data = []
with pdfplumber.open(pdf_path) as pdf:
# 1. 定位附注起始页
start_idx = 0
for i, page in enumerate(pdf.pages):
text = page.extract_text()
if text and "财务报表附注" in text:
start_idx = i
break
# 2. 扫描表格
for i in range(start_idx, len(pdf.pages)):
page = pdf.pages[i]
tables = page.extract_tables()
for table in tables:
# 过滤空表或短表
if len(table) > 3 and any(row[0] for row in table if row):
# 可选:如果指定了 keywords,检查表头是否匹配
if keywords:
headers = " ".join([str(c) for c in table[0] if c])
if any(kw in headers for kw in keywords):
found_data.append({"page": i+1, "table": table})
else:
found_data.append({"page": i+1, "table": table})
return found_data
注意事项
- 环境依赖: 使用宿主机的
python3和pdfplumber,不要在沙箱 (subagent) 中直接运行,除非确认安装了库。 - 表格合并: 跨页表格可能被拆分成两个,需逻辑合并(通过检查表头连续性)。
- 非标准排版: 极少数老旧年报可能是扫描版,需 OCR(如 MinerU 或 Tesseract),但目前 A 股年报大多为原生 PDF,
pdfplumber效果最佳。
Usage Guidance
This skill is coherent with its description and doesn't request secrets, but it tells the agent to run on the host Python environment rather than in a sandbox. Before installing or running: (1) prefer executing the extraction in an isolated environment (container or VM) to limit risk from malicious or malformed PDFs and third-party OCR binaries; (2) review and test the exact Python code you run and only install pdfplumber / Tesseract from trusted sources; (3) avoid giving the agent broad host filesystem or network access—limit it to the directories and network endpoints needed to fetch known PDF sources; (4) if you cannot run in isolation, treat the 'run on host' instruction as a reason to be cautious. If you want higher assurance, ask the skill author for a signed/reviewable code implementation or run the provided extraction logic yourself in a controlled environment.
Capability Analysis
Type: OpenClaw Skill
Name: cn-financial-notes-extraction
Version: 1.0.0
The skill is a utility for extracting financial table data from Chinese A-share annual report PDFs. The code logic in SKILL.md uses the standard 'pdfplumber' library to locate and parse tables based on specific keywords (e.g., '财务报表附注'), and it contains no evidence of data exfiltration, malicious execution, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description (extract financial-note tables from A-share annual report PDFs) align with the workflow and code snippet: locating the '财务报表附注' section, using pdfplumber to extract tables, filtering and structuring results. Referencing CNInfo/East Money as PDF sources is consistent with the stated purpose.
Instruction Scope
SKILL.md stays focused on PDF download and parsing. The only notable instruction beyond pure parsing is the advisory: 'don't run in subagent (sandbox) unless pdfplumber is installed in it'—this steers execution toward the host environment. That is not inconsistent with the purpose but does broaden the agent's operational surface (runs code against host filesystem/environment). The instructions do not ask for unrelated system files, credentials, or data exfiltration.
Install Mechanism
No install spec and no code files — instruction-only. This is low-risk from an installer perspective (nothing will be downloaded/installed automatically by the skill itself).
Credentials
The skill requests use of the host's python3 and pdfplumber (and optionally OCR tools like Tesseract/MinerU). It requests no environment variables or credentials. Using host Python is reasonable for PDF processing, but advising against sandbox use increases privilege required at runtime; that should be considered when granting execution rights.
Persistence & Privilege
Skill is user-invocable, not always-enabled, and does not request persistent system-wide privileges or modify other skills. It does not request autonomous always-on presence.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install cn-financial-notes-extraction - After installation, invoke the skill by name or use
/cn-financial-notes-extraction - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of cn-financial-notes-extraction skill.
- Extracts detailed tables from financial notes sections in China A-share listed companies’ annual report PDFs.
- Supports analyses including CapEx breakdown, receivables aging, goodwill impairment, related party transactions, and R&D expense details.
- Workflow covers: annual report PDF download, financial notes section detection, table extraction/filtering, and structured output.
- Uses pdfplumber for PDF parsing; advises handling multi-page tables and scanned (non-native) PDFs with care.
Metadata
Frequently Asked Questions
What is Cn Financial Notes Extraction?
提取中国 A 股上市公司年报 (PDF) 中的财务报表附注明细。适用于获取主表无法体现的深层数据(如 CapEx 明细、研发费用细分、应收账款账龄、关联方交易等)。 It is an AI Agent Skill for Claude Code / OpenClaw, with 65 downloads so far.
How do I install Cn Financial Notes Extraction?
Run "/install cn-financial-notes-extraction" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Cn Financial Notes Extraction free?
Yes, Cn Financial Notes Extraction is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Cn Financial Notes Extraction support?
Cn Financial Notes Extraction is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Cn Financial Notes Extraction?
It is built and maintained by cgxxxxxxxxxxxx (@cgxxxxxxxxxxxx); the current version is v1.0.0.
More Skills