← 返回 Skills 市场

Cn Financial Notes Extraction

Name: Cn Financial Notes Extraction
Author: cgxxxxxxxxxxxx

作者 cgxxxxxxxxxxxx · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install cn-financial-notes-extraction

功能描述

提取中国 A 股上市公司年报 (PDF) 中的财务报表附注明细。适用于获取主表无法体现的深层数据（如 CapEx 明细、研发费用细分、应收账款账龄、关联方交易等）。

使用说明 (SKILL.md)

核心能力

从巨潮资讯 (CNINFO) 下载的年报 PDF 中，精准定位并提取财务报表附注中的表格数据。

适用场景

CapEx 分析: 提取“在建工程”、“固定资产”、“无形资产”等附注中的本期增加/减少明细（MD&A 口径 vs 现金流表口径）。
风险排查: 提取应收账款账龄、坏账准备计提比例、商誉减值明细。
关联交易: 提取关联方往来余额、购销金额。
研发细分: 提取研发费用资本化/费用化明细。

工作流程 (Workflow)

下载: 通过 CNInfo API Scraper 或 East Money Announcement Downloader 下载最新年报 PDF。
定位: 使用 pdfplumber (系统 Python 环境) 打开 PDF，全文检索关键字 财务报表附注。
提取:
- 从定位页开始，逐页扫描。
- 提取表格 (extract_tables())。
- 智能过滤: 根据表头关键词（如 项目, 期末余额, 本期增加, 本期减少, 账面余额）筛选有效表格。
- 忽略纯文本页或无意义的排版表。
结构化: 将提取的数据转换为 Dict[List] 或 DataFrame 格式输出。

关键参数与代码逻辑

import pdfplumber

def extract_notes(pdf_path, keywords=None):
    found_data = []
    with pdfplumber.open(pdf_path) as pdf:
        # 1. 定位附注起始页
        start_idx = 0
        for i, page in enumerate(pdf.pages):
            text = page.extract_text()
            if text and "财务报表附注" in text:
                start_idx = i
                break
        
        # 2. 扫描表格
        for i in range(start_idx, len(pdf.pages)):
            page = pdf.pages[i]
            tables = page.extract_tables()
            for table in tables:
                # 过滤空表或短表
                if len(table) > 3 and any(row[0] for row in table if row):
                    # 可选：如果指定了 keywords，检查表头是否匹配
                    if keywords:
                        headers = " ".join([str(c) for c in table[0] if c])
                        if any(kw in headers for kw in keywords):
                            found_data.append({"page": i+1, "table": table})
                    else:
                        found_data.append({"page": i+1, "table": table})
    return found_data

注意事项

环境依赖: 使用宿主机的 python3 和 pdfplumber，不要在沙箱 (subagent) 中直接运行，除非确认安装了库。
表格合并: 跨页表格可能被拆分成两个，需逻辑合并（通过检查表头连续性）。
非标准排版: 极少数老旧年报可能是扫描版，需 OCR（如 MinerU 或 Tesseract），但目前 A 股年报大多为原生 PDF，pdfplumber 效果最佳。

安全使用建议

This skill is coherent with its description and doesn't request secrets, but it tells the agent to run on the host Python environment rather than in a sandbox. Before installing or running: (1) prefer executing the extraction in an isolated environment (container or VM) to limit risk from malicious or malformed PDFs and third-party OCR binaries; (2) review and test the exact Python code you run and only install pdfplumber / Tesseract from trusted sources; (3) avoid giving the agent broad host filesystem or network access—limit it to the directories and network endpoints needed to fetch known PDF sources; (4) if you cannot run in isolation, treat the 'run on host' instruction as a reason to be cautious. If you want higher assurance, ask the skill author for a signed/reviewable code implementation or run the provided extraction logic yourself in a controlled environment.

功能分析

Type: OpenClaw Skill Name: cn-financial-notes-extraction Version: 1.0.0 The skill is a utility for extracting financial table data from Chinese A-share annual report PDFs. The code logic in SKILL.md uses the standard 'pdfplumber' library to locate and parse tables based on specific keywords (e.g., '财务报表附注'), and it contains no evidence of data exfiltration, malicious execution, or prompt injection.

能力评估

✓ Purpose & Capability

Name/description (extract financial-note tables from A-share annual report PDFs) align with the workflow and code snippet: locating the '财务报表附注' section, using pdfplumber to extract tables, filtering and structuring results. Referencing CNInfo/East Money as PDF sources is consistent with the stated purpose.

ℹ Instruction Scope

SKILL.md stays focused on PDF download and parsing. The only notable instruction beyond pure parsing is the advisory: 'don't run in subagent (sandbox) unless pdfplumber is installed in it'—this steers execution toward the host environment. That is not inconsistent with the purpose but does broaden the agent's operational surface (runs code against host filesystem/environment). The instructions do not ask for unrelated system files, credentials, or data exfiltration.

✓ Install Mechanism

No install spec and no code files — instruction-only. This is low-risk from an installer perspective (nothing will be downloaded/installed automatically by the skill itself).

ℹ Credentials

The skill requests use of the host's python3 and pdfplumber (and optionally OCR tools like Tesseract/MinerU). It requests no environment variables or credentials. Using host Python is reasonable for PDF processing, but advising against sandbox use increases privilege required at runtime; that should be considered when granting execution rights.

✓ Persistence & Privilege

Skill is user-invocable, not always-enabled, and does not request persistent system-wide privileges or modify other skills. It does not request autonomous always-on presence.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install cn-financial-notes-extraction
安装完成后，直接呼叫该 Skill 的名称或使用 /cn-financial-notes-extraction 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of cn-financial-notes-extraction skill. - Extracts detailed tables from financial notes sections in China A-share listed companies’ annual report PDFs. - Supports analyses including CapEx breakdown, receivables aging, goodwill impairment, related party transactions, and R&D expense details. - Workflow covers: annual report PDF download, financial notes section detection, table extraction/filtering, and structured output. - Uses pdfplumber for PDF parsing; advises handling multi-page tables and scanned (non-native) PDFs with care.

元数据

Slug cn-financial-notes-extraction

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题