← 返回 Skills 市场
suncxw-creator

Fund Report Extractor

作者 suncxw-creator · GitHub ↗ · v1.0.0
cross-platform ✓ 安全检测通过
401
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install fund-report-extractor
功能描述
自动提取公募基金定期报告中“投资策略和运作分析”部分全文,支持文本型和扫描版PDF的精准定位与汇总。
使用说明 (SKILL.md)

基金定期报告投资策略提取Skill

功能

自动提取公募基金定期报告中"投资策略和运作分析"部分的全文。

适用场景

  • 提取基金经理管理的基金定期报告
  • 需要获取"报告期内基金的投资策略和运作分析"原文
  • 按时间正序整理汇总

使用方法

1. 获取基金代码

如果不知道基金代码,需要先搜索:

  • 基金名称 → 天天基金网搜索
  • 或使用AKShare搜索

2. 运行提取脚本

python 基金报告提取.py --code 基金代码 --name "基金名称"

技术要点

1. 数据获取:AKShare

import akshare as ak
df = ak.fund_announcement_report_em(symbol='基金代码')
  • 获取基金全部历史公告列表
  • 包含公告ID,可构建PDF下载链接

2. PDF下载链接格式

http://pdf.dfcfw.com/pdf/H2_{报告ID}_1.pdf

3. PDF解析方案

方案A:PyMuPDF(文本型PDF)

import fitz
import re

doc = fitz.open(stream=pdf_content, filetype='pdf')
full_text = ''
for page in doc:
    html = page.get_text('html')
    # 提取Unicode中文
    codes = re.findall(r'&#x([0-9a-fA-F]+);', html)
    for c in codes:
        full_text += chr(int(c, 16))

方案B:pdfplumber(扫描版PDF)

import pdfplumber

with pdfplumber.open(pdf_file) as pdf:
    all_text = ''
    for page in pdf.pages:
        text = page.extract_text()
        if text:
            all_text += text + '\
'

4. 关键词定位

不同基金公司/报告类型关键词位置不同:

文本型PDF(景顺长城):

  • "报告期内基金的投资策略和运作分析"
  • "管理人对报告期内基金的投资策略和业绩表现的说明"
  • "管理人对宏观经济、证券市场及行业走势的简要展望"

扫描版PDF(中泰星元):

  • "4.4 报告期内基金的投资策略和运作分析"
  • 内容通常在Page 7-9
  • 需要逐页搜索关键词

5. 内容提取模板

# 提取投资策略部分
if '报告期内基金的投资策略和运作分析' in full_text:
    idx1 = full_text.find('报告期内基金的投资策略和运作分析')
    idx2 = full_text.find('报告期内基金的业绩表现', idx1)
    if idx2 == -1:
        idx2 = idx1 + 2500
    content = full_text[idx1:idx2]

常见问题

Q: PDF是扫描版文字提取不到?

A: 使用pdfplumber替代PyMuPDF,并精确定位Page 7/8/9

Q: 关键词匹配不到?

A: 检查关键词是否有空格差异,尝试不同变体

Q: 报告数量不全?

A: 东方财富只保留最近4年报告,更早的报告需要其他渠道

Q: 网络请求失败?

A: 添加延时time.sleep(1-2),避免被限流

输出文件

  • reports_{基金代码}/ - 原始报告文件
  • {基金名称}_投资策略汇总.txt - 完整汇总

依赖库

pip install akshare pymupdf pdfplumber pandas requests

Created: 2026-03-08 Author: 有才

安全使用建议
This skill appears to do what it says: download public fund PDFs and extract the 'investment strategy and operations analysis' sections. Before running, consider: 1) The package source/homepage is unknown — review the extract.py source (you have it) and ensure it matches your expectations. 2) Run in an isolated environment (virtualenv or container) to limit side effects. 3) Dependencies (akshare, pymupdf, pdfplumber, requests, pandas) will be installed from PyPI; pin versions if you care about supply-chain risk. 4) The script downloads PDFs from pdf.dfcfw.com and writes files to the current directory; ensure you are comfortable with network access and disk writes. 5) There are no credentials requested and no obvious exfiltration, but if you plan to run this inside sensitive environments, validate akshare's network behavior and avoid exposing secrets. If you want higher assurance, run it on a small test fund code and inspect the downloaded PDFs and outputs first.
功能分析
Type: OpenClaw Skill Name: fund-report-extractor Version: 1.0.0 The skill is a legitimate tool designed to automate the extraction of investment strategy sections from Chinese mutual fund periodic reports. It utilizes the AKShare library to fetch announcement metadata and downloads PDFs from the official East Money (dfcfw.com) servers. The code in `extract.py` uses standard PDF processing libraries (PyMuPDF and pdfplumber) to parse text and perform keyword-based extraction, with no evidence of data exfiltration, malicious execution, or prompt injection.
能力评估
Purpose & Capability
Name/description (extract fund report 'investment strategy' sections) align with the code and SKILL.md. Required libraries (akshare, pdf parsers, requests) are appropriate for scraping and parsing PDFs; no unrelated credentials or binaries are requested.
Instruction Scope
SKILL.md and extract.py confine actions to: fetching announcement lists via akshare, constructing PDF URLs on pdf.dfcfw.com, downloading PDFs, extracting text with pdfplumber or PyMuPDF, and writing text files locally. There are no instructions to read unrelated files, access other credentials, or exfiltrate data to unexpected endpoints.
Install Mechanism
There is no install spec (instruction-only + a single Python script). Dependencies are standard Python packages from PyPI; no remote archives or obscure installers are downloaded by the skill itself.
Credentials
No environment variables, secrets, or config paths are requested. The skill only needs network access for public data and permission to write files in the working directory — both are reasonable for this task.
Persistence & Privilege
The skill does not request permanent/always-on inclusion and does not modify other skills or system-wide settings. It simply writes output files to its working directory.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install fund-report-extractor
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /fund-report-extractor 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
- Initial release of 基金定期报告投资策略提取Skill. - Automatically extracts the full text of "投资策略和运作分析" from public fund regular reports. - Supports both text-based and scanned PDFs, with extraction using PyMuPDF or pdfplumber. - Summarizes and organizes extracted content in chronological order. - Provides instructions for fund code lookup, report downloading, and troubleshooting common issues. - Output includes original report files and a consolidated .txt summary.
元数据
Slug fund-report-extractor
版本 1.0.0
许可证
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Fund Report Extractor 是什么?

自动提取公募基金定期报告中“投资策略和运作分析”部分全文,支持文本型和扫描版PDF的精准定位与汇总。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 401 次。

如何安装 Fund Report Extractor?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install fund-report-extractor」即可一键安装,无需额外配置。

Fund Report Extractor 是免费的吗?

是的,Fund Report Extractor 完全免费(开源免费),可自由下载、安装和使用。

Fund Report Extractor 支持哪些平台?

Fund Report Extractor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Fund Report Extractor?

由 suncxw-creator(@suncxw-creator)开发并维护,当前版本 v1.0.0。

💬 留言讨论