← Back to Skills Marketplace
suncxw-creator

Fund Report Extractor

by suncxw-creator · GitHub ↗ · v1.0.0
cross-platform ✓ Security Clean
401
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install fund-report-extractor
Description
自动提取公募基金定期报告中“投资策略和运作分析”部分全文,支持文本型和扫描版PDF的精准定位与汇总。
README (SKILL.md)

基金定期报告投资策略提取Skill

功能

自动提取公募基金定期报告中"投资策略和运作分析"部分的全文。

适用场景

  • 提取基金经理管理的基金定期报告
  • 需要获取"报告期内基金的投资策略和运作分析"原文
  • 按时间正序整理汇总

使用方法

1. 获取基金代码

如果不知道基金代码,需要先搜索:

  • 基金名称 → 天天基金网搜索
  • 或使用AKShare搜索

2. 运行提取脚本

python 基金报告提取.py --code 基金代码 --name "基金名称"

技术要点

1. 数据获取:AKShare

import akshare as ak
df = ak.fund_announcement_report_em(symbol='基金代码')
  • 获取基金全部历史公告列表
  • 包含公告ID,可构建PDF下载链接

2. PDF下载链接格式

http://pdf.dfcfw.com/pdf/H2_{报告ID}_1.pdf

3. PDF解析方案

方案A:PyMuPDF(文本型PDF)

import fitz
import re

doc = fitz.open(stream=pdf_content, filetype='pdf')
full_text = ''
for page in doc:
    html = page.get_text('html')
    # 提取Unicode中文
    codes = re.findall(r'&#x([0-9a-fA-F]+);', html)
    for c in codes:
        full_text += chr(int(c, 16))

方案B:pdfplumber(扫描版PDF)

import pdfplumber

with pdfplumber.open(pdf_file) as pdf:
    all_text = ''
    for page in pdf.pages:
        text = page.extract_text()
        if text:
            all_text += text + '\
'

4. 关键词定位

不同基金公司/报告类型关键词位置不同:

文本型PDF(景顺长城):

  • "报告期内基金的投资策略和运作分析"
  • "管理人对报告期内基金的投资策略和业绩表现的说明"
  • "管理人对宏观经济、证券市场及行业走势的简要展望"

扫描版PDF(中泰星元):

  • "4.4 报告期内基金的投资策略和运作分析"
  • 内容通常在Page 7-9
  • 需要逐页搜索关键词

5. 内容提取模板

# 提取投资策略部分
if '报告期内基金的投资策略和运作分析' in full_text:
    idx1 = full_text.find('报告期内基金的投资策略和运作分析')
    idx2 = full_text.find('报告期内基金的业绩表现', idx1)
    if idx2 == -1:
        idx2 = idx1 + 2500
    content = full_text[idx1:idx2]

常见问题

Q: PDF是扫描版文字提取不到?

A: 使用pdfplumber替代PyMuPDF,并精确定位Page 7/8/9

Q: 关键词匹配不到?

A: 检查关键词是否有空格差异,尝试不同变体

Q: 报告数量不全?

A: 东方财富只保留最近4年报告,更早的报告需要其他渠道

Q: 网络请求失败?

A: 添加延时time.sleep(1-2),避免被限流

输出文件

  • reports_{基金代码}/ - 原始报告文件
  • {基金名称}_投资策略汇总.txt - 完整汇总

依赖库

pip install akshare pymupdf pdfplumber pandas requests

Created: 2026-03-08 Author: 有才

Usage Guidance
This skill appears to do what it says: download public fund PDFs and extract the 'investment strategy and operations analysis' sections. Before running, consider: 1) The package source/homepage is unknown — review the extract.py source (you have it) and ensure it matches your expectations. 2) Run in an isolated environment (virtualenv or container) to limit side effects. 3) Dependencies (akshare, pymupdf, pdfplumber, requests, pandas) will be installed from PyPI; pin versions if you care about supply-chain risk. 4) The script downloads PDFs from pdf.dfcfw.com and writes files to the current directory; ensure you are comfortable with network access and disk writes. 5) There are no credentials requested and no obvious exfiltration, but if you plan to run this inside sensitive environments, validate akshare's network behavior and avoid exposing secrets. If you want higher assurance, run it on a small test fund code and inspect the downloaded PDFs and outputs first.
Capability Analysis
Type: OpenClaw Skill Name: fund-report-extractor Version: 1.0.0 The skill is a legitimate tool designed to automate the extraction of investment strategy sections from Chinese mutual fund periodic reports. It utilizes the AKShare library to fetch announcement metadata and downloads PDFs from the official East Money (dfcfw.com) servers. The code in `extract.py` uses standard PDF processing libraries (PyMuPDF and pdfplumber) to parse text and perform keyword-based extraction, with no evidence of data exfiltration, malicious execution, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description (extract fund report 'investment strategy' sections) align with the code and SKILL.md. Required libraries (akshare, pdf parsers, requests) are appropriate for scraping and parsing PDFs; no unrelated credentials or binaries are requested.
Instruction Scope
SKILL.md and extract.py confine actions to: fetching announcement lists via akshare, constructing PDF URLs on pdf.dfcfw.com, downloading PDFs, extracting text with pdfplumber or PyMuPDF, and writing text files locally. There are no instructions to read unrelated files, access other credentials, or exfiltrate data to unexpected endpoints.
Install Mechanism
There is no install spec (instruction-only + a single Python script). Dependencies are standard Python packages from PyPI; no remote archives or obscure installers are downloaded by the skill itself.
Credentials
No environment variables, secrets, or config paths are requested. The skill only needs network access for public data and permission to write files in the working directory — both are reasonable for this task.
Persistence & Privilege
The skill does not request permanent/always-on inclusion and does not modify other skills or system-wide settings. It simply writes output files to its working directory.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install fund-report-extractor
  3. After installation, invoke the skill by name or use /fund-report-extractor
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of 基金定期报告投资策略提取Skill. - Automatically extracts the full text of "投资策略和运作分析" from public fund regular reports. - Supports both text-based and scanned PDFs, with extraction using PyMuPDF or pdfplumber. - Summarizes and organizes extracted content in chronological order. - Provides instructions for fund code lookup, report downloading, and troubleshooting common issues. - Output includes original report files and a consolidated .txt summary.
Metadata
Slug fund-report-extractor
Version 1.0.0
License
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is Fund Report Extractor?

自动提取公募基金定期报告中“投资策略和运作分析”部分全文,支持文本型和扫描版PDF的精准定位与汇总。 It is an AI Agent Skill for Claude Code / OpenClaw, with 401 downloads so far.

How do I install Fund Report Extractor?

Run "/install fund-report-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Fund Report Extractor free?

Yes, Fund Report Extractor is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Fund Report Extractor support?

Fund Report Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Fund Report Extractor?

It is built and maintained by suncxw-creator (@suncxw-creator); the current version is v1.0.0.

💬 Comments