功能描述

解析金融行业 PDF 文档（A股年报、港股财报、美股10-K、招股说明书等），将其转换为结构化 Markdown 与 JSON。专为处理跨页表格、无边框表格、多级表头、密集数值、多栏排版、扫描件等金融场景常见难点设计。底层调用多模态视觉大模型，Skill 仅负责调度、提示词、IO 与评测对接。

使用说明 (SKILL.md)

PDF Finance Parser Skill

Name: my
Author: xcbc

金融场景 PDF/扫描件 → 结构化 Markdown + JSON。本 Skill 本身不做规则解析，能力全部委托给多模态视觉大模型（Doubao Vision / Qwen-VL 等 OpenAI 兼容端点）；脚本只做调度、IO、结果落盘与评测对接。

设计模式

参考 byted-las-document-parse 的工程范式：

Tool Wrapper：用多模态 VLM 作为唯一解析引擎，本地代码只是它的 IO 外壳
Pipeline：渲染 → 调模型 → 解析响应 → 落盘 → 打包
Inversion：在调用前先与用户确认解析模式（normal/detail）；不私自决定
本地无规则：本仓库 scripts/ 下没有版面分析、表格抽取、数值规整规则——这些都让模型做

老版规则代码保留在 _legacy/ 作参考，不在生产路径上调用。

Gotchas

必须配置 ARK_API_KEY（火山方舟）或 OPENAI_API_KEY（任意 OpenAI 兼容端点）
长报告（>200 页）建议用 --pages 1-50 分段，避免单次调用过长
财报表格多为复杂多级表头/跨页延续；遇到不满意结果，先升档到 --parse-mode detail，再考虑改 Prompt
扫描件/印章/手写：交给模型 detail 模式，不要本地做 OCR 预处理
API QPM 有限，多页文档脚本会自动退避重试；不要并行 spawn 多个进程

工作流

复制此清单并跟踪进度：

解析进度：
- [ ] 步骤 0：确认解析模式（normal / detail）
- [ ] 步骤 1：确认虚拟环境就绪
- [ ] 步骤 2：调用 parse（一次到底）或 submit + check-and-notify（异步分离）
- [ ] 步骤 3：检查 meta.json.failed_pages
- [ ] 步骤 4：按"结果回复模板"汇报

第 0 步：确认解析模式（必须先执行）

向用户展示并询问：

请选择 PDF 解析模式：

| 模式   | 说明                                          | 价格档位 |
|--------|-----------------------------------------------|----------|
| normal | 默认，单次推理，速度更快，适用于结构清晰的文档 | 标准价   |
| detail | 深度模式，对复杂表格/扫描件/印章精度更高       | 2x       |

推荐：A股/港股标准年报选 normal；财报附注、合并报表跨页、扫描合同选 detail。

用户未明确指定 → 默认 normal
课题验收建议跑两套对比（评测要点）

第 1 步：环境就绪（仅首次）

cd {skill_directory} && \
  (test -d .venv || (python3 -m venv .venv && .venv/bin/pip install -r requirements.txt)) && \
  .venv/bin/python3 scripts/skill.py info

info 命令验证：API Key 是否就绪、调用哪个 endpoint、用哪个模型。后续命令复用 .venv/bin/python3。

第 2 步：执行解析

同步模式（推荐，单文档/小批量）：

.venv/bin/python3 scripts/skill.py parse \
  --input \x3Cpdf_or_image_path> \
  --output \x3Coutput_dir> \
  --parse-mode normal

stdout 输出一行 JSON：{"status":"COMPLETED","task_id":"...","output_dir":"...","page_count":N,"table_count":N,"preview":"..."}
stderr 输出过程日志（渲染进度、模型调用、重试等）

异步分离模式（长文档/Agent 不阻塞）：

# 提交
.venv/bin/python3 scripts/skill.py submit --input \x3Cpdf> --parse-mode detail
# 返回 {"task_id":"...","eta":"..."}

# 检查（自动轮询直到终态）
.venv/bin/python3 scripts/skill.py check-and-notify --task-id \x3Cid> --output \x3Cdir> --poll

异步模式下，每页解析结果落入 \x3Coutput_dir>/pages/p{N}.json，可以增量恢复。

第 3 步：检查 failed_pages

meta.json 必读字段：

字段	含义	处理
`failed_pages`	模型返回错误/超时/解析失败的页	非空 → 告诉用户哪些页失败，建议升档 detail 重跑
`n_tables_cross_page`	跨页延续的表格数	仅供观察，模型已做合并
`wall_time_seconds`	端到端耗时	用于评测对比
`parse_mode`	normal / detail	报告中明示

第 4 步：结果回复模板

✅ 解析完成

📄 文档信息
- 文件：{filename}
- 页数：{page_count} | 表格：{table_count}
- 模式：{parse_mode} | 耗时：{wall_time}s

📁 输出
- Markdown：{output_dir}/output.md
- JSON：{output_dir}/output.json
- 每页结果：{output_dir}/pages/
- 评测可用：{output_dir}/output.json（符合 assets/output_schema.json）

⚠️ 注意（若有）
- 第 X 页解析失败：{reason} → 建议 `--parse-mode detail --pages X` 单独重跑

输出目录结构

{output_dir}/
├── output.md                # 合并后的 Markdown（按页拼接）
├── output.json              # 结构化 JSON（评测脚本读取此文件）
├── meta.json                # 元信息（耗时、模式、失败页、模型版本）
├── pages/                   # 每页独立结果（断点恢复用）
│   ├── p1.json
│   ├── p1.md
│   └── ...
├── images/                  # 渲染出的页图（可选，--save-images）
└── run.log                  # 详细日志

异常处理决策树

现象	来源	处理
API key 缺失	`info` 报错	引导用户配置 `env.sh` 或 `--env-file`
模型返回非 JSON	单页 `failed_pages`	自动重试 2 次；仍失败标记失败页
Rate limit / 429	API	指数退避重试
PDF 加密	渲染阶段	提示用户用 `--password` 重试
单页超时 (>60s)	API	自动重试；仍失败建议改 normal 模式
模型输出截断	response 末尾不完整	升档 detail 或减小单页发送尺寸

进阶用法

仅解析某些页

.venv/bin/python3 scripts/skill.py parse --input \x3Cpdf> --output \x3Cdir> --pages 1-10,15,20-25

强制使用 `detail` 模式

.venv/bin/python3 scripts/skill.py parse --input \x3Cpdf> --output \x3Cdir> --parse-mode detail

切换模型

.venv/bin/python3 scripts/skill.py parse --input \x3Cpdf> --output \x3Cdir> --model doubao-1.5-vision-pro
# 或 qwen-vl-max / qwen2.5-vl-72b-instruct

与评测脚本对接

# 1. 把多份样本跑过去
for pdf in samples/documents/*.pdf; do
  doc_id=$(basename "$pdf" .pdf)
  .venv/bin/python3 scripts/skill.py parse \
    --input "$pdf" --output "samples/outputs/$doc_id" --parse-mode normal
done

# 2. 一键评测对比 GT
python3 ../../evaluation/scripts/run_eval.py \
  --pred_dir samples/outputs \
  --gt_dir ../../evaluation/gt_templates/annotated \
  --report_dir ../../evaluation/reports

重要约束

不要在本 Skill 写规则解析代码 — 表格/版面/数值识别全部交给模型；本仓库 scripts/ 只允许出现：CLI / API client / IO / Prompt 模板
不要修改原始 PDF：只读
不要做数值"创意修正"：原文 1,234 → 输出 1234（去千分位），但不要根据上下文猜测
保留可追溯性：每个 cell/block 必须带 source_page；坐标信息可选，依模型是否返回

参考资料

安全使用建议

Install only if you are comfortable sending the pages you parse to the configured VLM provider, or configure a local provider such as Ollama for sensitive documents. Avoid using it on confidential, regulated, customer, legal, or non-public financial PDFs until the skill adds explicit remote-upload consent, clearer provider/data-retention disclosure, and tighter permission metadata.

能力标签

requires-sensitive-credentials

能力评估

ℹ Purpose & Capability

The skill's core purpose is coherent: parse financial PDFs/images into Markdown and JSON using a VLM, with local file reads and output writes aligned to that purpose. However, the production script also includes an auto rules route despite documentation emphasizing VLM-only behavior, which is a documentation mismatch rather than a security concern.

⚠ Instruction Scope

The artifacts disclose VLM use and provider configuration, but they do not plainly warn at invocation time that rendered document pages and prompts may be transmitted to remote providers such as ARK, DashScope, or OpenAI. For financial reports, contracts, and scanned documents, that under-disclosure is material.

ℹ Install Mechanism

The skill instructs users to create a Python virtual environment and install requirements, but the referenced requirements file is inconsistent or absent in the artifact paths reviewed. This looks like packaging fragility, not malicious installation behavior.

⚠ Credentials

Reading PDF/image inputs, API keys, and writing parsed outputs is proportionate, but the default remote VLM data flow is high-impact for confidential financial documents and lacks explicit opt-in, provider allowlisting, or a prominent local-only mode warning.

ℹ Persistence & Privilege

No background service, startup persistence, privilege escalation, destructive mutation, or broad local indexing was found. The skill writes task metadata and parsed outputs to user-selected or /tmp output directories, and may optionally save rendered page images.

版本历史

v1.0.1

No user-facing changes in this version. - No file changes detected from the previous release.

v1.0.0

Initial release of pdf-finance-parser skill. - Parses financial PDFs (annual reports, financial disclosures, prospectuses) into structured Markdown and JSON. - Designed to robustly handle complex layouts: cross-page tables, borderless tables, multi-level headers, dense numeric data, multi-column formats, and scanned/handwritten documents. - Delegates all parsing to multimodal vision large models; local scripts handle orchestration, prompts, I/O, and evaluation integration only. - Supports "normal" and "detail" parsing modes, selected interactively. - Output includes structured JSON, Markdown, per-page results, and detailed metadata on failures and processing. - Requires ARK_API_KEY or compatible OpenAI endpoint configuration for use.

元数据

Slug pdf-fin-parse

版本 1.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

my 是什么？

解析金融行业 PDF 文档（A股年报、港股财报、美股10-K、招股说明书等），将其转换为结构化 Markdown 与 JSON。专为处理跨页表格、无边框表格、多级表头、密集数值、多栏排版、扫描件等金融场景常见难点设计。底层调用多模态视觉大模型，Skill 仅负责调度、提示词、IO 与评测对接。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 48 次。

如何安装 my？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install pdf-fin-parse」即可一键安装，无需额外配置。

my 是免费的吗？

是的，my 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

my 支持哪些平台？

my 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 my？

由 xcbc（@xcbc）开发并维护，当前版本 v1.0.1。

my