功能描述

Use this skill when the user asks to OCR, transcribe, extract, or convert the contents of a scanned PDF, image, or office document into Markdown, HTML, DOCX,...

使用说明 (SKILL.md)

Finance OCR Pro

Name: Finance OCR Pro
Author: rizmoon

Run this skill only after OCR intent from the user.

This skill is especially helpful for financial reports, annual reports, prospectuses, investor presentations, regulatory filings, research reports, and other documents with complicated structure, charts, graphs, tables, and mixed layout elements.

Security And Privacy

Before running OCR, make the operating model clear:

This skill requires three environment variables, all of which must be configured before OCR can run:
- API_KEY (sensitive) -- the API key for authenticating with the VLM endpoint.
- BASE_URL -- the base URL of the OpenAI-compatible VLM endpoint. All page images and OCR prompts are transmitted to this URL.
- VLM_MODEL -- the vision-capable model identifier. Must support image inputs; text-only models will not work.
OCR sends rendered page images and structured prompts to BASE_URL. This is the primary data-transmission path. Users must verify that the endpoint is trusted before processing sensitive documents.
If the user wants offline or local-only OCR, BASE_URL must point to a local VLM service. Do not run this skill against an external endpoint with sensitive documents unless the provider is trusted.
scripts/ocr_setup.py checks dependencies and creates .env templates, but it never installs Python packages automatically. Users must review and run dependency installation themselves.
HTML report generation uses vendored Mermaid and MathJax files from scripts/vendor/ and does not download frontend assets from a CDN at runtime.
Local subprocess usage is limited to starting the local OCR worker and invoking document-conversion tools such as LibreOffice or osascript. Commands are executed with explicit argument lists rather than shell strings.
Never commit a populated .env file. Use .env.example as a template and keep real credentials local.

Pre-Run Notice

After the user asks for OCR or extraction, give a short notice that includes:

whether BASE_URL is local or remote
which VLM_MODEL will be used
which execution mode will be used
where results will be written
that the skill supports multi-thread OCR and the thread count can be increased when the user's API endpoint, rate limits, and plan support parallel OCR requests
that page images and prompts will be transmitted to the configured endpoint

Proceed automatically unless the user asks to change those defaults.

Defaults To Announce

Running mode: background job by default
Model: VLM_MODEL
Threads: 1 If the user's API endpoint or plan supports safe parallel OCR requests, tell them they can choose a higher thread count.
Result path:
- background: ~/.semantic-ocr/jobs/\x3Cjob_id>/results/
- synchronous: ocr_output/OCR_\x3Cfilename>/results/

Setup

Use the skill-local virtual environment if present.

macOS/Linux: .venv/bin/python
Windows: .venv/Scripts/python.exe
Fallback: python3 on macOS/Linux, python on Windows

Before running any command, resolve the interpreter and reuse it for the rest of the session:

macOS/Linux: PYTHON="${PYTHON:-$( [ -x .venv/bin/python ] && printf .venv/bin/python || printf python3 )}"
Windows: use .venv\Scripts\python.exe when present, otherwise python

Run:

$PYTHON scripts/ocr_setup.py --check

If setup is incomplete, run:

$PYTHON scripts/ocr_setup.py

Preferred Execution

By default, start a background worker:

$PYTHON scripts/ocrctl.py --json start /path/to/document.pdf

If the provider supports concurrency and the user wants faster OCR, offer a higher thread count such as:

$PYTHON scripts/ocrctl.py --json start -t 4 /path/to/document.pdf

Then inspect progress and outputs:

$PYTHON scripts/ocrctl.py --json status \x3Cjob_id>
$PYTHON scripts/ocrctl.py --json artifacts \x3Cjob_id>
$PYTHON scripts/ocrctl.py --json tail \x3Cjob_id>

Use synchronous mode only when the user explicitly wants inline execution:

$PYTHON scripts/ocr_main.py /path/to/document.pdf

Notes

Inputs: PDF, common office documents, Apple office formats, and images.
Outputs: merged Markdown, HTML review report, DOCX, and Excel.
OCR requires API_KEY, BASE_URL, and VLM_MODEL to be configured before running.
The default page-rendering resolution is 200 DPI.
The skill supports multi-thread OCR. Keep the default at 1 unless the user's API endpoint, rate limits, and plan support concurrent OCR requests.
Sensitive document pages are transmitted to the configured endpoint during OCR unless the endpoint is a local service.
Best suited for financial documents and other visually dense materials with tables, charts, graphs, and complex page structure.
Office-document conversion may require LibreOffice.
OCR extraction by the VLM model may be time-consuming; check the status regularly.

安全使用建议

This skill appears coherent for VLM-based OCR, but it will transmit rendered page images and OCR prompts to whatever BASE_URL you configure. Before installing or running: 1) Verify BASE_URL points to a trusted service (or a local VLM) and that the API_KEY has minimal privileges and usage limits; do not point it to an untrusted remote API for sensitive documents. 2) Review scripts/ai_service_vlm.py and ocr_setup.py to confirm there are no hardcoded endpoints or unexpected telemetry. 3) Run in an isolated environment (virtualenv, container, or machine) and inspect requirements.txt before installing dependencies. 4) Keep real credentials out of checked-in files (.env.example only) and use network egress controls if you need to prevent accidental external uploads. 5) Note the skill vendor/source is unknown and there is no homepage — if you need higher assurance, audit the code thoroughly or run OCR against non-sensitive sample documents first.

能力评估

✓ Purpose & Capability

Name/description, required env vars (API_KEY, BASE_URL, VLM_MODEL), and the included Python scripts (ai_service_vlm.py, image_to_md.py, docs_to_image.py, etc.) line up with an OCR-to-Markdown/HTML/DOCX/Excel workflow. Asking for a vision-model endpoint and API key is proportionate for a VLM-based OCR pipeline.

ℹ Instruction Scope

The SKILL.md explicitly instructs agents to convert pages to images and send them to BASE_URL (the declared data-exfiltration path). It also instructs starting local background workers, using a local virtualenv when present, and invoking document-conversion helpers (LibreOffice, osascript). These behaviors are within scope for the stated purpose but have important privacy implications because page images (possibly sensitive) are transmitted to the configured endpoint.

✓ Install Mechanism

There is no automatic install spec; this is instruction-only with code files and a requirements.txt. scripts/ocr_setup.py only prints manual install commands and does not auto-install packages. This reduces install-time risk compared with remote downloads or automatic installers.

✓ Credentials

The three required environment variables (API_KEY, BASE_URL, VLM_MODEL) are directly relevant to the VLM-based OCR use case. The primary credential is API_KEY. No unrelated secrets or surprising environment variables are requested.

✓ Persistence & Privilege

The skill is not always-on (always: false) and is user-invocable. It stores job state under a local job directory (~/.semantic-ocr/jobs) for background jobs, which is appropriate for a local runtime and does not modify other skills or global agent settings.

版本历史

v1.0.6

**Bundled all HTML report assets locally and improved privacy—no more runtime CDN downloads.** - Added vendored copies of MathJax and Mermaid (including licenses) to eliminate external asset downloads at report view time. - Updated security documentation to clarify that HTML reports use only local assets and never download assets from external sources. - Included SECURITY.md and THIRD_PARTY_NOTICES.md for transparency and license compliance. - No functional changes to OCR workflow logic or requirements.

v1.0.5

finance-ocr-pro 1.0.5 - Added .env.example for environment variable setup guidance. - Added .gitignore to prevent unwanted files from being committed. - Added LICENSE file to clarify usage and distribution rights. - Improved SKILL.md with more flexible Python interpreter discovery and session reuse instructions. - Documented multi-threaded OCR support, including guidance on choosing thread count based on API and plan. - Updated pre-run notice and defaults to inform users about concurrency options and interpreter selection.

v1.0.4

- Removed the `openai.yaml` configuration file. - Updated documentation to clarify that OCR runs only after explicit user intent, not just file upload. - Revised execution instructions and removed automated start based on file attachments. - Clarified security, privacy, and default behaviors. - Added a note that OCR extraction may be time-consuming; users should check status regularly.

v1.0.3

finance-ocr-pro 1.0.3 - Added openai.yaml and skill.yaml configuration files for improved compatibility and clarity. - Updated SKILL.md to use a simplified, declarative environment variable section under "requires". - No changes to OCR logic or workflow; documentation and structure only.

v1.0.2

- Added explicit environment variable and credential definitions for API_KEY, BASE_URL, and VLM_MODEL in SKILL.md. - Expanded security and privacy sections to clarify data flow, endpoint trust, credential handling, and local/offline usage. - Included a new "data_transmission" summary for transparency around what is sent to the OCR endpoint. - Emphasized not to commit real credentials and to only use the skill after explicit user OCR requests. - No functional changes; documentation improvements only.

v1.0.1

**Skill now clarifies external data transmission and security requirements.** - Added detailed security and privacy notice describing the transmission of page images and prompts to a configured endpoint. - Pre-run notice now includes endpoint location, model, execution mode, and result path, and informs users of external data transmission. - Usage requires `API_KEY`, `BASE_URL`, and `VLM_MODEL` to be configured. - Emphasized that OCR should only run after explicit user intent and should not begin automatically with file uploads.

v1.0.0

Initial release of Finance OCR Pro for advanced document OCR and conversion tasks. - Added support for extracting and converting scanned PDFs, images, and office documents—including financial reports and documents with dense tables, charts, or multi-layout pages—into Markdown, HTML, DOCX, or Excel. - Introduced background job system for long-running OCR tasks using easy status, artifact, and log checks. - Provided scripts, configs, and environment examples for streamlined setup and operation across major platforms. - Supports conversion from a broad range of input formats and produces multiple structured outputs optimized for review and analysis. - Included a user guide and command examples for setup, job management, and best practices.

元数据

Slug finance-ocr-pro

版本 1.0.6

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 7

常见问题

Finance OCR Pro 是什么？

Use this skill when the user asks to OCR, transcribe, extract, or convert the contents of a scanned PDF, image, or office document into Markdown, HTML, DOCX,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 202 次。

如何安装 Finance OCR Pro？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install finance-ocr-pro」即可一键安装，无需额外配置。

Finance OCR Pro 是免费的吗？

是的，Finance OCR Pro 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Finance OCR Pro 支持哪些平台？

Finance OCR Pro 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Finance OCR Pro？

由 RizMoon（@rizmoon）开发并维护，当前版本 v1.0.6。

Finance OCR Pro

Finance OCR Pro

Security And Privacy

Pre-Run Notice

Defaults To Announce

Setup

Preferred Execution

Notes

Finance OCR Pro 是什么？

如何安装 Finance OCR Pro？

Finance OCR Pro 是免费的吗？

Finance OCR Pro 支持哪些平台？

谁开发了 Finance OCR Pro？

💬 留言讨论