功能描述

增值税发票识别技能：自动识别 PDF（单页/多页）或各种常见图片格式（PNG/JPG等）的发票，调用百度云增值税发票 OCR API 提取关键信息，输出结构化 Excel 报告。适用于以下场景：用户上传发票文件并要求识别、提取、转换信息时；需要批量处理发票并生成 Excel 汇总表时；需要对发票进行检测、内容...

使用说明 (SKILL.md)

增值税发票识别技能 (VAT Invoice OCR Skill)

Name: 发票内容识别
Author: pengyulong

概述

本技能支持 PDF（单页/多页） 和 常见图片格式 的发票识别，底层强依赖百度云增值税发票 OCR 接口，将非结构化文档转化为结构化数据，并输出为 Excel 报告。核心处理包含三个模块：

发票检测 — 过滤非发票图像、初判发票类型。
内容识别 — 调用百度云专用 API 提取核心字段（失败时自动降级通用 OCR）。
质量评估 — 基于字段完整度和引擎置信度进行自动化打分。

工作流程

步骤 1：环境与凭证检查

在执行任务前，必须确认以下条件：

输入文件有效性：文件格式须为 .pdf / .png / .jpg / .jpeg 等支持的格式。
环境变量配置 (.env)：工作目录下必须存在 .env 文件，且包含 BAIDU_API_KEY 和 BAIDU_SECRET_KEY。
- 若未检测到配置，必须主动询问用户。若用户无凭证，引导其前往百度智能云控制台免费申请。
输出路径：确认最终生成的 Excel 文件存放位置（默认：result.xlsx）。

步骤 2：安装依赖并执行

# 1. 安装核心依赖
pip install pymupdf openpyxl requests Pillow python-dotenv --break-system-packages -q

# 2. 运行极简命令行
python invoice_ocr_main.py \
  --input \x3C文件路径> \
  --output ./outputs/发票识别结果.xlsx

步骤 3：结果交付

向用户交付生成的 Excel 文件，并简明扼要地总结处理情况（例如："已成功处理包含 3 页的 PDF，成功识别 2 张增值税专用发票，1 页因非发票内容已跳过，请查看 Excel 汇总。"）。

核心架构与模块说明

脚本 invoice_ocr_main.py 的核心架构如下：

1. 输入处理 (InputHandler)

统一将不同类型的输入转化为标准的 JPEG 字节流列表。
对大尺寸图片（>3MB）或高分辨率图像自动执行比例裁剪和质量压缩，规避 API 负荷限制。

2. 发票检测 (InvoiceDetector)

基于图像尺寸进行初步过滤（剔除极小图片）。
结合 OCR 提取的文本内容，通过正则和关键词（如"增值税专用发票"、"价税合计"等）确切判定发票类型并输出置信度。

3. 内容提取 (ContentExtractor)

严格通过 python-dotenv 读取鉴权信息，获取并缓存 Access Token。
优先请求 vat_invoice 增值税专用接口。
若接口返回错误码（如非标准发票类型），无缝降级调用 general_basic 通用文字识别接口，并通过正则尽力挽救关键数据。
内置自动重试机制（最高 3 次，间隔递增），增强网络抗抖动能力。

4. 质量评估 (QualityAssessor)

系统对成功识别的发票进行满分 100 分的量化评估，权重分配如下：

评估维度	权重	说明
字段完整性	60%	校验核心字段（号码、日期、买卖双方、总金额）是否提取成功
图像清晰度	40%	基于 OCR 引擎返回的文本置信度转化

评级标准：

🟢 优秀 (90-100)：无需人工干预
🟡 良好 (70-89)：建议抽查核心金额
🔴 较差 (\x3C70)：必须人工核验或重新扫描

Excel 报告输出结构

数据导出为单 Sheet（"发票汇总"），主打高度概括和直观，包含以下核心列头：

页码
发票类型
发票号码
开票日期
销售方
购买方
价税合计
质量评估（直接展示红黄绿指示灯及评级，如 "🟢 优秀"）

边缘 Case 容错处理

异常场景	系统的自动化处理策略
多页混排 PDF	使用 PyMuPDF 逐页渲染分离，独立处理并汇总到同一 Excel
超大图片超限	自动按长宽比不超过 3:1 裁剪，并递减 JPEG 质量直至满足百度接口要求
API 调用失败	阶梯式延迟重试 3 次，彻底失败则在 Excel 记录"识别失败"并继续处理下一页
非发票 / 空白页	`has_invoice` 标记为 False，跳过 API 消耗并在 Excel 中如实记录
字段含扰码/空格	代码层面对提取的发票号码、税号进行 `.strip()` 和 `.replace(" ", "")` 清洗

安全使用建议

This skill appears to implement what it claims, but there are a few important things to check before installing or running it: - Metadata mismatch: The registry lists no required env vars, but the script and SKILL.md both require BAIDU_API_KEY and BAIDU_SECRET_KEY in a .env file. Do not provide credentials until you confirm where and how they will be stored. - Data exposure: The script uploads image data to Baidu OCR endpoints (aip.baidubce.com). Invoices contain sensitive personal and financial data — ensure you have authorization and that sending this data to Baidu complies with your privacy, regulatory, and corporate policies. - Secrets handling: The included scripts/.env is a template. Avoid committing real keys to source control. Prefer environment-specific secret storage, rotate keys regularly, and use least-privilege API credentials if supported. - Run in an isolated environment: Execute the script in a controlled environment (local VM, container, or isolated workspace) so it cannot accidentally read other files. Review the full invoice_ocr_main.py file yourself (or with a security colleague) to confirm there are no unexpected network endpoints or hidden behaviors in the truncated section. - Test with non-sensitive samples first: Validate functionality using synthetic or redacted invoices to confirm behavior and outputs before processing real data. - Ask the publisher to fix metadata: Request that the skill's registry metadata explicitly list BAIDU_API_KEY and BAIDU_SECRET_KEY as required env vars and describe data flows (which endpoints it contacts). This improves transparency and trust. If you need, I can extract and show the remaining truncated portion of invoice_ocr_main.py for a complete review, or produce a short checklist you can use to safely run this skill.

功能分析

Type: OpenClaw Skill Name: invoice-ocr Version: 1.0.0 The skill is a legitimate implementation for VAT invoice OCR using the Baidu Cloud API. It processes PDF and image files, extracts structured data, and exports results to Excel. The script (invoice_ocr_main.py) correctly uses environment variables for API credentials and communicates only with official Baidu endpoints (aip.baidubce.com). No evidence of malicious behavior, data exfiltration, or harmful instructions was found.

能力评估

ℹ Purpose & Capability

The skill's name/description (VAT invoice OCR via Baidu) aligns with the code and SKILL.md: the script converts PDF/images to JPEG, calls Baidu VAT OCR and falls back to general OCR, and writes Excel results. That capability legitimately requires Baidu API credentials and image-processing libraries. However, the registry metadata claims 'Required env vars: none' while both SKILL.md and the script require BAIDU_API_KEY and BAIDU_SECRET_KEY — a clear metadata omission.

✓ Instruction Scope

SKILL.md instructions are narrowly scoped to the stated task: check for .env with Baidu keys (or ask user), install Python deps, render PDF pages, call Baidu OCR, evaluate results, and write an Excel. The instructions explicitly require user-provided credentials and do not instruct broad system reconnaissance. They do instruct installing pip packages and reading the input file and local .env.

ℹ Install Mechanism

No install spec is provided (instruction-only), but a Python script is included. The runtime requires pip installing commonly used libraries (pymupdf, openpyxl, requests, Pillow, python-dotenv). No external arbitrary downloads or obscure installers are used. The absence of an install spec in registry metadata is an operational omission but not an immediate danger.

⚠ Credentials

The code and SKILL.md require BAIDU_API_KEY and BAIDU_SECRET_KEY (via a .env file), which is appropriate for calling Baidu OCR. The concern is that the skill registry metadata does not declare these required environment variables, so an install could be attempted without the user realizing credentials are necessary. Also note that providing these credentials permits the script to send invoice images (sensitive PII) to Baidu's servers — this is expected for a cloud OCR integration but has privacy/consent implications.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills or global agent configuration. It runs as a normal, user-invoked/autonomously-invokable skill without elevated or persistent system privileges.

版本历史

v1.0.0

- 首次发布增值税发票 OCR 识别技能，支持 PDF（单/多页）及主流图片格式。 - 自动调用百度云增值税发票 OCR API，三步流转：预检测、结构化提取、质量评分。 - 支持批量处理与多页发票，输出直观的 Excel 汇总报告。 - 集成自动依赖安装、API 降级与重试、质量评级与容错处理。 - 输出结果包括详细字段、处理状态及识别质量，适配多样发票识别需求。

元数据

Slug invoice-ocr

版本 1.0.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题

发票内容识别是什么？

增值税发票识别技能：自动识别 PDF（单页/多页）或各种常见图片格式（PNG/JPG等）的发票，调用百度云增值税发票 OCR API 提取关键信息，输出结构化 Excel 报告。适用于以下场景：用户上传发票文件并要求识别、提取、转换信息时；需要批量处理发票并生成 Excel 汇总表时；需要对发票进行检测、内容... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 340 次。

如何安装发票内容识别？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install invoice-ocr」即可一键安装，无需额外配置。

发票内容识别是免费的吗？

是的，发票内容识别完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

发票内容识别支持哪些平台？

发票内容识别跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了发票内容识别？

由 pengyulong（@pengyulong）开发并维护，当前版本 v1.0.0。

发票内容识别