← Back to Skills Marketplace
alan5168

Fapiao Clipper

by Alan5168 · GitHub ↗ · v1.5.2 · MIT-0
cross-platform ⚠ suspicious
143
Downloads
0
Stars
1
Active Installs
6
Versions
Install in OpenClaw
/install fapiao-clipper
Description
发票夹子 v1.4 - 本地大模型驱动的发票自动识别与报销管理工具。 2级降级链:PyMuPDF文本提取(修复跨行匹配)→ Qwen3-VL视觉模型。 新增:seller/buyer跨行匹配修复、日期标准化。 功能:8项风控验真 + 一键导出 Excel + 合并 PDF。
README (SKILL.md)

发票夹子 (Invoice Clipper) v1.3

纯 Python CLI 工具,OpenClaw / Claude Code / KimiClaw 等任何 Agent 平台均可使用。

v1.3 重大更新

简化架构为 2 级(2026-04-03):

  • 第1级:PyMuPDF 文本提取(修复跨行匹配)
  • 第2级:Qwen3-VL 视觉模型(备用)
  • 去掉 GLM-OCR(不稳定)和 TurboQuant(未启用)

设计理念

发票 → 放文件夹
      ↓
PDF 提取文字(两种引擎可选)
      ↓ 读不出才走第2级
视觉模型(扫描件才触发)
      ↓
存入 SQLite 数据库
      ↓
Agent 直接读数据库回答问题 ← 完全不消耗 API token

二级识别链 (v1.3)

级别 引擎 触发条件 特点
第1级 PyMuPDF 可搜索 PDF(默认) 毫秒级,无需Java
第2级 Ollama Qwen3-VL 图片/扫描件 ~6.1GB 内存

大部分发票走第1级,零成本。

数据库(Agent 直接读)

发票处理后存在 ~/Documents/发票夹子/invoices.db(SQLite)。

Agent 可以直接用自然语言读数据库,例如:

  • "这个月收到哪些发票?"
  • "有没有超过365天的发票?"
  • "XX公司的发票有吗?"

不需要额外调用任何大模型 API,Agent 用自己的上下文就能直接读。

命令速查

用户意图 执行命令
扫描发票 python3 {baseDir}/main.py scan
列出发票 python3 {baseDir}/main.py list
查询日期 python3 {baseDir}/main.py query --from 2026-03-01 --to 2026-03-31
标记不报销 python3 {baseDir}/main.py exclude \x3CID>
恢复报销 python3 {baseDir}/main.py include \x3CID>
导出报销 python3 {baseDir}/main.py export --from 2026-03-01 --to 2026-03-31 --format both
批量验真 python3 {baseDir}/main.py verify
查看问题发票 python3 {baseDir}/main.py problems
同步黑名单 python3 {baseDir}/main.py blacklist-sync

意图识别规则

用户说 执行的命令
"扫描发票" / "整理邮箱" scan
"本月发票" / "列出所有" list
"XX商家发票" query --seller XX
"导出报销" export --from ... --to ... --format both
"不要报销#3那张" exclude 3

Agent 平台使用

零配置(推荐首次使用)

不想编辑 YAML?运行交互向导,回答几个问题即可:

python3 {baseDir}/setup_config.py

安装

git clone https://github.com/Alan5168/fapiao-clipper.git
cd fapiao-clipper
pip install -r requirements.txt
cp config/config.yaml.template config/config.yaml

注意事项

  • 原文件永不删除,exclude 仅标记
  • 发票有效期默认 365 天(可配置)
  • 有 OpenClaw/Claude Code → 第1级搞定后,Agent 直接读数据库,不消耗 API
Usage Guidance
This repository appears to implement exactly what it claims: a local invoice OCR and reimbursement helper. Before installing or running it, consider the following: - Credentials/config: The email watcher expects IMAP username/password in config/config.yaml — these will be stored in plaintext in that file unless you take other measures. Limit file permissions (chmod 600) and keep config out of backups if you don't want credentials stored elsewhere. - Network I/O: The email component will download attachments and follow links found in email HTML (including re-posting form actions) to retrieve PDFs. This is necessary for auto-download but increases the risk of fetching malicious content embedded in emails. If you enable mail scanning, run it on a trusted machine or in an isolated environment. - Local services: OCR fallback uses a local Ollama model (Qwen3-VL) or optional cloud providers. If you use a cloud provider (siliconflow etc.), you will need to supply an API key — review those settings in config.yaml and requirements.txt before enabling. - Privacy claims: The project advertises 'zero upload' — code shows downloads from tax.gov for blacklist/verification and clicking the tax bureau check link; verification likely involves querying public tax-check endpoints. Review verifier.py (not shown fully in the bundle) to confirm it only queries public verification endpoints and does not post invoice contents to third-party services. - Exposed interfaces: README documents options to expose the Web UI (Tailscale/frp or running in Docker). If you enable remote access, ensure you secure access (VPN/Tailscale, firewall rules) because the Web UI can read the local invoice DB and exports. - Dependency audit: Inspect requirements.txt and vet dependencies before pip install. Consider installing into a dedicated virtualenv or container. - Least privilege: If you only need local PDF/image processing (no mail auto-fetch), leave email.enabled=false and run manual scans to reduce network exposure. If you want deeper analysis, provide the full verifier.py and the complete requirements.txt so I can check whether any dependency or verification code sends invoice data to third-party endpoints beyond the stated tax-check/blacklist lookups.
Capability Analysis
Type: OpenClaw Skill Name: fapiao-clipper Version: 1.5.2 The skill bundle provides a comprehensive invoice management system with several high-risk automated features. Specifically, `email_watcher.py` automatically identifies, follows, and downloads files from URLs found in email bodies, which presents a significant SSRF (Server-Side Request Forgery) risk and a vector for downloading malicious payloads. The Web UI in `app.py` utilizes `unsafe_allow_html=True` to render data extracted from invoices (such as seller names), posing a potential Cross-Site Scripting (XSS) risk if invoice content is maliciously crafted. While these capabilities are aligned with the stated purpose of automated invoice collection and verification, the combination of automated network activity, IMAP credential handling, and local file system manipulation warrants a suspicious classification.
Capability Assessment
Purpose & Capability
Name/description (local invoice OCR, verification, export) matches the code and files: PDF/OFD handling, PyMuPDF extractor, optional local Ollama Qwen3-VL, SQLite DB, email downloader, blacklist sync and tax-check interactions. Required binary is only python3 and no unrelated cloud credentials are demanded in the skill metadata. The components present are appropriate for the stated purpose.
Instruction Scope
Runtime instructions are limited to cloning the repo, pip installing requirements, configuring config.yaml, and running CLI/web UI commands. The code will read/write files under the user-specified storage path (default ~/Documents/发票夹子) and lets the agent read the SQLite DB. The email watcher will log in to the user's IMAP account, download attachments, extract links from email HTML and follow those links (including re-requesting forms) to retrieve PDFs — behavior needed for 'auto fetch invoices' but it means the skill fetches external URLs and writes downloaded payloads locally. This is within scope but worth noting as an I/O/network surface that can pull arbitrary remote content if present in mail.
Install Mechanism
No automated install spec is embedded in the skill metadata (instruction-only), but SKILL.md / README instruct cloning from the GitHub homepage and pip installing requirements.txt. That is a normal install path. The repo includes executable Python code (not just prose), so installing and running will execute that code locally. No suspicious remote binary downloads or URL-shortener installs are used in the provided install instructions; Docker compose references local services (Ollama) and optional env vars.
Credentials
The skill declares no required env vars in metadata, which aligns with shipping a config file-based tool. Operationally, the tool requires IMAP credentials (username/password) in config to enable mailbox scanning, and may require local Ollama or optional third-party API keys if you choose those providers (config example shows siliconflow.api_key, docker-compose shows DASHSCOPE_API_KEY and OLLAMA_BASE_URL). These credential needs match the features (email scanning, local vision model, optional cloud provider) and are not excessive, but the user must supply them in plaintext config.yaml — treat those credentials as sensitive and protect config file permissions.
Persistence & Privilege
Skill is not force-installed (always: false) and does not request to modify other skills or system-wide agent settings. It stores data locally (SQLite DB, inbox directory, exports) in the user-specified storage path. Allowing the agent to read the DB is intentional for answering invoice queries; autonomous invocation is allowed by platform default but is not combined with additional privileged flags here.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install fapiao-clipper
  3. After installation, invoke the skill by name or use /fapiao-clipper
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.5.2
v1.5.2: 发票内容验证+OFD中文路径修复+PDF散列字符提取+qwen3-vl超时300s+DPI300
v1.5.1
v1.5.1 (2026-04-08): 发票内容验证(必须含发票号码+发票字样)+ OFD中文路径修复 + PDF散列字符提取修复 + qwen3-vl超时延长至300s + DPI提升至300
v1.5.0
- Added web UI preview documentation and new utility scripts (app.py, update_readme.py). - Introduced pyproject.toml for improved packaging and dependency management. - Updated skill version to 1.4.0 and refreshed documentation. - Made multiple code updates across core modules for stability and feature expansion.
v1.3.1
No file changes detected for version 1.3.1; this is a version bump only. - No changes in code or documentation. - No new features or bug fixes introduced.
v1.3.0
fapiao-clipper v1.3.0 - Major update: Architecture simplified to a 2-level extraction chain (PyMuPDF → Qwen3-VL). - Added seller/buyer cross-line matching fix for more reliable invoice recognition. - Implemented date normalization to standardize date fields. - Removed GLM-OCR and TurboQuant support for clarity and stability. - Updated docs and metadata to reflect changes.
v1.0.0
首发版本
Metadata
Slug fapiao-clipper
Version 1.5.2
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 6
Frequently Asked Questions

What is Fapiao Clipper?

发票夹子 v1.4 - 本地大模型驱动的发票自动识别与报销管理工具。 2级降级链:PyMuPDF文本提取(修复跨行匹配)→ Qwen3-VL视觉模型。 新增:seller/buyer跨行匹配修复、日期标准化。 功能:8项风控验真 + 一键导出 Excel + 合并 PDF。 It is an AI Agent Skill for Claude Code / OpenClaw, with 143 downloads so far.

How do I install Fapiao Clipper?

Run "/install fapiao-clipper" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Fapiao Clipper free?

Yes, Fapiao Clipper is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Fapiao Clipper support?

Fapiao Clipper is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Fapiao Clipper?

It is built and maintained by Alan5168 (@alan5168); the current version is v1.5.2.

💬 Comments