← 返回 Skills 市场

Ca File Processor

Name: Ca File Processor
Author: purvik6062

作者 purvik6062 · GitHub ↗ · v1.0.3 · MIT-0

cross-platform ✓ 安全检测通过

141

总下载

当前安装

版本数

在 OpenClaw 中安装

/install ca-file-processor

功能描述

Process financial documents for Indian CA firms. Use when any PDF, Excel (.xlsx/.xls), CSV, JPG, or PNG file is received or uploaded — including GST returns,...

使用说明 (SKILL.md)

CA File Processor

This skill processes the four most common file formats used by Indian CA firms and extracts structured information from them for analysis, summarisation, and answering queries.

Supported formats

PDF — GST returns, ITR acknowledgements, audit reports, scanned invoices (text-layer and scanned via OCR)
Excel (.xlsx / .xls) — Trial balance, P&L, balance sheets, payroll registers, GST workings
CSV — Bank statement exports (HDFC, ICICI, SBI), GSTR-2B downloads, Tally exports
Images (.jpg / .png) — WhatsApp invoice photos, scanned Form 16, cheque images

How to use

When a file is attached or uploaded, run the appropriate script:

python3 scripts/skill_router.py \x3Cfile_path>

The router auto-detects the file type and calls the correct processor. It returns a structured JSON dict.

What to do with the output

Once the script returns output, use it to:

Answer the user's question about the document
Extract specific fields they asked for (GSTIN, totals, dates)
Summarise the document in plain language
Flag anomalies or missing information
Compare figures across multiple documents

Field extraction — what gets detected automatically

For invoices and PDFs:

GSTIN (supplier and recipient)
Invoice number and date
Total amount / grand total
PAN number
Email and phone

For bank statements (CSV):

Total debits and credits
Date range of transactions
Detected bank format

For Excel files:

Document type (trial balance / P&L / balance sheet / payroll / GST workings / ledger)
Sheet names and row counts
Preview of header rows

OCR notes

Text-layer PDFs are read directly (fast, accurate)
Scanned PDFs and images go through Tesseract OCR (English + Hindi)
Confidence is rated high / medium / low in the output
Always flag low-confidence results to the user and ask for confirmation on numeric fields

Trust statement

This skill runs entirely locally on your server. No data is sent to any external service. All processing happens via open-source Python libraries (PyMuPDF, pytesseract, openpyxl, pandas).

安全使用建议

This skill appears coherent and operates locally, but take standard precautions before installing/using it: 1) Install system deps (tesseract, poppler) and pip packages in an isolated environment (virtualenv/container). 2) Review/upgrade pinned dependencies for known vulnerabilities. 3) Test on non-sensitive sample files first to confirm behavior. 4) Because it processes sensitive financial documents, run it on a trusted machine or inside a restricted environment to avoid accidental data exposure. 5) The skill returns extracted text and fields — ensure downstream handling (LLM, logs) is secure and that you do not inadvertently forward sensitive data to external services.

能力评估

✓ Purpose & Capability

Name, description, and included scripts (router, pdf, image, excel, csv) align with a local CA document processing skill. Required binaries (python3, tesseract) and Python libraries match the declared functionality (OCR, PDF/excel/csv parsing).

✓ Instruction Scope

SKILL.md and the scripts only reference local file processing, reading the provided file path and returning structured JSON. There are no instructions to read unrelated system files, environment secrets, or to send data to external endpoints.

ℹ Install Mechanism

No automated install spec is provided (instruction-only), but a requirements.txt and system dependency notes are included. This is reasonable for a local Python skill; user must manually install pip deps and system packages (tesseract, poppler). Pinning of specific package versions is normal but should be reviewed for known CVEs before deployment.

✓ Credentials

The skill requests no environment variables or credentials. It only needs local binaries (tesseract) and reads files provided to it. There are no unexpected secret access patterns.

✓ Persistence & Privilege

always:false and default invocation settings. The skill does not attempt to modify other skills or system-wide configs. It runs on-demand against supplied files.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install ca-file-processor
安装完成后，直接呼叫该 Skill 的名称或使用 /ca-file-processor 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.3

Update skill.md

v1.0.2

Change format

v1.0.1

Change version

v1.0.0

Initial release of CA File Processor. - Supports automated processing of PDF, Excel (.xlsx/.xls), CSV, JPG, and PNG files commonly used by Indian CA firms. - Extracts key fields (GSTIN, invoice number, totals, dates, etc.) and tables from documents. - Auto-detects file type and routes to the correct extraction script. - Includes OCR support for scanned PDFs and images (English + Hindi). - Outputs structured JSON for easy analysis, summarisation, and answering user queries. - All processing is done locally for privacy; no data is sent externally.

元数据

Slug ca-file-processor

版本 1.0.3

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 4

常见问题