/install invoice-extractor
\r \r
Invoice Extractor\r
\r Extract invoice information from images (PNG, JPG) and PDF files, then export to Excel format.\r \r
Capabilities\r
\r
- Multi-format support: PNG, JPG, JPEG, BMP, TIFF, PDF\r
- High accuracy: Uses Baidu OCR API specialized for invoice recognition\r
- Complete fields: Extracts all invoice fields including buyer/seller info, amounts, items\r
- Excel export: Formatted Excel output with summary and detail sheets\r
- Flexible input: Single file, multiple files, or entire directory processing\r
- Batch processing: Process hundreds of invoices in one command\r
- Preview mode: List files before processing\r \r
Prerequisites\r
\r
- Baidu Cloud OCR API credentials (free tier: 50,000 requests/day)\r
- Python environment with required packages\r \r
Quick Start\r
\r
1. Setup Baidu OCR\r
\r Get API credentials from https://cloud.baidu.com/product/ocr:\r
- Register/login to Baidu Cloud\r
- Create an application\r
- Get API Key and Secret Key\r \r
2. Configure\r
\r
Create config.txt in the project root:\r
BAIDU_API_KEY=your_api_key_here\r
BAIDU_SECRET_KEY=your_secret_key_here\r
```\r
\r
Or run the setup wizard:\r
```bash\r
python main_baidu.py --setup\r
```\r
\r
### 3. Run\r
\r
**Process a single file:**\r
```bash\r
python main_baidu.py -f invoice.pdf\r
```\r
\r
**Process multiple files:**\r
```bash\r
python main_baidu.py -f invoice1.pdf -f invoice2.png\r
```\r
\r
**Process entire directory:**\r
```bash\r
python main_baidu.py -i ./fp\r
```\r
\r
**Mixed mode (directory + extra files):**\r
```bash\r
python main_baidu.py -i ./fp -f extra_invoice.pdf\r
```\r
\r
Output will be saved to `output/` directory as Excel file.\r
\r
## Workflow\r
\r
```\r
Task Progress:\r
- [ ] Check prerequisites (Baidu API credentials)\r
- [ ] Choose input method (single file / multiple files / directory)\r
- [ ] Scan and collect invoice files\r
- [ ] Preview files (optional with --list)\r
- [ ] Process each file with Baidu OCR\r
- [ ] Parse invoice fields\r
- [ ] Export to Excel\r
- [ ] Verify output\r
```\r
\r
## Input Methods\r
\r
### Single File\r
Process one specific invoice file:\r
```bash\r
python main_baidu.py -f invoice.pdf\r
python main_baidu.py -f "path/to/invoice.png"\r
```\r
\r
### Multiple Files\r
Process several specific files:\r
```bash\r
python main_baidu.py -f file1.pdf -f file2.png -f file3.jpg\r
```\r
\r
### Entire Directory\r
Process all invoice files in a directory (recursive):\r
```bash\r
python main_baidu.py -i ./my_invoices\r
python main_baidu.py -i "/path/to/invoice/folder"\r
```\r
\r
### Mixed Mode\r
Combine directory and individual files:\r
```bash\r
python main_baidu.py -i ./fp -f ./extra/invoice.pdf\r
```\r
\r
### Preview Mode\r
List files without processing:\r
```bash\r
python main_baidu.py -i ./fp --list\r
```\r
\r
## Extracted Fields\r
\r
### Basic Information\r
- Invoice code (发票代码)\r
- Invoice number (发票号码)\r
- Invoice date (开票日期)\r
- Invoice type (发票类型)\r
\r
### Buyer Information\r
- Name (购买方名称)\r
- Tax number (纳税人识别号)\r
- Address and phone (地址电话)\r
- Bank account (开户行及账号)\r
\r
### Seller Information\r
- Name (销售方名称)\r
- Tax number (纳税人识别号)\r
- Address and phone (地址电话)\r
- Bank account (开户行及账号)\r
\r
### Amounts\r
- Total amount (合计金额)\r
- Total tax (合计税额)\r
- Amount with tax (价税合计)\r
\r
### Items\r
- Product name (货物名称)\r
- Specification (规格型号)\r
- Unit (单位)\r
- Quantity (数量)\r
- Unit price (单价)\r
- Amount (金额)\r
- Tax rate (税率)\r
- Tax amount (税额)\r
\r
## Command Line Options\r
\r
```bash\r
python main_baidu.py [options]\r
\r
Input Options:\r
-f FILE, --file FILE Specify invoice file (can be used multiple times)\r
-i DIR, --input DIR Input directory (default: fp)\r
\r
Output Options:\r
-o DIR, --output DIR Output directory (default: output)\r
-n NAME, --name NAME Output filename prefix (default: 发票信息)\r
\r
Authentication Options:\r
--api-key KEY Baidu API Key\r
--secret-key KEY Baidu Secret Key\r
\r
Other Options:\r
--setup Run configuration wizard\r
--list List files to be processed without processing\r
-h, --help Show help\r
```\r
\r
## Usage Examples\r
\r
### Example 1: Single File\r
```bash\r
python main_baidu.py -f "invoice.pdf"\r
```\r
\r
### Example 2: Multiple Files\r
```bash\r
python main_baidu.py -f "1.pdf" -f "2.png" -f "3.jpg"\r
```\r
\r
### Example 3: Entire Directory\r
```bash\r
python main_baidu.py -i "./2024_invoices"\r
```\r
\r
### Example 4: Preview Before Processing\r
```bash\r
python main_baidu.py -i ./fp --list\r
# Then process:\r
python main_baidu.py -i ./fp\r
```\r
\r
### Example 5: Mixed Input\r
```bash\r
python main_baidu.py -i ./fp -f ./urgent/invoice.pdf -o ./output -n "March_2024"\r
```\r
\r
### Example 6: Custom Output\r
```bash\r
python main_baidu.py -i ./fp -o ./reports -n "Q1_Invoice_Summary"\r
```\r
\r
## Project Structure\r
\r
```\r
.\r
├── fp/ # Place invoice files here\r
├── output/ # Excel output directory\r
├── src/\r
│ ├── main_baidu.py # Main entry point\r
│ ├── baidu_ocr_extractor.py # Baidu OCR wrapper\r
│ ├── invoice_model.py # Data models\r
│ ├── excel_exporter.py # Excel export\r
│ └── config.py # Configuration\r
├── scripts/ # Utility scripts\r
│ ├── batch_process.py # Batch processing helper\r
│ └── verify_export.py # Verify Excel export\r
├── config.txt # API credentials\r
├── requirements.txt # Dependencies\r
├── SKILL.md # This file\r
├── setup.md # Detailed setup guide\r
└── examples.md # Usage examples\r
```\r
\r
## Utility Scripts\r
\r
### Batch Processing Helper\r
```bash\r
python scripts/batch_process.py /path/to/invoices\r
```\r
\r
### Verify Export\r
```bash\r
python scripts/verify_export.py output/invoice_info.xlsx\r
```\r
\r
## Error Handling\r
\r
Common issues and solutions:\r
\r
**"Baidu OCR authentication failed"**\r
- Check API Key and Secret Key in config.txt\r
- Verify credentials are correct in Baidu Cloud console\r
\r
**"No invoice files found"**\r
- Ensure files are in the specified directory\r
- Check file formats (supported: png, jpg, jpeg, bmp, tiff, pdf)\r
- Use `--list` to see what files are detected\r
\r
**"Image format error"**\r
- PDF files are automatically converted to images\r
- Ensure PDF is not corrupted or password-protected\r
\r
**"File not found"**\r
- Check file path is correct\r
- Use quotes for paths with spaces: `"path/to/file name.pdf"`\r
\r
## Advanced Usage\r
\r
### Environment Variables\r
Set credentials via environment:\r
```bash\r
export BAIDU_API_KEY="your_key"\r
export BAIDU_SECRET_KEY="your_secret"\r
```\r
\r
### Batch Processing Script\r
Create a script for monthly processing:\r
```bash\r
#!/bin/bash\r
MONTH=$(date +%Y%m)\r
python main_baidu.py \\r
-i "/invoices/$MONTH" \\r
-o "/reports/$MONTH" \\r
-n "Invoice_Report_$MONTH"\r
```\r
\r
## Additional Resources\r
\r
- For detailed setup instructions, see [setup.md](setup.md)\r
- For more examples, see [examples.md](examples.md)\r
- For API documentation, visit https://cloud.baidu.com/doc/OCR/index.html\r
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install invoice-extractor - After installation, invoke the skill by name or use
/invoice-extractor - Provide required inputs per the skill's parameter spec and get structured output
What is Invoice-Recognition?
Extract invoice information from images and PDF files using Baidu OCR API, export to Excel. Supports single file, multiple files, or entire directory process... It is an AI Agent Skill for Claude Code / OpenClaw, with 258 downloads so far.
How do I install Invoice-Recognition?
Run "/install invoice-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Invoice-Recognition free?
Yes, Invoice-Recognition is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Invoice-Recognition support?
Invoice-Recognition is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Invoice-Recognition?
It is built and maintained by aitanjp (@aitanjp); the current version is v1.0.0.