← Back to Skills Marketplace
laimiaohua

GI Excel PDF Process

by laimiaohua · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
293
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install gi-excel-pdf-process
Description
Process Excel and PDF files - extract data, parse tables, generate reports. Use when working with .xlsx, .xls, .csv, .pdf files, or when the user mentions sp...
README (SKILL.md)

Excel / PDF 处理

处理 Excel 与 PDF 文件:提取数据、解析表格、生成报告。适用于数据导入导出、报表生成、文档解析等场景。

何时使用

  • 用户提供或请求处理 .xlsx.xls.csv.pdf 文件
  • 用户提到「表格」「Excel」「报表」「PDF 提取」「表单」
  • 需要从文件读取数据或生成可下载文件

可执行脚本scripts/excel_extract.py(Excel→CSV)、scripts/pdf_extract.py(PDF 文本/表格提取),依赖见 scripts/requirements.txt

Excel 处理

读取 Excel

import pandas as pd

# 读取整个文件
df = pd.read_excel("file.xlsx", sheet_name=0)  # 第一个 sheet

# 指定 sheet
df = pd.read_excel("file.xlsx", sheet_name="Sheet1")

# 读取 CSV
df = pd.read_csv("file.csv", encoding="utf-8")

写入 Excel

# 单 sheet
df.to_excel("output.xlsx", index=False)

# 多 sheet
with pd.ExcelWriter("output.xlsx") as writer:
    df1.to_excel(writer, sheet_name="汇总", index=False)
    df2.to_excel(writer, sheet_name="明细", index=False)

常用操作

  • 筛选:df[df['列名'] > 0]
  • 去重:df.drop_duplicates(subset=['列名'])
  • 合并:pd.concat([df1, df2])pd.merge(df1, df2, on='key')
  • 透视:df.pivot_table(values='val', index='row', columns='col', aggfunc='sum')

依赖

pip install pandas openpyxl  # xlsx 需要 openpyxl

PDF 处理

提取文本

import pdfplumber

with pdfplumber.open("file.pdf") as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        if text:
            print(text)

提取表格

with pdfplumber.open("file.pdf") as pdf:
    page = pdf.pages[0]
    tables = page.extract_tables()
    for table in tables:
        # table 为二维列表
        for row in table:
            print(row)

依赖

pip install pdfplumber

若需 OCR(扫描版 PDF):pip install pdf2image pytesseract,并安装 Tesseract。

报告生成流程

  1. 数据准备:从 API/DB 或 Excel 获取数据,用 pandas 清洗
  2. 计算/聚合:按业务逻辑生成汇总表
  3. 输出
    • Excel:df.to_excel()
    • PDF:可用 reportlab 或先生成 Excel 再转 PDF

注意事项

  • 大文件:分块读取或限制行数,避免内存溢出
  • 编码:CSV 常见 utf-8gbk,先尝试 utf-8
  • 空值:df.fillna(0)df.dropna() 按需处理
  • 日期:pd.to_datetime(df['date_col']) 统一格式
Usage Guidance
This skill appears coherent and implements what it describes. Before installing or running: (1) ensure you install the Python requirements in a controlled environment (virtualenv) — pip packages are from PyPI; (2) if you need OCR, install the Tesseract binary separately as noted; (3) only run the scripts on files you trust or have isolated — while pandas/pdfplumber don't execute Excel macros, maliciously crafted files can still cause issues; (4) review and run the scripts locally to confirm behavior before granting the agent access to your files.
Capability Analysis
Type: OpenClaw Skill Name: gi-excel-pdf-process Version: 1.0.0 The skill bundle provides standard utilities for processing Excel and PDF files using well-known libraries like pandas and pdfplumber. The Python scripts (excel_extract.py and pdf_extract.py) and the SKILL.md instructions are consistent with the stated purpose of data extraction and report generation, showing no signs of malicious intent, data exfiltration, or prompt injection.
Capability Assessment
Purpose & Capability
Name/description match the included scripts, SKILL.md, and requirements.txt. The scripts perform Excel→CSV and PDF text/table extraction as advertised; required Python packages are appropriate.
Instruction Scope
SKILL.md stays within the stated purpose (reading/writing spreadsheets and extracting PDFs). It mentions optional OCR (pdf2image/pytesseract) which requires installing Tesseract (an external binary) — this is expected but the skill does not provide that binary or an install step.
Install Mechanism
No install spec; this is an instruction-only skill with a requirements.txt listing common Python packages (pandas, openpyxl, pdfplumber). These are standard for the task and come from PyPI.
Credentials
The skill requests no environment variables, credentials, or config paths. The scope of access (local files provided by the user) is proportional to the functionality.
Persistence & Privilege
Does not request always:true or modify other skills or system settings. It runs as-invoked and has no privileged persistence.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install gi-excel-pdf-process
  3. After installation, invoke the skill by name or use /gi-excel-pdf-process
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release. Gravitech Innovations.
Metadata
Slug gi-excel-pdf-process
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is GI Excel PDF Process?

Process Excel and PDF files - extract data, parse tables, generate reports. Use when working with .xlsx, .xls, .csv, .pdf files, or when the user mentions sp... It is an AI Agent Skill for Claude Code / OpenClaw, with 293 downloads so far.

How do I install GI Excel PDF Process?

Run "/install gi-excel-pdf-process" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is GI Excel PDF Process free?

Yes, GI Excel PDF Process is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does GI Excel PDF Process support?

GI Excel PDF Process is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created GI Excel PDF Process?

It is built and maintained by laimiaohua (@laimiaohua); the current version is v1.0.0.

💬 Comments