← 返回 Skills 市场
laimiaohua

GI Excel PDF Process

作者 laimiaohua · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
293
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install gi-excel-pdf-process
功能描述
Process Excel and PDF files - extract data, parse tables, generate reports. Use when working with .xlsx, .xls, .csv, .pdf files, or when the user mentions sp...
使用说明 (SKILL.md)

Excel / PDF 处理

处理 Excel 与 PDF 文件:提取数据、解析表格、生成报告。适用于数据导入导出、报表生成、文档解析等场景。

何时使用

  • 用户提供或请求处理 .xlsx.xls.csv.pdf 文件
  • 用户提到「表格」「Excel」「报表」「PDF 提取」「表单」
  • 需要从文件读取数据或生成可下载文件

可执行脚本scripts/excel_extract.py(Excel→CSV)、scripts/pdf_extract.py(PDF 文本/表格提取),依赖见 scripts/requirements.txt

Excel 处理

读取 Excel

import pandas as pd

# 读取整个文件
df = pd.read_excel("file.xlsx", sheet_name=0)  # 第一个 sheet

# 指定 sheet
df = pd.read_excel("file.xlsx", sheet_name="Sheet1")

# 读取 CSV
df = pd.read_csv("file.csv", encoding="utf-8")

写入 Excel

# 单 sheet
df.to_excel("output.xlsx", index=False)

# 多 sheet
with pd.ExcelWriter("output.xlsx") as writer:
    df1.to_excel(writer, sheet_name="汇总", index=False)
    df2.to_excel(writer, sheet_name="明细", index=False)

常用操作

  • 筛选:df[df['列名'] > 0]
  • 去重:df.drop_duplicates(subset=['列名'])
  • 合并:pd.concat([df1, df2])pd.merge(df1, df2, on='key')
  • 透视:df.pivot_table(values='val', index='row', columns='col', aggfunc='sum')

依赖

pip install pandas openpyxl  # xlsx 需要 openpyxl

PDF 处理

提取文本

import pdfplumber

with pdfplumber.open("file.pdf") as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        if text:
            print(text)

提取表格

with pdfplumber.open("file.pdf") as pdf:
    page = pdf.pages[0]
    tables = page.extract_tables()
    for table in tables:
        # table 为二维列表
        for row in table:
            print(row)

依赖

pip install pdfplumber

若需 OCR(扫描版 PDF):pip install pdf2image pytesseract,并安装 Tesseract。

报告生成流程

  1. 数据准备:从 API/DB 或 Excel 获取数据,用 pandas 清洗
  2. 计算/聚合:按业务逻辑生成汇总表
  3. 输出
    • Excel:df.to_excel()
    • PDF:可用 reportlab 或先生成 Excel 再转 PDF

注意事项

  • 大文件:分块读取或限制行数,避免内存溢出
  • 编码:CSV 常见 utf-8gbk,先尝试 utf-8
  • 空值:df.fillna(0)df.dropna() 按需处理
  • 日期:pd.to_datetime(df['date_col']) 统一格式
安全使用建议
This skill appears coherent and implements what it describes. Before installing or running: (1) ensure you install the Python requirements in a controlled environment (virtualenv) — pip packages are from PyPI; (2) if you need OCR, install the Tesseract binary separately as noted; (3) only run the scripts on files you trust or have isolated — while pandas/pdfplumber don't execute Excel macros, maliciously crafted files can still cause issues; (4) review and run the scripts locally to confirm behavior before granting the agent access to your files.
功能分析
Type: OpenClaw Skill Name: gi-excel-pdf-process Version: 1.0.0 The skill bundle provides standard utilities for processing Excel and PDF files using well-known libraries like pandas and pdfplumber. The Python scripts (excel_extract.py and pdf_extract.py) and the SKILL.md instructions are consistent with the stated purpose of data extraction and report generation, showing no signs of malicious intent, data exfiltration, or prompt injection.
能力评估
Purpose & Capability
Name/description match the included scripts, SKILL.md, and requirements.txt. The scripts perform Excel→CSV and PDF text/table extraction as advertised; required Python packages are appropriate.
Instruction Scope
SKILL.md stays within the stated purpose (reading/writing spreadsheets and extracting PDFs). It mentions optional OCR (pdf2image/pytesseract) which requires installing Tesseract (an external binary) — this is expected but the skill does not provide that binary or an install step.
Install Mechanism
No install spec; this is an instruction-only skill with a requirements.txt listing common Python packages (pandas, openpyxl, pdfplumber). These are standard for the task and come from PyPI.
Credentials
The skill requests no environment variables, credentials, or config paths. The scope of access (local files provided by the user) is proportional to the functionality.
Persistence & Privilege
Does not request always:true or modify other skills or system settings. It runs as-invoked and has no privileged persistence.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install gi-excel-pdf-process
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /gi-excel-pdf-process 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release. Gravitech Innovations.
元数据
Slug gi-excel-pdf-process
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

GI Excel PDF Process 是什么?

Process Excel and PDF files - extract data, parse tables, generate reports. Use when working with .xlsx, .xls, .csv, .pdf files, or when the user mentions sp... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 293 次。

如何安装 GI Excel PDF Process?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install gi-excel-pdf-process」即可一键安装,无需额外配置。

GI Excel PDF Process 是免费的吗?

是的,GI Excel PDF Process 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

GI Excel PDF Process 支持哪些平台?

GI Excel PDF Process 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 GI Excel PDF Process?

由 laimiaohua(@laimiaohua)开发并维护,当前版本 v1.0.0。

💬 留言讨论