← 返回 Skills 市场
lava-chen

Ielts Extractor

作者 Xuanyu Chen · GitHub ↗ · v1.1.0
cross-platform ⚠ suspicious
291
总下载
0
收藏
1
当前安装
7
版本数
在 OpenClaw 中安装
/install ielts-extractor
功能描述
自动从剑桥雅思PDF中提取多页连续阅读文章和题目内容,支持双栏排版并保存为结构化JSON格式。
使用说明 (SKILL.md)

IELTS 试题数据提取 Skill

概述

从剑桥雅思 PDF 提取阅读文章和题目。

触发条件

用户要求"提取雅思试题"时使用。

流程(4步)

1. 定位 PDF 和页码

  • 查找 "Test X" + "READING PASSAGE Y"
  • 记录起始页码

2. 提取文章

  • 必须连续提取多页
  • 处理两栏布局
  • 检查字数: 1500-2500词/篇

详见: references/pdf-extraction.md

3. 提取题目

  • 按大题分组
  • 选择正确题型
  • 完整选项(单选每题A-D)

详见: references/question-types.md

4. 保存 JSON

  • 使用 content 字段
  • 选项格式正确

详见: references/json-format.md

数据文件

ielts-tracker/data/tests/cambridge-{4,5,6}/test-{1-4}/test.json
ielts-tracker/ielts-app/public/images/

题型速查

题型 type
标题配对 matching-headings
判断题 yes-no-not-given
单选 multiple-choice-single
多选 multiple-answer
表格 table-completion
填空 fill-blank-summary
安全使用建议
This skill is internally inconsistent rather than obviously malicious: its description and docs promise full passage+question extraction and JSON output, but the shipped code only extracts concatenated article text and prints it. Before using, ask the author to (1) implement or remove question-parsing and JSON write logic, (2) declare/install dependencies (pdfplumber, and optionally PyMuPDF/fitz), and (3) confirm where files will be written to avoid overwriting local project files. Test the script on non-sensitive sample PDFs in a sandbox or disposable directory to verify behavior. If you don't trust the author, do not run the code on sensitive systems or give it write access to important directories.
功能分析
Type: OpenClaw Skill Name: ielts-extractor Version: 1.1.0 The skill is a legitimate tool for extracting IELTS reading passages and questions from PDF files into structured JSON data. The Python script (extract_article.py) uses standard libraries like pdfplumber for text extraction and includes logic for handling two-column layouts, while the documentation (SKILL.md and references/) provides clear, task-aligned instructions for the AI agent without any signs of prompt injection, data exfiltration, or malicious execution.
能力评估
Purpose & Capability
The name/description promise: extract multi-page Cambridge IELTS passages and questions, handle two-column layout, and save structured JSON. The included Python (extract_article.py) only locates a passage and concatenates page text (no question parsing, no JSON output, and the two-column helper is defined but not used). References describe image extraction with PyMuPDF and JSON schema, but those behaviors are not implemented in the code.
Instruction Scope
SKILL.md describes locating passages, extracting multi-page text, handling two-column layout, extracting question groups and options, and saving JSON to specific ielts-tracker paths. The runtime instructions reference writing files into project directories (ielts-tracker/.../public/images/) and expect question extraction, but the actual code does not perform those writes or parse questions. Instructions don't request unrelated credentials or environment variables.
Install Mechanism
No install spec (instruction-only) — lower risk — but the code imports pdfplumber and references fitz/PyMuPDF in docs. Dependencies are not declared; runtime may fail or require installing third-party packages. No external downloads or suspicious URLs are present.
Credentials
The skill requests no environment variables, no credentials, and no config paths. However, SKILL.md and references expect writing output files into project paths which could overwrite local files if run with file-system access; this is expected for a data-extraction tool but worth noting.
Persistence & Privilege
always is false and the skill is user-invocable. It does not request permanent agent inclusion or modify other skills. There is no autonomously elevated privilege requested.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install ielts-extractor
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /ielts-extractor 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
Version 1.1.0 – Adds modular references and streamlines documentation. - Separated documentation into concise reference files: `json-format.md`, `pdf-extraction.md`, and `question-types.md` - Simplified and condensed main documentation in `SKILL.md` for clarity and ease of use - Added quick-reference tables for question types and data paths - Improved instructions for PDF extraction, article length, and options formatting - Enhanced guidance now points to relevant reference files for in-depth details
v1.0.5
- Expanded and clarified the list of IELTS reading question types with detailed type mapping and examples. - Added concise, tabular overviews of main question formats, including Chinese names and answer format notes. - Provided new sample JSON structures for complex types such as table completion and label matching. - Included a checklist for writers to ensure content length, type correctness, and answer integrity. - Reorganized sections for better reference and practical extraction guidance.
v1.0.4
- Documentation reformat and cleanup in SKILL.md. - No logic or code changes; content unchanged except for formatting. - Improved readability by removing duplicate or misaligned code block endings.
v1.0.3
- Expanded and clarified the question type definitions with a summary table, including rules for common IELTS reading tasks. - Added detailed guidance on distinguishing between matching, single-choice, multiple-choice, and fill-blank questions. - Provided new, precise JSON format examples for each question type, indicating where to place options for group and per-question scenarios. - Highlighted frequent mistakes in structuring question data and options, especially for single-choice and matching types. - Improved readability with summary boxes, warnings, and step-by-step breakdowns for data entry.
v1.0.2
- Added explicit formatting rules for multiple-choice and matching question options. - Included correct and incorrect examples for the options structure in question groups. - Clarified that options should not be nested inside individual questions. - Improved examples for question group JSON structure and standardization. - No code or logic changes; documentation update only.
v1.0.1
- Enhanced question extraction: introduced `question_groups` with structure for group title, type, instruction, passage fragment, and grouped questions. - Added detailed instructions and example JSON for grouping questions by type (fill-blank, matching, etc.). - Updated test status to indicate "分组" (grouped) completion for Cambridge 4 Test 2. - Expanded guidance on extraction techniques and provided tips for handling complex question layouts. - Included new command-line example for querying tests by ID.
v1.0.0
- Initial release of "IELTS 试题数据提取 Skill" for automated extraction of IELTS reading passages and questions from Cambridge IELTS PDFs. - Supports multi-page extraction, proper handling of two-column layouts, and consistent JSON output formatting. - Detailed workflow, file storage conventions, and typical PDF paths documented. - Progress table included to track extraction status of various Cambridge test sets.
元数据
Slug ielts-extractor
版本 1.1.0
许可证
累计安装 1
当前安装数 1
历史版本数 7
常见问题

Ielts Extractor 是什么?

自动从剑桥雅思PDF中提取多页连续阅读文章和题目内容,支持双栏排版并保存为结构化JSON格式。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 291 次。

如何安装 Ielts Extractor?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ielts-extractor」即可一键安装,无需额外配置。

Ielts Extractor 是免费的吗?

是的,Ielts Extractor 完全免费(开源免费),可自由下载、安装和使用。

Ielts Extractor 支持哪些平台?

Ielts Extractor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Ielts Extractor?

由 Xuanyu Chen(@lava-chen)开发并维护,当前版本 v1.1.0。

💬 留言讨论