← Back to Skills Marketplace
Ielts Extractor
by
Xuanyu Chen
· GitHub ↗
· v1.1.0
291
Downloads
0
Stars
1
Active Installs
7
Versions
Install in OpenClaw
/install ielts-extractor
Description
自动从剑桥雅思PDF中提取多页连续阅读文章和题目内容,支持双栏排版并保存为结构化JSON格式。
README (SKILL.md)
IELTS 试题数据提取 Skill
概述
从剑桥雅思 PDF 提取阅读文章和题目。
触发条件
用户要求"提取雅思试题"时使用。
流程(4步)
1. 定位 PDF 和页码
- 查找 "Test X" + "READING PASSAGE Y"
- 记录起始页码
2. 提取文章
- 必须连续提取多页
- 处理两栏布局
- 检查字数: 1500-2500词/篇
详见: references/pdf-extraction.md
3. 提取题目
- 按大题分组
- 选择正确题型
- 完整选项(单选每题A-D)
详见: references/question-types.md
4. 保存 JSON
- 使用 content 字段
- 选项格式正确
详见: references/json-format.md
数据文件
ielts-tracker/data/tests/cambridge-{4,5,6}/test-{1-4}/test.json
ielts-tracker/ielts-app/public/images/
题型速查
| 题型 | type |
|---|---|
| 标题配对 | matching-headings |
| 判断题 | yes-no-not-given |
| 单选 | multiple-choice-single |
| 多选 | multiple-answer |
| 表格 | table-completion |
| 填空 | fill-blank-summary |
Usage Guidance
This skill is internally inconsistent rather than obviously malicious: its description and docs promise full passage+question extraction and JSON output, but the shipped code only extracts concatenated article text and prints it. Before using, ask the author to (1) implement or remove question-parsing and JSON write logic, (2) declare/install dependencies (pdfplumber, and optionally PyMuPDF/fitz), and (3) confirm where files will be written to avoid overwriting local project files. Test the script on non-sensitive sample PDFs in a sandbox or disposable directory to verify behavior. If you don't trust the author, do not run the code on sensitive systems or give it write access to important directories.
Capability Analysis
Type: OpenClaw Skill
Name: ielts-extractor
Version: 1.1.0
The skill is a legitimate tool for extracting IELTS reading passages and questions from PDF files into structured JSON data. The Python script (extract_article.py) uses standard libraries like pdfplumber for text extraction and includes logic for handling two-column layouts, while the documentation (SKILL.md and references/) provides clear, task-aligned instructions for the AI agent without any signs of prompt injection, data exfiltration, or malicious execution.
Capability Assessment
Purpose & Capability
The name/description promise: extract multi-page Cambridge IELTS passages and questions, handle two-column layout, and save structured JSON. The included Python (extract_article.py) only locates a passage and concatenates page text (no question parsing, no JSON output, and the two-column helper is defined but not used). References describe image extraction with PyMuPDF and JSON schema, but those behaviors are not implemented in the code.
Instruction Scope
SKILL.md describes locating passages, extracting multi-page text, handling two-column layout, extracting question groups and options, and saving JSON to specific ielts-tracker paths. The runtime instructions reference writing files into project directories (ielts-tracker/.../public/images/) and expect question extraction, but the actual code does not perform those writes or parse questions. Instructions don't request unrelated credentials or environment variables.
Install Mechanism
No install spec (instruction-only) — lower risk — but the code imports pdfplumber and references fitz/PyMuPDF in docs. Dependencies are not declared; runtime may fail or require installing third-party packages. No external downloads or suspicious URLs are present.
Credentials
The skill requests no environment variables, no credentials, and no config paths. However, SKILL.md and references expect writing output files into project paths which could overwrite local files if run with file-system access; this is expected for a data-extraction tool but worth noting.
Persistence & Privilege
always is false and the skill is user-invocable. It does not request permanent agent inclusion or modify other skills. There is no autonomously elevated privilege requested.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install ielts-extractor - After installation, invoke the skill by name or use
/ielts-extractor - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
Version 1.1.0 – Adds modular references and streamlines documentation.
- Separated documentation into concise reference files: `json-format.md`, `pdf-extraction.md`, and `question-types.md`
- Simplified and condensed main documentation in `SKILL.md` for clarity and ease of use
- Added quick-reference tables for question types and data paths
- Improved instructions for PDF extraction, article length, and options formatting
- Enhanced guidance now points to relevant reference files for in-depth details
v1.0.5
- Expanded and clarified the list of IELTS reading question types with detailed type mapping and examples.
- Added concise, tabular overviews of main question formats, including Chinese names and answer format notes.
- Provided new sample JSON structures for complex types such as table completion and label matching.
- Included a checklist for writers to ensure content length, type correctness, and answer integrity.
- Reorganized sections for better reference and practical extraction guidance.
v1.0.4
- Documentation reformat and cleanup in SKILL.md.
- No logic or code changes; content unchanged except for formatting.
- Improved readability by removing duplicate or misaligned code block endings.
v1.0.3
- Expanded and clarified the question type definitions with a summary table, including rules for common IELTS reading tasks.
- Added detailed guidance on distinguishing between matching, single-choice, multiple-choice, and fill-blank questions.
- Provided new, precise JSON format examples for each question type, indicating where to place options for group and per-question scenarios.
- Highlighted frequent mistakes in structuring question data and options, especially for single-choice and matching types.
- Improved readability with summary boxes, warnings, and step-by-step breakdowns for data entry.
v1.0.2
- Added explicit formatting rules for multiple-choice and matching question options.
- Included correct and incorrect examples for the options structure in question groups.
- Clarified that options should not be nested inside individual questions.
- Improved examples for question group JSON structure and standardization.
- No code or logic changes; documentation update only.
v1.0.1
- Enhanced question extraction: introduced `question_groups` with structure for group title, type, instruction, passage fragment, and grouped questions.
- Added detailed instructions and example JSON for grouping questions by type (fill-blank, matching, etc.).
- Updated test status to indicate "分组" (grouped) completion for Cambridge 4 Test 2.
- Expanded guidance on extraction techniques and provided tips for handling complex question layouts.
- Included new command-line example for querying tests by ID.
v1.0.0
- Initial release of "IELTS 试题数据提取 Skill" for automated extraction of IELTS reading passages and questions from Cambridge IELTS PDFs.
- Supports multi-page extraction, proper handling of two-column layouts, and consistent JSON output formatting.
- Detailed workflow, file storage conventions, and typical PDF paths documented.
- Progress table included to track extraction status of various Cambridge test sets.
Metadata
Frequently Asked Questions
What is Ielts Extractor?
自动从剑桥雅思PDF中提取多页连续阅读文章和题目内容,支持双栏排版并保存为结构化JSON格式。 It is an AI Agent Skill for Claude Code / OpenClaw, with 291 downloads so far.
How do I install Ielts Extractor?
Run "/install ielts-extractor" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Ielts Extractor free?
Yes, Ielts Extractor is completely free (open-source). You can download, install and use it at no cost.
Which platforms does Ielts Extractor support?
Ielts Extractor is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Ielts Extractor?
It is built and maintained by Xuanyu Chen (@lava-chen); the current version is v1.1.0.
More Skills