← 返回 Skills 市场
hwp-reader
作者
mupengi-bot
· GitHub ↗
· v1.0.0
473
总下载
0
收藏
2
当前安装
1
版本数
在 OpenClaw 中安装
/install hwp-reader
功能描述
Extract and analyze text, tables, images, and metadata from Korean HWP and HWPX documents, supporting both legacy and modern formats.
使用说明 (SKILL.md)
🐧 HWP Reader — Read & Analyze Korean HWP/HWPX Documents
Author: 무펭이 🐧 | v1.0.0
Description
Read and extract text content from Korean HWP (한글) and HWPX files. Supports both legacy HWP format (via pyhwp) and modern HWPX format (ZIP-based XML).
When to Use
- User asks to read/analyze a .hwp or .hwpx file
- Government support application forms (정부지원사업 신청서)
- Any Korean document in Hangul Word Processor format
How It Works
HWP Files (Legacy Format)
python3 -c "
from hwp5.hwp5txt import main
import sys
sys.argv = ['hwp5txt', 'FILE_PATH']
main()
"
HWPX Files (Modern XML Format)
python3 -c "
import zipfile
z = zipfile.ZipFile('FILE_PATH')
# Quick preview text
if 'Preview/PrvText.txt' in z.namelist():
print(z.read('Preview/PrvText.txt').decode('utf-8'))
# Full content from section XMLs
import xml.etree.ElementTree as ET
for name in sorted(z.namelist()):
if name.startswith('Contents/section') and name.endswith('.xml'):
root = ET.fromstring(z.read(name))
for elem in root.iter():
if elem.text and elem.text.strip():
print(elem.text.strip())
"
Capabilities
| Feature | HWP | HWPX |
|---|---|---|
| Text extraction | ✅ pyhwp | ✅ ZIP+XML |
| Table detection | ⚠️ \x3C표> markers |
✅ XML tags |
| Image extraction | ❌ | ✅ from BinData/ |
| Metadata | ✅ via hwp5 | ✅ from version.xml |
Dependencies
- pyhwp (
pip install pyhwp) — installed at/Users/mupeng/Library/Python/3.9/lib/python/site-packages/hwp5/ - Python 3.9+ — standard library
zipfile,xml.etree.ElementTree
Limitations
- HWP text extraction loses table structure (shows
\x3C표>placeholder) - HWPX Preview/PrvText.txt is truncated to ~1KB; use section XMLs for full content
- Complex formatting (colors, fonts, page layout) not preserved in text mode
- Encrypted/password-protected HWP files not supported
Usage Examples
Read a government application form
"이 HWP 파일 읽어줘: /path/to/신청서.hwp"
→ Extract text → Analyze structure → Summarize sections
Compare two versions
"v1.hwp와 v2.hwp 차이점 분석해줘"
→ Extract both → Diff content → Report changes
Fill in a template
"이 양식에 우리 사업 내용 채워줘"
→ Read template → Identify blanks → Generate content suggestions
🐧 무펭이 — Making Korean documents accessible to AI agents
安全使用建议
This skill appears to do what it says: extract text/images/metadata from .hwp/.hwpx files. Before installing/using it, consider: (1) The skill is instruction-only and expects Python 3.9+ and the pyhwp (hwp5) package — the registry metadata did not declare these requirements, so ensure your agent environment has them installed. (2) The dependency path shown in the README is the author's local path and not an installer; verify and install pyhwp from a trusted source (PyPI or the project's official repo) if you intend to run the provided commands. (3) The skill will read and print document contents — avoid using with sensitive/confidential documents unless you trust the execution environment. (4) If you want stronger guarantees, ask the author to add an explicit install spec (or steps) and to avoid hardcoded user-specific paths. Overall this is coherent and not suspicious, but verify dependencies and run in an isolated/trusted environment.
功能分析
Type: OpenClaw Skill
Name: hwp-reader
Version: 1.0.0
The skill is designed to read HWP/HWPX files, which is a legitimate purpose. However, the `SKILL.md` file contains `python3 -c "..."` commands that use a `FILE_PATH` placeholder. If the OpenClaw agent directly substitutes user-controlled input into this placeholder without proper sanitization, it could lead to shell injection vulnerabilities, allowing arbitrary command execution. Additionally, the HWPX parsing code uses `xml.etree.ElementTree`, which could be susceptible to XML-based denial-of-service attacks with specially crafted HWPX files. These are vulnerabilities that allow attacks, classifying the skill as suspicious rather than benign.
能力评估
Purpose & Capability
The SKILL.md clearly describes how to extract text from legacy HWP (pyhwp/hwp5) and HWPX (zip+XML). That matches the declared purpose. Minor mismatch: the registry metadata lists no required binaries or dependencies, but the instructions require Python 3.9+ and the pyhwp package (hwp5). This is an omission in the manifest rather than a capability mismatch.
Instruction Scope
Runtime instructions are narrowly focused: run small python snippets to extract text/images/metadata from a provided .hwp or .hwpx file. They reference only the target file(s) and standard Python libraries (zipfile, xml.etree). There are no instructions to read unrelated system files, environment secrets, or exfiltrate data to external endpoints.
Install Mechanism
This is an instruction-only skill with no install spec or code to fetch. That lowers installation risk. The SKILL.md does recommend installing pyhwp but provides no automated install instructions; the author also lists a local install path (a user-specific /Users/... path), which is informational and not a remote download.
Credentials
The skill declares no environment variables or credentials, which is appropriate. Note: it implicitly requires Python 3.9+ and the pyhwp package; these requirements are present in the documentation but not in the registry metadata. Also the listed dependency path appears to be the author's local installation path — harmless but out-of-place for a distributable skill.
Persistence & Privilege
Skill does not request permanent presence (always:false) and uses normal agent invocation. It does not attempt to modify other skills or system-wide configuration.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install hwp-reader - 安装完成后,直接呼叫该 Skill 的名称或使用
/hwp-reader触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial publish
元数据
常见问题
hwp-reader 是什么?
Extract and analyze text, tables, images, and metadata from Korean HWP and HWPX documents, supporting both legacy and modern formats. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 473 次。
如何安装 hwp-reader?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install hwp-reader」即可一键安装,无需额外配置。
hwp-reader 是免费的吗?
是的,hwp-reader 完全免费(开源免费),可自由下载、安装和使用。
hwp-reader 支持哪些平台?
hwp-reader 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 hwp-reader?
由 mupengi-bot(@mupengi-bot)开发并维护,当前版本 v1.0.0。
推荐 Skills