← Back to Skills Marketplace
mupengi-bot

hwp-reader

by mupengi-bot · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
473
Downloads
0
Stars
2
Active Installs
1
Versions
Install in OpenClaw
/install hwp-reader
Description
Extract and analyze text, tables, images, and metadata from Korean HWP and HWPX documents, supporting both legacy and modern formats.
README (SKILL.md)

🐧 HWP Reader — Read & Analyze Korean HWP/HWPX Documents

Author: 무펭이 🐧 | v1.0.0

Description

Read and extract text content from Korean HWP (한글) and HWPX files. Supports both legacy HWP format (via pyhwp) and modern HWPX format (ZIP-based XML).

When to Use

  • User asks to read/analyze a .hwp or .hwpx file
  • Government support application forms (정부지원사업 신청서)
  • Any Korean document in Hangul Word Processor format

How It Works

HWP Files (Legacy Format)

python3 -c "
from hwp5.hwp5txt import main
import sys
sys.argv = ['hwp5txt', 'FILE_PATH']
main()
"

HWPX Files (Modern XML Format)

python3 -c "
import zipfile
z = zipfile.ZipFile('FILE_PATH')

# Quick preview text
if 'Preview/PrvText.txt' in z.namelist():
    print(z.read('Preview/PrvText.txt').decode('utf-8'))

# Full content from section XMLs
import xml.etree.ElementTree as ET
for name in sorted(z.namelist()):
    if name.startswith('Contents/section') and name.endswith('.xml'):
        root = ET.fromstring(z.read(name))
        for elem in root.iter():
            if elem.text and elem.text.strip():
                print(elem.text.strip())
"

Capabilities

Feature HWP HWPX
Text extraction ✅ pyhwp ✅ ZIP+XML
Table detection ⚠️ \x3C표> markers ✅ XML tags
Image extraction ✅ from BinData/
Metadata ✅ via hwp5 ✅ from version.xml

Dependencies

  • pyhwp (pip install pyhwp) — installed at /Users/mupeng/Library/Python/3.9/lib/python/site-packages/hwp5/
  • Python 3.9+ — standard library zipfile, xml.etree.ElementTree

Limitations

  • HWP text extraction loses table structure (shows \x3C표> placeholder)
  • HWPX Preview/PrvText.txt is truncated to ~1KB; use section XMLs for full content
  • Complex formatting (colors, fonts, page layout) not preserved in text mode
  • Encrypted/password-protected HWP files not supported

Usage Examples

Read a government application form

"이 HWP 파일 읽어줘: /path/to/신청서.hwp"
→ Extract text → Analyze structure → Summarize sections

Compare two versions

"v1.hwp와 v2.hwp 차이점 분석해줘"
→ Extract both → Diff content → Report changes

Fill in a template

"이 양식에 우리 사업 내용 채워줘"
→ Read template → Identify blanks → Generate content suggestions

🐧 무펭이 — Making Korean documents accessible to AI agents

Usage Guidance
This skill appears to do what it says: extract text/images/metadata from .hwp/.hwpx files. Before installing/using it, consider: (1) The skill is instruction-only and expects Python 3.9+ and the pyhwp (hwp5) package — the registry metadata did not declare these requirements, so ensure your agent environment has them installed. (2) The dependency path shown in the README is the author's local path and not an installer; verify and install pyhwp from a trusted source (PyPI or the project's official repo) if you intend to run the provided commands. (3) The skill will read and print document contents — avoid using with sensitive/confidential documents unless you trust the execution environment. (4) If you want stronger guarantees, ask the author to add an explicit install spec (or steps) and to avoid hardcoded user-specific paths. Overall this is coherent and not suspicious, but verify dependencies and run in an isolated/trusted environment.
Capability Analysis
Type: OpenClaw Skill Name: hwp-reader Version: 1.0.0 The skill is designed to read HWP/HWPX files, which is a legitimate purpose. However, the `SKILL.md` file contains `python3 -c "..."` commands that use a `FILE_PATH` placeholder. If the OpenClaw agent directly substitutes user-controlled input into this placeholder without proper sanitization, it could lead to shell injection vulnerabilities, allowing arbitrary command execution. Additionally, the HWPX parsing code uses `xml.etree.ElementTree`, which could be susceptible to XML-based denial-of-service attacks with specially crafted HWPX files. These are vulnerabilities that allow attacks, classifying the skill as suspicious rather than benign.
Capability Assessment
Purpose & Capability
The SKILL.md clearly describes how to extract text from legacy HWP (pyhwp/hwp5) and HWPX (zip+XML). That matches the declared purpose. Minor mismatch: the registry metadata lists no required binaries or dependencies, but the instructions require Python 3.9+ and the pyhwp package (hwp5). This is an omission in the manifest rather than a capability mismatch.
Instruction Scope
Runtime instructions are narrowly focused: run small python snippets to extract text/images/metadata from a provided .hwp or .hwpx file. They reference only the target file(s) and standard Python libraries (zipfile, xml.etree). There are no instructions to read unrelated system files, environment secrets, or exfiltrate data to external endpoints.
Install Mechanism
This is an instruction-only skill with no install spec or code to fetch. That lowers installation risk. The SKILL.md does recommend installing pyhwp but provides no automated install instructions; the author also lists a local install path (a user-specific /Users/... path), which is informational and not a remote download.
Credentials
The skill declares no environment variables or credentials, which is appropriate. Note: it implicitly requires Python 3.9+ and the pyhwp package; these requirements are present in the documentation but not in the registry metadata. Also the listed dependency path appears to be the author's local installation path — harmless but out-of-place for a distributable skill.
Persistence & Privilege
Skill does not request permanent presence (always:false) and uses normal agent invocation. It does not attempt to modify other skills or system-wide configuration.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install hwp-reader
  3. After installation, invoke the skill by name or use /hwp-reader
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial publish
Metadata
Slug hwp-reader
Version 1.0.0
License
All-time Installs 2
Active Installs 2
Total Versions 1
Frequently Asked Questions

What is hwp-reader?

Extract and analyze text, tables, images, and metadata from Korean HWP and HWPX documents, supporting both legacy and modern formats. It is an AI Agent Skill for Claude Code / OpenClaw, with 473 downloads so far.

How do I install hwp-reader?

Run "/install hwp-reader" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is hwp-reader free?

Yes, hwp-reader is completely free (open-source). You can download, install and use it at no cost.

Which platforms does hwp-reader support?

hwp-reader is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created hwp-reader?

It is built and maintained by mupengi-bot (@mupengi-bot); the current version is v1.0.0.

💬 Comments