← Back to Skills Marketplace
harrylabsj

PDF Sanitizer

by haidong · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
40
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install pdf-sanitizer
Description
Detect and redact sensitive information in PDFs — ID numbers, phone numbers, addresses, bank cards.
README (SKILL.md)

PDF Sanitizer

Detect and redact sensitive information in PDF documents while preserving original layout.

Workflow

  1. Ingest PDF — extract text layer and metadata via pdfplumber/PyMuPDF.
  2. Scan for PII — run regex + AI pattern matching against Chinese and international PII:
    • Chinese ID number (18-digit)
    • Chinese phone numbers
    • Bank card numbers
    • Email addresses
    • Residential addresses (Chinese)
    • Person names (context-based)
  3. Highlight — annotate every match with bounding boxes and category labels.
  4. Confirm — present categories to user for selection. Default: all categories enabled.
  5. Redact — apply chosen mode per category:
    • blackout — solid black rectangle over sensitive text
    • blur — pixel-level Gaussian blur on image-rendered area
    • placeholder — replace with [REDACTED] while keeping surrounding text
  6. Rebuild PDF — flatten redactions into final output, preserving original fonts, images, and layout.
  7. Report — output redacted PDF + JSON report listing each redaction:
    • original snippet (truncated), category, page number, bounding box, mode applied.

Sample Prompt

pdf-sanitizer redact --input contract.pdf --categories id_card,phone,address --mode blackout
pdf-sanitizer redact --input 社保材料.pdf --output clean.pdf --categories all --mode placeholder
pdf-sanitizer scan --input report.pdf
pdf-sanitizer review --input contract.pdf --page 3-7
Usage Guidance
Before installing, confirm that your workflow is acceptable with JSON reports that may include truncated PII and exact page locations. Store reports with the same protections as the original PDFs, and prefer masked or category-only reporting for highly sensitive documents.
Capability Assessment
Purpose & Capability
The artifacts coherently describe detecting and redacting PII in PDFs; the included Python script performs local regex detection on stdin and emits JSON matches.
Instruction Scope
The workflow discloses scanning, user confirmation, redaction modes, and report generation, but the script includes passport and IPv4 detection that are not explicitly listed in the short description or workflow category list.
Install Mechanism
The artifact contains a SKILL.md and one small Python helper script, with no installer, package fetch, network behavior, auto-start hook, or hidden setup mechanism.
Credentials
Reading PDF text and detecting PII is proportional to the stated sanitizer purpose; no evidence shows credential use, network transmission, unrelated file access, or broad local indexing.
Persistence & Privilege
The documented JSON report can retain truncated original snippets and precise bounding boxes, which is privacy-sensitive, but it is disclosed and there is no evidence of background persistence or privilege escalation.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install pdf-sanitizer
  3. After installation, invoke the skill by name or use /pdf-sanitizer
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release: Detect and redact Chinese PII in PDFs
Metadata
Slug pdf-sanitizer
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is PDF Sanitizer?

Detect and redact sensitive information in PDFs — ID numbers, phone numbers, addresses, bank cards. It is an AI Agent Skill for Claude Code / OpenClaw, with 40 downloads so far.

How do I install PDF Sanitizer?

Run "/install pdf-sanitizer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PDF Sanitizer free?

Yes, PDF Sanitizer is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does PDF Sanitizer support?

PDF Sanitizer is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PDF Sanitizer?

It is built and maintained by haidong (@harrylabsj); the current version is v1.0.0.

💬 Comments