← 返回 Skills 市场
xuanwuskill

Has Anonymizer

作者 Huiming Liu · GitHub ↗ · v1.0.3 · MIT-0
cross-platform ✓ 安全检测通过
1016
总下载
11
收藏
3
当前安装
4
版本数
在 OpenClaw 中安装
/install has-anonymizer
功能描述
HaS (Hide and Seek) on-device text and image anonymization. Text: 8 languages (zh/en/fr/de/es/pt/ja/ko), open-set entity types. Image: 21 privacy categories...
使用说明 (SKILL.md)

HaS Privacy

HaS exposes a single umbrella CLI:

  • has text ... for text anonymization, restoration, and scanning
  • has image ... for image scanning, masking, and category discovery

Use it when you need to remove private data locally before sending content elsewhere, inspect a directory for privacy risks, or mask visual privacy targets in photos and screenshots.

Agent Decision Guidelines

  • Prefer has text for plaintext and has image for raster images. For mixed directories, run both and combine the results into one report.
  • For PDFs, Word documents, or scanned pages, extract text first and then use has text. For screenshots/photos where the goal is simply to hide visible carriers such as faces, screens, paper, labels, or QR codes, use has image. If the goal is to reason about the text content inside an image, run OCR first and then use has text.
  • Do not overwrite or delete the original files. Text commands can restore later, image masking is irreversible.
  • Proactively mention configurable knobs when the user intent is clear: has text uses repeated --type; has image uses repeated --type, plus --method and --strength.
  • If the user intent is ambiguous, start with scan before hide.
  • After batch scans, summarize text file count, image file count, findings by type/category, high-risk items, and the suggested next step.
  • If timing matters to the user, add --timing and report the elapsed result in plain language afterward.
  • For qr_code and barcode, the default mosaic strength is automatically raised based on the detection size to ensure the encoding is destroyed. The agent does not need to manually increase --strength for these categories. If a detection output includes effective_strength, report it to the user.

Shared CLI Contract

The current CLI contract is designed for agents first:

  • Success returns compact JSON.
  • Failure also returns compact JSON with error.code and error.message.
  • Returned path fields are absolute.
    • This includes file, output, mapping_output, and skipped[].file.
  • Invalid combinations fail fast instead of silently falling back.
  • Directory mode is non-recursive. Only immediate children are processed.
  • Batch results can include skipped and skipped_count.
    • Treat skipped entries as unprocessed files, not as clean files.

Shared command layout:

{baseDir}/scripts/has.sh \x3Ctext|image> \x3Ccommand> [options]

Shared options can be placed before or after the subcommand.


Part 1: has text

has text is the plaintext namespace. It supports:

  • scan
  • hide
  • restore

It runs entirely on-device and uses a local llama-server plus the HaS text model when model inference is required.

Core Text Concepts

Semantic tags

Anonymized text uses semantic tags such as:

\x3CEntityType[ID].Category.Attribute>

This preserves structure better than a flat [REDACTED] token and is the reason restored downstream LLM output can remain usable.

Open-set types

Repeated --type flags are open-set. They are not limited to a fixed catalog. Natural language type names such as "person name", "address", "phone number", or "numeric values (transaction amounts)" are valid.

Public/private distinction

Type wording matters. For example, "personal location" is usually safer than "location" if you want to preserve public places but hide private addresses. Public/private person-name distinctions remain less stable and should not be trusted without verification.

Multilingual support

The text model supports Chinese, English, French, German, Spanish, Portuguese, Japanese, and Korean, including mixed-language text.

Type name language

Match the --type language to the source text language:

  • Chinese text → use Chinese type names: --type "人名" --type "电话号码" --type "地址"
  • Non-Chinese text (English, French, German, etc.) → use English type names: --type "person name" --type "phone number" --type "address"

Text Runtime Prerequisites

has text auto-starts a local llama-server when needed.

  • Default model path: ~/.openclaw/tools/has-anonymizer/models/has_text_model.gguf
  • Override model path: HAS_TEXT_MODEL_PATH=/abs/path/to/has_text_model.gguf
  • Override parallel cap: HAS_TEXT_MAX_PARALLEL_REQUESTS
  • If HuggingFace downloads fail, see Model Download Mirrors below.

Text Usage

{baseDir}/scripts/has.sh text [--timing] [--verbose] \x3Cscan|hide|restore> [options]

Namespace options:

Option Description
--timing Include elapsed_ms in the JSON output
--verbose Emit runtime status and progress messages to stderr

Input methods:

Method Description
--text '\x3Ctext>' Pass text directly
--file \x3Cpath> Read text from a file
--dir \x3Cpath> Process immediate plaintext files in a directory
stdin For single-text mode when no --text, --file, or --dir is provided

Rules:

  • --text, --file, and --dir are mutually exclusive.
  • Empty --type values are rejected.
  • Directory mode only accepts batch output flags.
  • Single-file hide requires --mapping-output.
  • Single-file restore requires --mapping.
  • In text directory mode, skipped can include unprocessed files (binary, encoding, or read errors).

has text scan

Finds sensitive entities without replacing them.

{baseDir}/scripts/has.sh text scan --type "person name" --type "phone number" --file report.txt
{baseDir}/scripts/has.sh text scan --type "person name" --type "phone number" --dir ./reports/

Parameters:

Parameter Required Description
--type yes Entity type to scan for; repeat to add more
--text / --file / --dir one input Input source
--max-chunk-tokens Max tokens per chunk, default 5000
--max-parallel-requests Max scan chunks in parallel, default 4

Output:

  • Single-text mode returns {"entities": ...}
  • Directory mode returns {"results":[...],"count":N,"summary":{...}}
  • Batch output may include skipped and skipped_count

has text hide

Replaces sensitive entities with semantic tags.

{baseDir}/scripts/has.sh text hide --type "person name" --type "address" --text "John lives in Brooklyn" --mapping-output ./mapping.json
{baseDir}/scripts/has.sh text hide --type "person name" --file note.txt --output ./note.anonymized.txt --mapping-output ./note.mapping.json
{baseDir}/scripts/has.sh text hide --type "person name" --dir ./docs/

Parameters:

Parameter Required Description
--type yes Entity type to anonymize; repeat to add more
--text / --file / --dir one input Input source
--mapping-output single-file: yes Output path for generated mapping JSON
--output single-file Output path for anonymized text
--mapping single-file Existing mapping JSON file for incremental anonymization
--output-dir batch Output directory for anonymized files (default: \x3Cdir>/.has/anonymized/)
--mapping-dir batch Output directory for per-file mapping JSON files (default: \x3Coutput-dir>/mappings/)
--max-chunk-tokens Max tokens per chunk, default 3000
--max-parallel-requests Max files in parallel for --dir, default 4
--no-tool-pair Disable diff-based pair extraction; always use Model-Pair (slower but more robust)

Behavior:

  • Single-file mode never emits the mapping table inline.
  • Single-file mode returns either:
    • {"text":"...","mapping_output":"/abs/path/to/map.json"}
    • {"output":"/abs/path/to/out.txt","mapping_output":"/abs/path/to/map.json"}
  • Batch mode does not accept shared --mapping.
  • Mapping files are sensitive assets. Protect them.

has text restore

Restores anonymized text using mapping JSON.

{baseDir}/scripts/has.sh text restore --mapping mapping.json --text "\x3Cperson name[1].personal.name> lives in ..."
{baseDir}/scripts/has.sh text restore --mapping mapping.json --file anonymized.txt --output restored.txt
{baseDir}/scripts/has.sh text restore --dir ./.has/anonymized/ --output-dir ./.has/restored/

Parameters:

Parameter Required Description
--mapping single-file: yes Mapping JSON file path
--text / --file / --dir one input Input source
--output single-file Output path for restored text
--mapping-dir batch Per-file mapping directory (default: \x3Cdir>/mappings/)
--output-dir batch Output directory for restored files (default: sibling restored/ under .has/, or \x3Cdir>/.has/restored/)
--max-chunk-tokens Max tokens per chunk when model restore is needed, default 3000
--max-parallel-requests Max model-backed restore chunks in parallel

Behavior:

  • Single-file mode returns inline text unless --output is provided.
  • restore --dir uses per-file mapping JSON files. It does not accept a shared --mapping.
  • restore --dir expects mapping files at \x3Cmapping-dir>/\x3Cfilename>.mapping.json (matching the naming convention produced by hide --dir).

Typical Text Workflow

Anonymize text before sending it to a cloud LLM, then restore the answer:

  1. hide to produce anonymized text plus mapping
  2. send anonymized text to the cloud model with a tag-format explanation (see below)
  3. restore the model response with the mapping

For multi-line text, prefer file-based intermediates over shell variables.

Prompting the cloud LLM with anonymized text

When forwarding anonymized text to a cloud LLM, the agent must prepend a brief explanation of the tag format so the model understands and preserves the tags. Include wording equivalent to the following (adjust language to match the conversation):

The text below has been anonymized. Sensitive entities are replaced by tags in the format \x3CEntityType[ID].Category.Attribute>:

  • EntityType — the kind of entity (matches the --type value, e.g. person name, address, phone number).
  • [ID] — a numeric identifier. The same type + same ID always refers to the same real-world entity (e.g. every \x3Cperson name[1]> is the same person; \x3Cperson name[2]> is a different person).
  • .Category.Attribute — additional semantic classification of the entity.

Rules:

  1. Preserve every tag exactly as-is in your response — do not modify, translate, paraphrase, omit, or expand any tag.
  2. When referring to an anonymized entity, reuse the original tag with the correct ID.
  3. Do not attempt to guess the real values behind the tags.

Omitting this explanation may cause the cloud model to strip, rewrite, or misinterpret the tags, which will break the restore step.

Model Download Mirrors

If HuggingFace downloads fail, use these ModelScope mirrors:

  • text model: https://modelscope.cn/models/TencentXuanwu/HaS_Text_0209_0.6B_Q8
  • image model: https://modelscope.cn/models/TencentXuanwu/HaS_Image_0209_FP32

Part 2: has image

has image is the image namespace. It supports:

  • scan
  • hide
  • categories

It loads the YOLO segmentation model directly and does not require llama-server.

Image Usage

{baseDir}/scripts/has.sh image [--timing] [--model MODEL] \x3Cscan|hide|categories> [options]

Namespace options:

Option Applies to Description
--timing all image commands Include elapsed_ms in the JSON output
--model PATH scan, hide Override the image model path

Image Privacy Categories

Common categories include biometric_face, id_card, passport, license_plate, qr_code, mobile_screen, and paper.

Use has image categories when you need the full catalog of 21 supported classes.

--type accepts:

  • English names
  • Chinese names
  • numeric IDs
  • unique partial matches such as face

Rules:

  • Empty --type values are rejected.
  • Ambiguous partial matches fail fast.
  • Omit --type to scan or mask all supported categories.
  • In image directory mode, skipped can include unprocessed files.

has image scan

Finds privacy regions without modifying the image.

{baseDir}/scripts/has.sh image scan --image photo.jpg --type face --type id_card
{baseDir}/scripts/has.sh image scan --dir ./photos/ --type face

Parameters:

Parameter Required Description
--image / --dir one input Single image or batch directory
--type Category filter; repeat to add more
--conf Confidence threshold, default 0.25
--model Override image model path

Output:

  • Single-image mode returns detections and summary
  • Directory mode returns results, count, summary, and optional skipped

has image hide

Detects and masks privacy regions in images.

{baseDir}/scripts/has.sh image hide --image photo.jpg --type face --method blur --strength 25
{baseDir}/scripts/has.sh image hide --dir ./photos/

Parameters:

Parameter Required Description
--image / --dir one input Single image or batch directory
--output single-image Output image path
--output-dir batch Output directory
--type Category filter; repeat to add more
--method mosaic, blur, or fill; default mosaic
--strength Mosaic block size or blur radius; default 15
--fill-color Fill color for fill; default #000000
--conf Confidence threshold; default 0.25
--model Override image model path

Behavior:

  • Refuses to overwrite the source image.
  • Directory mode accepts --output-dir, not --output.
  • For qr_code and barcode detections with --method mosaic, the block size is automatically raised to max(strength, bbox_short_side // 10, 20) to prevent the encoding from surviving pixelation. After masking, a lightweight verification confirms the code is no longer machine-readable; if it is, the strength is escalated further (up to a fill fallback). Each affected detection includes an effective_strength field in the output.
  • A cv2-based fallback supplements YOLO detection for QR codes and barcodes. When YOLO misses a code (e.g. large codes on plain backgrounds), cv2.QRCodeDetector and cv2.barcode.BarcodeDetector provide additional coverage. When YOLO misclassifies a code region as a different category (e.g. monitor_screen), cv2 corrects the category before --type filtering, so --type qr_code catches all QR codes regardless of YOLO's label. Corrected detections include a "corrected_from" field; new detections include "cv2_fallback": true.

has image categories

Lists all supported image privacy categories.

{baseDir}/scripts/has.sh image categories
{baseDir}/scripts/has.sh image categories --timing

Behavior:

  • Returns {"categories":[...]}
  • Supports --timing

Suggested Combined Scan

For a mixed workspace:

  1. run has text scan ... --dir \x3Cdir> for plaintext
  2. run has image scan --dir \x3Cdir> for images
  3. merge the two JSON results into one privacy report

If the user wants masking after that, use hide on the specific files or directories you already identified.

安全使用建议
This skill appears coherent and implements local text/image anonymization using on-device models. Before installing: (1) be aware the model files are large and will be downloaded to disk (HuggingFace links are used); verify you trust and want those models locally and that you have enough disk space. (2) The install entries provide brew formulas for macOS only — on Linux/Windows you'll need to install 'uv' and 'llama-server' yourself. (3) The runtime uses 'uv run' which will install Python packages (ultralytics, opencv, etc.) from PyPI; review those dependencies if you require pinning or vetting. (4) No credentials are requested and the CLI is local, but treat models and any outputs as sensitive when processing private data. If you need higher assurance, review the full source files (they are included) and the downloaded model checksums, or run the tool in an isolated environment (VM/container) before use.
功能分析
Type: OpenClaw Skill Name: has-anonymizer Version: 1.0.3 The has-anonymizer skill bundle provides a comprehensive toolset for on-device text and image anonymization using local ML models (YOLO11 and llama.cpp). The implementation follows security best practices, such as using restrictive file permissions (0600) for sensitive mapping files in `mapping.py` and `cli_utils.py`, and ensuring that data processing remains local to preserve privacy. The code is well-structured, lacks obfuscation, and its behavior is strictly aligned with the stated purpose of privacy protection without any indicators of data exfiltration or malicious intent.
能力评估
Purpose & Capability
Name/description match the included artifacts: CLI wrappers, Python implementations for text/image anonymization, model downloads for a text GGUF and an image YOLO .pt, and references to llama-server for local inference. Required binaries (uv, llama-server) are reasonable for the described on-device workflow.
Instruction Scope
SKILL.md and the CLI scripts narrowly instruct the agent to run local scan/hide/restore and image mask operations. The runtime guidance focuses on scanning and masking and explicitly warns not to overwrite originals. There are no instructions that read unrelated system secrets or forward user data to unexpected external endpoints.
Install Mechanism
The install spec downloads two large model files from HuggingFace (well-known host) and references two brew formulas (uv, llama.cpp). Downloading models from HF is expected for on-device ML. Two caveats: (1) the brew install entries are macOS-specific but the skill declares no OS restriction, so users on Linux/Windows must manually provide binaries; (2) the scripts use 'uv run' which will install Python dependencies from PyPI at runtime — this is normal but means pip packages will be fetched/installed on the machine.
Credentials
The skill does not request credentials or secrets. Optional environment variables relate to model paths and concurrency (HAS_TEXT_MODEL_PATH, HAS_IMAGE_MODEL, HAS_TEXT_MAX_PARALLEL_REQUESTS) and are appropriate for runtime configuration.
Persistence & Privilege
The skill is not always-enabled and is user-invocable. Install writes model files to disk (expected for offline models) but it does not request persistent elevated privileges or attempt to modify other skills or system-wide agent configurations.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install has-anonymizer
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /has-anonymizer 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.3
fix(image): adaptive mosaic for QR codes and barcodes. Default mosaic strength aligned with QR module size, leaving codes machine-readable. Added adaptive strength, cv2 post-masking verification with escalation fallback, cv2 supplementary scan for missed codes, and cv2 type correction for YOLO misclassifications. Refactored detect-correct-filter-mask pipeline.
v1.0.2
1. **Fix chunk budget formula** — When mapping expands, budget reduction changes from 1:1 → ~0.49:1 2. **ContextOverflowError detection** — Detect prompt overflow + output truncation, preventing silent corruption 3. **Self-healing retry** — On overflow, automatically shrink chunk (×0.75) and retry up to 2 times 4. **Tool-Pair acceleration** — Skip Model-Pair, use diff algorithm to extract mapping, saving ~13.5% time 5. **16K fallback** — Default 8K, auto-upgrade to 16K only when mapping expansion causes insufficient budget 6. **Skill doc cleanup following "don't document what CLI owns" principle** — Removed 10 CLI internal details (pair strategy, server lifecycle, mapping path conventions, skip reason enums, etc.); added LLM prompting guidelines (the agent's core responsibility outside the CLI); fixed inconsistent tag format (`person_name` → `person name`) 7. **Unified batch output directory** — All batch default outputs moved from scattered user directories (`anonymized/`, `restored/`, `masked/`) into a unified `<input-dir>/.has/` hidden directory. Added `_default_restore_output_dir()` smart detection to ensure restore output is placed alongside `anonymized/` rather than nested inside `.has/`
v1.0.1
- Added environment variable support for model file locations and parallel request settings (HAS_TEXT_MODEL_PATH, HAS_IMAGE_MODEL, HAS_TEXT_MAX_PARALLEL_REQUESTS). - Updated skill requirements metadata to document these environment variables and their usage. - No functional or documentation changes to program logic detected.
v1.0.0
Initial release of HaS Privacy (Hide and Seek) on-device anonymizer. - Provides on-device text and image anonymization for privacy protection. - Text anonymization: Supports 8 languages, open-set entity types, anonymization and restoration. - Image anonymization: Detects and masks 21 privacy categories (faces, IDs, passports, license plates, etc.). - Scenarios include anonymizing before sharing/sending to cloud LLMs, scanning for sensitive content, and preparing privacy-compliant reports. - Offers configurable options for entity/category selection, masking method, and strength. - Preserves original files and consolidates scan reports with risk assessment and time taken.
元数据
Slug has-anonymizer
版本 1.0.3
许可证 MIT-0
累计安装 3
当前安装数 3
历史版本数 4
常见问题

Has Anonymizer 是什么?

HaS (Hide and Seek) on-device text and image anonymization. Text: 8 languages (zh/en/fr/de/es/pt/ja/ko), open-set entity types. Image: 21 privacy categories... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1016 次。

如何安装 Has Anonymizer?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install has-anonymizer」即可一键安装,无需额外配置。

Has Anonymizer 是免费的吗?

是的,Has Anonymizer 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Has Anonymizer 支持哪些平台?

Has Anonymizer 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Has Anonymizer?

由 Huiming Liu(@xuanwuskill)开发并维护,当前版本 v1.0.3。

💬 留言讨论