功能描述

HaS (Hide and Seek) on-device text and image anonymization. Text: 8 languages (zh/en/fr/de/es/pt/ja/ko), open-set entity types. Image: 21 privacy categories...

使用说明 (SKILL.md)

HaS Privacy

Name: Has Anonymizer
Author: xuanwuskill

HaS exposes a single umbrella CLI:

has text ... for text anonymization, restoration, and scanning
has image ... for image scanning, masking, and category discovery

Use it when you need to remove private data locally before sending content elsewhere, inspect a directory for privacy risks, or mask visual privacy targets in photos and screenshots.

Agent Decision Guidelines

Prefer has text for plaintext and has image for raster images. For mixed directories, run both and combine the results into one report.
For PDFs, Word documents, or scanned pages, extract text first and then use has text. For screenshots/photos where the goal is simply to hide visible carriers such as faces, screens, paper, labels, or QR codes, use has image. If the goal is to reason about the text content inside an image, run OCR first and then use has text.
Do not overwrite or delete the original files. Text commands can restore later, image masking is irreversible.
Proactively mention configurable knobs when the user intent is clear: has text uses repeated --type; has image uses repeated --type, plus --method and --strength.
If the user intent is ambiguous, start with scan before hide.
After batch scans, summarize text file count, image file count, findings by type/category, high-risk items, and the suggested next step.
If timing matters to the user, add --timing and report the elapsed result in plain language afterward.
For qr_code and barcode, the default mosaic strength is automatically raised based on the detection size to ensure the encoding is destroyed. The agent does not need to manually increase --strength for these categories. If a detection output includes effective_strength, report it to the user.

Shared CLI Contract

The current CLI contract is designed for agents first:

Success returns compact JSON.
Failure also returns compact JSON with error.code and error.message.
Returned path fields are absolute.
- This includes file, output, mapping_output, and skipped[].file.
Invalid combinations fail fast instead of silently falling back.
Directory mode is non-recursive. Only immediate children are processed.
Batch results can include skipped and skipped_count.
- Treat skipped entries as unprocessed files, not as clean files.

Shared command layout:

{baseDir}/scripts/has.sh \x3Ctext|image> \x3Ccommand> [options]

Shared options can be placed before or after the subcommand.

Part 1: `has text`

has text is the plaintext namespace. It supports:

scan
hide
restore

It runs entirely on-device and uses a local llama-server plus the HaS text model when model inference is required.

Core Text Concepts

Semantic tags

Anonymized text uses semantic tags such as:

\x3CEntityType[ID].Category.Attribute>

This preserves structure better than a flat [REDACTED] token and is the reason restored downstream LLM output can remain usable.

Open-set types

Repeated --type flags are open-set. They are not limited to a fixed catalog. Natural language type names such as "person name", "address", "phone number", or "numeric values (transaction amounts)" are valid.

Public/private distinction

Type wording matters. For example, "personal location" is usually safer than "location" if you want to preserve public places but hide private addresses. Public/private person-name distinctions remain less stable and should not be trusted without verification.

Multilingual support

The text model supports Chinese, English, French, German, Spanish, Portuguese, Japanese, and Korean, including mixed-language text.

Type name language

Match the --type language to the source text language:

Chinese text → use Chinese type names: --type "人名" --type "电话号码" --type "地址"
Non-Chinese text (English, French, German, etc.) → use English type names: --type "person name" --type "phone number" --type "address"

Text Runtime Prerequisites

has text auto-starts a local llama-server when needed.

Default model path: ~/.openclaw/tools/has-anonymizer/models/has_text_model.gguf
Override model path: HAS_TEXT_MODEL_PATH=/abs/path/to/has_text_model.gguf
Override parallel cap: HAS_TEXT_MAX_PARALLEL_REQUESTS
If HuggingFace downloads fail, see Model Download Mirrors below.

Text Usage

{baseDir}/scripts/has.sh text [--timing] [--verbose] \x3Cscan|hide|restore> [options]

Namespace options:

Option	Description
`--timing`	Include `elapsed_ms` in the JSON output
`--verbose`	Emit runtime status and progress messages to stderr

Input methods:

Method	Description
`--text '\x3Ctext>'`	Pass text directly
`--file \x3Cpath>`	Read text from a file
`--dir \x3Cpath>`	Process immediate plaintext files in a directory
stdin	For single-text mode when no `--text`, `--file`, or `--dir` is provided

Rules:

--text, --file, and --dir are mutually exclusive.
Empty --type values are rejected.
Directory mode only accepts batch output flags.
Single-file hide requires --mapping-output.
Single-file restore requires --mapping.
In text directory mode, skipped can include unprocessed files (binary, encoding, or read errors).

`has text scan`

Finds sensitive entities without replacing them.

{baseDir}/scripts/has.sh text scan --type "person name" --type "phone number" --file report.txt
{baseDir}/scripts/has.sh text scan --type "person name" --type "phone number" --dir ./reports/

Parameters:

Parameter	Required	Description
`--type`	yes	Entity type to scan for; repeat to add more
`--text` / `--file` / `--dir`	one input	Input source
`--max-chunk-tokens`		Max tokens per chunk, default `5000`
`--max-parallel-requests`		Max scan chunks in parallel, default `4`

Output:

Single-text mode returns {"entities": ...}
Directory mode returns {"results":[...],"count":N,"summary":{...}}
Batch output may include skipped and skipped_count

`has text hide`

Replaces sensitive entities with semantic tags.

{baseDir}/scripts/has.sh text hide --type "person name" --type "address" --text "John lives in Brooklyn" --mapping-output ./mapping.json
{baseDir}/scripts/has.sh text hide --type "person name" --file note.txt --output ./note.anonymized.txt --mapping-output ./note.mapping.json
{baseDir}/scripts/has.sh text hide --type "person name" --dir ./docs/

Parameters:

Parameter	Required	Description
`--type`	yes	Entity type to anonymize; repeat to add more
`--text` / `--file` / `--dir`	one input	Input source
`--mapping-output`	single-file: yes	Output path for generated mapping JSON
`--output`	single-file	Output path for anonymized text
`--mapping`	single-file	Existing mapping JSON file for incremental anonymization
`--output-dir`	batch	Output directory for anonymized files (default: `\x3Cdir>/.has/anonymized/`)
`--mapping-dir`	batch	Output directory for per-file mapping JSON files (default: `\x3Coutput-dir>/mappings/`)
`--max-chunk-tokens`		Max tokens per chunk, default `3000`
`--max-parallel-requests`		Max files in parallel for `--dir`, default `4`
`--no-tool-pair`		Disable diff-based pair extraction; always use Model-Pair (slower but more robust)

Behavior:

Single-file mode never emits the mapping table inline.
Single-file mode returns either:
- {"text":"...","mapping_output":"/abs/path/to/map.json"}
- {"output":"/abs/path/to/out.txt","mapping_output":"/abs/path/to/map.json"}
Batch mode does not accept shared --mapping.
Mapping files are sensitive assets. Protect them.

`has text restore`

Restores anonymized text using mapping JSON.

{baseDir}/scripts/has.sh text restore --mapping mapping.json --text "\x3Cperson name[1].personal.name> lives in ..."
{baseDir}/scripts/has.sh text restore --mapping mapping.json --file anonymized.txt --output restored.txt
{baseDir}/scripts/has.sh text restore --dir ./.has/anonymized/ --output-dir ./.has/restored/

Parameters:

Parameter	Required	Description
`--mapping`	single-file: yes	Mapping JSON file path
`--text` / `--file` / `--dir`	one input	Input source
`--output`	single-file	Output path for restored text
`--mapping-dir`	batch	Per-file mapping directory (default: `\x3Cdir>/mappings/`)
`--output-dir`	batch	Output directory for restored files (default: sibling `restored/` under `.has/`, or `\x3Cdir>/.has/restored/`)
`--max-chunk-tokens`		Max tokens per chunk when model restore is needed, default `3000`
`--max-parallel-requests`		Max model-backed restore chunks in parallel

Behavior:

Single-file mode returns inline text unless --output is provided.
restore --dir uses per-file mapping JSON files. It does not accept a shared --mapping.
restore --dir expects mapping files at \x3Cmapping-dir>/\x3Cfilename>.mapping.json (matching the naming convention produced by hide --dir).

Typical Text Workflow

Anonymize text before sending it to a cloud LLM, then restore the answer:

hide to produce anonymized text plus mapping
send anonymized text to the cloud model with a tag-format explanation (see below)
restore the model response with the mapping

For multi-line text, prefer file-based intermediates over shell variables.

Prompting the cloud LLM with anonymized text

When forwarding anonymized text to a cloud LLM, the agent must prepend a brief explanation of the tag format so the model understands and preserves the tags. Include wording equivalent to the following (adjust language to match the conversation):

The text below has been anonymized. Sensitive entities are replaced by tags in the format \x3CEntityType[ID].Category.Attribute>:

EntityType — the kind of entity (matches the --type value, e.g. person name, address, phone number).

[ID] — a numeric identifier. The same type + same ID always refers to the same real-world entity (e.g. every \x3Cperson name[1]> is the same person; \x3Cperson name[2]> is a different person).

.Category.Attribute — additional semantic classification of the entity.

Rules:

Preserve every tag exactly as-is in your response — do not modify, translate, paraphrase, omit, or expand any tag.

When referring to an anonymized entity, reuse the original tag with the correct ID.

Do not attempt to guess the real values behind the tags.

Omitting this explanation may cause the cloud model to strip, rewrite, or misinterpret the tags, which will break the restore step.

Model Download Mirrors

If HuggingFace downloads fail, use these ModelScope mirrors:

text model: https://modelscope.cn/models/TencentXuanwu/HaS_Text_0209_0.6B_Q8
image model: https://modelscope.cn/models/TencentXuanwu/HaS_Image_0209_FP32

Part 2: `has image`

has image is the image namespace. It supports:

scan
hide
categories

It loads the YOLO segmentation model directly and does not require llama-server.

Image Usage

{baseDir}/scripts/has.sh image [--timing] [--model MODEL] \x3Cscan|hide|categories> [options]

Namespace options:

Option	Applies to	Description
`--timing`	all image commands	Include `elapsed_ms` in the JSON output
`--model PATH`	`scan`, `hide`	Override the image model path

Image Privacy Categories

Common categories include biometric_face, id_card, passport, license_plate, qr_code, mobile_screen, and paper.

Use has image categories when you need the full catalog of 21 supported classes.

--type accepts:

English names
Chinese names
numeric IDs
unique partial matches such as face

Rules:

Empty --type values are rejected.
Ambiguous partial matches fail fast.
Omit --type to scan or mask all supported categories.
In image directory mode, skipped can include unprocessed files.

`has image scan`

Finds privacy regions without modifying the image.

{baseDir}/scripts/has.sh image scan --image photo.jpg --type face --type id_card
{baseDir}/scripts/has.sh image scan --dir ./photos/ --type face

Parameters:

Parameter	Required	Description
`--image` / `--dir`	one input	Single image or batch directory
`--type`		Category filter; repeat to add more
`--conf`		Confidence threshold, default `0.25`
`--model`		Override image model path

Output:

Single-image mode returns detections and summary
Directory mode returns results, count, summary, and optional skipped

`has image hide`

Detects and masks privacy regions in images.

{baseDir}/scripts/has.sh image hide --image photo.jpg --type face --method blur --strength 25
{baseDir}/scripts/has.sh image hide --dir ./photos/

Parameters:

Parameter	Required	Description
`--image` / `--dir`	one input	Single image or batch directory
`--output`	single-image	Output image path
`--output-dir`	batch	Output directory
`--type`		Category filter; repeat to add more
`--method`		`mosaic`, `blur`, or `fill`; default `mosaic`
`--strength`		Mosaic block size or blur radius; default `15`
`--fill-color`		Fill color for `fill`; default `#000000`
`--conf`		Confidence threshold; default `0.25`
`--model`		Override image model path

Behavior:

Refuses to overwrite the source image.
Directory mode accepts --output-dir, not --output.
For qr_code and barcode detections with --method mosaic, the block size is automatically raised to max(strength, bbox_short_side // 10, 20) to prevent the encoding from surviving pixelation. After masking, a lightweight verification confirms the code is no longer machine-readable; if it is, the strength is escalated further (up to a fill fallback). Each affected detection includes an effective_strength field in the output.
A cv2-based fallback supplements YOLO detection for QR codes and barcodes. When YOLO misses a code (e.g. large codes on plain backgrounds), cv2.QRCodeDetector and cv2.barcode.BarcodeDetector provide additional coverage. When YOLO misclassifies a code region as a different category (e.g. monitor_screen), cv2 corrects the category before --type filtering, so --type qr_code catches all QR codes regardless of YOLO's label. Corrected detections include a "corrected_from" field; new detections include "cv2_fallback": true.

`has image categories`

Lists all supported image privacy categories.

{baseDir}/scripts/has.sh image categories
{baseDir}/scripts/has.sh image categories --timing

Behavior:

Returns {"categories":[...]}
Supports --timing

Suggested Combined Scan

For a mixed workspace:

run has text scan ... --dir \x3Cdir> for plaintext
run has image scan --dir \x3Cdir> for images
merge the two JSON results into one privacy report

If the user wants masking after that, use hide on the specific files or directories you already identified.

安全使用建议

This skill appears coherent and implements local text/image anonymization using on-device models. Before installing: (1) be aware the model files are large and will be downloaded to disk (HuggingFace links are used); verify you trust and want those models locally and that you have enough disk space. (2) The install entries provide brew formulas for macOS only — on Linux/Windows you'll need to install 'uv' and 'llama-server' yourself. (3) The runtime uses 'uv run' which will install Python packages (ultralytics, opencv, etc.) from PyPI; review those dependencies if you require pinning or vetting. (4) No credentials are requested and the CLI is local, but treat models and any outputs as sensitive when processing private data. If you need higher assurance, review the full source files (they are included) and the downloaded model checksums, or run the tool in an isolated environment (VM/container) before use.

功能分析

Type: OpenClaw Skill Name: has-anonymizer Version: 1.0.3 The has-anonymizer skill bundle provides a comprehensive toolset for on-device text and image anonymization using local ML models (YOLO11 and llama.cpp). The implementation follows security best practices, such as using restrictive file permissions (0600) for sensitive mapping files in `mapping.py` and `cli_utils.py`, and ensuring that data processing remains local to preserve privacy. The code is well-structured, lacks obfuscation, and its behavior is strictly aligned with the stated purpose of privacy protection without any indicators of data exfiltration or malicious intent.

能力评估

✓ Purpose & Capability

Name/description match the included artifacts: CLI wrappers, Python implementations for text/image anonymization, model downloads for a text GGUF and an image YOLO .pt, and references to llama-server for local inference. Required binaries (uv, llama-server) are reasonable for the described on-device workflow.

✓ Instruction Scope

SKILL.md and the CLI scripts narrowly instruct the agent to run local scan/hide/restore and image mask operations. The runtime guidance focuses on scanning and masking and explicitly warns not to overwrite originals. There are no instructions that read unrelated system secrets or forward user data to unexpected external endpoints.

ℹ Install Mechanism

The install spec downloads two large model files from HuggingFace (well-known host) and references two brew formulas (uv, llama.cpp). Downloading models from HF is expected for on-device ML. Two caveats: (1) the brew install entries are macOS-specific but the skill declares no OS restriction, so users on Linux/Windows must manually provide binaries; (2) the scripts use 'uv run' which will install Python dependencies from PyPI at runtime — this is normal but means pip packages will be fetched/installed on the machine.

✓ Credentials

The skill does not request credentials or secrets. Optional environment variables relate to model paths and concurrency (HAS_TEXT_MODEL_PATH, HAS_IMAGE_MODEL, HAS_TEXT_MAX_PARALLEL_REQUESTS) and are appropriate for runtime configuration.

✓ Persistence & Privilege

The skill is not always-enabled and is user-invocable. Install writes model files to disk (expected for offline models) but it does not request persistent elevated privileges or attempt to modify other skills or system-wide agent configurations.

版本历史

v1.0.3

fix(image): adaptive mosaic for QR codes and barcodes. Default mosaic strength aligned with QR module size, leaving codes machine-readable. Added adaptive strength, cv2 post-masking verification with escalation fallback, cv2 supplementary scan for missed codes, and cv2 type correction for YOLO misclassifications. Refactored detect-correct-filter-mask pipeline.

v1.0.2

1. **Fix chunk budget formula** — When mapping expands, budget reduction changes from 1:1 → ~0.49:1 2. **ContextOverflowError detection** — Detect prompt overflow + output truncation, preventing silent corruption 3. **Self-healing retry** — On overflow, automatically shrink chunk (×0.75) and retry up to 2 times 4. **Tool-Pair acceleration** — Skip Model-Pair, use diff algorithm to extract mapping, saving ~13.5% time 5. **16K fallback** — Default 8K, auto-upgrade to 16K only when mapping expansion causes insufficient budget 6. **Skill doc cleanup following "don't document what CLI owns" principle** — Removed 10 CLI internal details (pair strategy, server lifecycle, mapping path conventions, skip reason enums, etc.); added LLM prompting guidelines (the agent's core responsibility outside the CLI); fixed inconsistent tag format (`person_name` → `person name`) 7. **Unified batch output directory** — All batch default outputs moved from scattered user directories (`anonymized/`, `restored/`, `masked/`) into a unified `<input-dir>/.has/` hidden directory. Added `_default_restore_output_dir()` smart detection to ensure restore output is placed alongside `anonymized/` rather than nested inside `.has/`

v1.0.1

- Added environment variable support for model file locations and parallel request settings (HAS_TEXT_MODEL_PATH, HAS_IMAGE_MODEL, HAS_TEXT_MAX_PARALLEL_REQUESTS). - Updated skill requirements metadata to document these environment variables and their usage. - No functional or documentation changes to program logic detected.

v1.0.0

Initial release of HaS Privacy (Hide and Seek) on-device anonymizer. - Provides on-device text and image anonymization for privacy protection. - Text anonymization: Supports 8 languages, open-set entity types, anonymization and restoration. - Image anonymization: Detects and masks 21 privacy categories (faces, IDs, passports, license plates, etc.). - Scenarios include anonymizing before sharing/sending to cloud LLMs, scanning for sensitive content, and preparing privacy-compliant reports. - Offers configurable options for entity/category selection, masking method, and strength. - Preserves original files and consolidates scan reports with risk assessment and time taken.

元数据

Slug has-anonymizer

版本 1.0.3

许可证 MIT-0

累计安装 3

当前安装数 3

历史版本数 4

常见问题

Has Anonymizer 是什么？

HaS (Hide and Seek) on-device text and image anonymization. Text: 8 languages (zh/en/fr/de/es/pt/ja/ko), open-set entity types. Image: 21 privacy categories... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1016 次。

如何安装 Has Anonymizer？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install has-anonymizer」即可一键安装，无需额外配置。

Has Anonymizer 是免费的吗？

是的，Has Anonymizer 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Has Anonymizer 支持哪些平台？

Has Anonymizer 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Has Anonymizer？

由 Huiming Liu（@xuanwuskill）开发并维护，当前版本 v1.0.3。

Has Anonymizer

HaS Privacy

Agent Decision Guidelines

Shared CLI Contract

Part 1: has text

Core Text Concepts

Semantic tags

Open-set types

Public/private distinction

Multilingual support

Type name language

Text Runtime Prerequisites

Text Usage

has text scan

has text hide

has text restore

Typical Text Workflow

Prompting the cloud LLM with anonymized text

Model Download Mirrors

Part 2: has image

Image Usage

Image Privacy Categories

has image scan

has image hide

has image categories

Suggested Combined Scan

Has Anonymizer 是什么？

如何安装 Has Anonymizer？

Has Anonymizer 是免费的吗？

Has Anonymizer 支持哪些平台？

谁开发了 Has Anonymizer？

💬 留言讨论

Part 1: `has text`

`has text scan`

`has text hide`

`has text restore`

Part 2: `has image`

`has image scan`

`has image hide`

`has image categories`