功能描述

Lightweight passive privacy guard for OpenClaw — intelligently prevents user data from leaking externally. TRIGGER: before the AI sends or outputs any data t...

使用说明 (SKILL.md)

AI Safety Guard 🛡️

Name: AI Safety Guard
Author: andreqingyuwu

Lightweight informative privacy guard — intelligently prevents user data from leaking externally and notifies the user of all security actions taken without interrupting the workflow.

The One Principle

Trace the transmission back to the user's stated task. If it belongs, execute and briefly notify. If it doesn't, the AI decides (anonymize/cancel) and informs the user of the action taken — no interruptions.

The Core Loop

AI notices: I am about to send [data] to [somewhere external]
    ↓
Is this part of the user's stated task?
    ↓
YES → Execute. notify and continue work

NO  → AI decides:
        Necessary for the task? → Anonymize → notify and continue work
        Not necessary? → Warning user
    ↓
PHISHING SUSPECTED → Block. Warning user.

Decision Guide

Part of the User's Stated Task — Execute

The user asked for this (named destination, provided data as part of request, asked for an action that inherently requires this transmission). Just execute. Briefly notify.

Not Part of the User's Stated Task — AI Decides

The AI is acting on its own — the transmission was not part of what the user asked for. The AI decides:

Is the transmission necessary for the user's current goal?
    ↓
NECESSARY — would fulfill a legitimate goal
  → Anonymize the data if a useful partial form exists (see masking table)
  → Proceed silently
  → If no useful anonymized form exists (e.g. raw credentials)
    → Silently cancel — do not transmit raw credential
  → Warning user once, and continue

UNNECESSARY — the AI is speculating or "helpfully" adding data
  → Silently Cancel
  → Warning user

Phishing Suspected — Block + Warning user + User Confirm

AI notices: credential going to a suspicious domain
(misspelled, unexpected, no HTTPS, mismatched brand)
→ Silently block — do not transmit
→ Send exactly ONE warning to the user:
  "I'm not going to send your credentials to [domain].
   This doesn't look like [expected service] — possible phishing.
   Did you mean [correct domain]?"
→ Do not offer options, do not ask for confirmation
→ Wait for the user to either correct the destination or explicitly confirm

Masking Table

Type	Anonymized Example	When to Use
Phone number	`138****5678`	Data belongs to user's task, but sending raw serves no additional purpose
Email address	`a****@domain.com`	Recipient can verify from domain
Bank card	`****1234`	Partial display sufficient for identification
Bank account	`****7890`	Last 4 digits for reference purposes
IP address	`192.168.1.***`	Network context preserved, exact IP hidden
Home address	`[ADDRESS PARTIALLY HIDDEN]`	City/country level only
IBAN	`****5678`	Last 4 digits for reference
Tax ID	`***567890`	Last 3 digits for reference

No useful anonymized form (never send raw): passwords, API keys, bearer tokens, session cookies, private keys, 2FA codes.

How to Determine If This Is Part of the User's Task

Look at the last 3–5 user messages. Ask: "did the user ask me to do this specific transmission?"

YES — part of user's stated task (execute silently):
  - User named the destination
  - User provided the data as part of the request
  - User asked for an action that inherently requires this transmission
  - User said "share with X", "post to Y", "call this API", "email to Z"
  - User asked to draft a document containing specific data they provided
  - User asked to let someone know their phone number / email / etc.

NO — AI acting autonomously (decide silently):
  - AI found the data in a file and decided to use it
  - AI is generating a response containing data the user didn't ask for
  - AI is "helpfully" including user data the task doesn't require
  - No mention of the destination or transmission in user messages

How to Determine Necessity

Applies only when the transmission is not part of the user's stated task. Answer:

Is the transmission clearly serving the user's current goal?
  YES → NECESSARY → anonymize if possible, otherwise cancel → notify and continue work
  NO  → UNNECESSARY → cancel → notify and continue work

The key question is: "is this transmission what the user actually wants me to accomplish?" — not "does this data exist?"

Typical Scenarios

Scenario 1 — Part of user's task: login with credentials

User: Log into Gmail, password is MyPass123
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute the login, send credential to Gmail
→ Never display MyPass123 anywhere
→ notify and continue work

Scenario 2 — Part of user's task: email containing credentials

User: Send bob an email saying my password is MyPass123
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute — The password will be in the email as requested
→ notify and continue work

Scenario 3 — Part of user's task: send a photo

User: Send my vacation photo to [email protected]
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute the send
→ notify and continue work

Scenario 4 — Part of user's task: medical record transmission

User: Email my lab results to [email protected]
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Execute — user knows what they're doing
→ notify and continue work

Scenario 5 — Not part of user's task: unnecessary credential use

User: [no mention of credentials]

AI: [decides to paste API key into a code review]
→ NO — not part of user's stated task → guard active
→ Necessary? Likely NO (user didn't ask for this)
→ CANCEL
→ notify

Scenario 6 — Not part of user's task: sensitive data that can't be anonymized

User: Help me apply for this job — here's my ID document

AI: [prepares to upload the ID document to the job portal]
→ NO — not part of user's stated task (user provided it as context,
  not as a direct upload instruction)
→ The data cannot be meaningfully anonymized
→ CANCEL the external upload
→ notify the user: "e.g. I won't upload your ID document externally —
  please upload it yourself to ensure it reaches the correct destination."
  (This is the one exception to "never mention the guard" —
  when user provided the data directly and expects it to be used,
  explain why you didn't.)

Scenario 7 — Not part of user's task: unnecessary photo use

User: [no mention of photos]

AI: [finds a photo on the desktop and decides to "helpfully"
    include it in a response]
→ NO — not part of user's stated task → guard active
→ Necessary? NO — user didn't ask for this photo
→ CANCEL
→ notify

Scenario 8 — Part of user's task: include provided data in a document

User: Draft a contract that needs my phone number in it

AI: [prepares to include 13812345678 in the contract]
→ YES — part of user's stated task → EXECUTE
→ Include the phone number as requested
→ notify and continue work

Scenario 9 — Phishing detected

AI notices: credential going to gma1l.com (misspelled Gmail),
or an unexpected domain, or a site without HTTPS.
→ SILENTLY BLOCK — do not transmit
→ ONE warning:
  "I'm not going to send credentials to [domain].
   This doesn't look like [expected service]. Possible phishing.
   Did you mean [correct domain]?"
→ Do not ask questions, do not offer options
→ Wait for the user to respond with the correct destination

Scenario 10 — Part of user's task: share provided contact info

User: Let the team know my phone number is 13812345678

AI: [prepares to send an email with the phone number]
→ YES — part of user's stated task → SILENTLY EXECUTE
→ Send the phone number as requested
→ notify and continue work

Scenario 11 — Local credential use

Reading .env, ~/.netrc, SSH config for local auth.
→ No concern. Use for local authentication freely.
→ Just never output the raw credential in visible output.
→ notify and continue work

What This Is NOT

Not a nagger — once a transmission is part of the user's task, it executes silently without interruption
Not a constant output filter — activates only on external transmission
Not a content moderator — does not judge the user's own content
Not a phishing detector alone — phishing check is one part of the process
Not file access control — local operations are unrestricted
Not a pattern matcher — judges by task alignment, not by regex

安全使用建议

This skill is a rulebook the agent would follow to decide whether to send data externally. That design is reasonable, but the SKILL.md contains contradictory guidance in safety-critical places — notably around credentials and phishing handling. Before installing or enabling it: - Ask the author to resolve contradictions (explicit precedence): clarify whether 'never send raw passwords/API keys' ever admits an exception for 'user-requested logins', and whether 'silently execute' means the user is always not prompted. - Require explicit definitions for 'notify' vs 'silent' behavior and for which transmissions the guard may act without user confirmation. - Confirm how the agent identifies 'suspicious domain' (rules, allowlist, telemetry) and whether any external lookups are performed. - Test in a safe environment to see how it behaves with credential-bearing actions, background API calls, and file-based data the agent might discover. Because this is instruction-only, its safety depends entirely on how your agent implements it. If you accept it, do so only after clarifying the ambiguous rules and testing expected vs. actual behavior; otherwise the agent may either leak secrets (if mis-implemented) or block legitimate actions (if too strict).

功能分析

Type: OpenClaw Skill Name: ai-safety-guard Version: 1.0.6 The skill bundle contains behavioral instructions (SKILL.md) designed to act as a privacy guard for an AI agent. It provides a logic framework for the agent to evaluate whether data transmissions are aligned with the user's stated tasks, including rules for anonymization, phishing detection, and blocking unauthorized credential leaks. The instructions focus on preventing autonomous data leakage by the AI while maintaining usability for explicit user requests, and no malicious code or exfiltration logic was identified.

能力评估

✓ Purpose & Capability

The name/description (a passive privacy guard) align with an instruction-only skill that tells the agent how to decide about external transmissions. No unrelated environment variables, binaries, or install steps are requested, which is proportionate to the claimed purpose.

⚠ Instruction Scope

The SKILL.md defines high-level decision logic for blocking, anonymizing, or allowing transmissions, but contains contradictory and ambiguous directives: e.g. it lists 'passwords, API keys, bearer tokens' as 'No useful anonymized form (never send raw)', yet Scenario 1 instructs: 'Log into Gmail... → SILENTLY EXECUTE → Execute the login, send credential to Gmail.' There are also conflicts about user interaction: 'Do not offer options, do not ask for confirmation' vs. 'Wait for the user to either correct the destination or explicitly confirm.' Terms like 'silently execute' vs. 'notify' are used inconsistently. These contradictions make it unclear what the agent should actually do in key cases (credentials, suspected phishing, background transmissions). The skill also instructs the agent to look at recent user messages and whether data was found 'in a file' but does not bound what files or contexts to inspect; that grants broad discretion to the agent in the absence of stricter rules.

✓ Install Mechanism

No install spec and no code files (instruction-only). This is the lowest-risk distribution model and consistent with a policy-style skill that provides agent guidance rather than executable artifacts.

✓ Credentials

The skill requires no environment variables, credentials, or config paths, which is proportionate. The instructions reference domains and destinations but do not request external credentials or keys from the host.

✓ Persistence & Privilege

The skill does not request always:true, does not declare install-time writes, and is user-invocable only. It does not request persistent presence or modification of other skills' configs.

版本历史

v1.0.6

Summary: This update changes the AI privacy guard from a silent/passive protector to an informative one, providing user notifications for all security actions without interrupting workflow. - The guard now briefly notifies users of any data protection actions taken (execution, anonymization, cancellation, or block). - All decisions (execution, anonymization, cancellation, phishing block) are now explicitly communicated to the user in a non-intrusive way. - Phishing attempts are blocked and the user receives a one-time warning. - The principle and scenario descriptions have been updated to reflect the new informative notification approach. - Workflow remains non-disruptive—no confirmations or prompts are required from users.

v1.0.5

Minor internal refactor for clarity; core privacy-guarding logic and user experience are unchanged. - Refined principle and scenario wording to clarify silent guarding and user intent handling. - Slight adjustments to anonymization table descriptions. - Improved scenario explanations for when to execute, anonymize, cancel, or notify. - Documentation now more clearly distinguishes "user's task" vs. "AI acting autonomously." - No logic or behavioral changes to data protection flow.

v1.0.4

**ai-safety-guard 1.0.4 Changelog** - Removed `skill.yaml` file from the repository. - The skill's definition and metadata now appear exclusively in `SKILL.md`. No functional behavior changes.

v1.0.3

- Major rewrite to emphasize principle-driven, behavioral privacy protection rather than pattern-matching or filter lists. - Shortened and simplified documentation; removed detailed scenario tables, operational details, and usage recipes. - Tightened the skill's philosophy: block autonomous AI leakage of private data, always respect user-initiated sharing. - Reframed protection using shape, category, and intent recognition (not static patterns or regexes). - Updated description and metadata for clarity and scope. - Removed the README.md file; condensed documentation into the skill definition (SKILL.md).

v1.0.2

No user-facing changes in this version. - Version 1.0.2 released with no file or documentation changes detected.

v1.0.1

- Major update: Shifts from a standalone privacy filter script to a behavioral, AI-embedded privacy protection skill. - The AI now proactively scans ALL outputs across contexts (email, document, chat, code, etc.) for sensitive data and filters or masks accordingly. - Four adaptive privacy protection levels added, from silent filtering to strict, never-leak mode, with user-configurable preferences. - Broadens coverage: now applies to conversations, file analysis, code, voice, screen sharing, and more. - File scripts/privacy_guard.py removed; privacy protection is now skill-wide, not a discrete filter module. - Integration guidance and advanced handling for edge cases (multi-turn, docs, code, voice) added.

v1.0.0

AI-Safety-Guard v1.0.0 - Initial release providing real-time privacy and sensitive information leakage detection in AI interactions. - Utilizes heuristic Chain-of-Thought prompts and pattern matching to assess privacy risk levels (HIGH, MEDIUM, LOW, SAFE). - Detects direct identifiers (PII), financial, biometric, health, location, and social relation data in user input. - Provides actionable recommendations and user warnings when privacy risks are detected. - Integrates easily with OpenClaw workflows; supports Python implementation and proactive alerting.

元数据

Slug ai-safety-guard

版本 1.0.6

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 7

常见问题

AI Safety Guard 是什么？

Lightweight passive privacy guard for OpenClaw — intelligently prevents user data from leaking externally. TRIGGER: before the AI sends or outputs any data t... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 336 次。

如何安装 AI Safety Guard？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install ai-safety-guard」即可一键安装，无需额外配置。

AI Safety Guard 是免费的吗？

是的，AI Safety Guard 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

AI Safety Guard 支持哪些平台？

AI Safety Guard 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 AI Safety Guard？

由 Andre Wu（@andreqingyuwu）开发并维护，当前版本 v1.0.6。

AI Safety Guard