← 返回 Skills 市场
clawkk

Guard

作者 clawkk · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
251
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install guard
功能描述
Deep AI safety guardrails workflow—policy definition, input/output filtering, monitoring, escalation, and false-positive handling. Use when reducing harmful...
使用说明 (SKILL.md)

AI Guardrails (Deep Workflow)

Guardrails turn product and legal policy into enforced behavior: blocking, rewriting, logging, and human review—with attention to false positives and latency.

When to Offer This Workflow

Trigger conditions:

  • Launching consumer-facing LLM features
  • Jailbreak attempts, policy violations, or PII leakage risks
  • Region-specific compliance (minors, regulated advice)

Initial offer:

Use six stages: (1) policy scope, (2) threat model, (3) controls stack, (4) implementation patterns, (5) monitoring & review, (6) iteration & appeals). Confirm latency budget and jurisdictions.


Stage 1: Policy Scope

Goal: Define prohibited categories (hate, sexual content, violence, self-harm, malware instructions, etc.) and required disclaimers for sensitive domains (medical, legal).

Exit condition: Policy document owned by legal/product; escalation path for gray areas.


Stage 2: Threat Model

Goal: Identify adversaries (prompt injection, data exfiltration, tool abuse) and assets (user data, system prompts, connectors).


Stage 3: Controls Stack

Goal: Layer defenses: input screening, model safety APIs, output classifiers, tool sandboxing, allowlists for tools and URLs.


Stage 4: Implementation Patterns

Goal: Structured refusal messages; telemetry on every block; distinguish block vs rewrite vs warn; avoid silent failures.


Stage 5: Monitoring & Review

Goal: Sample borderline cases for human review; dashboards on block rates by category; abuse spike alerts.


Stage 6: Iteration & Appeals

Goal: User appeals path where appropriate; version policy changes; measure false positives by locale and use case.


Final Review Checklist

  • Policy categories and owners defined
  • Threat model aligned with product
  • Layered controls with clear responsibilities
  • Telemetry and review for edge cases
  • Appeals and iteration process where applicable

Tips for Effective Guidance

  • Defense in depth—no single classifier is sufficient.
  • Pair with moderation for UGC and tool-calling for agent safety.

Handling Deviations

  • Enterprise internal bots: emphasize data-leak prevention and connector scope over public “safety” categories alone.
安全使用建议
This skill is essentially a playbook — low-risk as shipped. Before relying on it in production, verify any concrete implementations you or the agent build from it: ensure telemetry/storage systems do not capture unnecessary PII, confirm retention and access controls for dashboards and logs, get legal/product owners to sign off on policy definitions and escalation paths, and avoid granting the agent or any implementation access to production secrets or connectors without separate review. If you plan to operationalize these recommendations (add classifiers, dashboards, or automated blockers), review the actual code, packages, and endpoints those implementations use—those are where most security and privacy risks arise.
功能分析
Type: OpenClaw Skill Name: guard Version: 1.0.0 The skill bundle consists of conceptual documentation and a high-level workflow for implementing AI safety guardrails. It contains no executable code, network requests, or malicious instructions, focusing entirely on best practices for policy definition, threat modeling, and monitoring within the SKILL.md file.
能力评估
Purpose & Capability
The name and description claim a guardrails workflow and the SKILL.md provides a high-level six-stage process for policy, threat modeling, controls, implementation, monitoring, and appeals. No unrelated credentials, binaries, or install steps are requested—this is proportionate to a documentation-style skill.
Instruction Scope
Instructions are prescriptive but high-level (policy definition, classifiers, telemetry, dashboards, human review). The document does not instruct the agent to read local files, access environment variables, call external endpoints, or exfiltrate data. Mentions of telemetry and dashboards are architectural guidance, not implementation commands.
Install Mechanism
No install spec and no code files are present. Being instruction-only means nothing is downloaded or written to disk by the skill itself—this is the lowest-risk install posture.
Credentials
The skill declares no environment variables, credentials, or config paths. That matches the SKILL.md content (which only gives process guidance). There are no disproportionate or unexplained credential requests.
Persistence & Privilege
always is false and the skill is user-invocable with normal autonomous invocation allowed by default. There is no request for permanent presence or modifications to other skills or system settings. This is appropriate for a guidance-only skill.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install guard
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /guard 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Version 1.0.0 – Initial Release - Introduces a comprehensive deep AI safety guardrails workflow for LLM-based products. - Details a six-stage process: policy scope, threat modeling, controls stack, implementation patterns, monitoring & review, and iteration & appeals. - Provides specific guidance on policy definition, input/output filtering, monitoring, escalation, and false-positive handling. - Includes review checklist and tips for best practices in deploying safety guardrails for AI features. - Addresses enterprise-specific considerations (e.g., data-leak prevention for internal bots).
元数据
Slug guard
版本 1.0.0
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

Guard 是什么?

Deep AI safety guardrails workflow—policy definition, input/output filtering, monitoring, escalation, and false-positive handling. Use when reducing harmful... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 251 次。

如何安装 Guard?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install guard」即可一键安装,无需额外配置。

Guard 是免费的吗?

是的,Guard 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Guard 支持哪些平台?

Guard 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Guard?

由 clawkk(@clawkk)开发并维护,当前版本 v1.0.0。

💬 留言讨论