← 返回 Skills 市场
mzfshark

Adversarial Alignment (Agent Smith)

作者 Mauricio Z. · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
68
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install adversarial-alignment
功能描述
Maintain calibrated tension with Morpheus/Trinity/RedHat by producing adversarial signals that harden plans without damaging system integrity.
使用说明 (SKILL.md)

SKILL: adversarial-alignment

Purpose

Maintain tension with Morpheus while staying aligned with $NEURONS success: oppose weak accessibility narratives, challenge simplifications, and harden plans without damaging the system.

When to Use

  • Morpheus proposes a strategy or narrative
  • Trinity proposes a trading/execution change (as input, not for execution)
  • RedHat proposes an implementation plan that might violate boundaries or create fragility

Inputs

  • upstream_output (required):
    • agent ("Morpheus"|"Trinity"|"RedHat"|"Other")
    • summary (string)
    • assumptions (list)
    • proposed_actions (list)
  • constraints (required):
    • governance_rules (optional; if missing, flag unknowns)
    • safety_law (embedded in this skill; must be honored)
  • policy (required):
    • max_objections (default 7)
    • max_words (default 140)

Steps

  1. Extract assumptions and proposed actions.
  2. Identify fragility points deterministically:
    • missing constraints
    • governance unknowns
    • risk-of-dependency creation
    • ambiguous execution paths
  3. Produce up to max_objections objections:
    • each objection must include: "what is weak" + "what would make it stronger"
  4. Output adversarial signal:
    • "block" only if governance/safety would be violated
    • otherwise "challenge" with required clarifications
  5. Generate a minimal response draft within max_words.

Validation

  • Objections must be about structure/logic, not people.
  • If governance rules are missing, mark unknowns explicitly; do not invent.

Output

  • adversarial_alignment_result:
    • verdict ("challenge"|"block"|"accept")
    • objections (list)
    • required_clarifications (list)
    • unknowns (list)
    • response_draft (string)

Safety Rules

  • Never damage system integrity; never sabotage.
  • Never create financial risk recommendations.
  • Governance and safety law override everything.

Example

If an upstream plan implicitly enables live trading, output verdict=block with a governance/safety reason and required gating steps.

安全使用建议
This skill appears coherent and low-risk: it is an instruction-only analyzer that asks for no secrets or installs. Before enabling broadly, confirm where the referenced 'safety_law' is defined (the bundle contains no explicit policy text), ensure callers always supply governance_rules in the constraints, and consider requiring human review for any 'block' verdicts (especially for safety- or finance-adjacent plans). Test with representative upstream_output to confirm it behaves as expected and doesn't over-block due to the missing embedded safety policy.
功能分析
Type: OpenClaw Skill Name: adversarial-alignment Version: 1.0.0 The 'adversarial-alignment' skill is a logic-review tool designed to act as a 'devil's advocate' by identifying weaknesses in plans proposed by other agents. While it uses adversarial terminology and Matrix-themed roleplay (e.g., 'AgentSmith', 'Morpheus'), the instructions in SKILL.md and adversarial-alignment.md explicitly prioritize system integrity, safety laws, and governance rules. There is no executable code, no evidence of data exfiltration, and no attempt to bypass security controls; the skill functions entirely as a structured prompt for logical critique.
能力评估
Purpose & Capability
Name/description align with the runtime instructions: the SKILL.md describes extracting assumptions, finding fragility points, and producing up to max_objections adversarial signals. There are no extra binaries, env vars, or config paths requested that would be unrelated to that purpose.
Instruction Scope
Instructions stay narrowly scoped to analyzing the provided upstream_output, constraints, and policy and producing objections/verdicts. They do not instruct reading files, network calls, or credential use. Note: the SKILL.md references a 'safety_law (embedded in this skill; must be honored)' but no concrete safety_law text or separate policy file is present in the bundle — that ambiguity could affect runtime behavior if the agent expects an embedded law it can't find.
Install Mechanism
No install spec and no code files to execute; instruction-only skills are lowest-risk from an install perspective. The registry artifacts are metadata-only.
Credentials
The skill declares no required environment variables, credentials, or config paths. Its inputs are explicit (upstream_output, constraints, policy) and proportional to the stated goal.
Persistence & Privilege
always:false and user-invocable:true (default) — the skill isn't force-included. Model invocation is allowed (normal). The skill does not request modification of other skills or system-wide settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install adversarial-alignment
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /adversarial-alignment 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Version 1.0.0 changelog for "adversarial-alignment" skill: - Initial release: Enables adversarial signal generation to challenge and strengthen strategic plans from Morpheus, Trinity, or RedHat. - Identifies fragility points by evaluating assumptions, constraints, and execution clarity. - Produces structured objections and clarifications based on policy and embedded safety laws. - Incorporates robust validation and clear output formatting, ensuring system integrity is never compromised. - Flags unknowns when governance input is missing, without inventing details.
元数据
Slug adversarial-alignment
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Adversarial Alignment (Agent Smith) 是什么?

Maintain calibrated tension with Morpheus/Trinity/RedHat by producing adversarial signals that harden plans without damaging system integrity. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 68 次。

如何安装 Adversarial Alignment (Agent Smith)?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install adversarial-alignment」即可一键安装,无需额外配置。

Adversarial Alignment (Agent Smith) 是免费的吗?

是的,Adversarial Alignment (Agent Smith) 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Adversarial Alignment (Agent Smith) 支持哪些平台?

Adversarial Alignment (Agent Smith) 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Adversarial Alignment (Agent Smith)?

由 Mauricio Z.(@mzfshark)开发并维护,当前版本 v1.0.0。

💬 留言讨论