← 返回 Skills 市场

Adversarial Alignment (Agent Smith)

Name: Adversarial Alignment (Agent Smith)
Author: mzfshark

作者 Mauricio Z. · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ 安全检测通过

总下载

当前安装

版本数

在 OpenClaw 中安装

/install adversarial-alignment

功能描述

Maintain calibrated tension with Morpheus/Trinity/RedHat by producing adversarial signals that harden plans without damaging system integrity.

使用说明 (SKILL.md)

SKILL: adversarial-alignment

Purpose

Maintain tension with Morpheus while staying aligned with $NEURONS success: oppose weak accessibility narratives, challenge simplifications, and harden plans without damaging the system.

When to Use

Morpheus proposes a strategy or narrative
Trinity proposes a trading/execution change (as input, not for execution)
RedHat proposes an implementation plan that might violate boundaries or create fragility

Inputs

upstream_output (required):
- agent ("Morpheus"|"Trinity"|"RedHat"|"Other")
- summary (string)
- assumptions (list)
- proposed_actions (list)
constraints (required):
- governance_rules (optional; if missing, flag unknowns)
- safety_law (embedded in this skill; must be honored)
policy (required):
- max_objections (default 7)
- max_words (default 140)

Steps

Extract assumptions and proposed actions.
Identify fragility points deterministically:
- missing constraints
- governance unknowns
- risk-of-dependency creation
- ambiguous execution paths
Produce up to max_objections objections:
- each objection must include: "what is weak" + "what would make it stronger"
Output adversarial signal:
- "block" only if governance/safety would be violated
- otherwise "challenge" with required clarifications
Generate a minimal response draft within max_words.

Validation

Objections must be about structure/logic, not people.
If governance rules are missing, mark unknowns explicitly; do not invent.

Output

adversarial_alignment_result:
- verdict ("challenge"|"block"|"accept")
- objections (list)
- required_clarifications (list)
- unknowns (list)
- response_draft (string)

Safety Rules

Never damage system integrity; never sabotage.
Never create financial risk recommendations.
Governance and safety law override everything.

Example

If an upstream plan implicitly enables live trading, output verdict=block with a governance/safety reason and required gating steps.

安全使用建议

This skill appears coherent and low-risk: it is an instruction-only analyzer that asks for no secrets or installs. Before enabling broadly, confirm where the referenced 'safety_law' is defined (the bundle contains no explicit policy text), ensure callers always supply governance_rules in the constraints, and consider requiring human review for any 'block' verdicts (especially for safety- or finance-adjacent plans). Test with representative upstream_output to confirm it behaves as expected and doesn't over-block due to the missing embedded safety policy.

功能分析

Type: OpenClaw Skill Name: adversarial-alignment Version: 1.0.0 The 'adversarial-alignment' skill is a logic-review tool designed to act as a 'devil's advocate' by identifying weaknesses in plans proposed by other agents. While it uses adversarial terminology and Matrix-themed roleplay (e.g., 'AgentSmith', 'Morpheus'), the instructions in SKILL.md and adversarial-alignment.md explicitly prioritize system integrity, safety laws, and governance rules. There is no executable code, no evidence of data exfiltration, and no attempt to bypass security controls; the skill functions entirely as a structured prompt for logical critique.

能力评估

✓ Purpose & Capability

Name/description align with the runtime instructions: the SKILL.md describes extracting assumptions, finding fragility points, and producing up to max_objections adversarial signals. There are no extra binaries, env vars, or config paths requested that would be unrelated to that purpose.

ℹ Instruction Scope

Instructions stay narrowly scoped to analyzing the provided upstream_output, constraints, and policy and producing objections/verdicts. They do not instruct reading files, network calls, or credential use. Note: the SKILL.md references a 'safety_law (embedded in this skill; must be honored)' but no concrete safety_law text or separate policy file is present in the bundle — that ambiguity could affect runtime behavior if the agent expects an embedded law it can't find.

✓ Install Mechanism

No install spec and no code files to execute; instruction-only skills are lowest-risk from an install perspective. The registry artifacts are metadata-only.

✓ Credentials

The skill declares no required environment variables, credentials, or config paths. Its inputs are explicit (upstream_output, constraints, policy) and proportional to the stated goal.

✓ Persistence & Privilege

always:false and user-invocable:true (default) — the skill isn't force-included. Model invocation is allowed (normal). The skill does not request modification of other skills or system-wide settings.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install adversarial-alignment
安装完成后，直接呼叫该 Skill 的名称或使用 /adversarial-alignment 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Version 1.0.0 changelog for "adversarial-alignment" skill: - Initial release: Enables adversarial signal generation to challenge and strengthen strategic plans from Morpheus, Trinity, or RedHat. - Identifies fragility points by evaluating assumptions, constraints, and execution clarity. - Produces structured objections and clarifications based on policy and embedded safety laws. - Incorporates robust validation and clear output formatting, ensuring system integrity is never compromised. - Flags unknowns when governance input is missing, without inventing details.

元数据

Slug adversarial-alignment

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题