← Back to Skills Marketplace
mzfshark

Adversarial Alignment (Agent Smith)

by Mauricio Z. · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ Security Clean
68
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install adversarial-alignment
Description
Maintain calibrated tension with Morpheus/Trinity/RedHat by producing adversarial signals that harden plans without damaging system integrity.
README (SKILL.md)

SKILL: adversarial-alignment

Purpose

Maintain tension with Morpheus while staying aligned with $NEURONS success: oppose weak accessibility narratives, challenge simplifications, and harden plans without damaging the system.

When to Use

  • Morpheus proposes a strategy or narrative
  • Trinity proposes a trading/execution change (as input, not for execution)
  • RedHat proposes an implementation plan that might violate boundaries or create fragility

Inputs

  • upstream_output (required):
    • agent ("Morpheus"|"Trinity"|"RedHat"|"Other")
    • summary (string)
    • assumptions (list)
    • proposed_actions (list)
  • constraints (required):
    • governance_rules (optional; if missing, flag unknowns)
    • safety_law (embedded in this skill; must be honored)
  • policy (required):
    • max_objections (default 7)
    • max_words (default 140)

Steps

  1. Extract assumptions and proposed actions.
  2. Identify fragility points deterministically:
    • missing constraints
    • governance unknowns
    • risk-of-dependency creation
    • ambiguous execution paths
  3. Produce up to max_objections objections:
    • each objection must include: "what is weak" + "what would make it stronger"
  4. Output adversarial signal:
    • "block" only if governance/safety would be violated
    • otherwise "challenge" with required clarifications
  5. Generate a minimal response draft within max_words.

Validation

  • Objections must be about structure/logic, not people.
  • If governance rules are missing, mark unknowns explicitly; do not invent.

Output

  • adversarial_alignment_result:
    • verdict ("challenge"|"block"|"accept")
    • objections (list)
    • required_clarifications (list)
    • unknowns (list)
    • response_draft (string)

Safety Rules

  • Never damage system integrity; never sabotage.
  • Never create financial risk recommendations.
  • Governance and safety law override everything.

Example

If an upstream plan implicitly enables live trading, output verdict=block with a governance/safety reason and required gating steps.

Usage Guidance
This skill appears coherent and low-risk: it is an instruction-only analyzer that asks for no secrets or installs. Before enabling broadly, confirm where the referenced 'safety_law' is defined (the bundle contains no explicit policy text), ensure callers always supply governance_rules in the constraints, and consider requiring human review for any 'block' verdicts (especially for safety- or finance-adjacent plans). Test with representative upstream_output to confirm it behaves as expected and doesn't over-block due to the missing embedded safety policy.
Capability Analysis
Type: OpenClaw Skill Name: adversarial-alignment Version: 1.0.0 The 'adversarial-alignment' skill is a logic-review tool designed to act as a 'devil's advocate' by identifying weaknesses in plans proposed by other agents. While it uses adversarial terminology and Matrix-themed roleplay (e.g., 'AgentSmith', 'Morpheus'), the instructions in SKILL.md and adversarial-alignment.md explicitly prioritize system integrity, safety laws, and governance rules. There is no executable code, no evidence of data exfiltration, and no attempt to bypass security controls; the skill functions entirely as a structured prompt for logical critique.
Capability Assessment
Purpose & Capability
Name/description align with the runtime instructions: the SKILL.md describes extracting assumptions, finding fragility points, and producing up to max_objections adversarial signals. There are no extra binaries, env vars, or config paths requested that would be unrelated to that purpose.
Instruction Scope
Instructions stay narrowly scoped to analyzing the provided upstream_output, constraints, and policy and producing objections/verdicts. They do not instruct reading files, network calls, or credential use. Note: the SKILL.md references a 'safety_law (embedded in this skill; must be honored)' but no concrete safety_law text or separate policy file is present in the bundle — that ambiguity could affect runtime behavior if the agent expects an embedded law it can't find.
Install Mechanism
No install spec and no code files to execute; instruction-only skills are lowest-risk from an install perspective. The registry artifacts are metadata-only.
Credentials
The skill declares no required environment variables, credentials, or config paths. Its inputs are explicit (upstream_output, constraints, policy) and proportional to the stated goal.
Persistence & Privilege
always:false and user-invocable:true (default) — the skill isn't force-included. Model invocation is allowed (normal). The skill does not request modification of other skills or system-wide settings.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install adversarial-alignment
  3. After installation, invoke the skill by name or use /adversarial-alignment
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Version 1.0.0 changelog for "adversarial-alignment" skill: - Initial release: Enables adversarial signal generation to challenge and strengthen strategic plans from Morpheus, Trinity, or RedHat. - Identifies fragility points by evaluating assumptions, constraints, and execution clarity. - Produces structured objections and clarifications based on policy and embedded safety laws. - Incorporates robust validation and clear output formatting, ensuring system integrity is never compromised. - Flags unknowns when governance input is missing, without inventing details.
Metadata
Slug adversarial-alignment
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Adversarial Alignment (Agent Smith)?

Maintain calibrated tension with Morpheus/Trinity/RedHat by producing adversarial signals that harden plans without damaging system integrity. It is an AI Agent Skill for Claude Code / OpenClaw, with 68 downloads so far.

How do I install Adversarial Alignment (Agent Smith)?

Run "/install adversarial-alignment" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Adversarial Alignment (Agent Smith) free?

Yes, Adversarial Alignment (Agent Smith) is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Adversarial Alignment (Agent Smith) support?

Adversarial Alignment (Agent Smith) is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Adversarial Alignment (Agent Smith)?

It is built and maintained by Mauricio Z. (@mzfshark); the current version is v1.0.0.

💬 Comments