← Back to Skills Marketplace

Adversarial Alignment (Agent Smith)

Name: Adversarial Alignment (Agent Smith)
Author: mzfshark

by Mauricio Z. · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ✓ Security Clean

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install adversarial-alignment

Description

Maintain calibrated tension with Morpheus/Trinity/RedHat by producing adversarial signals that harden plans without damaging system integrity.

README (SKILL.md)

SKILL: adversarial-alignment

Purpose

Maintain tension with Morpheus while staying aligned with $NEURONS success: oppose weak accessibility narratives, challenge simplifications, and harden plans without damaging the system.

When to Use

Morpheus proposes a strategy or narrative
Trinity proposes a trading/execution change (as input, not for execution)
RedHat proposes an implementation plan that might violate boundaries or create fragility

Inputs

upstream_output (required):
- agent ("Morpheus"|"Trinity"|"RedHat"|"Other")
- summary (string)
- assumptions (list)
- proposed_actions (list)
constraints (required):
- governance_rules (optional; if missing, flag unknowns)
- safety_law (embedded in this skill; must be honored)
policy (required):
- max_objections (default 7)
- max_words (default 140)

Steps

Extract assumptions and proposed actions.
Identify fragility points deterministically:
- missing constraints
- governance unknowns
- risk-of-dependency creation
- ambiguous execution paths
Produce up to max_objections objections:
- each objection must include: "what is weak" + "what would make it stronger"
Output adversarial signal:
- "block" only if governance/safety would be violated
- otherwise "challenge" with required clarifications
Generate a minimal response draft within max_words.

Validation

Objections must be about structure/logic, not people.
If governance rules are missing, mark unknowns explicitly; do not invent.

Output

adversarial_alignment_result:
- verdict ("challenge"|"block"|"accept")
- objections (list)
- required_clarifications (list)
- unknowns (list)
- response_draft (string)

Safety Rules

Never damage system integrity; never sabotage.
Never create financial risk recommendations.
Governance and safety law override everything.

Example

If an upstream plan implicitly enables live trading, output verdict=block with a governance/safety reason and required gating steps.

Usage Guidance

This skill appears coherent and low-risk: it is an instruction-only analyzer that asks for no secrets or installs. Before enabling broadly, confirm where the referenced 'safety_law' is defined (the bundle contains no explicit policy text), ensure callers always supply governance_rules in the constraints, and consider requiring human review for any 'block' verdicts (especially for safety- or finance-adjacent plans). Test with representative upstream_output to confirm it behaves as expected and doesn't over-block due to the missing embedded safety policy.

Capability Analysis

Type: OpenClaw Skill Name: adversarial-alignment Version: 1.0.0 The 'adversarial-alignment' skill is a logic-review tool designed to act as a 'devil's advocate' by identifying weaknesses in plans proposed by other agents. While it uses adversarial terminology and Matrix-themed roleplay (e.g., 'AgentSmith', 'Morpheus'), the instructions in SKILL.md and adversarial-alignment.md explicitly prioritize system integrity, safety laws, and governance rules. There is no executable code, no evidence of data exfiltration, and no attempt to bypass security controls; the skill functions entirely as a structured prompt for logical critique.

Capability Assessment

✓ Purpose & Capability

Name/description align with the runtime instructions: the SKILL.md describes extracting assumptions, finding fragility points, and producing up to max_objections adversarial signals. There are no extra binaries, env vars, or config paths requested that would be unrelated to that purpose.

ℹ Instruction Scope

Instructions stay narrowly scoped to analyzing the provided upstream_output, constraints, and policy and producing objections/verdicts. They do not instruct reading files, network calls, or credential use. Note: the SKILL.md references a 'safety_law (embedded in this skill; must be honored)' but no concrete safety_law text or separate policy file is present in the bundle — that ambiguity could affect runtime behavior if the agent expects an embedded law it can't find.

✓ Install Mechanism

No install spec and no code files to execute; instruction-only skills are lowest-risk from an install perspective. The registry artifacts are metadata-only.

✓ Credentials

The skill declares no required environment variables, credentials, or config paths. Its inputs are explicit (upstream_output, constraints, policy) and proportional to the stated goal.

✓ Persistence & Privilege

always:false and user-invocable:true (default) — the skill isn't force-included. Model invocation is allowed (normal). The skill does not request modification of other skills or system-wide settings.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install adversarial-alignment
After installation, invoke the skill by name or use /adversarial-alignment
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Version 1.0.0 changelog for "adversarial-alignment" skill: - Initial release: Enables adversarial signal generation to challenge and strengthen strategic plans from Morpheus, Trinity, or RedHat. - Identifies fragility points by evaluating assumptions, constraints, and execution clarity. - Produces structured objections and clarifications based on policy and embedded safety laws. - Incorporates robust validation and clear output formatting, ensuring system integrity is never compromised. - Flags unknowns when governance input is missing, without inventing details.

Metadata

Slug adversarial-alignment

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Adversarial Alignment (Agent Smith)?

Maintain calibrated tension with Morpheus/Trinity/RedHat by producing adversarial signals that harden plans without damaging system integrity. It is an AI Agent Skill for Claude Code / OpenClaw, with 68 downloads so far.

How do I install Adversarial Alignment (Agent Smith)?

Run "/install adversarial-alignment" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Adversarial Alignment (Agent Smith) free?

Yes, Adversarial Alignment (Agent Smith) is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Adversarial Alignment (Agent Smith) support?

Adversarial Alignment (Agent Smith) is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Adversarial Alignment (Agent Smith)?

It is built and maintained by Mauricio Z. (@mzfshark); the current version is v1.0.0.

More Skills