← Back to Skills Marketplace
panlm

Aws Fis Experiment Prepare

by panlm · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
33
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install aws-fis-experiment-prepare
Description
Use when the user wants to prepare, create, or generate an AWS FIS (Fault Injection Service) experiment configuration. Triggers on "prepare FIS experiment",...
README (SKILL.md)

AWS FIS Experiment Prepare

Generate all configuration files needed to run an AWS FIS experiment, then deploy via CloudFormation with self-healing iteration until the stack succeeds. Outputs a self-contained directory with a validated, deployed experiment template ready for execution.

Core principle: Validate resource-action compatibility before generating files. Never deliver untested configuration — deploy and self-heal first.

References

Always load for every experiment:

  • references/output-format.md — directory layout, slug naming, README template
  • references/cfn-base-template.md — CFN skeleton (Parameters, IAM Role, Dashboard, FIS Template, Outputs)
  • references/slug-conventions.md — scenario/context slug abbreviations, resource naming, name length budget

Load conditionally by scenario:

  • references/az-power-interruption-guide.md — AZ Power Interruption (sub-action pruning, tagging strategy, permissions)
  • references/eks-pod-action-guide.md — any aws:eks:pod-* action (RBAC Lambda, EKS Access Entry, Pod memory stress calculation)
  • references/elasticache-redis-guide.md — ElastiCache Redis/Valkey (native AZ power interruption, primary node reboot via SSM Automation, or replication group failover via SSM Automation)
  • references/msk-guide.md — Amazon MSK (broker reboot via SSM Automation — no native FIS action exists)

Utility scripts (execute, do not read as reference):

  • scripts/precheck-cfn-permissions.sh — detects required CFN service role
  • scripts/deploy-with-retry.sh — validate + deploy + delete-on-fail
  • scripts/rename-output-dir.sh — appends FIS template ID to directory name

Script invocation: ${SKILL_DIR} refers to the absolute path of this skill's directory (where SKILL.md lives). Resolve it from the skill's filesystem location before running any scripts.

Output Language Rule

Detect the user's conversation language and use the same language for all output files (README.md, comments in JSON/YAML).

  • Chinese input → Chinese output
  • English input → English output
  • Mixed → follow the dominant language

Prerequisites

Required tools:

  • AWS CLIaws fis list-actions, resource discovery, CloudFormation
  • aws___search_documentation / aws___read_documentation — FIS docs research
  • jq — required by scripts/deploy-with-retry.sh and scripts/precheck-cfn-permissions.sh

EKS Pod fault injection: Cluster auth mode must be API_AND_CONFIG_MAP or API. Check:

aws eks describe-cluster --name {CLUSTER} \
  --query 'cluster.accessConfig.authenticationMode'

If CONFIG_MAP only, the user must update the cluster first. MANDATORY: For any aws:eks:pod-* action, follow references/eks-pod-action-guide.md.

Workflow

Step 1: Identify Scenario and Region

Classify user intent into one of these branches:

Branch Trigger Additional Reference
Scenario Library AZ Power Interruption, AZ App Slowdown, Cross-AZ/Region scenarios Read AWS doc URL (table below)
Custom FIS action User specifies an action ID or describes a single fault
Custom FIS action (ElastiCache) ElastiCache AZ power interruption or Redis/Valkey failover references/elasticache-redis-guide.md
SSM Automation Target service has no native FIS action (MSK, ElastiCache primary reboot, ElastiCache failover) references/msk-guide.md or references/elasticache-redis-guide.md

If ambiguous, ask the user.

Scenario Library documentation URLs (JSON templates are NOT available via CLI/API — read the doc to extract):

Scenario Documentation URL
AZ Power Interruption https://docs.aws.amazon.com/en_us/fis/latest/userguide/az-availability-scenario.html
AZ Application Slowdown https://docs.aws.amazon.com/en_us/fis/latest/userguide/az-application-slowdown-scenario.html
Cross-AZ Traffic Slowdown https://docs.aws.amazon.com/en_us/fis/latest/userguide/cross-az-traffic-slowdown-scenario.html
Cross-Region Connectivity https://docs.aws.amazon.com/en_us/fis/latest/userguide/cross-region-scenario.html

Region detection order:

  1. User explicitly specifies
  2. Infer from context (ARNs, previous conversation)
  3. aws configure get region
  4. Ask the user

Store as TARGET_REGION.

Default experiment duration: PT10M (10 minutes) for all scenarios and sub-actions unless the user specifies otherwise. For AZ Power Interruption, scale ARC Zonal Autoshift timing proportionally (ARC starts at minute 2, runs for 8 minutes at PT10M; formula: startAfter = duration × (5/30)).

Step 2: Discover Target Resources

For Scenario Library Scenarios

CRITICAL: Scenario Library experiment templates CANNOT be generated via FIS API. You MUST call aws___read_documentation with the scenario URL (Step 1 table) to extract the JSON experiment template before generating any files. The documentation is the only authoritative source.

Target identification — prefer resourceArns over resourceTags:

  • Use resourceArns (exact ARNs) for most resource types — more precise, no pre-tagging needed
  • Exception — these types do NOT support resourceArns, use resourceTags instead:
    • aws:elasticache:replicationgroup
    • aws:ec2:autoscaling-group
  • EKS pod actions use Kubernetes namespace + pod labels (neither resourceArns nor resourceTags)

resourceArns and filters are mutually exclusive. FIS rejects targets that specify both. For AZ-scoped targeting, either use resourceArns with only the target AZ's ARNs, or use resourceTags + filters together.

If scenario is AZ Power Interruption: follow references/az-power-interruption-guide.md for sub-action pruning, tagging strategy, permissions, and one-Stack-per-AZ design.

Ask the user:

  1. Which AZ to target (for AZ-level scenarios)
  2. Which services to include (for AZ Power Interruption) — if user mentions specific services, include ONLY those + mandatory infrastructure sub-actions
  3. Target resource identifiers (cluster IDs, instance IDs, etc.)

For Custom FIS Actions

aws fis get-action --id "ACTION_ID" --region TARGET_REGION

Extract required targets and parameters. Resolve user-provided identifiers to ARNs via AWS CLI.

For Services Without Native FIS Actions (SSM Automation)

  1. Confirm no native action exists:

    aws fis list-actions \
      --query "actions[?starts_with(id, 'aws:{SERVICE}:')]" \
      --region TARGET_REGION
    
  2. If empty, follow the service-specific guide:

    • Amazon MSK → references/msk-guide.md
    • ElastiCache primary node reboot → references/elasticache-redis-guide.md (Scenario 2)
    • Other services → not yet documented. Stop and inform the user.

Special case — ElastiCache: Has a native FIS action for AZ-level impact (aws:elasticache:replicationgroup-interrupt-az-power) but no native action for single-node reboot or replication group failover. For primary node reboot, use SSM Automation per references/elasticache-redis-guide.md → Scenario 2. For replication group failover (TestFailover), use SSM Automation per references/elasticache-redis-guide.md → Scenario 3.

  1. Discover resources via the target service's CLI (aws kafka list-clusters, etc.).

Step 2.5: EKS Pod Action Setup Gate

If the experiment includes ANY aws:eks:pod-* action, complete this gate BEFORE Step 3.

Applicable actions: aws:eks:pod-cpu-stress, aws:eks:pod-delete, aws:eks:pod-io-stress, aws:eks:pod-memory-stress, aws:eks:pod-network-blackhole-port, aws:eks:pod-network-latency, aws:eks:pod-network-packet-loss.

  1. Read the official documentation:

    aws___read_documentation:
      url: https://docs.aws.amazon.com/fis/latest/userguide/eks-pod-actions.html
    
  2. Follow ALL requirements in references/eks-pod-action-guide.md:

    • Lambda-backed CFN Custom Resource for K8s RBAC (fixed names: fis-sa, fis-experiment-role, fis-experiment-role-binding)
    • EKS Access Entry for FIS Experiment Role (Username: fis-experiment)
    • Cluster auth mode check (API_AND_CONFIG_MAP or API)
    • Pod readOnlyRootFilesystem: false check
    • Network action limitations (no Fargate, no bridge mode)
    • Pod memory stress threshold calculation (if action is aws:eks:pod-memory-stress) — user's percent is total target, not injection value

Do NOT skip. EKS pod actions have complex setup requirements that differ significantly from other FIS actions.

Step 3: Validate Resource-Action Compatibility

CRITICAL GATE. Before generating any files, verify that the user's actual resources are compatible with the chosen FIS action(s).

3a. Inspect the Actual Resource

User Says CLI Command Key Fields
RDS database aws rds describe-db-instances --db-instance-identifier {ID} Engine, DBClusterIdentifier
RDS/Aurora cluster aws rds describe-db-clusters --db-cluster-identifier {ID} Engine, EngineMode, MultiAZ
EC2 instance aws ec2 describe-instances --instance-ids {ID} InstanceType, Placement.AvailabilityZone
EKS cluster aws eks describe-cluster --name {NAME} accessConfig.authenticationMode, version
ElastiCache aws elasticache describe-replication-groups --replication-group-id {ID} NodeGroupConfiguration, MultiAZ
ASG aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names {NAME} AvailabilityZones, Instances

3b. Cross-Check Against FIS Action Requirements

aws fis get-action --id "ACTION_ID" --region TARGET_REGION \
  --query 'action.targets' --output json

Common incompatibility traps:

FIS Action Required resourceType Incompatible With Detection
aws:rds:failover-db-cluster aws:rds:cluster Standalone RDS (non-Aurora) DBClusterIdentifier is null
aws:rds:reboot-db-instances aws:rds:db Aurora clusters Engine starts with aurora
aws:elasticache:replicationgroup-interrupt-az-power aws:elasticache:replicationgroup Standalone ElastiCache nodes No replication group
aws:ec2:stop-instances aws:ec2:instance Spot instances InstanceLifecycle = spot

3c. Decision Gate

  • Compatible → proceed to Step 4.
  • Incompatible → explain the mismatch, suggest alternatives based on the actual resource type, ask the user to confirm or abort.

Example alternatives:

  • Standalone RDS Multi-AZ → aws:rds:reboot-db-instances with --force-failover
  • Aurora cluster → aws:rds:failover-db-cluster
  • ElastiCache standalone → explain replication group is required

3d. For Scenario Library Scenarios

Validate EACH included sub-action against its target resources. Only validate sub-actions that remain after service-scoped pruning (Step 2).

Step 4: Determine Monitoring Configuration

Stop Conditions — default: source: "none" (no alarm). Only create a CloudWatch Alarm if the user explicitly provides one.

Dashboard Metrics — comprehensive, per-service. Group widgets by service, 3 widgets per service (availability, performance, errors/latency). Include only services actually affected by the experiment.

Service Metrics
EC2 StatusCheckFailed, CPUUtilization, NetworkIn/Out, NetworkPacketsIn/Out
RDS/Aurora DatabaseConnections, ReadLatency, WriteLatency, AuroraReplicaLag, FreeableMemory
EKS pod_number_of_running_pods, pod_number_of_container_restarts, node_cpu_utilization, node_memory_utilization
ElastiCache ReplicationLag, EngineCPUUtilization, CurrConnections, CacheHitRate, Evictions, IsMaster
ALB HealthyHostCount, UnHealthyHostCount, HTTPCode_ELB_5XX_Count, TargetResponseTime
NLB ActiveFlowCount, TCP_Client_Reset_Count, TCP_Target_Reset_Count

Step 5: Generate Configuration Files

Create output directory:

# ─── Fill in from user's request + references/slug-conventions.md ───
SCENARIO_SLUG="..."         # e.g., pod-delete, az-power-int, rds-failover
TARGET_RESOURCE_ID="..."    # e.g., my-aurora-cluster, i-0abc123def
CONTEXT_NAME=""             # optional (e.g., redis, msk); leave empty if N/A
# ────────────────────────────────────────────────────────────────────

# Derived values (do not edit):
TARGET_SLUG=$(echo "${TARGET_RESOURCE_ID}" | tr '[:upper:]' '[:lower:]' | tr ' :/' '-' | cut -c1-20)
CONTEXT_SLUG=$(echo "${CONTEXT_NAME}" | tr '[:upper:]' '[:lower:]' | tr ' :/' '-' | cut -c1-10)
TIMESTAMP=$(TZ=Asia/Shanghai date +%Y-%m-%d-%H-%M-%S)

if [ -n "${CONTEXT_SLUG}" ]; then
    OUTPUT_DIR="./${TIMESTAMP}-${SCENARIO_SLUG}-${TARGET_SLUG}-${CONTEXT_SLUG}"
else
    OUTPUT_DIR="./${TIMESTAMP}-${SCENARIO_SLUG}-${TARGET_SLUG}"
fi
mkdir -p "${OUTPUT_DIR}"

REQUIRED: Before generating cfn-template.yaml, read the AWS::FIS::ExperimentTemplate CloudFormation resource documentation:

aws___read_documentation:
  url: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-fis-experimenttemplate.html

ALSO REQUIRED: Search for CloudFormation examples for the resources used:

aws___search_documentation:
  search_phrase: "\x3CCFN resource types in this experiment>"
  topics: ["cloudformation"]

Generate files:

  1. cfn-template.yaml — use references/cfn-base-template.md as the skeleton. Extend with scenario-specific resources per:

    • references/az-power-interruption-guide.md (if AZ Power Interruption)
    • references/eks-pod-action-guide.md (if EKS pod actions)
    • references/msk-guide.md (if MSK)
    • references/elasticache-redis-guide.md (if ElastiCache)
  2. README.md — use the template in references/output-format.md.

Step 5.5: CFN Permission Pre-Check

Run the precheck script to detect whether a CFN service role is required:

CFN_ROLE_ARN=$("${SKILL_DIR}/scripts/precheck-cfn-permissions.sh")

If the caller lacks CloudFormation permissions, the script exits 1 with guidance — stop and inform the user. Otherwise, CFN_ROLE_ARN is either empty (no service role needed) or contains the required role ARN.

Step 6: Deploy CFN Template (Self-Healing Loop)

Generate deployment parameters:

# See references/slug-conventions.md for the ExperimentName composition rule
RANDOM_SUFFIX=$(LC_ALL=C tr -dc 'a-z0-9' \x3C /dev/urandom | head -c6)

if [ -n "${CONTEXT_SLUG}" ]; then
    EXPERIMENT_NAME="${SCENARIO_SLUG}-${TARGET_SLUG}-${CONTEXT_SLUG}-${RANDOM_SUFFIX}"
else
    EXPERIMENT_NAME="${SCENARIO_SLUG}-${TARGET_SLUG}-${RANDOM_SUFFIX}"
fi
STACK_NAME="fis-${EXPERIMENT_NAME}"

Deploy with self-healing retry loop (maximum 5 attempts driven by the agent). The deploy-with-retry.sh script performs one attempt — the agent drives the loop externally. On each attempt:

  1. Run scripts/deploy-with-retry.sh:
    "${SKILL_DIR}/scripts/deploy-with-retry.sh" \
      "${OUTPUT_DIR}/cfn-template.yaml" \
      "${STACK_NAME}" \
      "${TARGET_REGION}" \
      "${CFN_ROLE_ARN}" \
      "ExperimentName=${EXPERIMENT_NAME}" \
      "RandomSuffix=${RANDOM_SUFFIX}"
    
  2. Exit 0 → deployment succeeded, proceed to "On Successful Deployment".
  3. Exit 1 (validation failed) or 2 (deployment failed, stack deleted) → analyze stderr output, fix cfn-template.yaml, increment attempt counter, re-invoke the script.
  4. After 5 failed attempts → stop and report to the user with the last error, all fixes attempted, and the current cfn-template.yaml.

Common CFN errors and fixes:

Error Pattern Root Cause Fix
Property validation failure Invalid CFN property name/value Fix the resource property
Template format error YAML syntax issue Fix indentation/structure
Resource type not supported Resource unavailable in region Check regional availability
Circular dependency Resources reference each other Use DependsOn or restructure
RoleArn ... is invalid IAM role not yet propagated Add DependsOn for IAM role
Empty logConfiguration AZ Power Interruption doc artifact Remove the logConfiguration block

On Successful Deployment

  1. Extract stack outputs:

    aws cloudformation describe-stacks \
      --stack-name "${STACK_NAME}" \
      --query 'Stacks[0].Outputs' \
      --region "${TARGET_REGION}" --output table
    
  2. Update README.md with actual stack name, template ID, dashboard URL, and cleanup command. Replace ALL {STACK_NAME} placeholders — do NOT leave placeholders in the final output.

Step 7: Rename Output Directory with Template ID

Run the rename script:

NEW_OUTPUT_DIR=$("${SKILL_DIR}/scripts/rename-output-dir.sh" \
    "${OUTPUT_DIR}" \
    "${STACK_NAME}" \
    "${TARGET_REGION}")
OUTPUT_DIR="${NEW_OUTPUT_DIR}"

Update README.md's **Directory:** field with the full absolute path of the renamed directory. If CFN deployment failed (Step 6 exceeded max retries), skip this step.

Print a brief summary to the terminal:

  • Experiment output directory (with template ID)
  • CFN stack name and deployment status
  • Experiment template ID
  • Next step instruction

Important Guidelines

  • Scenario Library templates come from documentation. Call aws___read_documentation on the scenario's doc URL (Step 1 table) before generating any files. The documentation is the only authoritative source.
  • Never start the FIS experiment in this skill. Starting the experiment is handled by aws-fis-experiment-execute or manually by the user.
  • Validate resource-action compatibility BEFORE generating files (Step 3). The most common source of wasted effort is deploying a template that targets an incompatible resource.
  • Always deploy and validate. Do not just generate files — deploy the CFN template and iterate until it succeeds (Step 6). The user should receive a working, deployed experiment template ready to start.
  • Self-heal on CFN errors. Read stack events, diagnose, fix the template, delete the failed stack, retry. Do not ask the user to fix CFN errors.
  • Verify FIS action availability (aws fis list-actions / aws fis get-action) before generating templates. Don't fabricate action IDs.
  • Prefer resourceArns over resourceTags for targets. Exceptions: aws:elasticache:replicationgroup, aws:ec2:autoscaling-group. Never combine resourceArns with filters.
  • IAM policy must be least-privilege. Only include permissions for the specific actions in the experiment.
  • CFN template must be self-contained. Deploy the CFN template and get a working experiment without any other steps.
  • Sequential MCP calls. All aws___read_documentation and aws___search_documentation calls must be sequential, never parallel. Retry up to 10 times on rate limit errors.
  • Keep local files in sync. After successful deployment, update README.md with real ARNs and stack outputs.
Usage Guidance
Install only if you intend the agent to modify your AWS environment, not just draft files. Use a dedicated low-privilege AWS profile or sandbox account, review the generated CloudFormation and IAM policies before deployment, require confirmation before any stack changes, and plan cleanup for persistent EKS RBAC resources.
Capability Analysis
Type: OpenClaw Skill Name: aws-fis-experiment-prepare Version: 1.0.0 This skill automates the preparation and deployment of AWS Fault Injection Service (FIS) experiments, involving high-privilege operations such as creating IAM roles, modifying Kubernetes RBAC via Lambda-backed Custom Resources (references/eks-pod-action-guide.md), and executing SSM Automations to reboot resources. A key indicator is the "self-healing" deployment loop in SKILL.md (Step 6), which instructs the agent to iteratively analyze CloudFormation errors, modify templates, and redeploy until successful. While these capabilities are aligned with the stated purpose of chaos engineering, the autonomous management of security configurations and infrastructure code represents a high-risk capability. No evidence of malicious intent or data exfiltration was found.
Capability Tags
cryptorequires-walletcan-make-purchasesrequires-oauth-tokenrequires-sensitive-credentials
Capability Assessment
Purpose & Capability
The AWS FIS preparation purpose is coherent, but the artifacts expand from generating configuration into immediate AWS CloudFormation deployment, IAM role creation, EKS/RBAC setup, and SSM automation scaffolding.
Instruction Scope
The workflow directs the agent to deploy and self-heal automatically after file generation, including deleting failed stacks and retrying, rather than only producing files for user review.
Install Mechanism
There is no install spec and metadata declares no required binaries or credentials, but the skill documentation requires AWS CLI, jq, AWS documentation tools, and execution of packaged shell scripts.
Credentials
The requested environment access is high impact: it uses the user's AWS account to discover resources and create CloudFormation, IAM, CloudWatch, FIS, Lambda, SSM, and conditional EKS resources.
Persistence & Privilege
The generated cloud resources persist until cleaned up, and EKS pod-action setup explicitly leaves shared Kubernetes RBAC resources in place on stack deletion.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install aws-fis-experiment-prepare
  3. After installation, invoke the skill by name or use /aws-fis-experiment-prepare
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of aws-fis-experiment-prepare - Generate, validate, and deploy AWS FIS experiment templates (including Scenario Library scenarios and custom FIS/SSM faults). - Outputs a self-contained directory with all configuration and documentation, using the user's preferred language (English or Chinese). - Ensures resource-action compatibility before generating files; deploys via CloudFormation with self-healing retries. - Supports pre-built scenarios (AZ Power Interruption, App Slowdown, Cross-AZ/Region Connectivity) and custom single-action faults (including ElastiCache/Redis and MSK SSM Automation). - Loads and processes detailed scenario-specific references and scripts for accurate, reliable deployments.
Metadata
Slug aws-fis-experiment-prepare
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Aws Fis Experiment Prepare?

Use when the user wants to prepare, create, or generate an AWS FIS (Fault Injection Service) experiment configuration. Triggers on "prepare FIS experiment",... It is an AI Agent Skill for Claude Code / OpenClaw, with 33 downloads so far.

How do I install Aws Fis Experiment Prepare?

Run "/install aws-fis-experiment-prepare" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Aws Fis Experiment Prepare free?

Yes, Aws Fis Experiment Prepare is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Aws Fis Experiment Prepare support?

Aws Fis Experiment Prepare is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Aws Fis Experiment Prepare?

It is built and maintained by panlm (@panlm); the current version is v1.0.0.

💬 Comments