功能描述

Intelligent Automation Worker — analyzes video/image streams and generates structured, real-time operating steps for physical tasks (debug, repair, assembly,...

使用说明 (SKILL.md)

iaworker — Intelligent Automation Worker

Name: iaworker
Author: yinleunglai

Analyze video/image streams, diagnose physical problems, and generate structured step-by-step operating guidance. Deliver instructions both visually (displayed markdown) and audibly (TTS spoken aloud).

Core Workflow

┌─────────────────────────────────────────────────────────────────────┐
│                        iaworker PROCESS                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  [1] RECEIVE INPUT                                                   │
│      Video file path, image path, or live camera frame              │
│           ↓                                                          │
│  [2] ANALYZE (video_analyzer.py)                                     │
│      - Extract key frames                                             │
│      - Identify objects, damage, components                           │
│      - Detect anomaly patterns (cracks, loose parts, fluid leaks)   │
│      - Classify task type (repair / assembly / inspection / debug)   │
│           ↓                                                          │
│  [3] GENERATE STEPS (step_engine.py)                                 │
│      - Build ordered, numbered action steps                           │
│      - Include tool requirements, safety warnings                   │
│      - Flag prerequisite steps (disconnect power, etc.)             │
│      - Estimate difficulty/time for each step                       │
│           ↓                                                          │
│  [4] DELIVER (speaker.py + display)                                  │
│      - Display formatted markdown step guide                         │
│      - Speak each step aloud via TTS                                  │
│      - Step-by-step progression (not all at once)                    │
│      - Wait for user confirmation before advancing (configurable)    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Quick Start

Analyze an image and get spoken steps

python scripts/video_analyzer.py \
  --input /path/to/image.jpg \
  --task repair \
  --lang en \
  --speak

Analyze a video and get per-segment steps

python scripts/video_analyzer.py \
  --input /path/to/video.mp4 \
  --task debug \
  --lang en \
  --speak \
  --step-by-step

Analyze from camera feed (live)

python scripts/video_analyzer.py \
  --input camera \
  --task inspection \
  --lang en \
  --speak \
  --live

Scripts

video_analyzer.py

Entry point. Analyzes visual input and triggers step generation.

python scripts/video_analyzer.py [options]

Options:

Flag	Description	Default
`--input PATH`	Image path, video path, or `camera` for live	Required
`--task TYPE`	`repair`, `debug`, `assembly`, `inspection`, `auto`	`auto`
`--lang CODE`	`en` or `zh`	`en`
`--speak`	Enable TTS for step output	Disabled
`--step-by-step`	Speak and display one step at a time, wait for confirmation	Sequential mode
`--live`	Live camera mode with continuous analysis	Off
`--output PATH`	Write steps to markdown file	None (console only)
`--frame-skip N`	Skip every N frames in video (speed up analysis)	10

Task auto-detection:

repair — Something is broken; find damage, suggest fixes
debug — Something isn't working; trace fault to cause
assembly — Something needs to be built/put together
inspection — Check condition, report findings

step_engine.py

Generates structured steps from analysis results.

from step_engine import StepEngine

engine = StepEngine(lang="en")
steps = engine.generate(
    task_type="repair",
    objects=["wheel", "chain", "brake caliper"],
    anomalies=["chain loose", "brake pad worn"],
    context={"bike_type": "mountain"}
)

for step in steps:
    print(step["number"], step["title"])
    print(step["description"])
    print(f"[Tools: {step['tools']}] [Time: {step['time_estimate']}]")
    if step["safety_warning"]:
        print(f"⚠️  {step['safety_warning']}")

Step object schema:

{
    "number": int,              # 1-based step number
    "title": str,               # Short action title
    "description": str,         # Detailed description
    "tools": list[str],         # Required tools
    "time_estimate": str,       # e.g. "5-10 min"
    "difficulty": str,          # "easy" | "medium" | "hard" | "expert"
    "safety_warning": str|null,# Warning text if any
    "prerequisite": bool,       # Must be done before others proceed
    "common_mistakes": list[str],# What to avoid
}

Difficulty classification:

Level	Indicator
`easy`	No special tools, minimal risk
`medium`	Basic tools, some disassembly
`hard`	Specialty tools, significant disassembly
`expert`	Professional tools, structural risk

speaker.py

Handles TTS output and markdown display.

from speaker import Speaker

speaker = Speaker(lang="en", tts_enabled=True)

speaker.display_and_speak("Step 1: Inspect the chain tensioner")
speaker.display_steps([...steps...])
speaker.speak_only("Make sure to wear safety glasses.")
speaker.wait_for_user("Press Enter when ready to continue")

Features:

gtts (Google TTS) — default, works out of the box
pyttsx3 — offline fallback
Markdown rendering in terminal with rich library
Per-step speak with configurable pacing
Confirmation gating between steps (for --step-by-step mode)

Step Generation Guidelines

Steps must follow this structure:

Prerequisites — Things that must be done first (disconnect power, secure object, etc.)
Assessment — Inspect and confirm the problem
Preparation — Gather tools, clear workspace
Main actions — Numbered, one clear action per step
Verification — Test that the fix/assembly worked
Cleanup — Put back together, tidy tools

Rules:

Each step = one action. If it has "and", it's two steps.
Always include a safety check step after anything involving power, hot parts, or fluids.
Difficulty and time estimate must be realistic.
Flag the most common mistakes for each step.

Configuration

Config file: scripts/config.yaml

tts:
  engine: "gtts"          # "gtts" or "pyttsx3"
  lang: "en"
  speed: 1.0              # 0.5 = slow, 2.0 = fast
  volume: 1.0             # 0.0 to 1.0

display:
  use_rich: true          # Pretty terminal output
  color: "cyan"           # Step highlight color
  show_icons: true        # Show ✅ ⚠️ 🔧 icons

analysis:
  default_task: "auto"
  frame_skip: 10
  confidence_threshold: 0.6

step_delivery:
  auto_speak: true
  wait_confirmation: false
  speak_difficulty: true
  speak_time_estimate: true

Task Reference

Bike Repair — Chain Adjustment

🔧 Tools: Hex keys (4mm, 5mm), chain tool, lubricant
⏱ Time: 15-25 min
⚠️ Safety: Flip bike first — chain tension releases can snap

Flip bike, rest on seat and handlebars
Inspect chain for stiff links, rust, kinks
Loosen rear axle bolts (5mm hex)
Adjust chain tension via horizontal dropouts
Check tension: 10-15mm deflection at midpoint
Re-tighten axle bolts
Lubricate if needed, wipe excess
Test ride

Car Debug — Engine Won't Start

🔧 Tools: OBD2 scanner, multimeter, basic socket set
⏱ Time: 20-40 min (diagnosis first)
⚠️ Safety: Disable ignition, disconnect battery negative first

Check if fuel pump primes (turn key to ON, listen)
Test battery voltage (>12.4V idle, >13.5V running)
Connect OBD2 scanner, read fault codes
Inspect spark plugs for gap/damage
Check for crank/cam position sensor signals
Verify immobilizer status
Narrow to most likely cause, then address

Generic Assembly — IKEA-style

🔧 Tools: Hex key (included), Phillips screwdriver, hammer
⏱ Time: varies
⚠️ Safety: Enlist a second person for large panels

Unpack and sort all hardware (count screws, dowels)
Lay out all panels, identify front/back
Pre-assemble sub-groups before final join
Hand-tighten all screws first
Use cardboard to protect floors
Final torque pass after 24h

Troubleshooting

"No audio output"

Check if gtts is installed: pip install gtts
Fallback: engine: pyttsx3 in config (offline)
On headless servers: set DISPLAY env var or use pyttsx3

"Analysis is slow on video"

Increase --frame-skip (e.g., --frame-skip 30)
Use --input camera --live for real-time with throttled analysis

"Steps are too generic"

Provide more context in the initial prompt
Use --task repair explicitly if auto-detect fails
For specialized equipment, the LLM analysis quality depends on prompt specificity

"OpenCV camera not found"

Check camera index: python scripts/video_analyzer.py --input camera --list-devices
Try --input camera --camera-index 1 if default is wrong

Extending for Specific Domains

iaworker ships with general-purpose analysis. To add domain-specific knowledge:

Create references/domains/MYDOMAIN.md with known failure modes and tool lists
In step_engine.py, add a DOMAIN_HANDLERS map that loads these
The step engine will then reference domain files when generating steps

Example domain file:

# Domain: electric_bike

## Common Failures
- Motor controller overheating → reduce load, check ventilation
- Battery BMS cutout → reset via unplugging 30s
- Torque sensor miscalibration → re-zero via display menu

## Safety
- Never open motor housing — high voltage capacitors retain charge
- Battery must be removed before any repair

安全使用建议

This skill appears to do what it claims, but review these practical concerns before installing: (1) Dependencies: it relies on OpenCV, Pillow and optionally torch/transformers — the latter will download models (large) from the internet unless pre-cached. Add a proper install step and pinned package versions. (2) Network usage: gTTS will send text to an online Google TTS endpoint; transformers may fetch models from Hugging Face — if you need offline privacy, configure pyttsx3 and avoid the classifier pipeline or pre-download models. (3) Device access: the skill can read camera devices and write files / temporary audio; run it in a sandbox or a controlled environment if you have sensitive cameras or images. (4) Safety: generated repair instructions can be safety-critical; validate steps and do not rely solely on automated guidance for high-risk tasks. (5) Recommended actions: run code review or tests in an isolated virtualenv/container, set tts.enabled=false if you want no external TTS by default, and require the author to include an install spec that documents network calls, model sources, and exact dependencies.

功能分析

Type: OpenClaw Skill Name: iaworker Version: 1.0.0 The iaworker skill bundle is a legitimate utility designed for visual analysis of physical tasks (repair, assembly, inspection) using computer vision and text-to-speech (TTS). The code uses standard libraries like OpenCV, Transformers, and gTTS to process images/videos and provide guided instructions. While it utilizes subprocess calls in `speaker.py` to play audio files across different platforms (macOS, Linux, Windows), these are implemented using system-generated temporary file paths and align with the stated purpose of providing audible guidance. No evidence of data exfiltration, persistence, or malicious prompt injection was found.

能力标签

cryptocan-make-purchases

能力评估

✓ Purpose & Capability

Name/description (visual analysis → step generation → TTS) matches the provided scripts (video_analyzer.py, step_engine.py, speaker.py). The code implements object/ anomaly detection, step generation and TTS/display delivery — all coherent with the stated purpose. Minor note: heavy ML libs (torch/transformers, cv2, PIL) are used even though the skill has no install spec; this is plausible for image analysis but should be declared to users.

ℹ Instruction Scope

SKILL.md and the scripts confine behavior to analyzing provided images/videos or a camera feed, producing steps, writing markdown output, and TTS playback. That is within scope. Important caveats: the analyzer will access local files and camera devices, write output files (markdown, temp audio), and run subprocesses to play audio. The _llm_ analysis is implemented locally as prompt templates (no external LLM call in the code shown), but the classifier pipeline may fetch models from the network (see install notes).

⚠ Install Mechanism

No install spec is provided. The code depends on sizable native libraries (opencv-python, pillow), optional torch/transformers (which will download models like 'microsoft/resnet-50' from the model hub at runtime if installed), and gTTS/pyttsx3 for audio. Those model downloads and gTTS network calls are implicit and not documented in SKILL.md; absence of an install block means a user may be surprised by large downloads, network traffic, or missing runtime dependencies.

ℹ Credentials

The skill requests no environment variables or credentials, which is proportionate. However, it uses gTTS (an online TTS client) by default in config.yaml which will make network calls to Google’s TTS service; transformers will pull models from Hugging Face if used. These network interactions are reasonable for the feature set but are not declared in the metadata and may be privacy-sensitive (image data uploaded to remote services via those libraries).

✓ Persistence & Privilege

The skill is not always-included and uses normal agent invocation. It does not modify other skills or system-wide configs. It reads/writes files within its own directory and uses system devices (camera, audio) — this is expected given the functionality.

版本历史

v1.0.0

Initial release of iaworker: Intelligent Automation Worker for real-time physical task guidance. - Analyze images, videos, or live camera feeds to detect physical issues (repair, debug, assembly, inspection). - Generate structured, step-by-step operating instructions with estimated difficulty and tool requirements. - Delivers guidance both visually (markdown display) and audibly (TTS spoken aloud) with optional step-by-step confirmation. - Supports multiple task types, per-step safety warnings, and mistake avoidance tips. - Highly configurable TTS, display, and workflow settings via YAML config file. - Includes modular scripts: video/image analyzer, step generator, and TTS/display handler.

元数据

Slug iaworker

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

iaworker 是什么？

Intelligent Automation Worker — analyzes video/image streams and generates structured, real-time operating steps for physical tasks (debug, repair, assembly,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 91 次。

如何安装 iaworker？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install iaworker」即可一键安装，无需额外配置。

iaworker 是免费的吗？

是的，iaworker 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

iaworker 支持哪些平台？

iaworker 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 iaworker？

由 yinleunglai（@yinleunglai）开发并维护，当前版本 v1.0.0。

iaworker