← 返回 Skills 市场
yinleunglai

iaworker

作者 yinleunglai · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ✓ 安全检测通过
91
总下载
1
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install iaworker
功能描述
Intelligent Automation Worker — analyzes video/image streams and generates structured, real-time operating steps for physical tasks (debug, repair, assembly,...
使用说明 (SKILL.md)

iaworker — Intelligent Automation Worker

Analyze video/image streams, diagnose physical problems, and generate structured step-by-step operating guidance. Deliver instructions both visually (displayed markdown) and audibly (TTS spoken aloud).


Core Workflow

┌─────────────────────────────────────────────────────────────────────┐
│                        iaworker PROCESS                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  [1] RECEIVE INPUT                                                   │
│      Video file path, image path, or live camera frame              │
│           ↓                                                          │
│  [2] ANALYZE (video_analyzer.py)                                     │
│      - Extract key frames                                             │
│      - Identify objects, damage, components                           │
│      - Detect anomaly patterns (cracks, loose parts, fluid leaks)   │
│      - Classify task type (repair / assembly / inspection / debug)   │
│           ↓                                                          │
│  [3] GENERATE STEPS (step_engine.py)                                 │
│      - Build ordered, numbered action steps                           │
│      - Include tool requirements, safety warnings                   │
│      - Flag prerequisite steps (disconnect power, etc.)             │
│      - Estimate difficulty/time for each step                       │
│           ↓                                                          │
│  [4] DELIVER (speaker.py + display)                                  │
│      - Display formatted markdown step guide                         │
│      - Speak each step aloud via TTS                                  │
│      - Step-by-step progression (not all at once)                    │
│      - Wait for user confirmation before advancing (configurable)    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Quick Start

Analyze an image and get spoken steps

python scripts/video_analyzer.py \
  --input /path/to/image.jpg \
  --task repair \
  --lang en \
  --speak

Analyze a video and get per-segment steps

python scripts/video_analyzer.py \
  --input /path/to/video.mp4 \
  --task debug \
  --lang en \
  --speak \
  --step-by-step

Analyze from camera feed (live)

python scripts/video_analyzer.py \
  --input camera \
  --task inspection \
  --lang en \
  --speak \
  --live

Scripts

video_analyzer.py

Entry point. Analyzes visual input and triggers step generation.

python scripts/video_analyzer.py [options]

Options:

Flag Description Default
--input PATH Image path, video path, or camera for live Required
--task TYPE repair, debug, assembly, inspection, auto auto
--lang CODE en or zh en
--speak Enable TTS for step output Disabled
--step-by-step Speak and display one step at a time, wait for confirmation Sequential mode
--live Live camera mode with continuous analysis Off
--output PATH Write steps to markdown file None (console only)
--frame-skip N Skip every N frames in video (speed up analysis) 10

Task auto-detection:

  • repair — Something is broken; find damage, suggest fixes
  • debug — Something isn't working; trace fault to cause
  • assembly — Something needs to be built/put together
  • inspection — Check condition, report findings

step_engine.py

Generates structured steps from analysis results.

from step_engine import StepEngine

engine = StepEngine(lang="en")
steps = engine.generate(
    task_type="repair",
    objects=["wheel", "chain", "brake caliper"],
    anomalies=["chain loose", "brake pad worn"],
    context={"bike_type": "mountain"}
)

for step in steps:
    print(step["number"], step["title"])
    print(step["description"])
    print(f"[Tools: {step['tools']}] [Time: {step['time_estimate']}]")
    if step["safety_warning"]:
        print(f"⚠️  {step['safety_warning']}")

Step object schema:

{
    "number": int,              # 1-based step number
    "title": str,               # Short action title
    "description": str,         # Detailed description
    "tools": list[str],         # Required tools
    "time_estimate": str,       # e.g. "5-10 min"
    "difficulty": str,          # "easy" | "medium" | "hard" | "expert"
    "safety_warning": str|null,# Warning text if any
    "prerequisite": bool,       # Must be done before others proceed
    "common_mistakes": list[str],# What to avoid
}

Difficulty classification:

Level Indicator
easy No special tools, minimal risk
medium Basic tools, some disassembly
hard Specialty tools, significant disassembly
expert Professional tools, structural risk

speaker.py

Handles TTS output and markdown display.

from speaker import Speaker

speaker = Speaker(lang="en", tts_enabled=True)

speaker.display_and_speak("Step 1: Inspect the chain tensioner")
speaker.display_steps([...steps...])
speaker.speak_only("Make sure to wear safety glasses.")
speaker.wait_for_user("Press Enter when ready to continue")

Features:

  • gtts (Google TTS) — default, works out of the box
  • pyttsx3 — offline fallback
  • Markdown rendering in terminal with rich library
  • Per-step speak with configurable pacing
  • Confirmation gating between steps (for --step-by-step mode)

Step Generation Guidelines

Steps must follow this structure:

  1. Prerequisites — Things that must be done first (disconnect power, secure object, etc.)
  2. Assessment — Inspect and confirm the problem
  3. Preparation — Gather tools, clear workspace
  4. Main actions — Numbered, one clear action per step
  5. Verification — Test that the fix/assembly worked
  6. Cleanup — Put back together, tidy tools

Rules:

  • Each step = one action. If it has "and", it's two steps.
  • Always include a safety check step after anything involving power, hot parts, or fluids.
  • Difficulty and time estimate must be realistic.
  • Flag the most common mistakes for each step.

Configuration

Config file: scripts/config.yaml

tts:
  engine: "gtts"          # "gtts" or "pyttsx3"
  lang: "en"
  speed: 1.0              # 0.5 = slow, 2.0 = fast
  volume: 1.0             # 0.0 to 1.0

display:
  use_rich: true          # Pretty terminal output
  color: "cyan"           # Step highlight color
  show_icons: true        # Show ✅ ⚠️ 🔧 icons

analysis:
  default_task: "auto"
  frame_skip: 10
  confidence_threshold: 0.6

step_delivery:
  auto_speak: true
  wait_confirmation: false
  speak_difficulty: true
  speak_time_estimate: true

Task Reference

Bike Repair — Chain Adjustment

🔧 Tools: Hex keys (4mm, 5mm), chain tool, lubricant
⏱ Time: 15-25 min
⚠️ Safety: Flip bike first — chain tension releases can snap
  1. Flip bike, rest on seat and handlebars
  2. Inspect chain for stiff links, rust, kinks
  3. Loosen rear axle bolts (5mm hex)
  4. Adjust chain tension via horizontal dropouts
  5. Check tension: 10-15mm deflection at midpoint
  6. Re-tighten axle bolts
  7. Lubricate if needed, wipe excess
  8. Test ride

Car Debug — Engine Won't Start

🔧 Tools: OBD2 scanner, multimeter, basic socket set
⏱ Time: 20-40 min (diagnosis first)
⚠️ Safety: Disable ignition, disconnect battery negative first
  1. Check if fuel pump primes (turn key to ON, listen)
  2. Test battery voltage (>12.4V idle, >13.5V running)
  3. Connect OBD2 scanner, read fault codes
  4. Inspect spark plugs for gap/damage
  5. Check for crank/cam position sensor signals
  6. Verify immobilizer status
  7. Narrow to most likely cause, then address

Generic Assembly — IKEA-style

🔧 Tools: Hex key (included), Phillips screwdriver, hammer
⏱ Time: varies
⚠️ Safety: Enlist a second person for large panels
  1. Unpack and sort all hardware (count screws, dowels)
  2. Lay out all panels, identify front/back
  3. Pre-assemble sub-groups before final join
  4. Hand-tighten all screws first
  5. Use cardboard to protect floors
  6. Final torque pass after 24h

Troubleshooting

"No audio output"

  • Check if gtts is installed: pip install gtts
  • Fallback: engine: pyttsx3 in config (offline)
  • On headless servers: set DISPLAY env var or use pyttsx3

"Analysis is slow on video"

  • Increase --frame-skip (e.g., --frame-skip 30)
  • Use --input camera --live for real-time with throttled analysis

"Steps are too generic"

  • Provide more context in the initial prompt
  • Use --task repair explicitly if auto-detect fails
  • For specialized equipment, the LLM analysis quality depends on prompt specificity

"OpenCV camera not found"

  • Check camera index: python scripts/video_analyzer.py --input camera --list-devices
  • Try --input camera --camera-index 1 if default is wrong

Extending for Specific Domains

iaworker ships with general-purpose analysis. To add domain-specific knowledge:

  1. Create references/domains/MYDOMAIN.md with known failure modes and tool lists
  2. In step_engine.py, add a DOMAIN_HANDLERS map that loads these
  3. The step engine will then reference domain files when generating steps

Example domain file:

# Domain: electric_bike

## Common Failures
- Motor controller overheating → reduce load, check ventilation
- Battery BMS cutout → reset via unplugging 30s
- Torque sensor miscalibration → re-zero via display menu

## Safety
- Never open motor housing — high voltage capacitors retain charge
- Battery must be removed before any repair
安全使用建议
This skill appears to do what it claims, but review these practical concerns before installing: (1) Dependencies: it relies on OpenCV, Pillow and optionally torch/transformers — the latter will download models (large) from the internet unless pre-cached. Add a proper install step and pinned package versions. (2) Network usage: gTTS will send text to an online Google TTS endpoint; transformers may fetch models from Hugging Face — if you need offline privacy, configure pyttsx3 and avoid the classifier pipeline or pre-download models. (3) Device access: the skill can read camera devices and write files / temporary audio; run it in a sandbox or a controlled environment if you have sensitive cameras or images. (4) Safety: generated repair instructions can be safety-critical; validate steps and do not rely solely on automated guidance for high-risk tasks. (5) Recommended actions: run code review or tests in an isolated virtualenv/container, set tts.enabled=false if you want no external TTS by default, and require the author to include an install spec that documents network calls, model sources, and exact dependencies.
功能分析
Type: OpenClaw Skill Name: iaworker Version: 1.0.0 The iaworker skill bundle is a legitimate utility designed for visual analysis of physical tasks (repair, assembly, inspection) using computer vision and text-to-speech (TTS). The code uses standard libraries like OpenCV, Transformers, and gTTS to process images/videos and provide guided instructions. While it utilizes subprocess calls in `speaker.py` to play audio files across different platforms (macOS, Linux, Windows), these are implemented using system-generated temporary file paths and align with the stated purpose of providing audible guidance. No evidence of data exfiltration, persistence, or malicious prompt injection was found.
能力标签
cryptocan-make-purchases
能力评估
Purpose & Capability
Name/description (visual analysis → step generation → TTS) matches the provided scripts (video_analyzer.py, step_engine.py, speaker.py). The code implements object/ anomaly detection, step generation and TTS/display delivery — all coherent with the stated purpose. Minor note: heavy ML libs (torch/transformers, cv2, PIL) are used even though the skill has no install spec; this is plausible for image analysis but should be declared to users.
Instruction Scope
SKILL.md and the scripts confine behavior to analyzing provided images/videos or a camera feed, producing steps, writing markdown output, and TTS playback. That is within scope. Important caveats: the analyzer will access local files and camera devices, write output files (markdown, temp audio), and run subprocesses to play audio. The _llm_ analysis is implemented locally as prompt templates (no external LLM call in the code shown), but the classifier pipeline may fetch models from the network (see install notes).
Install Mechanism
No install spec is provided. The code depends on sizable native libraries (opencv-python, pillow), optional torch/transformers (which will download models like 'microsoft/resnet-50' from the model hub at runtime if installed), and gTTS/pyttsx3 for audio. Those model downloads and gTTS network calls are implicit and not documented in SKILL.md; absence of an install block means a user may be surprised by large downloads, network traffic, or missing runtime dependencies.
Credentials
The skill requests no environment variables or credentials, which is proportionate. However, it uses gTTS (an online TTS client) by default in config.yaml which will make network calls to Google’s TTS service; transformers will pull models from Hugging Face if used. These network interactions are reasonable for the feature set but are not declared in the metadata and may be privacy-sensitive (image data uploaded to remote services via those libraries).
Persistence & Privilege
The skill is not always-included and uses normal agent invocation. It does not modify other skills or system-wide configs. It reads/writes files within its own directory and uses system devices (camera, audio) — this is expected given the functionality.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install iaworker
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /iaworker 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of iaworker: Intelligent Automation Worker for real-time physical task guidance. - Analyze images, videos, or live camera feeds to detect physical issues (repair, debug, assembly, inspection). - Generate structured, step-by-step operating instructions with estimated difficulty and tool requirements. - Delivers guidance both visually (markdown display) and audibly (TTS spoken aloud) with optional step-by-step confirmation. - Supports multiple task types, per-step safety warnings, and mistake avoidance tips. - Highly configurable TTS, display, and workflow settings via YAML config file. - Includes modular scripts: video/image analyzer, step generator, and TTS/display handler.
元数据
Slug iaworker
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

iaworker 是什么?

Intelligent Automation Worker — analyzes video/image streams and generates structured, real-time operating steps for physical tasks (debug, repair, assembly,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 91 次。

如何安装 iaworker?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install iaworker」即可一键安装,无需额外配置。

iaworker 是免费的吗?

是的,iaworker 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

iaworker 支持哪些平台?

iaworker 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 iaworker?

由 yinleunglai(@yinleunglai)开发并维护,当前版本 v1.0.0。

💬 留言讨论