← 返回 Skills 市场
iamtwz

AgentKVM

作者 iamtwz · GitHub ↗ · v0.2.1 · MIT-0
cross-platform ⚠ suspicious
218
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install agentkvm
功能描述
Control physical devices (phones, PCs, Macs) through NanoKVM-USB hardware. Use this skill whenever the user asks you to interact with a physical screen, take...
使用说明 (SKILL.md)

Requirements

Before using AgentKVM, ensure the following are installed and available:

  • AgentKVM CLInpm install -g agentkvm
  • Node.js >= 18
  • ffmpeg — required for screenshot capture (brew install ffmpeg on macOS, apt install ffmpeg on Linux)
  • NanoKVM-USB hardware connected to the host machine via USB
  • HDMI input from the target device connected to the NanoKVM-USB

Run agentkvm status to verify everything is set up correctly. If the CLI is not found, install it first. If the device is not detected, check agentkvm list for available serial ports.

AgentKVM — AI-Driven Device Control

AgentKVM lets you see and operate physical devices (iPhones, Android phones, PCs, Macs, Linux machines) connected via NanoKVM-USB hardware. You take screenshots to observe the screen, then send mouse clicks, keyboard input, and scrolls to interact — just like a human sitting in front of the device.

Core Loop

Every interaction with a physical device follows the same pattern:

Screenshot → Analyze → Act → Verify
  1. Screenshot — capture what's currently on screen
  2. Analyze — look at the image to understand the UI state
  3. Act — click, type, scroll, or drag based on what you see
  4. Verify — take another screenshot to confirm the action worked

This loop is your fundamental building block. Chain multiple iterations to accomplish complex tasks.

Quick Start

Check connection

agentkvm --json status

If this fails, the device isn't connected. Check the serial port with agentkvm list.

See the screen

agentkvm --json screenshot

Returns { "path": "/path/to/screenshot.png", ... }. Read the image to see what's on screen.

Interact

# Click at pixel coordinates (relative to the cropped image)
agentkvm mouse click 223 485

# Type text
agentkvm type "hello world"

# Press key combos
agentkvm key enter
agentkvm key ctrl+c
agentkvm key cmd+space

# Scroll (positive = up, negative = down)
agentkvm mouse scroll 300 500 --delta -3

# Drag from point A to point B
agentkvm mouse drag 100 200 400 600

Remote operation

If AgentKVM is running on another machine, all commands work identically with --remote:

agentkvm --remote http://192.168.1.100:7070 --token my-secret screenshot --json
agentkvm --remote http://192.168.1.100:7070 --token my-secret mouse click 223 485

Or use the HTTP API directly — see references/api.md.

How Coordinates Work

This is critical to get right. When you analyze a screenshot and identify a UI element at pixel (x, y), those coordinates are relative to the screenshot image itself — top-left is (0, 0). Pass these coordinates directly to agentkvm mouse click x y.

AgentKVM handles the translation to the actual hardware coordinates internally, based on the device type and crop settings. You don't need to do any math.

Two coordinate modes

The device type determines how coordinates are translated:

"device" mode (iPhone, Android) — The cropped region IS the device's full screen. HID absolute coordinates 0–4096 map to the device's own display. Use this when the HDMI output shows the device screen within a larger capture frame.

"frame" mode (PC, Mac, Linux) — The cropped region is just a visual focus area; HID coordinates still map to the full monitor. Use this when you're controlling a computer where the capture resolution matches the target display.

The mode is selected automatically from the config. You rarely need to think about it.

Implementing a Task

When asked to perform a GUI task (e.g., "open Safari and search for X"):

Step 1: Observe first

Always start with a screenshot. Never assume what's on screen.

agentkvm --json screenshot

Read the returned image file. Describe what you see — this grounds your actions in reality.

Step 2: Plan your actions

Break the task into individual interactions. For "open Safari and search for X":

  1. Find the Safari icon → click it
  2. Wait for Safari to load → screenshot to verify
  3. Find the address bar → click it
  4. Type the search query
  5. Press Enter
  6. Screenshot to verify results

Step 3: Execute with verification

After each significant action, take a screenshot to verify it worked. Screens can be slow to update, so add brief waits between actions when needed (use sleep in your script).

Common pattern in a bash script:

# Click Safari icon at the observed position
agentkvm mouse click 223 950
sleep 1

# Verify it opened
agentkvm --json screenshot
# (read and analyze the screenshot)

# Click address bar
agentkvm mouse click 300 50
sleep 0.3

# Type search query
agentkvm type "weather today"
agentkvm key enter
sleep 2

# Verify search results loaded
agentkvm --json screenshot

Step 4: Handle failures

If an action didn't produce the expected result:

  • The element might have moved — take a fresh screenshot and re-locate it
  • The screen might not have updated yet — wait and retry
  • You might have clicked the wrong spot — re-analyze and adjust coordinates

Config Reference

All settings live in ~/.config/agentkvm/config.json. A typical setup:

{
  "serialPort": "/dev/tty.usbserial-2140",
  "resolution": { "width": 1920, "height": 1080 },
  "videoDevice": "USB3 Video",
  "deviceType": "iphone",
  "crop": { "x": 738, "y": 55, "width": 447, "height": 970 }
}

Key fields:

  • serialPort — path to the NanoKVM-USB serial device
  • resolution — HDMI capture resolution
  • videoDevice — video capture device name or index
  • deviceType — determines coordinate mode (iphone/android = device, pc/mac/linux = frame)
  • crop — sub-region of the capture frame to use as the working area

When config is set, you can run bare commands without flags: agentkvm screenshot, agentkvm mouse click 100 200, etc.

Tips for Reliable Automation

Prefer clicking on text labels over icons — text is easier to locate precisely in screenshots.

Use --json for programmatic access — all commands support it and return structured data you can parse.

Double-click when single-click doesn't respond — some UI elements need --double.

Scroll in small increments--delta 1 or --delta -1 is one scroll step. Use multiple steps with verification screenshots in between.

Type slowly for unreliable connections — increase --delay (default 50ms) if characters get dropped.

Use key combos for navigationcmd+space (Spotlight), alt+tab (window switch), ctrl+c (cancel) are often faster than finding and clicking UI elements.

For the full CLI reference, key combo syntax, and HTTP API details, see references/api.md.

安全使用建议
This skill appears to do what it says, but take operational precautions before installing or running it: - Verify the npm package source and integrity before `npm install -g agentkvm` (check package owner, README, repository link, and consider installing in a container or VM first). - Do not run the HTTP server bound to 0.0.0.0 on untrusted networks. Start it bound to localhost or behind a firewall, and always configure a strong token if you enable remote mode. - Be aware screenshots may contain sensitive data (passwords, 2FA codes). Avoid routing screenshots to remote endpoints you do not control. - Avoid instructing the agent to type secrets automatically unless you trust the environment and connection; prefer manual input when handling credentials. - Inspect ~/.config/agentkvm/config.json after installation for unexpected entries, and restrict access to the host running the hardware. If you want a higher-confidence safety assessment, provide the actual npm package source (package repository or tarball) so its code can be reviewed for hidden network calls, telemetry, or unexpected behavior.
功能分析
Type: OpenClaw Skill Name: agentkvm Version: 0.2.1 The skill provides high-risk capabilities including full-screen capture and remote input injection (keyboard/mouse) for physical devices via NanoKVM hardware. It includes a built-in HTTP server (`agentkvm serve`) for remote operation and explicitly instructs the agent to handle sensitive tasks like typing passwords on connected machines (`SKILL.md`). While these features are aligned with the stated purpose of hardware automation, the broad control over target systems and the potential for remote access via the HTTP API (`references/api.md`) represent significant security risks.
能力评估
Purpose & Capability
The name/description match the runtime instructions: screenshots, mouse/keyboard HID actions, and the described CLI/HTTP API. Requiring Node/npm, ffmpeg, and NanoKVM hardware is expected for this functionality.
Instruction Scope
SKILL.md stays within device-control scope (screenshot → analyze → act → verify). It documents reading screenshots from disk and writing/reading a config at ~/.config/agentkvm/config.json, and shows remote API usage. However, the examples include typing passwords and running a remote server that can return screenshots (sensitive data), so the operator must be careful about what gets captured or typed.
Install Mechanism
This is an instruction-only skill (no bundled install). It tells users to run `npm install -g agentkvm` and install ffmpeg via package managers. That is a normal distribution path but does bring in remotely hosted code (npm) which will run on the host when installed — the skill itself does not provide or pin any verified release or checksum.
Credentials
No environment variables or unrelated credentials are requested. The documented config path (~/.config/agentkvm/config.json) and optional server token are appropriate for the tool's behavior.
Persistence & Privilege
The skill does not request 'always: true', but the documented HTTP server mode exposes a control surface (default host 0.0.0.0:7070) and accepts screenshots/remote control. If the server is started without a strong token or with public binding, it could allow remote control and exfiltration of screen images. Autonomous invocation being allowed is platform-default and not itself flagged, but combined with network-exposed control it increases blast radius.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install agentkvm
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /agentkvm 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.2.1
- Improved and expanded documentation in SKILL.md, detailing requirements, setup instructions, and usage patterns for controlling devices through NanoKVM-USB. - Clarified how to structure device control loops: screenshot, analyze, act, and verify. - Added comprehensive examples for common tasks, including coordinate handling and remote operation. - Provided troubleshooting tips, workflow strategies, and a detailed config reference for easier setup. - Enhanced guidelines for scaling automation reliably across device types and scenarios.
元数据
Slug agentkvm
版本 0.2.1
许可证 MIT-0
累计安装 1
当前安装数 1
历史版本数 1
常见问题

AgentKVM 是什么?

Control physical devices (phones, PCs, Macs) through NanoKVM-USB hardware. Use this skill whenever the user asks you to interact with a physical screen, take... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 218 次。

如何安装 AgentKVM?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install agentkvm」即可一键安装,无需额外配置。

AgentKVM 是免费的吗?

是的,AgentKVM 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

AgentKVM 支持哪些平台?

AgentKVM 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 AgentKVM?

由 iamtwz(@iamtwz)开发并维护,当前版本 v0.2.1。

💬 留言讨论