← 返回 Skills 市场

ClawdCursor

Name: ClawdCursor
Author: amrdab

作者 AmrDab · GitHub ↗ · v0.6.3

cross-platform ⚠ suspicious

1204

总下载

当前安装

版本数

在 OpenClaw 中安装

/install clawd-cursor

功能描述

AI desktop agent — control any app on Windows/macOS from your OpenClaw agent. Send natural language tasks to the Clawd Cursor API and it handles everything:...

使用说明 (SKILL.md)

Clawd Cursor

If a human can do it on a screen, you can too. No API? No integration? No problem. Every app already has a UI — Clawd Cursor gives you eyes and hands to use all of them.

No task is impossible. If it's on the screen, it's within reach.

IMPORTANT — READ THIS BEFORE ANYTHING ELSE

You have a desktop AI agent over GUI running on the user's machine. This means:

You CAN open apps, click buttons, type text, fill forms

You CAN see what's on screen (via Clawd Cursor's screenshot pipeline)

You CAN interact with applications — browser, desktop apps, settings

You MUST respect safety tiers — 🟢 Auto actions run freely, 🔴 Confirm actions require user approval

You MUST ask before accessing sensitive apps (email, banking, private messages, password managers)

You SHOULD prefer direct tools (file read, web_fetch, CLI) over desktop automation when available

Think of Clawd Cursor as your hands and eyes on the user's computer — used with their permission.

When to Use This (AGENTS: READ THIS)

You have a desktop UI agent available. Use it when the user asks you to interact with their desktop, or when no other tool can accomplish the task.

Tool vs Skill (OpenClaw terminology)

Tool = direct capability (API call, filesystem, shell, web fetch, browser command).
Skill = packaged workflow/domain logic that may call one or more tools.
This skill (Clawd Cursor) = GUI execution skill. Use it after OpenClaw tools/skills that can complete the same work without GUI.

Use Clawd Cursor for (examples, not limits):

Clawd Cursor can perform any action that is visible and interactable in the GUI (subject to safety policy).

User-requested desktop tasks — "open Gmail and send an email", "check my calendar"
Read a webpage — when web_fetch or browser tools aren't available
Interact with desktop apps — click buttons, fill forms, read results
Browser tasks — search, navigate, fill forms (when browser tool unavailable)
Visual verification — did the page load? what does the UI show?
Cross-app workflows — copy from one app, paste in another
Settings changes — when the user explicitly asks

⚠️ Sensitive App Policy

Always ask the user before accessing:

Email clients (Gmail, Outlook)
Banking or financial apps
Private messaging (WhatsApp, Signal, Telegram)
Password managers
Admin panels or cloud consoles

Don't use Clawd Cursor when:

You can do it with a direct API call or CLI command (faster)
The task is purely computational (math, text generation, code writing)
You can already read/write the file directly
The browser tool or web_fetch can handle it

OpenClaw + Clawd Cursor Routing Contract (Avoid Overlap)

Clawd Cursor should be treated as OpenClaw's GUI execution layer, not a competing planner.

Route tasks in this order:

OpenClaw native tools first (filesystem, API, shell, provider-native skills)
Browser-native automation next (Playwright/CDP direct) for browser-only reads/clicks
Clawd Cursor API task (POST /task) only when desktop/UI-level interaction is required

Practical rule

If OpenClaw already has a reliable skill/tool for the domain, use it.
Use Clawd Cursor to bridge gaps where no API/tool exists or when the user explicitly asks for GUI interaction.

This keeps behavior predictable, lowers latency/cost, and avoids duplicated logic between the main OpenClaw agent and this skill.

Universal task pattern

For broad "get it done" requests, split into three phases:

Plan in OpenClaw: break work into API/CLI/browser/GUI subtasks.
Execute cheap paths first: API + CLI + browser direct.
Escalate only residual UI steps to Clawd Cursor.

Think: "OpenClaw decides, Clawd Cursor acts on GUI when needed."

Direct Browser Access (Fast Path)

For quick page reads without a full task, connect to Chrome via Playwright CDP:

const pw = require('playwright');
const browser = await pw.chromium.connectOverCDP('http://127.0.0.1:9222');
const pages = browser.contexts()[0].pages();
const text = await pages[0].innerText('body');

Use this when you just need page content — faster than sending a task.

Scenario	Use	Why
Read page content/text	CDP Direct	Instant, free
Fill a web form	API task (`POST /task`)	Clawd handles multi-step planning
Check if a page loaded	CDP Direct	Just read the title/URL
Click through a complex UI flow	API task (`POST /task`)	Clawd handles planning
Get a list of elements on page	CDP Direct	Fast DOM query
Interact with a desktop app	API task (`POST /task`)	CDP is browser-only

REST API Reference

Base URL: http://127.0.0.1:3847

Note: On Windows PowerShell, use curl.exe (with .exe) or Invoke-RestMethod. Bare curl is aliased to Invoke-WebRequest which behaves differently.

Pre-flight Check

Before your first task, verify Clawd Cursor is running:

curl.exe -s http://127.0.0.1:3847/health

Expected: {"status":"ok","version":"0.6.0"}

If connection refused — start it yourself (don't ask the user):

# Find the skill directory and start the server
Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "\x3Cclawd-cursor-directory>" -WindowStyle Hidden
Start-Sleep 3
# Verify it's running
curl.exe -s http://127.0.0.1:3847/health

The skill directory is wherever SKILL.md lives (the parent of this file). Use that path as the working directory.

Sending a Task (Async — Returns Immediately)

POST /task accepts the task and returns immediately. The task runs in the background. You must poll /status to know when it's done.

curl.exe -s -X POST http://127.0.0.1:3847/task -H "Content-Type: application/json" -d "{\"task\": \"YOUR_TASK_HERE\"}"

PowerShell:

Invoke-RestMethod -Uri http://127.0.0.1:3847/task -Method POST -ContentType "application/json" -Body '{"task": "YOUR_TASK_HERE"}'

Polling Pattern (Follow This)

1. POST /task → get accepted
2. Wait 2 seconds
3. GET /status
4. If status is "idle" → done
5. If status is "waiting_confirm" → ASK THE USER, then POST /confirm based on their answer
6. If still running → wait 2 more seconds, go to step 3
7. If 60+ seconds → POST /abort and retry with clearer instructions

Checking Status

curl.exe -s http://127.0.0.1:3847/status

Confirming Safety-Gated Actions

Some actions (sending messages, deleting) require approval. 🔴 NEVER self-approve these. Always ask the user for confirmation before POST /confirm. These exist to protect the user — do not bypass them.

curl.exe -s -X POST http://127.0.0.1:3847/confirm -H "Content-Type: application/json" -d "{\"approved\": true}"

Aborting a Task

curl.exe -s -X POST http://127.0.0.1:3847/abort

Reading Logs (Debugging)

curl.exe -s http://127.0.0.1:3847/logs

Returns last 200 log entries. Check for error or warn entries when tasks fail.

Response States

State	Response	What to do
Accepted	`{"accepted": true, "task": "..."}`	Start polling
Running	`{"status": "acting", "currentTask": "...", "stepsCompleted": 2}`	Keep polling
Waiting confirm	`{"status": "waiting_confirm", "currentStep": "..."}`	POST /confirm
Done	`{"status": "idle"}`	Task complete
Busy	`{"error": "Agent is busy", "state": {...}}`	Wait or POST /abort first

CDP Direct Reference

Chrome must be running with --remote-debugging-port=9222.

Quick check:

curl.exe -s http://127.0.0.1:9222/json/version

If this returns JSON, Chrome is ready.

Connecting via Playwright:

const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP('http://127.0.0.1:9222');
const context = browser.contexts()[0];
const page = context.pages()[0];

// Read page content
const title = await page.title();
const url = page.url();
const text = await page.textContent('body');

// Click by role
await page.getByRole('button', { name: 'Submit' }).click();

// Fill a field
await page.getByLabel('Email').fill('[email protected]');

// Read specific elements
const buttons = await page.$$eval('button', els => els.map(e => e.textContent));

Task Writing Guidelines

Be specific — include app names, URLs, exact text to type, button names
One task at a time — wait for completion before sending the next
Describe the goal, not the clicks — say "Send an email to [email protected] about the meeting" not "click compose, click to field..."
Check status if a task seems to hang
Don't include credentials in task text — tasks are logged

Task Examples

Goal	Task to send
Simple navigation	`Open Chrome and go to github.com`
Read screen content	`What text is currently displayed in Notepad?`
Cross-app workflow	`Copy the email address from the Chrome tab and paste it into the To field in Outlook`
Form filling	`In the open Chrome tab, fill the contact form: name "John Doe", email "[email protected]"`
App interaction	`Open Spotify and play the Discover Weekly playlist`
Settings change	`Open Windows Settings and turn on Dark Mode`
Data extraction	`Read the stock price shown in the Bloomberg tab in Chrome`
Complex browser	`Open YouTube, search for "Adele Hello", and play the first video result`
Verification	`Check if the deployment succeeded — look at the Vercel dashboard in Chrome`
Send email	`Open Gmail, compose email to [email protected], subject: Meeting Tomorrow, body: Confirming 2pm. Best regards.`
Take screenshot	`Take a screenshot`

Error Recovery

Problem	Solution
Connection refused on :3847	Start Clawd Cursor: `cd clawd-cursor && npm start`
Connection refused on :9222	Start Chrome with CDP: `Start-Process chrome -ArgumentList "--remote-debugging-port=9222"`
Agent returns "busy"	Poll `/status` — wait for idle, or POST `/abort`
Task fails with no details	Check `/logs` for error entries
Task completes but wrong result	Rephrase with more specifics: exact app name, button text, field labels
Same task fails repeatedly	Break into smaller tasks (one action per task)
Safety confirmation pending	POST `/confirm` with `{"approved": true}` or `{"approved": false}`
Task hangs > 60 seconds	POST `/abort`, then retry with simpler phrasing

How It Works — 5-Layer Pipeline

Layer	What	Speed	Cost
0: Browser Layer	URL detection → direct navigation	Instant	Free
1: Action Router + Shortcuts	Regex + UI Automation + keyboard shortcuts	Instant	Free
1.5: Smart Interaction	1 LLM plan → CDP/UIDriver executes	~2-5s	1 LLM call
2: Accessibility Reasoner	UI tree → text LLM decides	~1s	Cheap
3: Computer Use	Screenshot → vision LLM	~5-8s	Expensive

Layer 1 includes keyboard shortcuts — common actions execute as direct keystrokes (0 LLM calls).

80%+ of tasks handled by Layer 0-1 (free, instant). Vision model is last resort only.

Safety Tiers

Tier	Actions	Behavior
🟢 Auto	Navigation, reading, opening apps	Runs immediately
🟡 Preview	Typing, form filling	Logs before executing
🔴 Confirm	Sending messages, deleting	Pauses — ask the user before POST `/confirm`. Never self-approve.

Security & Privacy

Network Isolation

API binds to 127.0.0.1 only — not network accessible. Verify: netstat -an | findstr 3847 should show 127.0.0.1:3847
Screenshots stay in memory, never saved to disk (unless --debug)
No telemetry, no analytics, no phone-home calls

Data Flow

With Ollama (local): 100% offline — zero external network calls. No data leaves the machine.
With cloud providers: screenshots/text are sent to the user's chosen provider API only. No data goes to skill authors, ClawHub, or third parties.
OpenClaw users: credentials auto-discovered from local config files — no keys stored in skill directory.
The user controls data flow by choosing their provider. Ollama = fully private.

Agent Autonomy Controls

🟢 Auto actions (navigation, reading, opening apps) run without prompting
🟡 Preview actions (typing, form filling) are logged before executing
🔴 Confirm actions (sending messages, deleting, purchases) always pause for user approval
Agents must ask the user before accessing sensitive apps (email, banking, messaging, passwords)
Agents must never self-approve 🔴 Confirm actions

Setup (User Reference)

Setup is handled by the user. If Clawd Cursor isn't running, start it yourself using the exec tool:

Start-Process -FilePath "node" -ArgumentList "dist/index.js","start" -WorkingDirectory "\x3Cskill-directory>" -WindowStyle Hidden

Only ask the user if you cannot start it (e.g., node not installed, build missing).

git clone https://github.com/AmrDab/clawd-cursor.git
cd clawd-cursor
npm install && npm run build
npx clawd-cursor doctor    # auto-detects and configures everything
npm start                  # starts on port 3847

macOS: Grant Accessibility permission to terminal: System Settings → Privacy & Security → Accessibility

Provider	Setup	Cost
Ollama (free)	`ollama pull \x3Cmodel>`	$0 (fully offline)
Any cloud provider	Set `AI_API_KEY=your-key`	Varies by provider
OpenClaw users	Automatic — no setup needed	Uses configured provider

Performance Optimization

Proven optimizations applied to reduce task execution latency and LLM API costs. Reference files in perf/references/patches/.

Applied Optimizations

#	Name	Impact
1	Screenshot hash cache	90% fewer LLM calls on static screens
2	Parallel screenshot+a11y	30-40% per-step latency cut
3	A11y context cache (2s TTL)	Eliminates redundant PS spawns
4	Screenshot compression	52% smaller payload (58KB vs 120KB)
5	Async debug writes	94% less event loop blocking
6	Streaming LLM responses	1-3s faster per LLM call
7	Trimmed system prompts	~60% fewer prompt tokens
8	A11y tree filtering	Interactive elements only, 3000 char cap
9	Combined PS script	1 spawn instead of 3
10	Taskbar cache (30s TTL)	Skip expensive taskbar query
11	Delay reduction	50-150ms vs 200-1500ms

Benchmarks (2560x1440)

Metric	v0.3 (VNC)	v0.4 (Native)	v0.4.1+ (Optimized)
Screenshot capture	~850ms	~50ms	~57ms
Screenshot size	~200KB	~120KB	~58KB
A11y context (uncached)	N/A	~600ms	~462ms
A11y context (cached)	N/A	0ms	0ms (2s TTL)
Delays (per step)	N/A	200-1500ms	50-600ms
System prompt tokens	N/A	~800	~300

Perf Tools

perf/apply-optimizations.ps1 — apply all patches
perf/perf-test.ts — benchmark harness (npx ts-node perf/perf-test.ts)

安全使用建议

Things to consider before installing: 1) This skill requires cloning and running code from the project's GitHub (npm install, setup, start) — review that repository and the startup scripts before running them. 2) It runs a local service that captures screenshots and can control the UI; that is powerful — avoid giving it broad access to sensitive apps, or require explicit user confirmations for sensitive actions. 3) The skill inherits your agent's AI provider/API key (per SKILL.md): if you use a cloud provider, screenshots/text may be sent to that provider. Prefer a local provider (Ollama) if you want to keep data fully on-device. 4) If you proceed, run the install in a sandbox/VM or on a test machine first, verify what the service listens on, what it logs, and whether it auto-starts, and audit the npm dependencies. 5) Ask the publisher to resolve metadata mismatches (declare required binaries/env vars and clarify persistence/autostart) and to provide a reproducible install artifact (pinned release) rather than always cloning the main branch.

功能分析

Type: OpenClaw Skill Name: clawd-cursor Version: 0.6.3 The OpenClaw AgentSkills skill 'clawd-cursor' is designed for desktop GUI automation, providing the AI agent with capabilities to interact with applications, type, click, and navigate. While these capabilities are inherently powerful and could be misused, the `SKILL.md` documentation explicitly and repeatedly instructs the AI agent on critical safety measures, user consent, and privacy. It mandates asking the user for approval before sensitive actions (e.g., email, banking, deleting), binds its API to `127.0.0.1` only, and states that data sent to cloud AI providers goes only to the user's chosen provider, not to skill authors. There is no evidence of intentional data exfiltration to unauthorized parties, persistence mechanisms, or obfuscation, and the prompt injection surface is used to enforce safety rather than bypass it.

能力评估

ℹ Purpose & Capability

Name/description match the runtime instructions: this is a desktop GUI automation agent designed to control apps via screenshots and synthetic input. Requiring a local service that can take screenshots and send them to an AI provider is coherent with the stated purpose. However, the SKILL.md includes an explicit install flow (git clone + npm install + npm run setup + start) even though registry metadata lists no required binaries or env vars; that mismatch is notable but explainable (the skill needs Node/npm at install/run time).

⚠ Instruction Scope

The instructions direct the agent to run a local Clawd Cursor service that captures screenshots and performs clicks/typing — which necessarily gives broad access to whatever is on the screen. The doc says screenshots/text stay local or go only to the user's chosen AI provider, and that the skill inherits the active agent's API key, but there is no verifiable enforcement in the SKILL.md. The guidance to always ask before accessing sensitive apps is policy, not a technical constraint; the agent could be misconfigured or buggy and access sensitive apps. The SKILL.md also contains code snippets that require additional tooling (Playwright) and local ports (9222) which broaden its touch points.

⚠ Install Mechanism

Install steps in SKILL.md instruct cloning a GitHub repo and running npm install/setup and starting a service. Pulling and executing code from a remote repository is a moderate-to-high risk action because arbitrary code will be written to disk and executed. GitHub is a well-known host (better than an arbitrary URL), but npm install can bring many dependencies and native modules. The registry metadata's 'install specifications' were unknown/empty while SKILL.md contains explicit install commands — this discrepancy should be clarified by the publisher.

⚠ Credentials

The registry says 'no required env vars', but SKILL.md and notes state that in OpenClaw the skill 'inherits the active agent's AI provider + API key' and that screenshots/text may be sent to cloud providers. That means the agent's API key could be used by the Clawd Cursor process — a powerful credential for exfiltrating data to the configured provider. Requiring zero declared env vars while implicitly inheriting the agent's API key is a proportionality gap and should be explicitly documented and limited.

⚠ Persistence & Privilege

The skill starts a local REST service bound to 127.0.0.1 which will run on the user's machine and has GUI-level privileges (can read screenshots and synthesize input). 'always' is false (good), but installing and starting a background process still grants ongoing local capability to observe and control the UI. Binding to localhost reduces remote network exposure but does not eliminate local attack surface or misuse by other local processes. There is no explicit guarantee about auto-start/boot persistence in SKILL.md.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install clawd-cursor
安装完成后，直接呼叫该 Skill 的名称或使用 /clawd-cursor 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v0.6.3

- Updated install instructions: now uses `npm run setup` and the CLI command `clawdcursor` in place of previous `npx clawd-cursor` and `npm start`. - Clarified OpenClaw routing logic: Clawd Cursor should only be used after OpenClaw's native tools/skills are attempted. - Added guidance on distinguishing tools vs skills, and best practices for integrating with OpenClaw workflows. - Clarified the role of Clawd Cursor as the GUI execution layer, including updated recommendations for task escalation. - Expanded documentation for when and how to use Clawd Cursor versus direct (API/CLI/browser) automation, minimizing overlap.

v0.6.0

- Removed the direct check for node and npm binaries to improve compatibility on Windows; install instructions remain unchanged. - Updated usage guidance: now emphasizes obtaining user consent before accessing sensitive apps (email, banking, private messaging, password managers, admin panels). - Clarified that agents must respect safety tiers; confirm actions always require explicit user approval. - Adjusted instruction language and scenarios on when to use desktop automation versus direct APIs or command-line tools. - Internal notes added for OpenClaw inheritance of AI provider and key; credential field removed for stand-alone use. - REST API health check updated for new version reporting ("0.6.0").

v0.5.5

Clawd Cursor 0.5.5 - Added: Clear instructions that ClawdCursor is the fallback for any UI task that other agents can't handle. - Changed: Emphasized read-only autonomy unless explicitly instructed by the user (never send, delete, or modify without permission). - Updated: Confirmation steps — safety-gated actions must always ask the user before proceeding; do not self-approve. - Improved: REST API examples and polling workflow; version bumped in documentation. - Fixed: Skill name typo corrected in metadata.

v0.5.4

clawd-cursor v0.5.4 - Updated version number to 0.5.4. - Added homepage and source repository links to metadata. - Clarified privacy details: REST API now explicitly noted as binding only to 127.0.0.1 and being non-network-accessible. - Updated example output in REST API health check to match new version. - Minor improvements to API documentation and privacy wording for clarity.

v0.5.3

No file changes detected. - Version bump from 0.5.1 to 0.5.3. - Documentation updates: rewritten SKILL.md for clearer usage guidelines, API references, and agent instructions. - No code changes included in this release.

v0.5.1

**Clawd Cursor v0.5.1 — Major multi-platform upgrade with smarter, faster, and privacy-focused desktop automation.** - Added Smart Interaction Layer, reducing browser task LLM token use by 95% (1 call instead of 18) - Introduced CDP (Chrome DevTools Protocol) and native UI drivers for fast, free browser/OS interaction - Full macOS support: native accessibility and UI automation with JXA/AppleScript - New "doctor" tool for auto-configuration, provider/model detection, and update checks - Enhances privacy: all screenshots/data remain on the user's machine; runs 100% local with Ollama - Improved self-healing pipeline—auto-selects best execution layer and falls back on failure

v0.4.1

v0.4.1 - Screenshots are now held in memory only and not saved to disk by default - Opt-in debug mode enables screenshot disk saves via `--debug` flag - API now binds to localhost (127.0.0.1) for increased security - Removed over 2,800 lines of legacy VNC code - No postinstall scripts: `npm install` only fetches dependencies

v0.4.0

v0.4.0 (summary: Major upgrade to native desktop control — no more VNC dependencies) - Switched to @nut-tree-fork/nut-js for direct Windows/Mac control, eliminating the need for a VNC server. - Greatly improved speed: screenshots 17× faster (~50ms), connection 5× faster (~38ms). - Simplified setup: install and run with npm — no external server or setup script required. - Updated instructions and environment variables to reflect new native workflow.

v0.3.3

No code changes in this release; documentation only. - No file or code modifications detected. - SKILL.md and related documentation remain unchanged in version 0.3.3.

v0.3.2

Changelog v0.3.3 - Bulletproof headless setup: setup.ps1 now runs end-to-end in non-interactive agent shells - Generates a random VNC password if not given interactively - Fixed msiexec crash with improved error handling and window hiding - Fixed Start-Service post-install crash with dedicated error handling - Replaced emoji with ASCII for compatibility with cp1252 headless terminals

v0.3.1

clawd-cursor v0.3.1 - Improved documentation: README reorganized and rewritten for clarity and security considerations. - Explicit privacy notes: warns users screenshots are sent to AI providers, and highlights sensitive credential requirements. - Installation and setup instructions clarified, especially Windows automation via setup.ps1. - API and execution path sections streamlined for easier usage and quick-start. - Security and safety tier guidance now emphasized for safe real-world deployment.

v0.3.0

- Major update: Added comprehensive SKILL.md with installation, configuration, usage, execution paths, safety controls, API, CLI, and troubleshooting instructions. - Clear guidance provided for both Windows and cross-platform setup. - Detailed explanation of Anthropic (Computer Use) and OpenAI (Action Router) execution modes. - Safety tiers and their behaviors are documented. - API endpoints and CLI arguments are now fully described. - Prerequisites and troubleshooting steps included for easier onboarding and debugging.

元数据

Slug clawd-cursor

版本 0.6.3

许可证 —

累计安装 9

当前安装数 8

历史版本数 12

常见问题

ClawdCursor 是什么？

AI desktop agent — control any app on Windows/macOS from your OpenClaw agent. Send natural language tasks to the Clawd Cursor API and it handles everything:... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 1204 次。

如何安装 ClawdCursor？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install clawd-cursor」即可一键安装，无需额外配置。

ClawdCursor 是免费的吗？

是的，ClawdCursor 完全免费（开源免费），可自由下载、安装和使用。

ClawdCursor 支持哪些平台？

ClawdCursor 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 ClawdCursor？

由 AmrDab（@amrdab）开发并维护，当前版本 v0.6.3。