← 返回 Skills 市场
amrdab

ClawdCursor

作者 AmrDab · GitHub ↗ · v0.7.5 · MIT-0
cross-platform ⚠ suspicious
142
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install clawdcursor
功能描述
OS-level desktop automation tool server. 42 tools for controlling any application on Windows, macOS, and Linux. Model-agnostic — works with any AI that can d...
使用说明 (SKILL.md)

\r \r If a human can do it on a screen, you can too. No API? No integration? No problem.\r \r

USE AS A FALLBACK — NOT FIRST CHOICE\r \r Before reaching for any ClawdCursor tool, ask:\r

  1. Is there a native API? (Gmail API, GitHub API, Slack API) → use the API\r
  2. Is there a CLI? (git, npm, curl) → use the CLI\r
  3. Can you edit the file directly? → do that\r
  4. Is there a browser automation layer? (Playwright, Puppeteer) → use that\r \r None of the above work? Now use ClawdCursor. It's for the last mile.\r \r ---\r \r

Modes at a Glance\r

\r | Mode | Command | Brain | Tools available |\r |------|---------|-------|----------------|\r | serve | clawdcursor serve | You (REST client) | All 42 tools via HTTP |\r | mcp | clawdcursor mcp | You (MCP client) | All 42 tools via MCP stdio |\r | start | clawdcursor start | Built-in LLM pipeline | All 42 tools + autonomous agent |\r \r In serve and mcp modes: you reason, ClawdCursor acts. There is no built-in LLM. You call tools, interpret results, decide next steps.\r \r ---\r \r

Connecting\r

\r

Option A — REST (clawdcursor serve)\r

\r

clawdcursor serve        # starts on http://127.0.0.1:3847\r
```\r
\r
All POST endpoints require: `Authorization: Bearer \x3Ctoken>` (token saved to `~/.clawdcursor/token`)\r
\r
```\r
GET  /tools              → all tool schemas (OpenAI function-calling format)\r
POST /execute/{name}     → run a tool: {"param": "value"}\r
GET  /health             → {"status":"ok","version":"0.7.5"}\r
GET  /docs               → full documentation\r
```\r
\r
Example:\r
```\r
POST /execute/get_windows     {}\r
POST /execute/mouse_click     {"x": 640, "y": 400}\r
POST /execute/type_text       {"text": "hello world"}\r
```\r
\r
If the server isn't running, start it yourself — don't ask the user:\r
```bash\r
clawdcursor serve\r
# wait 2 seconds, then verify: GET /health\r
```\r
\r
### Option B — MCP (`clawdcursor mcp`)\r
\r
```json\r
{\r
  "mcpServers": {\r
    "clawdcursor": {\r
      "command": "clawdcursor",\r
      "args": ["mcp"]\r
    }\r
  }\r
}\r
```\r
\r
Works with Claude Code, Cursor, Windsurf, Zed, or any MCP-compatible client. All 42 tools are exposed identically.\r
\r
### Option C — Autonomous agent (`clawdcursor start`)\r
\r
```\r
POST /task    {"task": "Open Notepad and write Hello"}   → submit task\r
GET  /status  → {"status": "acting"} | "idle" | "waiting_confirm"\r
POST /confirm {"approved": true}                         → approve safety-gated action\r
POST /abort                                              → stop current task\r
```\r
\r
Use `delegate_to_agent` tool to submit tasks from within MCP/REST sessions. Requires `clawdcursor start` running on port 3847.\r
\r
**Polling pattern:**\r
```\r
POST /task  {"task": "...", "returnPartial": true}\r
→ poll GET /status every 2s:\r
    "acting"           → still running, keep polling\r
    "waiting_confirm"  → STOP. Ask user → POST /confirm {"approved": true}\r
    "idle"             → done, check GET /task-logs for result\r
→ if 60s+ with no progress: POST /abort, retry with simpler phrasing\r
```\r
\r
**returnPartial mode** — send `{"returnPartial": true}` with POST /task:\r
ClawdCursor skips Stage 3 (expensive vision) and returns control to you if Stage 2 fails:\r
```json\r
{"partial": true, "stepsCompleted": [...], "context": "got stuck on dialog"}\r
```\r
You finish the task with MCP tools, then call POST /learn to save what worked.\r
\r
**POST /learn — adaptive learning:**\r
After completing a task with your own tool calls, teach ClawdCursor for next time:\r
```json\r
POST /learn\r
{\r
  "processName": "EXCEL",\r
  "task": "create table with headers",\r
  "actions": [\r
    {"action": "key", "description": "Ctrl+Home to go to A1"},\r
    {"action": "type", "description": "Type header name"},\r
    {"action": "key", "description": "Tab to next column"}\r
  ],\r
  "shortcuts": {"next_cell": "Tab", "next_row": "Enter"},\r
  "tips": ["Use Tab between columns, Enter between rows"]\r
}\r
```\r
This enriches the app's guide JSON. Stage 2 reads it on the next run — no vision fallback needed.\r
\r
---\r
\r
## The Universal Loop\r
\r
Every GUI task follows the same pattern regardless of transport:\r
\r
```\r
1. ORIENT  →  read_screen() or get_windows()          see what's open and focused\r
2. ACT     →  smart_click() / smart_type() / key_press()   do the thing\r
3. VERIFY  →  check return value → window state → text check → screenshot\r
4. REPEAT  →  until done\r
```\r
\r
### Verification (cheapest to most expensive)\r
\r
1. **Tool return value** — every tool reports success/failure. Check it first.\r
2. **Window state** — `get_active_window()`, `get_windows()` — did a dialog appear? Did the title change?\r
3. **Text check** — `read_screen()` or `smart_read()` — is the expected text visible?\r
4. **Screenshot** — `desktop_screenshot()` — only when text methods fail. Costs the most.\r
5. **Negative check** — look for error dialogs, wrong window, unchanged screen.\r
\r
**Always verify** after: sends, saves, deletes, form submissions.\r
**Skip verification** for: mid-sequence keystrokes, scrolling.\r
\r
---\r
\r
## Tool Decision Trees\r
\r
### Perception — always start here\r
\r
```\r
read_screen()          → FIRST. Accessibility tree: buttons, inputs, text, with coords.\r
                          Fast, structured, works on native apps.\r
ocr_read_screen()      → When a11y tree is empty (canvas UIs, image-based apps).\r
smart_read()           → Combines OCR + a11y. Good first call when unsure.\r
desktop_screenshot()   → LAST RESORT. Only when you need pixel-level visual detail.\r
desktop_screenshot_region(x,y,w,h) → Zoomed crop when you need detail in one area.\r
```\r
\r
### Clicking\r
\r
```\r
smart_click("Save")              → FIRST. Finds by label/text via OCR + a11y, clicks.\r
                                   Pass processId to target the right window.\r
invoke_element(name="Save")      → When you know the exact automation ID from read_screen.\r
cdp_click(text="Submit")         → Browser elements. Requires cdp_connect() first.\r
mouse_click(x, y)                → LAST RESORT. Raw coordinates from a screenshot.\r
```\r
\r
### Typing\r
\r
```\r
smart_type("Email", "[email protected]")  → FIRST. Finds field by label, focuses, types.\r
cdp_type(label="Email", text="…")  → Browser inputs. Requires cdp_connect() first.\r
type_text("hello")                 → Clipboard paste into whatever is focused.\r
                                     Use after manually focusing with smart_click.\r
```\r
\r
### Browser / CDP\r
\r
```\r
1. navigate_browser(url)     → opens URL, auto-enables CDP\r
2. cdp_connect()             → connect to browser DevTools Protocol\r
3. cdp_page_context()        → list interactive elements on page\r
4. cdp_read_text()           → extract DOM text (returns empty on canvas apps → use OCR)\r
5. cdp_click(text="…")       → click by visible text\r
6. cdp_type(label, text)     → fill input by label\r
7. cdp_evaluate(script)      → run JavaScript in page context\r
8. cdp_scroll(direction, px) → scroll page via DOM (not mouse wheel)\r
9. cdp_list_tabs()           → list all open tabs\r
10. cdp_switch_tab(target)   → switch to a specific tab\r
```\r
\r
If CDP isn't connected, switch tabs with keyboard:\r
```\r
key_press("ctrl+1")          → tab 1\r
key_press("ctrl+tab")        → next tab\r
key_press("ctrl+shift+tab")  → previous tab\r
```\r
\r
### Window Management\r
\r
```\r
get_windows()                         → list all open windows (use to find PIDs)\r
get_active_window()                   → what's in the foreground right now\r
focus_window(processName="Discord")   → bring to front (auto-minimizes phantom off-screen windows)\r
minimize_window(processName="calc")   → minimize a window — 1 call, cross-platform\r
                                        also accepts: processId, title\r
```\r
\r
**Rule:** Always `focus_window()` before `key_press()` or `type_text()`. Keystrokes go to whatever has focus — if that's your terminal, not the target app.\r
\r
### Canvas apps (Google Docs, Figma, Notion)\r
\r
DOM has no readable text. Pattern:\r
```\r
ocr_read_screen()          → read content (DOM extraction fails)\r
mouse_click(x, y)          → click into the canvas area\r
type_text("your text")     → clipboard paste works even on canvas\r
```\r
\r
---\r
\r
## Quick Patterns\r
\r
**Open app and type:**\r
```\r
open_app("notepad") → wait(2) → smart_read() → type_text("Hello") → smart_read()\r
```\r
\r
**Read a webpage:**\r
```\r
navigate_browser(url) → wait(3) → cdp_connect() → cdp_read_text()\r
```\r
\r
**Fill a web form:**\r
```\r
cdp_connect() → cdp_type("Email", "[email protected]") → cdp_type("Password", "…") → cdp_click("Submit")\r
```\r
\r
**Cross-app copy/paste:**\r
```\r
focus_window("Chrome") → key_press("ctrl+a") → key_press("ctrl+c")\r
→ read_clipboard() → focus_window("Notepad") → type_text(clipboard)\r
```\r
\r
**Send email via Outlook:**\r
```\r
open_app("outlook") → wait(2) → smart_click("New Email")\r
→ mouse_click(to_field_x, to_field_y) → type_text("[email protected]") → key_press("Tab")\r
→ mouse_click(subject_x, subject_y) → type_text("Subject") → key_press("Tab")\r
→ mouse_click(body_x, body_y) → type_text("Body text")\r
→ mouse_click(send_x, send_y)\r
```\r
\r
**Autonomous complex task (requires `clawdcursor start`):**\r
```\r
delegate_to_agent("Open Gmail, find latest email from Stripe, forward to [email protected]")\r
→ poll GET /status every 2s\r
→ if waiting_confirm: ask user → POST /confirm {"approved": true}\r
→ if idle: task done\r
```\r
\r
---\r
\r
## Full Tool Reference (42 tools)\r
\r
Speed: ⚡ Free/instant · 🔵 Cheap · 🟡 Moderate · 🔴 Vision (expensive)\r
\r
### Perception (6)\r
| Tool | What it does | When |\r
|------|-------------|------|\r
| `read_screen` | A11y tree — buttons, inputs, text, coords | ⚡ Default first read |\r
| `smart_read` | OCR + a11y combined | 🔵 When unsure which to use |\r
| `ocr_read_screen` | Raw OCR text with bounding boxes | 🔵 Canvas UIs, empty a11y trees |\r
| `desktop_screenshot` | Full screen image (1280px wide) | ⚡ Last resort visual check |\r
| `desktop_screenshot_region` | Zoomed crop of specific area | ⚡ Fine-grained visual detail |\r
| `get_screen_size` | Screen dimensions and DPI | ⚡ Coordinate calculations |\r
\r
### Mouse (7)\r
| Tool | What it does | When |\r
|------|-------------|------|\r
| `smart_click` | Find element by text/label, click | 🔵 First choice for clicking |\r
| `mouse_click` | Left click at (x, y) | ⚡ Last resort |\r
| `mouse_double_click` | Double click at (x, y) | ⚡ Open files, select words |\r
| `mouse_right_click` | Right click at (x, y) | ⚡ Context menus |\r
| `mouse_hover` | Move cursor without clicking | ⚡ Hover menus |\r
| `mouse_scroll` | Scroll at position (physical mouse wheel) | ⚡ Scroll content |\r
| `mouse_drag` | Drag from start to end — accepts `startX/startY/endX/endY` or `x1/y1/x2/y2` | ⚡ Resize, select ranges |\r
\r
### Keyboard (5)\r
| Tool | What it does | When |\r
|------|-------------|------|\r
| `smart_type` | Find input by label, focus it, type | 🔵 First choice for form fields |\r
| `type_text` | Clipboard paste into focused element | ⚡ After manually focusing |\r
| `key_press` | Send key combo (`ctrl+s`, `Return`, `alt+tab`) | ⚡ After focus_window |\r
| `shortcuts_list` | List keyboard shortcuts for current app | ⚡ Before reaching for mouse |\r
| `shortcuts_execute` | Run a named shortcut (fuzzy match) | ⚡ Save, copy, paste, undo |\r
\r
### Window Management (5)\r
| Tool | What it does | When |\r
|------|-------------|------|\r
| `get_windows` | List all open windows with PIDs and bounds | ⚡ Situational awareness |\r
| `get_active_window` | Current foreground window | ⚡ Check current focus |\r
| `get_focused_element` | Element with keyboard focus | ⚡ Debug wrong-field typing |\r
| `focus_window` | Bring window to front (auto-clears off-screen phantoms) | ⚡ Always before key_press |\r
| `minimize_window` | Minimize by processName, processId, or title | ⚡ Clear focus stealers |\r
\r
### UI Elements (2)\r
| Tool | What it does | When |\r
|------|-------------|------|\r
| `find_element` | Search UI tree by name or type | ⚡ Find automation IDs |\r
| `invoke_element` | Invoke element by automation ID or name | ⚡ When ID known from read_screen |\r
\r
### Clipboard (2)\r
| Tool | What it does | When |\r
|------|-------------|------|\r
| `read_clipboard` | Read clipboard text | ⚡ After copy operations |\r
| `write_clipboard` | Write text to clipboard | ⚡ Before paste operations |\r
\r
### Browser / CDP (11)\r
| Tool | What it does | When |\r
|------|-------------|------|\r
| `cdp_connect` | Connect to browser DevTools Protocol | ⚡ First step for any browser task |\r
| `cdp_page_context` | List interactive elements on page | ⚡ After connect |\r
| `cdp_read_text` | Extract DOM text | ⚡ Read page content |\r
| `cdp_click` | Click by CSS selector or visible text | ⚡ Browser clicks |\r
| `cdp_type` | Type into input by label or selector | ⚡ Browser form filling |\r
| `cdp_select_option` | Select dropdown option | ⚡ Select elements |\r
| `cdp_evaluate` | Run JavaScript in page context | ⚡ Custom queries |\r
| `cdp_scroll` | Scroll page via DOM (`direction`, `amount` px) | ⚡ DOM-level scroll |\r
| `cdp_wait_for_selector` | Wait for element to appear | ⚡ After navigation/AJAX |\r
| `cdp_list_tabs` | List all browser tabs | ⚡ When on wrong tab |\r
| `cdp_switch_tab` | Switch to a tab by title or index | ⚡ After cdp_list_tabs |\r
\r
### Orchestration (4)\r
| Tool | What it does | When |\r
|------|-------------|------|\r
| `open_app` | Launch application by name | ⚡ First step for desktop tasks |\r
| `navigate_browser` | Open URL (auto-enables CDP) | ⚡ First step for browser tasks |\r
| `wait` | Pause N seconds | ⚡ After opening apps, let UI render |\r
| `delegate_to_agent` | Send task to built-in autonomous agent | 🟡 Complex multi-step tasks (requires `clawdcursor start`) |\r
\r
---\r
\r
## Provider Setup (agent mode only)\r
\r
| Provider | Setup | Cost |\r
|----------|-------|------|\r
| **Ollama** (local) | `ollama pull qwen2.5:7b && ollama serve` | $0 — fully offline, no data leaves machine |\r
| **Any cloud** | Set env var: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GEMINI_API_KEY`, `MOONSHOT_API_KEY`, etc. | Varies |\r
| **OpenClaw users** | Auto-detected from `~/.openclaw/agents/main/auth-profiles.json` | No extra setup |\r
\r
Run `clawdcursor doctor` to auto-detect and validate providers.\r
\r
---\r
\r
## Security\r
\r
- **Network isolation:** Binds to `127.0.0.1` only. Verify: `netstat -an | findstr 3847` — should show `127.0.0.1:3847`, never `0.0.0.0:3847`\r
- **Ollama:** 100% offline. Screenshots stay in RAM, never leave the machine.\r
- **Cloud providers:** Screenshots/text sent only to your configured provider. No telemetry, no analytics, no third-party logging.\r
- **Token auth:** All mutating POST endpoints require `Authorization: Bearer \x3Ctoken>`. Token at `~/.clawdcursor/token`.\r
- **Safety tiers:** Auto / Preview / Confirm. Agents must **never self-approve Confirm actions**.\r
\r
---\r
\r
## Coordinate System\r
\r
All mouse tools use **image-space coordinates** from a 1280px-wide viewport — matching screenshots from `desktop_screenshot`. DPI scaling is handled automatically. Do not pre-scale coordinates.\r
\r
---\r
\r
## Safety\r
\r
| Tier | Actions | Behavior |\r
|------|---------|----------|\r
| 🟢 Auto | Navigation, reading, opening apps | Runs immediately |\r
| 🟡 Preview | Typing, form filling | Logged |\r
| 🔴 Confirm | Send, delete, purchase | Pauses — **always ask user first** |\r
\r
- **Never self-approve Confirm actions.**\r
- `Alt+F4` and `Ctrl+Alt+Delete` are blocked.\r
- Server binds to `127.0.0.1` only.\r
- First run requires explicit user consent for desktop control.\r
\r
---\r
\r
## Error Recovery\r
\r
| Problem | Fix |\r
|---------|-----|\r
| Port 3847 not responding | `clawdcursor serve` — wait 2s — `GET /health` |\r
| 401 Unauthorized | Token changed — read `~/.clawdcursor/token` and use fresh value |\r
| CDP not available | Chrome must be open. `navigate_browser(url)` auto-enables it. |\r
| CDP on wrong tab | `cdp_list_tabs()` → `cdp_switch_tab(target)` |\r
| `focus_window` fails | `get_windows()` to confirm title/processName, then retry |\r
| `smart_click` can't find element | `read_screen()` for coords → `mouse_click(x, y)` |\r
| `key_press` goes to wrong window | You skipped `focus_window` — always focus first |\r
| `cdp_read_text` returns empty | Canvas app — use `ocr_read_screen()` instead |\r
| Same action fails 3+ times | Try a completely different approach |\r
\r
---\r
\r
## Platform Support\r
\r
| Platform | A11y | OCR | CDP |\r
|----------|------|-----|-----|\r
| Windows (x64/ARM64) | PowerShell + .NET UIA | Windows.Media.Ocr | Chrome/Edge |\r
| macOS (Intel/Apple Silicon) | JXA + System Events | Apple Vision | Chrome/Edge |\r
| Linux (x64/ARM64) | AT-SPI | Tesseract | Chrome/Edge |\r
\r
**macOS:** Grant Accessibility in System Settings → Privacy → Accessibility.\r
**Linux:** `sudo apt install tesseract-ocr` for OCR support.\r
安全使用建议
This skill appears to be what it says (a local desktop automation server), but it is powerful and should be treated like installing a program that can see and control your screen. Before installing: 1) Review the GitHub source and confirm the npm package name/version match the repo; 2) Prefer running it in a disposable VM or isolated account if you have sensitive data; 3) Understand it stores a token at ~/.clawdcursor/token and can take screenshots/read screen contents and (in 'start' mode) send them to your configured AI provider — verify where those provider credentials live and whether you trust that flow; 4) If you allow agent autonomous actions, consider disabling autonomous invocation or require explicit user confirmation before starting the server; 5) If unsure, use native APIs/CLIs/browser automation instead of screen-level automation.
功能分析
Type: OpenClaw Skill Name: clawdcursor Version: 0.7.5 The clawdcursor skill provides extensive OS-level desktop automation capabilities, including screen capture, clipboard access, keystroke injection, and browser manipulation via Chrome DevTools Protocol (cdp_evaluate). While the documentation in SKILL.md emphasizes local execution and safety tiers, the tool grants the agent broad control over the host system, which constitutes a high-risk attack surface. The instructions specifically direct the agent to autonomously start the server (clawdcursor serve) and bypass user prompts for certain actions, which could be exploited if the agent's reasoning is compromised.
能力标签
cryptocan-make-purchases
能力评估
Purpose & Capability
Name, description and runtime instructions consistently describe an OS-level desktop automation server. The npm global install and the provided serve/mcp/start modes match the stated purpose of controlling GUIs and exposing tools over localhost.
Instruction Scope
SKILL.md instructs the agent to start the local server autonomously if it's not running ('don't ask the user'). The tool exposes functionality to read the screen, take screenshots, query windows, and automate input — which is expected for this purpose but carries broad access to sensitive local data. It also documents a token file (~/.clawdcursor/token) and an autonomous 'start' mode that will send screenshots/text to the user's configured AI provider, which could result in data leaving the machine depending on configuration.
Install Mechanism
Installation is via 'npm install -g clawdcursor' (documented in SKILL.md). Installing a global npm CLI is a typical distribution method for Node-based desktop tooling, but it runs third-party code with filesystem/exec privileges. The registry metadata shows three 'unknown' install specs (parser couldn't identify them) — not necessarily malicious but worth verifying the exact install steps and source before running.
Credentials
The skill declares no required environment variables or credentials, but it uses a token saved at ~/.clawdcursor/token for its REST endpoints and relies on the user's configured AI provider (which implies provider credentials held elsewhere). The skill itself does not request unrelated secrets, but it has the capability to read local files and capture screen contents — a high-privilege capability that is proportionate to desktop automation but sensitive in practice.
Persistence & Privilege
always:false (good) and autonomous invocation is allowed (normal), but SKILL.md explicitly instructs the agent to start the server without asking the user. That gives the agent the ability to launch a long-running local service that can capture and transmit desktop data (depending on configuration). It does not appear to modify other skills or system-wide agent settings, however.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install clawdcursor
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /clawdcursor 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v0.7.5
Clawd Cursor 0.7.5 — No code changes in this release. - Version increment only; no file or feature changes detected. - Behavior and interface remain identical to previous version (0.6.3).
元数据
Slug clawdcursor
版本 0.7.5
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

ClawdCursor 是什么?

OS-level desktop automation tool server. 42 tools for controlling any application on Windows, macOS, and Linux. Model-agnostic — works with any AI that can d... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 142 次。

如何安装 ClawdCursor?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install clawdcursor」即可一键安装,无需额外配置。

ClawdCursor 是免费的吗?

是的,ClawdCursor 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

ClawdCursor 支持哪些平台?

ClawdCursor 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 ClawdCursor?

由 AmrDab(@amrdab)开发并维护,当前版本 v0.7.5。

💬 留言讨论