← 返回 Skills 市场
kyzcreig

Peekaboo Linux Gnome Wayland Computer Use

作者 Kyzcreig · GitHub ↗ · v1.0.2 · MIT-0
darwinlinux ⚠ suspicious
83
总下载
0
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install peekaboo-linux-gnome-wayland-computer-use
功能描述
See and control GNOME Wayland desktops using gnome-screenshot, ydotool, Window Calls, AT-SPI hybrid element coordinates, and optional GNOME Remote Desktop. T...
使用说明 (SKILL.md)

peekaboo-linux-gnome-wayland-computer-use

The Linux/Wayland counterpart to macOS computer-use — "see and poke the screen of a box for troubleshooting or automation." On macOS that's trivial (CGWindowList + CGEvent + AX tree). On GNOME/Wayland it's locked down by design, but it can be made to work end-to-end:

  • Seegnome-screenshot (re-enabled on GNOME 49+ via a small GNOME extension).
  • Actydotool (writes to /dev/uinput, no root).
  • Target elements → AT-SPI WINDOW-relative extents + the Window Calls extension's window origin, composed into real absolute click coordinates (works where naive AT-SPI returns (0,0)).
  • Hands-on (optional) → GNOME Remote Desktop in Remote-Login mode for a human at an RDP client.

Both the "see" and "act" sides work natively (an agent process running ON the box) and remotely (from another machine over SSH). A native agent is a single actor — it can screenshot, analyze, then ydotool-click in one loop. No second machine, no human, no RDP roundtrip.

Context tags below — [NATIVE] = an agent process running ON the GNOME box (sharing the desktop session of user@1000); [REMOTE] = on another machine, reaching the box over SSH.

Verified on Ubuntu 26.04 LTS + GNOME Shell 50 (Wayland), NVIDIA + AMD multi-GPU, multi-monitor. Substitute your own host/user where the examples use USER@HOST and /run/user/UID.

Capability matrix

Capability Native (on box) Remote (from another machine)
Screenshot yes — gnome-screenshot -f out.png (extension unlocks it) yes — same over SSH, then scp the file
Inject input (mouse/keys) yes — ydotool via /dev/uinput (no root) yes — over SSH, same ydotool
Click element by role/name yes — locate-element (AT-SPI WINDOW extents + Window Calls origin) yes — same over SSH
Full see→reason→act loop yes — single actor, no RDP yes — drive over SSH
Live interactive desktop n/a yes — RDP client (optional, for a human)

Target environment assumptions

  • A GNOME/Wayland box with a logged-in graphical session (the GUI session must be active on a seat).
  • For remote use: SSH access (key-based recommended), and the box's session env exported in non-login shells — XDG_RUNTIME_DIR=/run/user/UID, DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/UID/bus, WAYLAND_DISPLAY=wayland-0, DISPLAY=:0. (Replace UID with the session user's numeric id, e.g. 1000.)
  • If agents run under systemd --user with the full graphical session env + lingering enabled, they inherit all of the above and survive SSH disconnects.
  • gnome-screenshot grabs the full virtual desktop (all monitors stitched). On multi-4K setups that's huge — use -w (focused window) or the per-display helper below.

CAPABILITY A — Screenshot (the key unlock)

GNOME 49+ revoked gnome-screenshot's access to the private Shell screenshot API, so on a stock box every headless capture path is dead (see "why" below). The fix is the allow-gnome-screenshot extension (https://github.com/siddhpant/allow-gnome-screenshot, GNOME 49/50): it re-adds org.gnome.Screenshot to the Shell's _senderChecker._allowlistMap, so the gnome-screenshot binary specifically regains access. It is scoped to that one sender — it does NOT enable global unsafe_mode.

[NATIVE] — just call it (session env already present):

gnome-screenshot -f /tmp/shot.png        # full virtual desktop (all monitors)
gnome-screenshot -w -f /tmp/shot.png     # focused window only (smaller)
gnome-screenshot -a -f /tmp/shot.png     # interactive area (needs a human at the screen — avoid headless)
# then hand /tmp/shot.png to your image-analysis step

[REMOTE] — over SSH, then pull it back:

ssh USER@HOST '
  export XDG_RUNTIME_DIR=/run/user/UID DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/UID/bus
  gnome-screenshot -f /tmp/shot.png'
scp USER@HOST:/tmp/shot.png /tmp/   # then analyze it

Sanity-check the capture — a 4K full-desktop PNG is ~15-20 MB; a tiny/near-empty file or an all-black image means the extension got disabled or the session dropped to the GDM greeter. Quick non-black check: python3 -c "from PIL import Image; im=Image.open('/tmp/shot.png').convert('RGB'); d=list(im.getdata()); print(sum(1 for p in d[::997] if p!=(0,0,0)))" → should be well >0.

Installing the extension from scratch

# [NATIVE or over SSH with the session env exported]
sudo apt-get install -y gnome-screenshot
D=~/.local/share/gnome-shell/extensions/[email protected]
git clone --depth 1 https://github.com/siddhpant/allow-gnome-screenshot.git /tmp/ags
mkdir -p "$D" && cp -r /tmp/ags/[email protected]/* "$D/"
sudo systemctl restart gdm3      # reload the Shell so it discovers the extension (see WARNING below)
# after the session returns:
export XDG_RUNTIME_DIR=/run/user/UID DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/UID/bus
gnome-extensions enable [email protected]
gnome-extensions info [email protected] | grep State   # expect ACTIVE

Per-display capture (capture ONE monitor, not the stitched canvas)

gnome-screenshot -f grabs the entire virtual desktop. On multi-monitor setups that's a huge, mostly-empty canvas. To get a single clean display, use the bundled screenshot-display helper (scripts/screenshot-display.py). It reads live monitor geometry from Mutter DisplayConfig and crops the full capture to the chosen monitor.

screenshot-display --list                               # enumerate monitors + geometry
screenshot-display --monitor primary --out /tmp/p.png   # primary (default)
screenshot-display --monitor DP-4   --out /tmp/m.png    # by connector name
screenshot-display --monitor 2      --out /tmp/m.png    # by index (0-based, sorted L->R, T->B)
screenshot-display --monitor active --out /tmp/w.png    # the FOCUSED WINDOW (best "being worked on" signal)

Each per-monitor crop is a clean single-display image — feed it straight to your vision step. [REMOTE]: run screenshot-display over SSH (export the session env), then scp the output back.

Window targeting — screenshot a specific program / the display it's on

Shell.Eval/Introspect are locked, but the Window Calls extension (https://github.com/ickyicky/window-calls, GNOME 45-50) exposes window list + geometry over D-Bus at /org/gnome/Shell/Extensions/Windows. The screenshot-display helper uses it to find and crop to a window or its monitor:

screenshot-display --list-windows                       # wm_class, id, focus, monitor, title
screenshot-display --window focused      --out /tmp/w.png   # crop to focused window's exact rect
screenshot-display --window "wm_class=firefox" --out /tmp/w.png  # match by wm_class substring
screenshot-display --window "title=Settings"   --out /tmp/w.png  # match by title substring
screenshot-display --window-display focused        --out /tmp/d.png  # the whole MONITOR that window sits on
screenshot-display --window-display "wm_class=Code" --out /tmp/d.png  # monitor a matched window is on

Flow: List → pick window (focused / wm_class / title) → Details gives x,y,width,height,monitor → helper crops the full capture to the window rect, or to whichever monitor's rect contains the window's center. (Details works; GetFrameBounds is broken in this ext version — the helper doesn't use it.)

Full setup/enablement guide to replicate on any GNOME/Wayland box: references/peekaboo-and-rdp-setup-ubuntu-gnome-wayland.md.

CAPABILITY B — Input injection (ydotool)

ydotool writes to /dev/uinput below the display server (the input analog of the screenshot unlock). Install it, run ydotoold as a persistent user service (socket /run/user/UID/.ydotool_socket), and ACL /dev/uinput to your user (no root needed at runtime).

[NATIVE]:

export YDOTOOL_SOCKET=/run/user/UID/.ydotool_socket
ydotool mousemove -a 960 540     # absolute pixel coords (no element targeting)
ydotool click 0xC0               # left click down+up
ydotool type "hello"
ydotool key 28:1 28:0            # Enter (keycode:press / keycode:release)

[REMOTE]:

ssh USER@HOST '
  export XDG_RUNTIME_DIR=/run/user/UID YDOTOOL_SOCKET=/run/user/UID/.ydotool_socket
  ydotool mousemove -a 700 400; ydotool click 0xC0; ydotool type "hi"'

If ydotoold.service is down — systemctl --user enable --now ydotoold.service. For element targeting (not blind coords) use the locate-element helper below.

CAPABILITY B+ — Click an element by role/name (SOM via hybrid coords)

You do NOT need blind pixel guessing or a vision round-trip for most clicks. The bundled locate-element helper (scripts/locate-element.py) resolves a UI element to a real absolute coordinate and clicks/types it.

Why it works when AT-SPI looks broken: on Wayland, getExtents(SCREEN) returns (0,0) for GTK4/Electron (no global coordinates by design), so naive AT-SPI clicking fails. But getExtents(WINDOW) (widget-relative) is populated, and Window Calls gives the window's screen origin from Mutter's privileged view. The helper composes them: abs = window_origin + window_relative_extents + size/2. Proven on GTK4 apps.

locate-element --app gnome-text-editor --list                       # list elements + computed ABS coords
locate-element --app gnome-text-editor --role text --click          # click first text widget
locate-element --app org.gnome.Nautilus --name "New Folder" --click  # click by visible label
locate-element --app gnome-text-editor --role text --click --type "hello"   # click then type
locate-element --window-id 1367000216 --role "push button" --name Open --click  # by explicit window id
locate-element --app firefox --name Reload --print                  # just print "X Y", act yourself
  • --app matches the AT-SPI app name AND wm_class loosely (normalizes org.gnome.TextEditorgnome-text-editor).
  • --role/--name are case-insensitive substring filters; --index N picks the Nth match.
  • Needs toolkit-accessibility=true (gsettings set org.gnome.desktop.interface toolkit-accessibility true), Window Calls active, and ydotoold running.

Vision fallback (only when an element zeros even WINDOW extents — custom-drawn canvases, some Electron internals): locate-element exits non-zero and prints the fallback recipe — screenshot-display --window 'wm_class=X' --out /tmp/w.png → analyze for pixel coords → ydotool mousemove -a -x X -y Y.

Deep dive on the technique, the gotchas, and why Newton isn't a substitute yet: references/wayland-atspi-hybrid-som.md.

The native loop (single actor — the whole point)

[NATIVE agent on the box]
  1. gnome-screenshot -f /tmp/s.png
  2. analyze /tmp/s.png  → "the Save button is at ~(1820, 640)"
  3. ydotool mousemove -a 1820 640 && ydotool click 0xC0
  4. gnome-screenshot again → verify it landed

No second machine, no human, no RDP. For element targeting prefer locate-element over blind coords.

Window actions (focus / move / resize / close)

Window Calls also does focus/move/resize/close verbs (window id from --list-windows):

WIN=/org/gnome/Shell/Extensions/Windows
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Activate \x3Cid>   # focus/raise
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Move \x3Cid> \x3Cx> \x3Cy>
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Resize \x3Cid> \x3Cw> \x3Ch>
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Maximize \x3Cid>
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Close \x3Cid>

Flow: Activate \x3Cid>screenshot-display --window focusedydotool interact.

CAPABILITY C — Live interactive desktop via RDP (optional, for a human)

For a person to drive the desktop hands-on from another machine, use GNOME Remote Desktop in Remote Login mode (system/headless, fresh negotiated display — NOT Desktop Sharing, which mirrors the physical multi-monitor desktop and re-hits the huge-canvas problem).

  • Mode: grdctl --system (system service gnome-remote-desktop.service, enabled at boot).
  • Listener: *:3389always LAN-scope it in your firewall, never any-source.
  • TLS cert at /var/lib/gnome-remote-desktop/rdp-tls.{crt,key}.
  • Credentials: a device user + password over NLA. Setting the device cred = the Linux user's password makes both stages (NLA device prompt, then GDM login) take the same credential.

Connect from a client (Microsoft Remote Desktop / Windows App, or any RDP client) → HOST → device user/pass → GDM → sign in as the Linux user.

⚠️ #1 RDP gotcha — credential changes need a daemon restart. grdctl --system rdp set-credentials writes the keyfile but the running daemon does NOT hot-reload it into its NTLM SAM. Until you sudo systemctl restart gnome-remote-desktop.service, NLA fails for every client: ntlm_fetch_ntlm_v2_hash: Could not find user in SAM databaseSEC_E_NO_CREDENTIALStransport_accept_nla: client authentication failure. Meanwhile grdctl status shows the correct creds (it reads the file). Clients report this as FREERDP_ERROR_CONNECT_TRANSPORT_FAILED (RoyalTS) or ERRCONNECT_AUTHENTICATION_FAILED (FreeRDP) — looks like transport/cert, is actually stale creds. Fix: set creds (password via STDIN) → restart the service → verify journald shows no SAM/ NO_CREDENTIALS errors. Server requires NLA (HYBRID_REQUIRED_BY_SERVER) — expected.

Diagnosing any RDP auth failure: read the SERVER's journald, not the client error string. Client errors all collapse different server causes into the same opaque message. Full server-log→cause map, the +auth-only//sec:tls client probes, the cert/listener sanity checks, AND the empty-server-log case (tcpdump proves client-vs-network — e.g. a GUI client completing TCP then sending zero RDP bytes = client-side engine fault, fixed by recreating the connection, not touching the server): references/rdp-auth-diagnosis-server-side.md.

For automated capture you do NOT need RDP — Capability A is simpler. RDP is for a human. Full setup steps: references/peekaboo-and-rdp-setup-ubuntu-gnome-wayland.md (Step 7).

Why the stock capture paths fail (so you don't waste time re-testing)

  1. gnome-screenshot WITHOUT the extension → exit 0 but no file (AccessDenied to the private API under the hood).
  2. Raw gdbus/busctl call to org.gnome.Shell.ScreenshotAccessDenied. The extension allowlists the gnome-screenshot binary's D-Bus name, not arbitrary callers — so capture must go through the gnome-screenshot binary, not a hand-rolled gdbus call.
  3. xdg-desktop-portal Screenshot → no Response even with the PermissionStore granted (forced interactive dialog).
  4. org.gnome.Mutter.ScreenCast.CreateSessionSession creation inhibited (gated to grd itself).
  5. ffmpeg kmsgrab (KMS framebuffer) → "No usable planes" on NVIDIA proprietary KMS. /dev/fb0 → all-black (GNOME doesn't scan out to fb0).
  6. grim → wlroots-only, fails on Mutter. Don't apt install grim.
  7. enable unsafe_mode via gdb injection into gnome-shell → CRASHES the Shell, drops the desktop to the GDM greeter. Do NOT use the gdb method. The extension is the safe, scoped, persistent unlock.

No usable screen-coordinate accessibility on Wayland — AT-SPI getExtents(SCREEN) is (0,0) on GTK4/Electron. BUT this is solved by Capability B+ (locate-element): WINDOW-relative AT-SPI extents + Window Calls window origin → real click coords. Use locate-element for "click element labeled Save"; fall back to vision-locate only where even WINDOW extents are zero.

Pitfalls

  • The screenshot unlock is binary-scoped — only the gnome-screenshot binary is allowlisted. A raw gdbus Screenshot call still fails. Always shell out to gnome-screenshot.
  • Reloading the Shell on Wayland = restarting GDM (sudo systemctl restart gdm3). There's no in-place Alt+F2 r over SSH. The live GUI session is briefly torn down, so don't do it mid-GUI-task. If autologin is enabled the desktop recovers in ~12s; systemd --user services survive if lingering is on. Verify recovery: loginctl list-sessions shows a seat0 Type=wayland Active=yes session.
  • Never use the gdb unsafe_mode injection — it crashes gnome-shell. Use the extension.
  • The allow-gnome-screenshot extension can silently flip to INACTIVE (its allowlist hook only re-arms on a fresh Shell init). Symptom: gnome-screenshot -f exits 0 but writes no file. A disable/enable cycle is NOT enough — it needs a Shell reload (sudo systemctl restart gdm3) to re-arm. If you depend on capture in an automated loop, verify the file exists each time and reload-on-miss. (Window Calls can flip too — re-enable after any GDM restart.)
  • gnome-screenshot -a (interactive area) needs a human at the screen — useless headless; use -f (full) or -w (window).
  • Full-desktop shots are huge (multi-4K). Prefer -w, or downscale before analysis.
  • ydotool needs ydotoold alive + YDOTOOL_SOCKET/XDG_RUNTIME_DIR. Re-enable if it dies. Raw ydotool is blind absolute pixels — prefer locate-element (Capability B+) for element targeting.
  • A detached SSH session isn't fully in the graphical session — export XDG_RUNTIME_DIR=/run/user/UID (+ DBUS) explicitly. A native systemd --user service already has it.
  • Always LAN-scope RDP (3389) and SSH (22) in your firewall — never any-source.
安全使用建议
Install only on machines where you intentionally want an agent to see the desktop and control the mouse and keyboard as the logged-in user. Prefer a dedicated low-privilege account or isolated host, avoid sensitive apps on that session, keep SSH/RDP LAN- or VPN-scoped, use window-scoped capture when possible, and disable autologin, ydotoold, lingering, and RDP when not actively needed.
能力评估
Purpose & Capability
The screenshot, window targeting, ydotool input, SSH, and optional RDP capabilities are coherent with the stated computer-use purpose and are not hidden.
Instruction Scope
The instructions emphasize unattended operation, full-desktop capture, remote use over SSH, click/type automation, and window close/move actions without clear app scoping, confirmation rules, or safeguards for sensitive windows and prompts.
Install Mechanism
Setup requires installing GNOME extensions from GitHub, restarting GDM, enabling ydotool, changing /dev/uinput access, installing helper scripts, and optionally enabling GNOME Remote Desktop.
Credentials
The skill targets the user's live graphical session and can capture visible content and inject input as the logged-in user; that is proportionate to the purpose but high impact and broadly scoped.
Persistence & Privilege
The reference setup enables persistent ydotoold, recommends user lingering and autologin for unattended recovery, and optionally enables a boot-starting RDP service, creating durable desktop-control access.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install peekaboo-linux-gnome-wayland-computer-use
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /peekaboo-linux-gnome-wayland-computer-use 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.2
Correct local-VPN RDP diagnosis: stateful/intermittent (NordVPN+Tailscale); recovery is bouncing the VPN helper(s).
v1.0.1
RDP diagnosis: local VPN/security content-filter (NordVPN Threat Protection etc.) swallowing RDP — zero packets reach server from one client. Common works-from-A-not-B cause.
v1.0.0
Initial public release.
元数据
Slug peekaboo-linux-gnome-wayland-computer-use
版本 1.0.2
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 3
常见问题

Peekaboo Linux Gnome Wayland Computer Use 是什么?

See and control GNOME Wayland desktops using gnome-screenshot, ydotool, Window Calls, AT-SPI hybrid element coordinates, and optional GNOME Remote Desktop. T... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 83 次。

如何安装 Peekaboo Linux Gnome Wayland Computer Use?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install peekaboo-linux-gnome-wayland-computer-use」即可一键安装,无需额外配置。

Peekaboo Linux Gnome Wayland Computer Use 是免费的吗?

是的,Peekaboo Linux Gnome Wayland Computer Use 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Peekaboo Linux Gnome Wayland Computer Use 支持哪些平台?

Peekaboo Linux Gnome Wayland Computer Use 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(darwin, linux)。

谁开发了 Peekaboo Linux Gnome Wayland Computer Use?

由 Kyzcreig(@kyzcreig)开发并维护,当前版本 v1.0.2。

💬 留言讨论