← Back to Skills Marketplace
kyzcreig

Peekaboo Linux Gnome Wayland Computer Use

by Kyzcreig · GitHub ↗ · v1.0.2 · MIT-0
darwinlinux ⚠ suspicious
83
Downloads
0
Stars
0
Active Installs
3
Versions
Install in OpenClaw
/install peekaboo-linux-gnome-wayland-computer-use
Description
See and control GNOME Wayland desktops using gnome-screenshot, ydotool, Window Calls, AT-SPI hybrid element coordinates, and optional GNOME Remote Desktop. T...
README (SKILL.md)

peekaboo-linux-gnome-wayland-computer-use

The Linux/Wayland counterpart to macOS computer-use — "see and poke the screen of a box for troubleshooting or automation." On macOS that's trivial (CGWindowList + CGEvent + AX tree). On GNOME/Wayland it's locked down by design, but it can be made to work end-to-end:

  • Seegnome-screenshot (re-enabled on GNOME 49+ via a small GNOME extension).
  • Actydotool (writes to /dev/uinput, no root).
  • Target elements → AT-SPI WINDOW-relative extents + the Window Calls extension's window origin, composed into real absolute click coordinates (works where naive AT-SPI returns (0,0)).
  • Hands-on (optional) → GNOME Remote Desktop in Remote-Login mode for a human at an RDP client.

Both the "see" and "act" sides work natively (an agent process running ON the box) and remotely (from another machine over SSH). A native agent is a single actor — it can screenshot, analyze, then ydotool-click in one loop. No second machine, no human, no RDP roundtrip.

Context tags below — [NATIVE] = an agent process running ON the GNOME box (sharing the desktop session of user@1000); [REMOTE] = on another machine, reaching the box over SSH.

Verified on Ubuntu 26.04 LTS + GNOME Shell 50 (Wayland), NVIDIA + AMD multi-GPU, multi-monitor. Substitute your own host/user where the examples use USER@HOST and /run/user/UID.

Capability matrix

Capability Native (on box) Remote (from another machine)
Screenshot yes — gnome-screenshot -f out.png (extension unlocks it) yes — same over SSH, then scp the file
Inject input (mouse/keys) yes — ydotool via /dev/uinput (no root) yes — over SSH, same ydotool
Click element by role/name yes — locate-element (AT-SPI WINDOW extents + Window Calls origin) yes — same over SSH
Full see→reason→act loop yes — single actor, no RDP yes — drive over SSH
Live interactive desktop n/a yes — RDP client (optional, for a human)

Target environment assumptions

  • A GNOME/Wayland box with a logged-in graphical session (the GUI session must be active on a seat).
  • For remote use: SSH access (key-based recommended), and the box's session env exported in non-login shells — XDG_RUNTIME_DIR=/run/user/UID, DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/UID/bus, WAYLAND_DISPLAY=wayland-0, DISPLAY=:0. (Replace UID with the session user's numeric id, e.g. 1000.)
  • If agents run under systemd --user with the full graphical session env + lingering enabled, they inherit all of the above and survive SSH disconnects.
  • gnome-screenshot grabs the full virtual desktop (all monitors stitched). On multi-4K setups that's huge — use -w (focused window) or the per-display helper below.

CAPABILITY A — Screenshot (the key unlock)

GNOME 49+ revoked gnome-screenshot's access to the private Shell screenshot API, so on a stock box every headless capture path is dead (see "why" below). The fix is the allow-gnome-screenshot extension (https://github.com/siddhpant/allow-gnome-screenshot, GNOME 49/50): it re-adds org.gnome.Screenshot to the Shell's _senderChecker._allowlistMap, so the gnome-screenshot binary specifically regains access. It is scoped to that one sender — it does NOT enable global unsafe_mode.

[NATIVE] — just call it (session env already present):

gnome-screenshot -f /tmp/shot.png        # full virtual desktop (all monitors)
gnome-screenshot -w -f /tmp/shot.png     # focused window only (smaller)
gnome-screenshot -a -f /tmp/shot.png     # interactive area (needs a human at the screen — avoid headless)
# then hand /tmp/shot.png to your image-analysis step

[REMOTE] — over SSH, then pull it back:

ssh USER@HOST '
  export XDG_RUNTIME_DIR=/run/user/UID DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/UID/bus
  gnome-screenshot -f /tmp/shot.png'
scp USER@HOST:/tmp/shot.png /tmp/   # then analyze it

Sanity-check the capture — a 4K full-desktop PNG is ~15-20 MB; a tiny/near-empty file or an all-black image means the extension got disabled or the session dropped to the GDM greeter. Quick non-black check: python3 -c "from PIL import Image; im=Image.open('/tmp/shot.png').convert('RGB'); d=list(im.getdata()); print(sum(1 for p in d[::997] if p!=(0,0,0)))" → should be well >0.

Installing the extension from scratch

# [NATIVE or over SSH with the session env exported]
sudo apt-get install -y gnome-screenshot
D=~/.local/share/gnome-shell/extensions/[email protected]
git clone --depth 1 https://github.com/siddhpant/allow-gnome-screenshot.git /tmp/ags
mkdir -p "$D" && cp -r /tmp/ags/[email protected]/* "$D/"
sudo systemctl restart gdm3      # reload the Shell so it discovers the extension (see WARNING below)
# after the session returns:
export XDG_RUNTIME_DIR=/run/user/UID DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/UID/bus
gnome-extensions enable [email protected]
gnome-extensions info [email protected] | grep State   # expect ACTIVE

Per-display capture (capture ONE monitor, not the stitched canvas)

gnome-screenshot -f grabs the entire virtual desktop. On multi-monitor setups that's a huge, mostly-empty canvas. To get a single clean display, use the bundled screenshot-display helper (scripts/screenshot-display.py). It reads live monitor geometry from Mutter DisplayConfig and crops the full capture to the chosen monitor.

screenshot-display --list                               # enumerate monitors + geometry
screenshot-display --monitor primary --out /tmp/p.png   # primary (default)
screenshot-display --monitor DP-4   --out /tmp/m.png    # by connector name
screenshot-display --monitor 2      --out /tmp/m.png    # by index (0-based, sorted L->R, T->B)
screenshot-display --monitor active --out /tmp/w.png    # the FOCUSED WINDOW (best "being worked on" signal)

Each per-monitor crop is a clean single-display image — feed it straight to your vision step. [REMOTE]: run screenshot-display over SSH (export the session env), then scp the output back.

Window targeting — screenshot a specific program / the display it's on

Shell.Eval/Introspect are locked, but the Window Calls extension (https://github.com/ickyicky/window-calls, GNOME 45-50) exposes window list + geometry over D-Bus at /org/gnome/Shell/Extensions/Windows. The screenshot-display helper uses it to find and crop to a window or its monitor:

screenshot-display --list-windows                       # wm_class, id, focus, monitor, title
screenshot-display --window focused      --out /tmp/w.png   # crop to focused window's exact rect
screenshot-display --window "wm_class=firefox" --out /tmp/w.png  # match by wm_class substring
screenshot-display --window "title=Settings"   --out /tmp/w.png  # match by title substring
screenshot-display --window-display focused        --out /tmp/d.png  # the whole MONITOR that window sits on
screenshot-display --window-display "wm_class=Code" --out /tmp/d.png  # monitor a matched window is on

Flow: List → pick window (focused / wm_class / title) → Details gives x,y,width,height,monitor → helper crops the full capture to the window rect, or to whichever monitor's rect contains the window's center. (Details works; GetFrameBounds is broken in this ext version — the helper doesn't use it.)

Full setup/enablement guide to replicate on any GNOME/Wayland box: references/peekaboo-and-rdp-setup-ubuntu-gnome-wayland.md.

CAPABILITY B — Input injection (ydotool)

ydotool writes to /dev/uinput below the display server (the input analog of the screenshot unlock). Install it, run ydotoold as a persistent user service (socket /run/user/UID/.ydotool_socket), and ACL /dev/uinput to your user (no root needed at runtime).

[NATIVE]:

export YDOTOOL_SOCKET=/run/user/UID/.ydotool_socket
ydotool mousemove -a 960 540     # absolute pixel coords (no element targeting)
ydotool click 0xC0               # left click down+up
ydotool type "hello"
ydotool key 28:1 28:0            # Enter (keycode:press / keycode:release)

[REMOTE]:

ssh USER@HOST '
  export XDG_RUNTIME_DIR=/run/user/UID YDOTOOL_SOCKET=/run/user/UID/.ydotool_socket
  ydotool mousemove -a 700 400; ydotool click 0xC0; ydotool type "hi"'

If ydotoold.service is down — systemctl --user enable --now ydotoold.service. For element targeting (not blind coords) use the locate-element helper below.

CAPABILITY B+ — Click an element by role/name (SOM via hybrid coords)

You do NOT need blind pixel guessing or a vision round-trip for most clicks. The bundled locate-element helper (scripts/locate-element.py) resolves a UI element to a real absolute coordinate and clicks/types it.

Why it works when AT-SPI looks broken: on Wayland, getExtents(SCREEN) returns (0,0) for GTK4/Electron (no global coordinates by design), so naive AT-SPI clicking fails. But getExtents(WINDOW) (widget-relative) is populated, and Window Calls gives the window's screen origin from Mutter's privileged view. The helper composes them: abs = window_origin + window_relative_extents + size/2. Proven on GTK4 apps.

locate-element --app gnome-text-editor --list                       # list elements + computed ABS coords
locate-element --app gnome-text-editor --role text --click          # click first text widget
locate-element --app org.gnome.Nautilus --name "New Folder" --click  # click by visible label
locate-element --app gnome-text-editor --role text --click --type "hello"   # click then type
locate-element --window-id 1367000216 --role "push button" --name Open --click  # by explicit window id
locate-element --app firefox --name Reload --print                  # just print "X Y", act yourself
  • --app matches the AT-SPI app name AND wm_class loosely (normalizes org.gnome.TextEditorgnome-text-editor).
  • --role/--name are case-insensitive substring filters; --index N picks the Nth match.
  • Needs toolkit-accessibility=true (gsettings set org.gnome.desktop.interface toolkit-accessibility true), Window Calls active, and ydotoold running.

Vision fallback (only when an element zeros even WINDOW extents — custom-drawn canvases, some Electron internals): locate-element exits non-zero and prints the fallback recipe — screenshot-display --window 'wm_class=X' --out /tmp/w.png → analyze for pixel coords → ydotool mousemove -a -x X -y Y.

Deep dive on the technique, the gotchas, and why Newton isn't a substitute yet: references/wayland-atspi-hybrid-som.md.

The native loop (single actor — the whole point)

[NATIVE agent on the box]
  1. gnome-screenshot -f /tmp/s.png
  2. analyze /tmp/s.png  → "the Save button is at ~(1820, 640)"
  3. ydotool mousemove -a 1820 640 && ydotool click 0xC0
  4. gnome-screenshot again → verify it landed

No second machine, no human, no RDP. For element targeting prefer locate-element over blind coords.

Window actions (focus / move / resize / close)

Window Calls also does focus/move/resize/close verbs (window id from --list-windows):

WIN=/org/gnome/Shell/Extensions/Windows
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Activate \x3Cid>   # focus/raise
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Move \x3Cid> \x3Cx> \x3Cy>
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Resize \x3Cid> \x3Cw> \x3Ch>
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Maximize \x3Cid>
gdbus call --session --dest org.gnome.Shell --object-path $WIN --method $WIN.Close \x3Cid>

Flow: Activate \x3Cid>screenshot-display --window focusedydotool interact.

CAPABILITY C — Live interactive desktop via RDP (optional, for a human)

For a person to drive the desktop hands-on from another machine, use GNOME Remote Desktop in Remote Login mode (system/headless, fresh negotiated display — NOT Desktop Sharing, which mirrors the physical multi-monitor desktop and re-hits the huge-canvas problem).

  • Mode: grdctl --system (system service gnome-remote-desktop.service, enabled at boot).
  • Listener: *:3389always LAN-scope it in your firewall, never any-source.
  • TLS cert at /var/lib/gnome-remote-desktop/rdp-tls.{crt,key}.
  • Credentials: a device user + password over NLA. Setting the device cred = the Linux user's password makes both stages (NLA device prompt, then GDM login) take the same credential.

Connect from a client (Microsoft Remote Desktop / Windows App, or any RDP client) → HOST → device user/pass → GDM → sign in as the Linux user.

⚠️ #1 RDP gotcha — credential changes need a daemon restart. grdctl --system rdp set-credentials writes the keyfile but the running daemon does NOT hot-reload it into its NTLM SAM. Until you sudo systemctl restart gnome-remote-desktop.service, NLA fails for every client: ntlm_fetch_ntlm_v2_hash: Could not find user in SAM databaseSEC_E_NO_CREDENTIALStransport_accept_nla: client authentication failure. Meanwhile grdctl status shows the correct creds (it reads the file). Clients report this as FREERDP_ERROR_CONNECT_TRANSPORT_FAILED (RoyalTS) or ERRCONNECT_AUTHENTICATION_FAILED (FreeRDP) — looks like transport/cert, is actually stale creds. Fix: set creds (password via STDIN) → restart the service → verify journald shows no SAM/ NO_CREDENTIALS errors. Server requires NLA (HYBRID_REQUIRED_BY_SERVER) — expected.

Diagnosing any RDP auth failure: read the SERVER's journald, not the client error string. Client errors all collapse different server causes into the same opaque message. Full server-log→cause map, the +auth-only//sec:tls client probes, the cert/listener sanity checks, AND the empty-server-log case (tcpdump proves client-vs-network — e.g. a GUI client completing TCP then sending zero RDP bytes = client-side engine fault, fixed by recreating the connection, not touching the server): references/rdp-auth-diagnosis-server-side.md.

For automated capture you do NOT need RDP — Capability A is simpler. RDP is for a human. Full setup steps: references/peekaboo-and-rdp-setup-ubuntu-gnome-wayland.md (Step 7).

Why the stock capture paths fail (so you don't waste time re-testing)

  1. gnome-screenshot WITHOUT the extension → exit 0 but no file (AccessDenied to the private API under the hood).
  2. Raw gdbus/busctl call to org.gnome.Shell.ScreenshotAccessDenied. The extension allowlists the gnome-screenshot binary's D-Bus name, not arbitrary callers — so capture must go through the gnome-screenshot binary, not a hand-rolled gdbus call.
  3. xdg-desktop-portal Screenshot → no Response even with the PermissionStore granted (forced interactive dialog).
  4. org.gnome.Mutter.ScreenCast.CreateSessionSession creation inhibited (gated to grd itself).
  5. ffmpeg kmsgrab (KMS framebuffer) → "No usable planes" on NVIDIA proprietary KMS. /dev/fb0 → all-black (GNOME doesn't scan out to fb0).
  6. grim → wlroots-only, fails on Mutter. Don't apt install grim.
  7. enable unsafe_mode via gdb injection into gnome-shell → CRASHES the Shell, drops the desktop to the GDM greeter. Do NOT use the gdb method. The extension is the safe, scoped, persistent unlock.

No usable screen-coordinate accessibility on Wayland — AT-SPI getExtents(SCREEN) is (0,0) on GTK4/Electron. BUT this is solved by Capability B+ (locate-element): WINDOW-relative AT-SPI extents + Window Calls window origin → real click coords. Use locate-element for "click element labeled Save"; fall back to vision-locate only where even WINDOW extents are zero.

Pitfalls

  • The screenshot unlock is binary-scoped — only the gnome-screenshot binary is allowlisted. A raw gdbus Screenshot call still fails. Always shell out to gnome-screenshot.
  • Reloading the Shell on Wayland = restarting GDM (sudo systemctl restart gdm3). There's no in-place Alt+F2 r over SSH. The live GUI session is briefly torn down, so don't do it mid-GUI-task. If autologin is enabled the desktop recovers in ~12s; systemd --user services survive if lingering is on. Verify recovery: loginctl list-sessions shows a seat0 Type=wayland Active=yes session.
  • Never use the gdb unsafe_mode injection — it crashes gnome-shell. Use the extension.
  • The allow-gnome-screenshot extension can silently flip to INACTIVE (its allowlist hook only re-arms on a fresh Shell init). Symptom: gnome-screenshot -f exits 0 but writes no file. A disable/enable cycle is NOT enough — it needs a Shell reload (sudo systemctl restart gdm3) to re-arm. If you depend on capture in an automated loop, verify the file exists each time and reload-on-miss. (Window Calls can flip too — re-enable after any GDM restart.)
  • gnome-screenshot -a (interactive area) needs a human at the screen — useless headless; use -f (full) or -w (window).
  • Full-desktop shots are huge (multi-4K). Prefer -w, or downscale before analysis.
  • ydotool needs ydotoold alive + YDOTOOL_SOCKET/XDG_RUNTIME_DIR. Re-enable if it dies. Raw ydotool is blind absolute pixels — prefer locate-element (Capability B+) for element targeting.
  • A detached SSH session isn't fully in the graphical session — export XDG_RUNTIME_DIR=/run/user/UID (+ DBUS) explicitly. A native systemd --user service already has it.
  • Always LAN-scope RDP (3389) and SSH (22) in your firewall — never any-source.
Usage Guidance
Install only on machines where you intentionally want an agent to see the desktop and control the mouse and keyboard as the logged-in user. Prefer a dedicated low-privilege account or isolated host, avoid sensitive apps on that session, keep SSH/RDP LAN- or VPN-scoped, use window-scoped capture when possible, and disable autologin, ydotoold, lingering, and RDP when not actively needed.
Capability Assessment
Purpose & Capability
The screenshot, window targeting, ydotool input, SSH, and optional RDP capabilities are coherent with the stated computer-use purpose and are not hidden.
Instruction Scope
The instructions emphasize unattended operation, full-desktop capture, remote use over SSH, click/type automation, and window close/move actions without clear app scoping, confirmation rules, or safeguards for sensitive windows and prompts.
Install Mechanism
Setup requires installing GNOME extensions from GitHub, restarting GDM, enabling ydotool, changing /dev/uinput access, installing helper scripts, and optionally enabling GNOME Remote Desktop.
Credentials
The skill targets the user's live graphical session and can capture visible content and inject input as the logged-in user; that is proportionate to the purpose but high impact and broadly scoped.
Persistence & Privilege
The reference setup enables persistent ydotoold, recommends user lingering and autologin for unattended recovery, and optionally enables a boot-starting RDP service, creating durable desktop-control access.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install peekaboo-linux-gnome-wayland-computer-use
  3. After installation, invoke the skill by name or use /peekaboo-linux-gnome-wayland-computer-use
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.2
Correct local-VPN RDP diagnosis: stateful/intermittent (NordVPN+Tailscale); recovery is bouncing the VPN helper(s).
v1.0.1
RDP diagnosis: local VPN/security content-filter (NordVPN Threat Protection etc.) swallowing RDP — zero packets reach server from one client. Common works-from-A-not-B cause.
v1.0.0
Initial public release.
Metadata
Slug peekaboo-linux-gnome-wayland-computer-use
Version 1.0.2
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 3
Frequently Asked Questions

What is Peekaboo Linux Gnome Wayland Computer Use?

See and control GNOME Wayland desktops using gnome-screenshot, ydotool, Window Calls, AT-SPI hybrid element coordinates, and optional GNOME Remote Desktop. T... It is an AI Agent Skill for Claude Code / OpenClaw, with 83 downloads so far.

How do I install Peekaboo Linux Gnome Wayland Computer Use?

Run "/install peekaboo-linux-gnome-wayland-computer-use" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Peekaboo Linux Gnome Wayland Computer Use free?

Yes, Peekaboo Linux Gnome Wayland Computer Use is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Peekaboo Linux Gnome Wayland Computer Use support?

Peekaboo Linux Gnome Wayland Computer Use is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin, linux).

Who created Peekaboo Linux Gnome Wayland Computer Use?

It is built and maintained by Kyzcreig (@kyzcreig); the current version is v1.0.2.

💬 Comments