← Back to Skills Marketplace
656
Downloads
0
Stars
5
Active Installs
3
Versions
Install in OpenClaw
/install screen-vision
Description
macOS screen OCR & click automation via Apple Vision + ScreenCaptureKit. Capture any window or screen region, extract text with coordinates, find text, and c...
Usage Guidance
This skill appears to do what it says: it installs a CLI that captures screen contents and can simulate clicks. Before installing, review the upstream GitHub repository and release you will download (setup.sh references the project's GitHub releases). Prefer the Homebrew path when possible, or build from source yourself if you want maximum assurance. Be aware you will need to grant Screen Recording permission to your terminal; that permission allows the tool to capture any visible screen content, so avoid running it when sensitive information is displayed. Also note the setup script extracts a tarball into /usr/local/bin (may require elevated rights or fail depending on permissions) and the script references release v1.0.0 while the registry metadata is v1.2.0 — verify you are installing the intended version. If you are uncomfortable granting screen-recording access or installing binaries from an external release, do not run setup.sh and instead inspect or build the code locally first.
Capability Analysis
Type: OpenClaw Skill
Name: screen-vision
Version: 1.2.0
The skill provides macOS screen OCR and click automation, which are high-risk capabilities requiring Screen Recording permissions. The 'setup.sh' script automatically installs dependencies by downloading a pre-built binary from a personal GitHub repository (github.com/jackyun1024/mac-screen-vision) or building from source. While the logic appears aligned with the stated purpose, the automated installation of external binaries and the potential for screen data misuse or unauthorized UI interaction make this bundle suspicious.
Capability Assessment
Purpose & Capability
Name/description (macOS screen OCR + click) match the included instructions and setup script: the script installs a 'screen-vision' binary (Homebrew, GitHub release, or source build) and 'cliclick' for automation. No unrelated services, credentials, or config paths are requested.
Instruction Scope
SKILL.md limits actions to running the CLI and parsing its output (list, ocr, find, tap, wait). It explicitly requires macOS 14+ and Screen Recording permission. There are no instructions to read unrelated files, exfiltrate data, or contact unexpected endpoints.
Install Mechanism
Install is handled by the included setup.sh (no separate install spec). The script uses Homebrew where available, otherwise downloads a tarball from the project's GitHub releases or clones/builds the repo via git/swift. Those are typical approaches, but the curl|tar extraction into /usr/local/bin and building from remote source are operations that write binaries to disk and should be reviewed before running.
Credentials
The skill declares no environment variables, no credentials, and no config paths. The setup script does not attempt to read or require unrelated secrets or environment variables.
Persistence & Privilege
The skill is not forced-always and does not modify other skills. The setup script installs binaries into /usr/local/bin (write to system path) and instructs the user to grant Screen Recording permission to the terminal app — both are expected for a screen-capture tool but are elevated actions that require user consent and attention.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install screen-vision - After installation, invoke the skill by name or use
/screen-vision - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.2.0
setup.sh now downloads pre-built binary on Apple Silicon (no Xcode/brew required)
v1.1.0
Add setup.sh for auto-install of CLI binary on first use
v1.0.0
Initial release of screen-vision skill.
- Adds macOS screen OCR and click automation via Apple Vision + ScreenCaptureKit.
- Supports OCR, text search, click automation, and wait-for-text features from the terminal.
- Allows targeting specific apps, screen regions, or the full screen.
- Outputs results as JSON or human-readable text with coordinates.
- Provides command-line usage patterns and argument handling for common automation scenarios.
Metadata
Frequently Asked Questions
What is Screen Vision?
macOS screen OCR & click automation via Apple Vision + ScreenCaptureKit. Capture any window or screen region, extract text with coordinates, find text, and c... It is an AI Agent Skill for Claude Code / OpenClaw, with 656 downloads so far.
How do I install Screen Vision?
Run "/install screen-vision" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Screen Vision free?
Yes, Screen Vision is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Screen Vision support?
Screen Vision is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Screen Vision?
It is built and maintained by Jack Yun (@jackyun1024); the current version is v1.2.0.
More Skills