← Back to Skills Marketplace
johanesalxd

Vision Sandbox

by Jo Alex · GitHub ↗ · v1.1.0
cross-platform ✓ Security Clean
6564
Downloads
1
Stars
35
Active Installs
2
Versions
Install in OpenClaw
/install vision-sandbox
Description
Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.
Usage Guidance
Install only if you are comfortable sending the chosen images, screenshots, prompts, and resulting analysis to Google Gemini under your API key. Avoid submitting secrets, credentials, private customer data, or confidential screenshots unless that is allowed by your data-handling policy, and use a constrained or monitored Gemini key where possible.
Capability Analysis
Type: OpenClaw Skill Name: vision-sandbox Version: 1.1.0 The skill is designed to leverage Google Gemini's vision capabilities, including its native code execution sandbox. The core logic in `scripts/vision_executor.py` reads an image, sends it to the Gemini API along with a prompt, and enables code execution *within Google's remote sandbox environment*. The local script does not execute arbitrary code received from the model; it only prints the sandbox code and its output. File operations are limited to reading the user-provided input image and writing output images generated by the Gemini model. There is no evidence of data exfiltration, malicious local execution, persistence mechanisms, or prompt injection attempts against the OpenClaw agent in any of the analyzed files.
Capability Assessment
Purpose & Capability
The capability matches the stated purpose: the local script reads a user-specified image, sends it with a prompt to Gemini, enables Gemini's hosted code execution tool, prints returned code/results, and saves any returned inline images. This is sensitive but coherent and disclosed as a vision-sandbox workflow.
Instruction Scope
The instructions focus on visual grounding, visual math, and UI auditing. The README includes an example where another coding agent uses the visual result to update CSS, but that is a user-directed follow-on workflow rather than hidden mutation by this skill.
Install Mechanism
Installation uses ClawHub or local Python packaging with uv and google-genai. No automatic persistence or privileged installer behavior is shown, though the dependency is version-ranged rather than pinned.
Credentials
A GEMINI_API_KEY and network submission of selected images/prompts are proportionate for a Gemini vision integration, but users should treat screenshots and prompts as data sent to an external provider.
Persistence & Privilege
No background service, local execution of model-generated code, credential storage, privilege escalation, or broad filesystem access is shown. The only local write found is saving model-returned inline media as sandbox_output_*.png in the current directory.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install vision-sandbox
  3. After installation, invoke the skill by name or use /vision-sandbox
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
Migrate to standard OpenClaw tool configuration
v1.0.0
Initial public release
Metadata
Slug vision-sandbox
Version 1.1.0
License
All-time Installs 247
Active Installs 35
Total Versions 2
Frequently Asked Questions

What is Vision Sandbox?

Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing. It is an AI Agent Skill for Claude Code / OpenClaw, with 6564 downloads so far.

How do I install Vision Sandbox?

Run "/install vision-sandbox" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Vision Sandbox free?

Yes, Vision Sandbox is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Vision Sandbox support?

Vision Sandbox is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Vision Sandbox?

It is built and maintained by Jo Alex (@johanesalxd); the current version is v1.1.0.

💬 Comments