← Back to Skills Marketplace

Vision Sandbox

Name: Vision Sandbox
Author: johanesalxd

by Jo Alex · GitHub ↗ · v1.1.0

cross-platform ✓ Security Clean

6564

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install vision-sandbox

Description

Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.

Usage Guidance

Install only if you are comfortable sending the chosen images, screenshots, prompts, and resulting analysis to Google Gemini under your API key. Avoid submitting secrets, credentials, private customer data, or confidential screenshots unless that is allowed by your data-handling policy, and use a constrained or monitored Gemini key where possible.

Capability Analysis

Type: OpenClaw Skill Name: vision-sandbox Version: 1.1.0 The skill is designed to leverage Google Gemini's vision capabilities, including its native code execution sandbox. The core logic in `scripts/vision_executor.py` reads an image, sends it to the Gemini API along with a prompt, and enables code execution *within Google's remote sandbox environment*. The local script does not execute arbitrary code received from the model; it only prints the sandbox code and its output. File operations are limited to reading the user-provided input image and writing output images generated by the Gemini model. There is no evidence of data exfiltration, malicious local execution, persistence mechanisms, or prompt injection attempts against the OpenClaw agent in any of the analyzed files.

Capability Assessment

ℹ Purpose & Capability

The capability matches the stated purpose: the local script reads a user-specified image, sends it with a prompt to Gemini, enables Gemini's hosted code execution tool, prints returned code/results, and saves any returned inline images. This is sensitive but coherent and disclosed as a vision-sandbox workflow.

ℹ Instruction Scope

The instructions focus on visual grounding, visual math, and UI auditing. The README includes an example where another coding agent uses the visual result to update CSS, but that is a user-directed follow-on workflow rather than hidden mutation by this skill.

ℹ Install Mechanism

Installation uses ClawHub or local Python packaging with uv and google-genai. No automatic persistence or privileged installer behavior is shown, though the dependency is version-ranged rather than pinned.

ℹ Credentials

A GEMINI_API_KEY and network submission of selected images/prompts are proportionate for a Gemini vision integration, but users should treat screenshots and prompts as data sent to an external provider.

✓ Persistence & Privilege

No background service, local execution of model-generated code, credential storage, privilege escalation, or broad filesystem access is shown. The only local write found is saving model-returned inline media as sandbox_output_*.png in the current directory.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install vision-sandbox
After installation, invoke the skill by name or use /vision-sandbox
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.1.0

Migrate to standard OpenClaw tool configuration

v1.0.0

Initial public release

Metadata

Slug vision-sandbox

Version 1.1.0

License —

All-time Installs 247

Active Installs 35

Total Versions 2

Frequently Asked Questions

What is Vision Sandbox?

Agentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing. It is an AI Agent Skill for Claude Code / OpenClaw, with 6564 downloads so far.

How do I install Vision Sandbox?

Run "/install vision-sandbox" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Vision Sandbox free?

Yes, Vision Sandbox is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Vision Sandbox support?

Vision Sandbox is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Vision Sandbox?

It is built and maintained by Jo Alex (@johanesalxd); the current version is v1.1.0.

More Skills