Description

Drive Owl Browser as an agent. Read pages as compact, handle-addressable OwlMark and click/type by handle, not by screenshot or pixel coordinates.

README (SKILL.md)

Owl Browser (agent rendering)

Name: Owl Browser
Author: ibnbd

Owl Browser is an AI-native browser. Instead of screenshots or raw HTML, it renders each page as OwlMark: a compact, handle-addressable text view of what is actually on screen. You observe the page, then act on handles. This is far cheaper than a screenshot and removes pixel-coordinate guessing.

Every call is POST $OWL_API_ENDPOINT/execute/\x3Ctool> with a JSON body and Authorization: Bearer $OWL_API_TOKEN. A reusable helper:

owl() { curl -s -X POST "$OWL_API_ENDPOINT/execute/$1" \
  -H "Authorization: Bearer $OWL_API_TOKEN" \
  -H "Content-Type: application/json" -d "$2"; }

The loop (do this, keep it short)

create_context(render_mode=agent) -> navigate(url) -> observe
   -> click/type(handle) -> observe -> ... -> close_context

Call observe after navigating and after every action. It is the only way you see the page.
Act using the handle tokens observe prints (e.g. l5, b12, x27). No CSS selectors, no pixel coordinates.
observe blocks until the page is ready. Do not add a separate wait step.
Never screenshot to read text or find elements. Screenshot only to judge visual design or layout.

Core tools

browser_create_context

Creates a session. Returns result.context_id (use it in every later call). Do NOT pass context_id in.

owl browser_create_context '{"render_mode":"agent"}'
# use "both" if you will also screenshot; "pixel" is the legacy human render

browser_navigate

owl browser_navigate '{"context_id":"ctx_...","url":"https://example.com"}'

Does NOT return the page. Call observe next.

browser_observe (your eyes)

Returns render (OwlMark text), handles (actionable elements), metadata, token_estimate.

owl browser_observe '{"context_id":"ctx_..."}'
owl browser_observe '{"context_id":"ctx_...","detail":"outline"}'   # headings-only map of a long page

Params: detail = min | normal (default) | full | outline; region = main/nav/header/footer or a handle; max_tokens = soft budget.

A handle in the render looks like: - link "Pricing" [#l5] or textbox "Email" [#x27 val=""]. Pass the token (l5, x27).

browser_click / browser_type (your hands)

Pass the handle token as selector (or handle). The response includes an effect: navigated, dom-changed, or no-effect. Trust it, then re-observe.

owl browser_click '{"context_id":"ctx_...","selector":"l5"}'
owl browser_type  '{"context_id":"ctx_...","selector":"x27","text":"[email protected]"}'

Also: browser_clear_input '{"context_id":"...","selector":"x27"}' before re-typing, and browser_press_key '{"context_id":"...","key":"Enter"}' to submit.

browser_screenshot (visual check only)

owl browser_screenshot '{"context_id":"ctx_..."}'

browser_close_context

owl browser_close_context '{"context_id":"ctx_..."}'

Drill-down tools (when observe collapsed something)

browser_expand '{"context_id":"...","handle":"R1"}' re-serializes one collapsed region/template at higher detail.
browser_read_node '{"context_id":"...","handle":"M1"}' returns the full text of a single node (e.g. an article body).

Edge cases and recovery

Check metadata.status on every observe:

ready — act on it.
pending — the page has not rendered its content yet (a lazy client-rendered shell). The envelope has reason and retry_after_ms; re-observe after that delay. Do NOT treat a pending render as an empty page.
incomplete — chrome rendered but main content did not; re-observe once, then use vision.

metadata.dropped_surfaces tells you what text could not capture:

canvas / webgl / image:N — a visual surface. Use render_mode:"both" + browser_screenshot + your own vision.
sparse_main / shell_unhydrated / main_content_unrendered — content is late or withheld; re-observe, then vision if still empty.
first_tree_timeout — slow or bot-blocked; read it with a screenshot.

Other cases:

Handles are per-document. After any navigation (a click whose effect is navigated, or a browser_navigate), re-observe to get fresh handles. Acting on a stale handle returns STALE_HANDLE; when you see that, re-observe.
Same-page anchors scroll, they do not navigate. Clicking an href="#section" link returns effect: "scrolled" and moves the viewport. Expected, not a failure.
Rare click-nav crash: on a few slow sites, a click that triggers a cross-document navigation can crash and auto-respawn the browser (~1s). If a context is lost right after such a click, recreate the context and browser_navigate directly to the destination URL instead of clicking.
PDF / embedded plugins: read the content with a screenshot (render_mode:"both"); in-page plugin controls may not be actionable handles.

Do and do not

DO observe before acting and re-observe after every action.
DO act on the exact handle tokens observe printed.
DO read the effect of a click/type before assuming it worked.
DO use detail:"outline" on long reference pages, then expand/read_node the part you need.
DO NOT screenshot to read a page or find elements.
DO NOT guess pixel coordinates. Owl gives you handles so you never have to.
DO NOT pass context_id to create_context; it is returned to you.

Minimal example: search and open a result

CTX=$(owl browser_create_context '{"render_mode":"agent"}' | jq -r .result.context_id)
owl browser_navigate "$(printf '{"context_id":"%s","url":"https://duckduckgo.com"}' "$CTX")"
owl browser_observe  "{\"context_id\":\"$CTX\"}"                 # find the search box, e.g. x4
owl browser_type     "{\"context_id\":\"$CTX\",\"selector\":\"x4\",\"text\":\"owl browser olib ai\"}"
owl browser_press_key "{\"context_id\":\"$CTX\",\"key\":\"Enter\"}"
owl browser_observe  "{\"context_id\":\"$CTX\"}"                 # results appear, pick a link, e.g. l31
owl browser_click    "{\"context_id\":\"$CTX\",\"selector\":\"l31\"}"
owl browser_observe  "{\"context_id\":\"$CTX\"}"                 # read the opened page
owl browser_close_context "{\"context_id\":\"$CTX\"}"

Notes

Over MCP, the Owl MCP server exposes this same loop and defaults to render_mode=agent; the toolset is profile-scoped via OWL_MCP_PROFILE (agent, automation, webdev, full).
The full machine-readable tool reference is served at GET $OWL_API_ENDPOINT/agent-skills.md.

Usage Guidance

Install this only if you trust the Owl Browser server behind OWL_API_ENDPOINT. Prefer localhost or HTTPS for any non-local endpoint, and avoid entering sensitive credentials or private page data unless you control the server receiving the browser actions and observations.

Capability Tags

requires-oauth-tokenrequires-sensitive-credentials

Capability Assessment

✓ Purpose & Capability

The stated purpose is browser automation through Owl Browser, and the documented capabilities are limited to creating browser contexts, navigating, observing rendered page text, clicking, typing, screenshots for visual checks, and closing contexts.

ℹ Instruction Scope

The skill clearly instructs agents to send requests to OWL_API_ENDPOINT with OWL_API_TOKEN, but it could more explicitly warn that visited URLs, observed page content, and typed input are visible to that endpoint.

✓ Install Mechanism

The artifact contains only SKILL.md and metadata; there are no executable scripts, package installs, startup hooks, or hidden install behavior.

✓ Credentials

Requiring curl, OWL_API_ENDPOINT, and OWL_API_TOKEN is proportionate for controlling an external/local browser API, and the endpoint is user supplied.

✓ Persistence & Privilege

No persistence, privilege escalation, background worker, local credential harvesting, or automatic execution is present; browser contexts are explicitly created and closed by user-directed calls.

Version History

v1.0.0

- Major update: added detailed documentation and usage guide in SKILL.md. - Improved skill description to highlight OwlMark and handle-based interaction. - Environment variables (`OWL_API_ENDPOINT`, `OWL_API_TOKEN`) and required binaries clarified. - Step-by-step usage loop and core API methods now documented for easy use. - Edge cases, error handling, and best practices section included. - Minimal example for search and navigation provided.

Metadata

Slug owl-browser

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Owl Browser?

Drive Owl Browser as an agent. Read pages as compact, handle-addressable OwlMark and click/type by handle, not by screenshot or pixel coordinates. It is an AI Agent Skill for Claude Code / OpenClaw, with 22 downloads so far.

How do I install Owl Browser?

Run "/install owl-browser" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Owl Browser free?

Yes, Owl Browser is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Owl Browser support?

Owl Browser is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Owl Browser?

It is built and maintained by Akram H. Sharkar (@ibnbd); the current version is v1.0.0.

More Skills

Owl Browser