功能描述

Browser automation for AI agents using PinchTab. Control Chrome programmatically for testing, scraping, and interaction. Features token-efficient text extrac...

使用说明 (SKILL.md)

Browser Automation

Name: Browser Automation
Author: huamu668

Browser automation for AI agents using PinchTab — a high-performance Chrome bridge with HTTP API.

What is PinchTab?

Standalone HTTP server — Control Chrome via HTTP API
Token-efficient — 800 tokens/page with text extraction (vs 10,000+ for screenshots)
Multi-instance — Run multiple parallel Chrome processes with isolated profiles
Headless or Headed — Run without window or with visible Chrome
Self-contained — 12MB binary, no external dependencies
MCP integration — Native SMCP plugin for Claude Code

Quick Start

Installation

# macOS / Linux
curl -fsSL https://pinchtab.com/install.sh | bash

# npm
npm install -g pinchtab

# Docker
docker run -d -p 9867:9867 pinchtab/pinchtab

Start Server

# Terminal 1: Start PinchTab server
pinchtab
# Server runs on http://localhost:9867

Basic Commands

# Navigate
pinchtab nav https://pinchtab.com

# Wait 3 seconds for accessibility tree
sleep 3

# Get interactive elements
pinchtab snap -i -c

# Extract text (token-efficient)
pinchtab text

# Click element by ref
pinchtab click e5

# Fill input
pinchtab fill e3 "[email protected]"

Core Concepts

Instance

A running Chrome process. Each instance has isolated state.

# Create headless instance
pinchtab instances create --mode=headless

# Create headed instance (visible window)
pinchtab instances create --mode=headed

# List instances
pinchtab instances list

# Stop instance
pinchtab instances stop \x3Cinstance-id>

Profile

Browser state (cookies, history, localStorage). Log in once, stay logged in.

# Create instance with profile
pinchtab instances create --profile=work

# Profile persists across restarts

Tab

A single webpage. Each instance can have multiple tabs.

# Open new tab
pinchtab tabs open https://example.com

# List tabs
pinchtab tabs list

# Close tab
pinchtab tabs close \x3Ctab-id>

Token-Efficient Patterns

The 3-Second Wait Rule

Critical: Chrome's accessibility tree takes ~3 seconds to populate after navigation.

# ❌ Too fast - empty tree
pinchtab nav https://example.com
pinchtab snap
# Returns: {"count": 1, "nodes": [{"ref": "e0"}]}

# ✅ Wait 3 seconds
pinchtab nav https://example.com
sleep 3
pinchtab snap
# Returns: {"count": 2645, "nodes": [...]}

Optimal Extraction Pattern

# Navigate + wait + filter (14x token savings)
curl -X POST http://localhost:9867/navigate \
  -d '{"url": "https://example.com"}' && \
sleep 3 && \
curl http://localhost:9867/snapshot | \
jq '.nodes[] | select(.name | length > 15) | .name' | \
head -30

Why this works:

Navigate + wait ensures full accessibility tree
jq filter extracts text nodes only
length > 15 filters buttons/labels
head -30 limits output

Token comparison:

Exploratory approach: ~3,800 tokens
Pattern-driven: ~270 tokens
Savings: 14x

HTTP API Reference

Base URL

http://localhost:9867

Instances

# Create instance
TAB=$(curl -s -X POST http://localhost:9867/instances \
  -d '{"profile":"work","mode":"headless"}' | jq -r '.id')

# List instances
curl http://localhost:9867/instances

# Stop instance
curl -X POST "http://localhost:9867/instances/$TAB/stop"

Navigation

# Navigate to URL
curl -X POST "http://localhost:9867/instances/$TAB/tabs/open" \
  -d '{"url":"https://example.com"}'

# Wait for load
sleep 3

Snapshot

# Full snapshot
curl "http://localhost:9867/instances/$TAB/snapshot"

# Interactive elements only
curl "http://localhost:9867/instances/$TAB/snapshot?filter=interactive"

# With coordinates
curl "http://localhost:9867/instances/$TAB/snapshot?includeCoords=true"

Actions

# Click element
curl -X POST "http://localhost:9867/instances/$TAB/action" \
  -d '{"kind":"click","ref":"e5"}'

# Type text
curl -X POST "http://localhost:9867/instances/$TAB/action" \
  -d '{"kind":"type","ref":"e12","text":"hello"}'

# Press key
curl -X POST "http://localhost:9867/instances/$TAB/action" \
  -d '{"kind":"key","ref":"e12","key":"Enter"}'

# Scroll
curl -X POST "http://localhost:9867/instances/$TAB/action" \
  -d '{"kind":"scroll","direction":"down"}'

Extraction

# Extract text (token-efficient)
curl "http://localhost:9867/instances/$TAB/text"

# Take screenshot
curl "http://localhost:9867/instances/$TAB/screenshot" \
  --output screenshot.png

# Generate PDF
curl "http://localhost:9867/instances/$TAB/pdf" \
  --output page.pdf

# Evaluate JavaScript
curl -X POST "http://localhost:9867/instances/$TAB/evaluate" \
  -d '{"script": "document.title"}'

Common Patterns

Pattern 1: Web Scraping

#!/bin/bash
# scrape-headlines.sh

URL=$1
INST=$(curl -s -X POST http://localhost:9867/instances \
  -d '{"mode":"headless"}' | jq -r '.id')

# Navigate and wait
curl -s -X POST "http://localhost:9867/instances/$INST/tabs/open" \
  -d "{\"url\":\"$URL\"}"
sleep 3

# Extract headlines (filter by length)
curl -s "http://localhost:9867/instances/$INST/snapshot" | \
  jq '.nodes[] | select(.name | length > 20) | .name' | \
  head -20

# Cleanup
curl -s -X POST "http://localhost:9867/instances/$INST/stop"

Pattern 2: Form Interaction

#!/bin/bash
# fill-form.sh

INST=$(curl -s -X POST http://localhost:9867/instances \
  -d '{"mode":"headless"}' | jq -r '.id')

# Navigate to form
curl -s -X POST "http://localhost:9867/instances/$INST/tabs/open" \
  -d '{"url":"https://example.com/login"}'
sleep 3

# Get snapshot to find element refs
SNAPSHOT=$(curl -s "http://localhost:9867/instances/$INST/snapshot?filter=interactive")

# Extract refs (example: e5=email, e7=password, e9=submit)
EMAIL_REF=$(echo $SNAPSHOT | jq -r '.nodes[] | select(.name | contains("email")) | .ref')
PASS_REF=$(echo $SNAPSHOT | jq -r '.nodes[] | select(.name | contains("password")) | .ref')
SUBMIT_REF=$(echo $SNAPSHOT | jq -r '.nodes[] | select(.role == "button") | .ref')

# Fill form
curl -s -X POST "http://localhost:9867/instances/$INST/action" \
  -d "{\"kind\":\"type\",\"ref\":\"$EMAIL_REF\",\"text\":\"[email protected]\"}"
curl -s -X POST "http://localhost:9867/instances/$INST/action" \
  -d "{\"kind\":\"type\",\"ref\":\"$PASS_REF\",\"text\":\"password123\"}"

# Submit
curl -s -X POST "http://localhost:9867/instances/$INST/action" \
  -d "{\"kind\":\"click\",\"ref\":\"$SUBMIT_REF\"}"

# Wait for navigation
sleep 3

# Verify login
curl -s "http://localhost:9867/instances/$INST/text" | jq -r '.title'

# Cleanup
curl -s -X POST "http://localhost:9867/instances/$INST/stop"

Pattern 3: Multi-Instance Parallel Processing

#!/bin/bash
# parallel-scrape.sh

URLS=("https://site1.com" "https://site2.com" "https://site3.com")
INSTANCES=()

# Create instances
for i in {0..2}; do
  INST=$(curl -s -X POST http://localhost:9867/instances \
    -d '{"mode":"headless"}' | jq -r '.id')
  INSTANCES[$i]=$INST
done

# Launch parallel jobs
for i in {0..2}; do
  (
    curl -s -X POST "http://localhost:9867/instances/${INSTANCES[$i]}/tabs/open" \
      -d "{\"url\":\"${URLS[$i]}\"}"
    sleep 3
    TITLE=$(curl -s "http://localhost:9867/instances/${INSTANCES[$i]}/text" | jq -r '.title')
    echo "Result $i: $TITLE"
    curl -s -X POST "http://localhost:9867/instances/${INSTANCES[$i]}/stop"
  ) &
done

wait
echo "All complete"

Pattern 4: Visual Regression Testing

#!/bin/bash
# visual-regression.sh

URLS=("https://staging.example.com" "https://production.example.com")
INST=$(curl -s -X POST http://localhost:9867/instances \
  -d '{"mode":"headless"}' | jq -r '.id')

for URL in "${URLS[@]}"; do
  curl -s -X POST "http://localhost:9867/instances/$INST/tabs/open" \
    -d "{\"url\":\"$URL\"}"
  sleep 3

  # Take screenshot
  FILENAME=$(echo $URL | sed 's/[^a-zA-Z0-9]/_/g').png
  curl -s "http://localhost:9867/instances/$INST/screenshot" \
    --output "$FILENAME"
  echo "Saved: $FILENAME"
done

curl -s -X POST "http://localhost:9867/instances/$INST/stop"

Pattern 5: Session Persistence

#!/bin/bash
# persistent-session.sh

# Create instance with named profile
INST=$(curl -s -X POST http://localhost:9867/instances \
  -d '{"profile":"myaccount","mode":"headless"}' | jq -r '.id')

# Login once
curl -s -X POST "http://localhost:9867/instances/$INST/tabs/open" \
  -d '{"url":"https://example.com/login"}'
sleep 3
# ... perform login ...

# Stop (cookies saved to profile)
curl -s -X POST "http://localhost:9867/instances/$INST/stop"

# Later: Resume with same profile
INST2=$(curl -s -X POST http://localhost:9867/instances \
  -d '{"profile":"myaccount","mode":"headless"}' | jq -r '.id')

# Already logged in!
curl -s -X POST "http://localhost:9867/instances/$INST2/tabs/open" \
  -d '{"url":"https://example.com/dashboard"}'

MCP Integration

PinchTab provides an SMCP plugin for native Claude Code integration.

Setup

# Set plugin directory
export MCP_PLUGINS_DIR=/path/to/pinchtab/plugins

# Restart Claude Code to load plugin

Available Tools

Tool	Description
`pinchtab__navigate`	Navigate to URL
`pinchtab__snapshot`	Get page structure
`pinchtab__action`	Click, type, press keys
`pinchtab__text`	Extract text content
`pinchtab__screenshot`	Capture screenshot
`pinchtab__pdf`	Generate PDF
`pinchtab__evaluate`	Run JavaScript
`pinchtab__cookies_get`	Get cookies
`pinchtab__stealth_status`	Check stealth mode

Usage in Claude Code

Use pinchtab to navigate to example.com and extract the main headlines.

Claude will:

Call pinchtab__navigate with URL
Wait 3 seconds
Call pinchtab__snapshot with filter
Extract headlines from result

Headless vs Headed

Aspect	Headless	Headed
Window	No visible UI	Chrome window visible
Speed	~20% faster	Slower (rendering overhead)
Memory	~50-80 MB	~100-150 MB
Use Case	CI/CD, scraping, batch	Debugging, visual QA
Interaction	API only	API + manual

# Headless for production
pinchtab instances create --mode=headless

# Headed for debugging
pinchtab instances create --mode=headed

Best Practices

DO

✅ Wait 3+ seconds after navigation
✅ Use text extraction over screenshots (token-efficient)
✅ Filter snapshots to reduce tokens
✅ Use profiles for persistent sessions
✅ Run headless in production
✅ Clean up instances after use
✅ Handle errors gracefully

DON'T

❌ Skip the 3-second wait
❌ Take screenshots for text extraction
❌ Parse full snapshots without filtering
❌ Use headed mode in CI/CD
❌ Leave instances running indefinitely
❌ Hardcode element refs (they change)

Troubleshooting

Only getting 1 node in snapshot

Cause: Accessibility tree not ready Fix: Increase wait time to 3+ seconds

pinchtab nav https://example.com
sleep 3  # Increase if needed
pinchtab snap

Timeouts

Cause: Page too slow or Chrome overloaded Fix: Increase sleep or use headless mode

# Increase wait
sleep 5

# Or use headless for faster rendering
pinchtab instances create --mode=headless

Element not found

Cause: Refs change between snapshots Fix: Re-snapshot before each action

# Get fresh refs before each action
REF=$(pinchtab snap -i | jq -r '.nodes[] | select(.name == "Submit") | .ref')
pinchtab click "$REF"

Connection refused

Cause: PinchTab server not running Fix: Start server first

pinchtab  # In separate terminal

References

Token-efficient browser automation for AI agents.

安全使用建议

This skill appears to do what it says (PinchTab-based browser automation) but exercise caution before following its install instructions. Verify the authenticity of pinchtab.com and prefer installing from your OS package manager or a verified release with checksums/signatures. Never run curl | bash on a URL you haven't audited—download the installer, inspect it, and verify signatures/checksums. If you plan to automate pages that require credentials, use isolated environments (VM/container) and avoid storing secrets in plaintext in scripts. If you need stronger assurance, request the upstream project's release checksums or use the Docker image instead of piping remote scripts directly. Finally, be aware that the skill omits declaring runtime dependencies (curl, jq, docker, npm, pinchtab CLI); ensure those tools are from trusted sources before use.

功能分析

Type: OpenClaw Skill Name: browser-automation-pin Version: 1.0.0 The skill bundle for 'PinchTab' provides browser automation capabilities for AI agents, including tools for navigation, interaction, and data extraction. It is classified as suspicious due to high-risk instructions in `skill.md`, specifically the recommendation to install the software via a pipe-to-shell command (`curl | bash`) and the inclusion of features for arbitrary JavaScript execution (`pinchtab__evaluate`) and session persistence. While these capabilities are aligned with the stated purpose of web automation, they represent a significant attack surface without explicit security constraints or input sanitization mentioned in the documentation.

能力评估

ℹ Purpose & Capability

The name and description match the SKILL.md content: the guide is about controlling Chrome via PinchTab. However the SKILL.md repeatedly uses command-line tools (curl, jq, npm, docker, pinchtab CLI) yet the skill metadata declares no required binaries or install steps. Not declaring these runtime dependencies is an inconsistency (informational, not necessarily malicious).

ℹ Instruction Scope

The instructions stay within the stated purpose (navigating pages, extracting text, clicking, filling forms, snapshots). A notable scope concern: the guide recommends executing remote installer scripts (curl -fsSL https://pinchtab.com/install.sh | bash) and running arbitrary evaluate JavaScript endpoints; those actions could execute arbitrary code from the PinchTab provider and should be reviewed before running. The examples include filling inputs (e.g., credentials) which is expected for form automation but could expose secrets to the pages automated—this is a normal risk for browser automation but worth calling out.

⚠ Install Mechanism

There is no install spec in the registry, but SKILL.md instructs running a remote install script (curl | bash), npm -g installs, and docker pull/run for pinchtab/pinchtab. Executing an unverified remote installer or running a downloaded binary is higher-risk than installing from a vetted package with checksums. The SKILL.md provides no checksums, signatures, or pinned versions.

✓ Credentials

The skill declares no required environment variables or credentials, and the instructions do not attempt to read hidden environment variables or unrelated system config. The absence of requested credentials is proportionate to the described purpose. (Be aware that automated browsing can cause user-entered credentials to be submitted to remote pages—this is an application-level risk, not an incoherence in the skill manifest.)

✓ Persistence & Privilege

The skill is instruction-only, has always=false, and requests no persistent system privileges or configuration changes in other skills. It does not attempt to modify other skills' configurations or request permanent presence.

版本历史

v1.0.0

Initial release of browser-automation skill, enabling programmatic Chrome browser control via PinchTab. - Automate browser tasks including navigation, interaction, testing, and data extraction. - Supports token-efficient text extraction (800 tokens/page) and interactive element identification. - Provides multi-instance, profile-based, headless and headed Chrome operation. - RESTful HTTP API for actions: navigation, snapshots, clicks, typing, scrolling, extraction, and more. - Includes ready-to-use bash patterns for web scraping, form filling, parallel automation, and visual testing.

元数据

Slug browser-automation-pin

版本 1.0.0

许可证 —

累计安装 0

当前安装数 0

历史版本数 1

常见问题