← Back to Skills Marketplace

FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones

Name: FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones
Author: hamlet0168

by hamlet0168 · GitHub ↗ · v1.0.2 · MIT-0

cross-platform ⚠ pending

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install flexarm-robot-arm

Description

Give AI agents physical eyes and a hand to operate any smartphone. Built-in 50+ atomic actions (click, swipe, OCR, screenshot). AI can call them directly. Also supports YAML scripts. Script Store: once one AI learns an app, all AI agents instantly master it. For strong-reasoning AI models: they can autonomously explore new apps via camera + robot arm — see, reason, act, verify. This closed-loop lets AI use any phone function, almost infinitely.

README (SKILL.md)

When to Use This Skill

An AI Agent MUST use this skill when the user wants to physically interact with a real smartphone screen using a robot arm. This includes:

Clicking, tapping, swiping, or typing on a phone.
Opening an app on a real phone.
Finding text or icons on the phone screen (OCR or template matching).
Automating a sequence of actions on a phone (e.g., daily check‑in, repeated workflow).
Writing, running, or managing YAML scripts for automation.
Any mention of: robot arm, FlexArm, calibration, camera view, physical click, or real phone.

Example user utterances that should trigger this skill:

“Click the WeChat icon on my phone.”
“Swipe down on the screen.”
“Open TikTok on the real device.”
“Find the ‘Settings’ button on the screen and tap it.”
“Write a YAML script to check weather every morning at 8 AM.”
“Run my automation script for Qishui Music.”
“Use the robot arm to type ‘Hello’ into the search box.”

Do NOT use this skill if:

The user asks a pure knowledge question (e.g., “What’s the capital of France?”).
The user wants to operate a virtual or simulated phone (e.g., Android emulator without physical arm).
The user simply asks for code generation without any intent to execute it on a real phone.

FlexArm Robot — AI Agent Skill Reference

Phone screen automation via robot arm + camera vision. Uses a camera to detect the phone screen area, maps pixel coordinates to physical arm coordinates, and performs precise clicks and swipes.

Environment & Initialization

All API calls in this skill depend on the RobotArmServer.exe service. Before using this skill, the following conditions must be met:

Calibration Tool: RobotArmCalibration.exe Download: Official Release Page
Latest version: v2.0.0
Size: ~160 MB (compressed)
Server Program: RobotArmServer.exe (included in RobotArmServer.zip) Download: FlexArm v2.0.1 Release
Latest version: v2.0.0
Size: ~231 MB (compressed)
Installation:
1. Download RobotArmServer.zip from the link above
2. Extract to any directory (e.g., D:\FlexArm) — this becomes the project root
3. Run RobotArmServer.exe as Administrator (first run requires admin privileges to install the driver)
4. Verify: visit http://127.0.0.1:7826/api/health — should return {"ok":true}
Directory Convention: All relative paths (e.g., scripts/, icons/) are relative to the project root above. Do not modify files inside the _internal/ directory.

⚠️ If the service is not running, this skill cannot perform any operations. Before starting a task, always check /api/health status.

Important: Fixed Port

All API requests must use port 7826, not 5000.

http://127.0.0.1:7826/api/*

The port is fixed at 7826 and cannot be changed. Flask's default port 5000 does not apply.

Important: Chinese Characters in curl

Do NOT use curl to send Chinese characters in JSON. curl corrupts UTF-8 encoding and the server won't correctly recognize Chinese keywords, causing lookup failures.

# ❌ Wrong: curl corrupts Chinese characters in JSON
curl -X POST http://127.0.0.1:7826/api/find_text -d '{"text_keyword":"领取"}'

# ✅ Correct: use Python requests for Chinese parameters
python -c "import requests; r = requests.post('http://127.0.0.1:7826/api/find_text', json={'text_keyword': '领取'}); print(r.text)"

APIs with English-only parameters (e.g., detect_desktop, click_icon, run_script, click_at) may use curl. APIs involving Chinese keywords (find_text, click_text, detect_page page names) must use Python.

Complete API Index (54 endpoints)

All endpoints below are accessible via HTTP POST/GET at http://127.0.0.1:7826.

System & Status (8)

#	Method	Endpoint	Description
1	GET	`/`	Service root
2	GET	`/api/health`	Health check (service, arm, camera status)
3	GET	`/api/arm_status`	Arm status (COM port, service, calibration)
4	GET	`/api/get_frame_info`	Get frame dimensions
5	GET	`/api/get_overlay`	Get current overlay (vision match result)
6	GET	`/api/get_phone_corners`	Get phone screen 4-corner coordinates
7	GET	`/api/is_phone_present`	Check if phone is in frame
8	GET	`/api/is_screen_on`	Check if screen is on

Display & Control (5)

#	Method	Endpoint	Description
9	GET/POST	`/api/show_window`	Open camera display window
10	GET/POST	`/api/hide_window`	Close camera display window
11	POST	`/api/toggle_phone_corners`	Toggle phone screen border overlay
12	POST	`/api/change_focus`	Adjust camera focus (delta value)
13	GET	`/api/screenshots`	List historical screenshot files

Actions (17)

#	Method	Endpoint	Description
14	POST	`/api/go_home`	Home — return to desktop
15	POST	`/api/go_back`	Back navigation
16	POST	`/api/go_forward`	Forward navigation
17	POST	`/api/reset`	Reset robot arm to origin
18	POST	`/api/clear_overlay`	Clear vision match overlay boxes
19	GET/POST	`/api/run_app`	Launch a specified app
20	POST	`/api/swipe_up`	Swipe up (large/small)
21	POST	`/api/swipe_down`	Swipe down (large/small)
22	POST	`/api/swipe_up_normal`	Standard swipe up (~80% success)
23	POST	`/api/swipe_down_normal`	Standard swipe down
24	POST	`/api/swipe`	Custom swipe (start/end percentages)
25	POST	`/api/close_all_apps`	Close all background apps
26	POST	`/api/click_icon`	Template-matching icon click
27	POST	`/api/click_icons`	Click multiple icons sequentially
28	POST	`/api/click_icon_many_times`	Click same icon multiple times
29	POST	`/api/click_text`	OCR text search and click
30	POST	`/api/click_at`	Click at frame pixel coordinates
31	POST	`/api/click`	Click at phone percentage coordinates
32	POST	`/api/click_roi`	Click center of an ROI area
33	POST	`/api/screenshot`	Screenshot (save file / return base64)
34	POST	`/api/reload_gestures`	Reload gesture config (hot-reload)

Vision Detection (12)

#	Method	Endpoint	Description
35	POST	`/api/find_template`	Full-screen template matching
36	POST	`/api/find_template_roi`	ROI-based template matching
37	POST	`/api/find_text`	OCR text search
38	POST	`/api/find_text_roi`	ROI-based OCR text search
39	POST	`/api/find_all_text`	Recognize all text on screen
40	POST	`/api/find_all_templates`	All templates must match
41	POST	`/api/find_any_template`	Any template match is sufficient
42	POST	`/api/count_template`	Count template occurrences
43	POST	`/api/detect_desktop`	Detect current desktop page
44	POST	`/api/detect_page`	Detect current app page
45	POST	`/api/wait_for_template`	Poll until template appears
46	POST	`/api/wait_for_page`	Poll until target page appears

Script Control (4)

#	Method	Endpoint	Description
47	POST	`/api/run_script`	Execute YAML script (async)
48	GET	`/api/script_status`	Check if a script is running
49	GET	`/api/script_progress`	Get script execution progress
50	POST	`/api/stop_script`	Force-stop a running script

Configuration (3)

#	Method	Endpoint	Description
51	GET/PUT	`/api/config/daily`	Read/update daily automation config
52	GET/PUT	`/api/config/app/\x3Cname>`	Read/update app page config
53	GET/PUT	`/api/config/gesture`	Read/update gesture config

System Management (1)

#	Method	Endpoint	Description
54	POST	`/api/shutdown`	Graceful service shutdown

System Architecture

User / AI Agent
    │
    ├── HTTP API (POST/GET http://127.0.0.1:7826/api/*)
    │       Controls robot arm, camera, script execution
    │
    └── YAML Scripts (scripts/*.yaml)
            Define automation workflows (click icons, find text, loops, conditions)

For AI Agents: You cannot see the camera feed directly. Use these APIs to understand the phone screen state:

GET /api/get_frame_info — frame metadata
GET /api/is_phone_present — detect if phone is in frame
GET /api/is_screen_on — detect if screen is lit
POST /api/screenshot {"return_base64": true} — get base64 image data
POST /api/detect_page — detect current page name

Core Principles:

Arm / camera / script engine are exclusive resources — only one task may use them at a time
All vision operations rely on template matching (icons) and OCR (text)
Synchronous blocking — except /api/run_script, all API commands are synchronous and blocking. You must wait for the HTTP response (with "ok": true indicating completion) before sending the next command. run_script launches a background thread and returns immediately; monitor it with script_status, script_progress, stop_script
Resource protection: action commands are rejected with "script is running" during script execution

✅ Safe to call while script runs	❌ Rejected while script runs
`health`, `arm_status`, `script_status`, `script_progress`	`run_script` (only one at a time)
`get_frame_info`, `get_overlay`, `get_phone_corners`	`click`, `click_icon`, `click_text`, `click_at`, `click_roi`
`is_phone_present`, `is_screen_on`	`swipe`, `swipe_up`, `swipe_down`, `swipe_up_normal`, `swipe_down_normal`
`find_template`, `find_all_templates`, `find_any_template`, `count_template` (cv2, thread-safe)	`go_home`, `go_back`, `go_forward`
`screenshot` (base64 or file)	`find_text`, `find_all_text`, `find_text_roi` (PaddleOCR singleton, not thread-safe)
`wait_for_template` (uses cv2 internally)	`detect_desktop`, `detect_page`, `wait_for_page` (may call OCR)
	`reset`, `clear_overlay`, `close_all_apps`, `run_app`, `reload_gestures`

Scripts are interpreted, no compilation needed, edit-and-run
The robot arm auto-resets after script completion

Quick Start: Hello FlexArm

Step 1: Confirm service is running

# Start the service
RobotArmServer.exe
# Default port: 7826

On startup the program auto-detects and initializes:

Checks if the robot arm Windows service is running
If not, tries to auto-install and start it
Auto-detects the arm's COM port and connects
Auto-loads the latest calibration file (or starts auto-calibration if none exists)
Checks license (shows activation dialog if unlicensed)

Note: First-time use requires Administrator privileges to install the Windows service. Run RobotArmServer.exe as Administrator. In daily use, administrator rights are not needed if the service is already installed.

If service installation fails, the program still starts but the arm is unavailable. You can manually run robot-arm-service\安装.bat as Administrator to install the service.

Step 2: Test connectivity

curl http://127.0.0.1:7826/api/health
# Returns: {"ok": true, "data": {"status": "running", ...}}

Step 3: Open camera window, confirm phone is visible

curl -X POST http://127.0.0.1:7826/api/show_window -H "Content-Type: application/json" -d '{}'

After opening the window, you should see the phone screen. Press ESC to close.

For AI Agents: Check screen state without seeing the window

AI Agents cannot see the window. Use these APIs instead:

# Get frame metadata
curl http://127.0.0.1:7826/api/get_frame_info
# Returns: {"ok":true,"data":{"width":960,"height":540}}

# Detect if phone is in frame (brightness check)
curl http://127.0.0.1:7826/api/is_phone_present
# Returns: {"ok":true,"data":{"present":true}}

# Detect if screen is on
curl http://127.0.0.1:7826/api/is_screen_on
# Returns: {"ok":true,"data":{"screen_on":true}}

# Get current frame (base64, parseable by AI)
curl -X POST http://127.0.0.1:7826/api/screenshot -H "Content-Type: application/json" -d '{"return_base64":true,"phone_only":true}'
# Returns: {"ok":true,"data":{"base64":"iVBORw0KGgoAAAANSUhEUg..."}}

# Save screenshot to file
curl -X POST http://127.0.0.1:7826/api/screenshot -H "Content-Type: application/json" -d '{"filename":"screenshots/test.png"}'
# Returns: {"ok":true,"data":{"path":"E:\\robot_arm\\screenshots\	est.png"}}

# List historical screenshots
curl http://127.0.0.1:7826/api/screenshots?limit=5
# Returns: [{"filename":"...","size":123456,"time":"2026-05-24 18:00:12"},...]

Step 4: Execute your first action — click center of screen

curl -X POST http://127.0.0.1:7826/api/click \
  -H "Content-Type: application/json" \
  -d '{"x": 0.5, "y": 0.5}'

The robot arm automatically moves to and clicks the center of the phone screen (x: 0.5, y: 0.5 are percentage coordinates, range 0-1).

No icon templates or configuration needed — just calibrate and go.

Step 5: Write your first script

⚠️ Important: All YAML script files must use UTF-8 without BOM encoding. UTF-8 with BOM causes parse failures or garbled Chinese parameters.

Create scripts/hello_flexarm.yaml:

name: hello_flexarm
description: "First FlexArm script — experience clicking, swiping, waiting"

steps:
  # 1. Click center of screen
  - action: click
    x: 0.5
    y: 0.5

  # 2. Wait 1 second
  - action: wait
    seconds: 1

  # 3. Click bottom of screen (back navigation)
  - action: click
    x: 0.5
    y: 0.95

  # 4. Wait 2 seconds
  - action: wait
    seconds: 2

  # 5. Large swipe up (page turn)
  - action: swipe_up
    large: true

  # 6. Wait 1 second
  - action: wait
    seconds: 1

  # 7. Small swipe down
  - action: swipe_down
    large: false

  # 8. Click top-right corner
  - action: click
    x: 0.85
    y: 0.1

Run it:

curl -X POST http://127.0.0.1:7826/api/run_script \
  -H "Content-Type: application/json" \
  -d '{"path": "scripts/hello_flexarm.yaml"}'

The example above only needs calibration — no icon templates or page config required.

To learn more advanced actions (icon clicks, OCR text clicks, page switching, conditional branches), continue reading to understand icon templates and page definitions.

Configuration: Teach the Program About Your Phone

Phone Desktop Config — `scripts/configs/app_desktop.yaml`

Required, filename is fixed as app_desktop.yaml. The program uses it to identify pages on the phone desktop.

app_name: Phone Desktop

pages:
  - name: desktop_page0           # Page name, arbitrary string
    min_match: 2                  # At least 2 features must match
    must_features:                # Required (all must pass)
      - name: Phone
        type: image
        path: icons/app_phone.png
        mask: false               # false=4-corner sampling, loose matching
      - name: Camera
        type: image
        path: icons/app_camera.png
        mask: false
    features:                     # Optional (need min_match to pass)
      - name: Messages
        type: image
        path: icons/app_message.png
        mask: false
      - name: Settings
        type: image
        path: icons/app_settings.png
        mask: false

  - name: desktop_page1
    min_match: 3
    must_features:
      - name: Phone
        type: image
        path: icons/app_phone.png
        mask: false
      - name: Camera
        type: image
        path: icons/app_camera.png
        mask: false
    features:
      - name: Qishui Music
        type: image
        path: icons/app_qishui.png
        mask: false
      - name: WeChat
        type: image
        path: icons/app_wechat.png
        mask: false

  - name: TaskSwitcher          # Multi-task switching view
    min_match: 0
    must_features:
      - name: RecentApps
        type: image
        path: icons/task_show.png
        mask: false
      - name: Trash
        type: image
        path: icons/task_delete.png
        mask: false
    features: []

Key Fields:

must_features: all must match, or the page is skipped
features: optional; passes if matched count >= min_match
mask: false: uses 4-corner background sampling, tolerates icon size/position variation
mask: true (or omitted): strict template matching, suitable for fixed UI elements

App Page Config — `scripts/configs/app_xxx.yaml`

One config file per app, defining all recognizable pages within that app.

app_name: Qishui Music

pages:
  - name: Music
    min_match: 1
    must_features: []
    features:
      - name: BottomPlayerBar
        type: image
        path: icons/qishui/music_playing.png
        mask: false

  - name: Rewards
    min_match: 1
    must_features:
      - name: RewardsTitle
        type: text
        text: "福利"
    features: []

When switch_page in a script doesn't match any page, it auto-executes the default branch — no separate config needed.

How to Create Icon Templates

Send an email with your license code to [email protected] to get the helper tools, including screen capture and script testing features.

Or manually: take a phone screenshot → crop the icon → place it in the icons/ directory.

Icon Requirements:

PNG format
Icon body should be complete with 2-3px empty border
Do not include dynamic content (countdowns, animations)
Use lowercase English + underscores for filenames, e.g., app_qishui.png

YAML Script Language

Basic Structure

name: script_name
description: "Script description"

steps:
  - action: action_type
    param1: value1
    param2: value2

Supported Actions

action	Parameters	Description
`click_icon`	`path`, `threshold`, `roi`, `mask`	Template-matching icon click
`click_icons`	`paths`, `interval`	Click multiple icons sequentially (arm exits frame between clicks)
`click_icon_many_times`	`path`, `count`, `interval`	Click same spot multiple times without reset
`dial_number`	`number`, `interval`	Smart dialing (maps number to digit icons, supports `#` and `*`)
`click_text`	`text`, `roi`, `min_score`	OCR text search and click
`click`	`x`, `y`	Click phone percentage coordinates (0-1), ±30px random offset
`click_at`	`cam_x`, `cam_y`	Click frame pixel coordinates (precise, no offset)
`click_roi`	`roi`, `label`	Click center of ROI area (phone screen percentage)
`find_all_text`	`roi`, `min_score`	Recognize all text, return list + positions + confidence
`swipe`	`sx`, `sy`, `ex`, `ey`, `steps`, `step_wait_ms`	Custom swipe (start/end percentages)
`swipe_up`	`large: true/false`	Swipe up (large/small)
`swipe_down`	`large: true/false`	Swipe down (large/small)
`swipe_up_normal`	none	Standard swipe up (~80% success)
`swipe_down_normal`	none	Standard swipe down
`go_home`	`max_retries`	Return to desktop (detect → swipe up → detect loop)
`go_back`	none	Back navigation
`go_forward`	none	Forward navigation
`reset`	none	Reset robot arm
`clear_overlay`	none	Clear vision overlay boxes
`run_app`	`app_name`	Launch app (go_home → detect page → swipe → click icon)
`close_all_apps`	`max_swipes`	Close all background apps
`screenshot`	`filename`, `phone_only`, `show_board`, `return_base64`	Screenshot
`reload_gesture`	none	Reload gesture config (hot-reload)
`set_video_to_coin`	`value`	Set video-to-coin earning mode
`wait`	`seconds`	Wait (supports ranges like `2-5`)
`loop`	`count`, `steps`	Loop sub-steps (supports random ranges like `count: 3-5`)
`if_found`	`type`, `path`/`text`, `then`, `else`	Conditional: if target found, run then; else run else
`if_found_roi`	`type`, `path`/`text`, `roi`, `then`, `else`	Same as above but with ROI-limited search
`if_progress_stop`	`template`, `roi`, `then`, `else`	Progress bar stall detection
`if_video_to_coin`	`then`, `else`	Branch based on video-to-coin mode state
`if_random`	`chance`, `then`, `else`	Random probability branch
`detect_desktop`	`config`	Detect if currently on desktop (no assertion)
`detect_page`	`config`	Detect current page name (no assertion)
`is_screen_on`	none	Check if screen is lit
`assert_desktop`	`config`	Must be on desktop, error if not
`switch_page`	`config`, `cases`	Detect page → match cases → default if no match
`run_script`	`path`	Execute sub-script (synchronous, returns on completion)
`stop_loop`	none	Break current loop
`stop_script`	none	Stop current script level (sub-script only stops itself)
`log`	`message`	Print log message

Parameter Details

click_icon

- action: click_icon
  path: icons/app_qishui.png     # Icon path (relative to project root)
  threshold: 0.75                # Match threshold (default 0.75)
  roi: [0.1, 0.2, 0.5, 0.6]     # Search area [sx, sy, ex, ey] (phone percentage, 0-1)
  mask: false                    # false=loose, true=strict (default true)

click_text

- action: click_text
  text: "领取"                   # Text to find
  roi: [0.3, 0.5, 0.7, 0.8]     # Optional search area
  min_score: 0.5                 # OCR minimum confidence (default 0.3)

click (percentage coordinates)

- action: click
  x: 0.5                         # X percentage (0=left, 1=right)
  y: 0.96                        # Y percentage (0=top, 1=bottom)

loop

- action: loop
  count: 10                      # Fixed count
  # count: 3-5                   # Random range also supported
  steps:
    - action: click_text
      text: "领取"
    - action: wait
      seconds: 2

if_found (conditional branch)

- action: if_found
  type: image/text               # image=template matching, text=OCR
  path: icons/qishui/cross.png   # For type=image
  text: "继续观看"               # For type=text
  roi: [0.7, 0.0, 1.0, 0.15]    # Optional search area
  then:
    - action: click_icon
      path: icons/qishui/cross.png
      roi: [0.7, 0.0, 1.0, 0.15]
  else:
    - action: wait
      seconds: 2

if_random (random branch)

- action: if_random
  chance: 0.4                    # 40% chance to take then branch
  then:
    - action: log
      message: "Took then branch"
  else:
    - action: log
      message: "Took else branch"

switch_page (page switching)

- action: switch_page
  config: scripts/configs/app_qishui.yaml   # Page config file
  cases:
    Music:                                   # When "Music" page matches
      - action: click
        x: 0.5
        y: 0.96
    Rewards:
      - action: click_text
        text: "福利"
    default:                                 # When no page matches
      - action: swipe_up
        large: true

Script Nesting

steps:
  - action: run_script
    path: qishui/run_ad_card.yaml     # Execute sub-script
  - action: wait
    seconds: 5

After a sub-script finishes, execution returns to the parent script.

Random Wait

- action: wait
  seconds: 2-5          # Random wait between 2~5 seconds

Execution Model

Scripts execute sequentially — each action completes before the next begins
loop repeats its sub-steps for the specified count
switch_page iterates through all page definitions until a match is found
run_script is a sub-call — returns to the parent when done
stop_loop breaks the current loop
stop_script stops the current level (in a sub-script, only stops that sub-script)
After script completion or error, the robot arm auto-resets

HTTP API Reference

Basic Info:

Address: http://127.0.0.1:7826
All POST endpoints expect JSON body
Success: {"ok": true, "data": {...}}
Failure: {"ok": false, "error": "error message"}

1. Health Check

GET /api/health

Returns service status, port, uptime, etc.

2. Arm Status

GET /api/arm_status

Returns COM port, connection status, movement range, etc.

3. Camera Frame

Get frame info

GET /api/get_frame_info

Returns:

{"ok": true, "data": {"width": 540, "height": 960, "fps": 29.5}}

Show/hide window

POST /api/show_window
POST /api/hide_window

Detect phone presence

GET /api/is_phone_present
GET /api/is_phone_present?bright_threshold=60&bright_ratio=0.08

Returns:

{"ok": true, "data": {"present": true}}

Detect screen on/off

GET /api/is_screen_on
GET /api/is_screen_on?dark_threshold=30&dark_ratio=0.7

Returns:

{"ok": true, "data": {"screen_on": true}}

Toggle phone corners

POST /api/toggle_phone_corners

Overlays a green phone screen border on the display window.

Screenshot

POST /api/screenshot {"path": "screenshots/test.png"}    # Save to file
POST /api/screenshot {"return_base64": true}             # Return base64 (recommended for AI Agents)
POST /api/screenshot {"phone_only": true}                # Crop to phone area only
POST /api/screenshot {"show_board": true}                # Full view with ruler

List screenshots

GET /api/screenshots
GET /api/screenshots?limit=10

Returns:

{"ok":true,"data":[{"filename":"20260524_1800_phone.png","path":"E:\\robot_arm\\screenshots\\...","size":123456,"time":"2026-05-24 18:00:12"},...]}

4. Page Detection

Detect desktop

POST /api/detect_desktop {"desktop_config": "scripts/configs/app_desktop.yaml"}

Returns:

{"ok": true, "data": {"matched": true, "page_name": "desktop_page1", "score": 0.84}}

Detect specific page

POST /api/detect_page {"config_path": "scripts/configs/app_qishui.yaml", "threshold": 0.75}

Returns:

{"ok": true, "data": {"matched": true, "page_name": "Rewards", "score": 0.82}}

5. Vision Search

Find icon

POST /api/find_template
{"path": "icons/app_qishui.png", "threshold": 0.75, "roi": [0.1, 0.2, 0.5, 0.6], "auto_mask": false}

Returns:

{"ok": true, "data": {"x": 257, "y": 453, "w": 52, "h": 53, "score": 0.9446}}

Find text

POST /api/find_text {"text_keyword": "领取", "roi": [0.3, 0.5, 0.7, 0.8], "min_score": 0.5}

Returns:

{"ok": true, "data": {"x": 300, "y": 600, "w": 40, "h": 20, "text": "领取奖励", "score": 0.91}}

ROI-based text search

POST /api/find_text_roi {"roi": [0.0, 0.6, 1.0, 1.0], "text_keyword": "夸克", "min_score": 0.3}

Similar to find_text but requires roi (array format [sx, sy, ex, ey], 0-1).

ROI-based template matching

POST /api/find_template_roi {"path": "icons/app_qishui.png", "roi": [0.1, 0.2, 0.5, 0.6], "threshold": 0.75}

Similar to find_template but requires a roi region.

Adjust camera focus

POST /api/change_focus {"value": 2}     # Focus near +2
POST /api/change_focus {"value": -2}    # Focus far -2

Incremental adjustment (range 0~500). Returns {"ok": true, "data": {"focus": 310.0}}.

Find all text

POST /api/find_all_text {"min_score": 0.5}

Returns all recognized text on screen.

Performance Warning: find_all_text does a full-screen OCR scan using CPU inference. Time varies by text density:

Sparse text (\x3C20 items): ~15 seconds
Dense text (novel apps, etc.): 90~120+ seconds

Best Practice: Use find_text with a specific roi whenever possible — it's orders of magnitude faster.

Find all templates

POST /api/find_all_templates {"template_paths": ["icons/a.png", "icons/b.png"], "threshold": 0.75}

Returns true only if all templates are found.

Find any template

POST /api/find_any_template {"template_paths": ["icons/a.png", "icons/b.png"], "threshold": 0.75}

Returns the first matched icon.

6. Actions

Return to desktop

POST /api/go_home {"max_retries": 5}

max_retries: maximum retry attempts, default 5 (detect desktop → swipe up → re-detect).

Back / Forward

POST /api/go_back
POST /api/go_forward

Reset

POST /api/reset

Swipe up / down

POST /api/swipe_up {"large": true}     # Large swipe up
POST /api/swipe_down {"large": false}  # Small swipe down

Click icon

POST /api/click_icon
{"path": "icons/app_qishui.png", "threshold": 0.75, "roi": [0.1, 0.2, 0.5, 0.6], "mask": false, "reset": true}

Click multiple icons

POST /api/click_icons
{"paths": ["icons/phone/num1.png", "icons/phone/num3.png", "icons/phone/num2.png"], "interval": 1}

Clicks each icon sequentially. After each click the arm exits the frame, then resets after all clicks. Returns {"ok": true, "data": {"clicked": true, "success_count": N, "failed": []}}.

Click same icon multiple times

POST /api/click_icon_many_times
{"path": "icons/qishui/like.png", "count": 3, "interval": 0.5}

Searches for the icon once, then clicks the same position multiple times without moving or resetting. Resets only after all clicks. Returns {"ok": true, "data": {"clicked": true, "clicks": 3}}.

Click text

POST /api/click_text
{"text": "领取", "roi": [0.3, 0.5, 0.7, 0.8], "min_score": 0.5}

Click coordinates

POST /api/click {"x": 0.5, "y": 0.96}

Click ROI center

POST /api/click_roi {"roi": [0.3, 0.5, 0.7, 0.8]}

Custom swipe

POST /api/swipe
{"sx": 0.5, "sy": 0.8, "ex": 0.5, "ey": 0.1, "steps": 5, "duration": 0.3}

sx/sy/ex/ey are phone percentage coordinates, steps is the number of steps, duration is total swipe time (seconds).

Close all apps

POST /api/close_all_apps {"max_swipes": 15}

Launch app

GET /api/run_app?app_name=汽水音乐
# or
POST /api/run_app {"app_name": "汽水音乐"}

Looks up the app icon in app_desktop.yaml and clicks it.

7. Script Control

Run script

POST /api/run_script {"path": "scripts/qishui_daily.yaml"}

Returns:

{"ok": true, "data": {"script": "D:\\...\\scripts/qishui_daily.yaml", "status": "started"}}

Check script status

GET /api/script_status

Returns:

{"ok": true, "data": {"running": false, "current_script": null}}

Get script progress

GET /api/script_progress

Returns full execution log + stats:

{
  "ok": true,
  "data": {
    "running": true,
    "script": "scripts/xxx.yaml",
    "current_step": {"step_index": 5, "action": "switch_page", "target": "拨号页", "status": "ok", "detail": "Matched branch: 拨号页", "timestamp": 1780329450.91},
    "steps_log": [
      {"step_index": 0, "action": "script_start", "target": "test_66", "status": "ok", "detail": "5 top-level steps", "timestamp": ...},
      ...
    ],
    "stats": {
      "total_steps": 24,
      "completed_steps": 23,
      "failed_steps": 0,
      "elapsed": 61.6,
      "status": "running"
    }
  }
}

steps_log: complete step history with timestamps
stats: progress stats including elapsed time, completed/failed counts
Idle state: {"running": false, "script": null, "current_step": null, "steps_log": [], "stats": {}}

Stop script

POST /api/stop_script

Wait for template

POST /api/wait_for_template {"path": "icons/qishui/reward_popup.png", "timeout": 10, "interval": 0.5}

Polls until the template appears or timeout. Checks every interval seconds within timeout seconds.

Wait for page

POST /api/wait_for_page {"config_path": "scripts/configs/app_xxx.yaml", "target_name": "RewardsPage", "timeout": 15}

Polls until the specified page appears or timeout.

Clear overlay

POST /api/clear_overlay

8. Configuration

Read/update daily config

GET  /api/config/daily
PUT  /api/config/daily  {"windows": [...]}

Read/update app page config

GET  /api/config/app/qishui           # Returns app_qishui.yaml content
PUT  /api/config/app/qishui           # Update config (body is YAML text)

Read/update gesture config

GET  /api/config/gesture
PUT  /api/config/gesture  {...}

9. Shutdown

POST /api/shutdown

Graceful shutdown: detect running script → safe terminate → arm reset → release resources → process exit.

Note: Do not kill the process directly — the COM port won't be released, and you'll need to reinstall the driver on next startup.

10. System Tray

RobotArmServer.exe minimizes to the system tray on launch:

Left click / double-click: Show/hide console window
Right-click menu: "Show/Hide Console", "Exit Service"
Tray exit = API /api/shutdown (same graceful shutdown flow)

Directory Structure

RobotArmServer/
├── RobotArmServer.exe          ← Main program
├── _internal/                  ← Program libraries (do not touch)
├── scripts/                    ← YAML scripts directory
│   ├── hello_flexarm.yaml      ← Your first script
│   ├── daily_config.yaml       ← Daily scheduled task config
│   ├── configs/
│   │   ├── app_desktop.yaml    ← Phone desktop config (required)
│   │   ├── app_qishui.yaml     ← Qishui Music page config
│   │   └── app_kuaishou.yaml   ← Kuaishou page config
│   └── qishui/                 ← Sub-scripts directory
│       ├── run_*.yaml
│       └── music_actions.yaml
├── icons/                      ← Icon templates directory
│   ├── app_phone.png
│   ├── app_camera.png
│   ├── app_qishui.png
│   └── qishui/
│       ├── cross.png
│       └── ...
├── calibrations/               ← Calibration results (auto-generated)
├── screenshots/                ← Screenshot save directory
├── camera_config.json          ← Camera focus config
├── device_config.json          ← Device config
├── gesture_config.json         ← Swipe gesture config
└── robot-arm-service/          ← Windows service driver

FAQ

Q: The robot arm doesn't move?

Make sure robot-arm-service\安装.bat has been run as Administrator
Verify GET /api/arm_status returns "connected": true
Ensure calibration is complete (a JSON file exists in calibrations/)

Q: Icon not found?

Confirm the icon file exists in icons/
Lower the threshold (e.g., 0.65)
Set mask: false (4-corner background sampling, more tolerant)
Specify a roi to narrow the search area
Check that the icon template matches the on-screen icon (size, color, background)

Q: Text not found?

Improve image clarity (adjust camera focus: POST /api/change_focus {"value": 5})
Raise min_score to 0.5+ to reduce false matches
Specify a roi to narrow the search area
Use find_all_text to see what text the OCR actually recognizes

Q: Script execution interrupted?

Check GET /api/script_progress to see which step is stuck
Check log output (service console)
Ensure the phone hasn't shown a system dialog (permissions, notifications, etc.)

Q: How to add automation for a new app?

Capture the app icon → place it in icons/
Update scripts/configs/app_desktop.yaml (add the new icon to features)
Create scripts/configs/app_xxx.yaml (define each page inside the app)
Write scripts/run_xxx.yaml (define the workflow)
Run: POST /api/run_script {"path": "scripts/run_xxx.yaml"}

Error Handling Guide

All APIs return a uniform format: {"ok": true/false, "data": {...}, "error": "..."}

Common Errors & Strategies

Error	Cause	Resolution
`"error": "Script is running"`	A script is executing in the background	Check `script_status`, wait for completion, or call `stop_script`
`"error": "RobotActions not initialized"`	Arm not connected / service not started	Guide user to check `robot-arm-service\安装.bat`
`"error": "Missing parameter: path"`	Incomplete request parameters	Check API call parameters
`"ok": false, "data": null` (find_template)	Icon not found	Lower `threshold` or check icon file, do not retry indefinitely
`"ok": false, "data": null` (find_text)	OCR text not found	Widen `roi` or lower `min_score`, try at most 2-3 times then report
`"error": "Unauthorized"`	License check failed	Guide user to activate

Agent Retry Recommendations

Icon lookup: fail → lower threshold → retry once → still fail → report to user
Text lookup: fail → widen ROI → retry once → still fail → report to user
Page detection: no match → try go_back or go_home → re-detect → still no match → report to user
Script conflict: received "script running" → check script_status → if running, wait or report to user

Agent Conversation Example: Find and Open Qishui Music

Below is a complete example showing how an AI Agent combines APIs to "find and open the Qishui Music app on the desktop":

Step 1: Check service status

curl http://127.0.0.1:7826/api/health
# Returns: {"ok":true,"data":{"status":"running","arm_connected":true,...}}

Step 2: Detect current desktop

curl -X POST http://127.0.0.1:7826/api/detect_desktop -H "Content-Type: application/json" -d '{}'
# Returns: {"ok":true,"data":{"page_name":"desktop_page1","score":0.94,"matched":true}}

Step 3: Find the Qishui Music icon

curl -X POST http://127.0.0.1:7826/api/find_template -H "Content-Type: application/json" -d '{"path":"icons/app_qishui.png","threshold":0.75}'
# Returns: {"ok":true,"data":{"x":242,"y":516,"w":55,"h":55,"score":0.92}}

Step 4: Click the icon

curl -X POST http://127.0.0.1:7826/api/click_icon -H "Content-Type: application/json" -d '{"path":"icons/app_qishui.png"}'
# Returns: {"ok":true,"data":{"clicked":true,"score":0.92,...}}

Step 5: Wait for app launch, detect page

python -c "import requests,time; time.sleep(2)"
python -c "import requests; r=requests.post('http://127.0.0.1:7826/api/detect_page',json={'config_path':'scripts/configs/app_qishui.yaml'}); print(r.text)"
# Returns: {"ok":true,"data":{"page_name":"Music","score":0.85,"matched":true}}

Step 6: Confirm phone is in frame

curl http://127.0.0.1:7826/api/is_phone_present
# Returns: {"ok":true,"data":{"present":true}}

✅ Task complete: Qishui Music is open, currently on the Music page.

Or more directly, if app_desktop.yaml is properly configured and the Qishui Music icon exists, you can use the run_app API endpoint directly. It will intelligently auto-navigate, find the correct desktop page, and click the icon.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install flexarm-robot-arm
After installation, invoke the skill by name or use /flexarm-robot-arm
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.2

No changes detected in this version. - Version 1.0.2 was released with no file changes compared to the previous release. - All features, documentation, and APIs remain identical to the previous version.

v1.0.1

flexarm-skill 1.0.1 - Rewrote the description and added a new section specifying exactly when AI agents must use this skill, including clear usage and non-usage scenarios. - Emphasized intended use cases (physical smartphone interaction via robot arm and camera) with concrete examples and triggers. - Clarified that this skill is for real device interaction only, not for virtual or simulated devices. - Made the AI agent's usage criteria more discoverable for all users. - No code or API changes; documentation and guidance update only.

v1.0.0

FlexArm Robot Controller 2.0.1 — Major upgrade with expanded robot arm phone control APIs. - Added detailed documentation and setup instructions, including calibration, fixed-port service, and usage caveats. - Greatly expanded API coverage to 54 endpoints for robot control, camera vision, template and text detection, and YAML automation scripting. - Now supports actions like precise clicking, swiping, multi-icon operations, OCR, and complex script orchestration. - Clarified requirements for proper UTF-8 handling and command-line usage (Python requests recommended for Chinese text). - Expanded system management, configuration, and script control APIs for full automation workflow integration.

Metadata

Slug flexarm-robot-arm

Version 1.0.2

License MIT-0

All-time Installs 1

Active Installs 1

Total Versions 3

Frequently Asked Questions

What is FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones?

How do I install FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones?

Run "/install flexarm-robot-arm" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones free?

Yes, FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones support?

FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones?

It is built and maintained by hamlet0168 (@hamlet0168); the current version is v1.0.2.

More Skills

FlexArm Robot Arm: Physical Tap, Swipe, OCR for Real Phones

When to Use This Skill

FlexArm Robot — AI Agent Skill Reference

Environment & Initialization

Important: Fixed Port

Important: Chinese Characters in curl

Complete API Index (54 endpoints)

System & Status (8)

Display & Control (5)

Actions (17)

Vision Detection (12)

Script Control (4)

Configuration (3)

System Management (1)

System Architecture

Quick Start: Hello FlexArm

Step 1: Confirm service is running

Step 2: Test connectivity

Step 3: Open camera window, confirm phone is visible

Step 4: Execute your first action — click center of screen

Step 5: Write your first script

Configuration: Teach the Program About Your Phone

Phone Desktop Config — scripts/configs/app_desktop.yaml

App Page Config — scripts/configs/app_xxx.yaml

How to Create Icon Templates

YAML Script Language

Basic Structure

Supported Actions

Parameter Details

click_icon

click_text

click (percentage coordinates)

loop

if_found (conditional branch)

if_random (random branch)

switch_page (page switching)

Script Nesting

Random Wait

Execution Model

HTTP API Reference

1. Health Check

2. Arm Status

3. Camera Frame

Get frame info

Show/hide window

Detect phone presence

Detect screen on/off

Toggle phone corners

Screenshot

List screenshots

4. Page Detection

Detect desktop

Detect specific page

5. Vision Search

Find icon

Find text

ROI-based text search

ROI-based template matching

Adjust camera focus

Find all text

Find all templates

Find any template

6. Actions

Return to desktop

Back / Forward

Reset

Swipe up / down

Click icon

Click multiple icons

Click same icon multiple times

Click text

Click coordinates

Click ROI center

Custom swipe

Close all apps

Launch app

7. Script Control

Run script

Check script status

Get script progress

Phone Desktop Config — `scripts/configs/app_desktop.yaml`

App Page Config — `scripts/configs/app_xxx.yaml`