功能描述

Execute multi-step user tasks on websites or local apps using the most reliable, minimally intrusive method with verification and learned patterns.

使用说明 (SKILL.md)

computer-task-execution

Name: Computer Task Execution
Author: gift-is-coding

Core idea

Execute the task, not the interface.

Start from the user goal and choose the path with the best combination of:

success rate
verifiability
low user disruption
low repeated trial-and-error

Do not default to local GUI automation. Prefer interfaces that are easier to verify and less intrusive.

Golden rules

Prefer reliability over cleverness. A short visible foreground step is better than an invisible but flaky flow.
Prefer data/interface layers over GUI simulation. If an API, scriptable interface, URL scheme, local file, browser DOM, or official automation surface exists, use that before UI driving.
Prefer browser execution over local-app execution when the same task can be done on the web with adequate login state and verification.
Treat verification as part of execution. A task is not done until the result is checked using the most reliable available evidence.
Minimize focus stealing, but do not worship zero-focus-steal. The correct target is minimum user disruption with high confidence, not purity.
Reuse proven patterns. Once a path has succeeded for a software target or domain, record it and use it first next time.

Decision model

Before acting, classify the task:

A. Read-only task

Examples:

read meeting details
inspect app state
fetch notifications
extract page content
check whether something exists

Default preference:

browser/web task path
local files / local database / app-exported data
official scripting interface
local app UI automation

B. Write-without-send task

Examples:

create a draft
fill a form but do not submit yet
edit a spreadsheet
prepare a document
populate a field

Default preference:

browser/web task path
official scripting / URL scheme / file-based write
local app UI automation with minimum foreground time

C. High-risk action task

Examples:

send a message
submit a form
delete or modify live data
post to social media
trigger an approval flow

Default preference:

browser/web task path if reliable and verifiable
official app interface if supported and verifiable
local app UI automation with explicit pre-check and post-check

For high-risk actions, if correct targeting depends on visible UI state, expect foreground execution for the critical step.

Execution-path priority

Always try paths in this order unless the task or accumulated target knowledge strongly suggests otherwise:

Browser/web execution
- Best when the target service is available on the web
- Prefer DOM-visible, page-verifiable flows
- Best default for logged-in services when browser access is available
Official non-UI interface
- App scripting
- URL schemes
- built-in automation hooks
- import/export surfaces
- local data files or supported storage
Hybrid execution
- Prepare in background using data or browser
- Switch to foreground only for the minimum critical action window
- Immediately verify and exit
Local app UI automation
- Use only when the task cannot be completed more reliably elsewhere
- Prefer keyboard-first flows only when target focus can be guaranteed
- Use visual or state verification for completion
Background/local no-focus experimentation
- Last resort for low-risk tasks or explicit user request
- Treat as experimental unless already proven for this target
- If success is not strongly verifiable, do not present it as complete

Focus policy

When background/no-focus execution is appropriate

Prefer trying no-focus or low-focus execution when:

the task is read-only
the action operates on data rather than UI state
the target exposes a scriptable surface
verification does not depend on visible window state
the user explicitly requests silent/minimal-disruption mode

When foreground execution is usually necessary

Use foreground execution for the critical step when:

keyboard input must reach a specific target window or field
the result depends on visible UI state
recipient/target selection must be visually confirmed
the task sends, submits, deletes, or approves something
previous background attempts for this target were flaky

Preferred compromise

For local app tasks, default to:

background preparation
shortest possible foreground critical section
post-action verification
return control promptly

This is usually better than forcing a fully background flow.

Verification rules

Use the strongest available verification method:

target-system confirmation
- message appears in thread
- record exists
- page shows success state
- saved content is readable from source
direct state read-back
- reread the object after write
- check the updated field value
- reload and confirm persistence
visual confirmation
- screenshot or visible state inspection
- only acceptable when stronger read-back is unavailable
process-only confirmation
- use only for low-risk tasks
- “command ran” is not sufficient evidence for a high-risk task

If you cannot verify confidently, say so and keep the result provisional.

Pattern memory: learn per software target

After each successful or meaningfully failed execution, update target-specific experience. If a relevant pattern already exists, reading it before execution is mandatory.

Storage

Website/domain patterns: references/site-patterns/\x3Cdomain>.md
Local app patterns: references/site-patterns/\x3Capp-name>.md

What to record

Record only facts learned through execution:

successful path
required preconditions
whether foreground was necessary
whether browser was superior to app
known unstable paths
verification method that worked
date of discovery

Why this matters

Next time, if the target matches a known app or domain:

read that pattern file first
reuse the proven path first
skip previously disproven paths unless the environment changed

This avoids repeated “try everything again” behavior.

Pattern file format

---
kind: local-app | website
name: WeChat
domain: x.com
app_id: com.tencent.xinWeChat
aliases: [微信, WeChat]
updated: 2026-03-27
---

## Successful paths
- [2026-03-27] Foreground: open -a WeChat → Command+F → paste contact → Enter → paste message → Enter

## Preconditions
- Main window present
- Search accepts pasted contact names

## Verification
- Sent message visibly appears in the target thread

## Unstable or failed paths
- [2026-03-27] Background-only keyboard injection was not reliably targetable

## Recommended default
- Use background preparation + short foreground execution + post-send verification

How to use pattern memory

Step 1: identify the target

Normalize the target to either:

a domain, or
an application name

Step 2: check for prior knowledge

If a matching pattern file exists, you must read it before choosing the execution path. Do not skip this step just because you already have a generic plan.

Step 3: start with the proven route

If the stored preferred route still fits the current request, use it first. Treat stored successful patterns as the default starting point.

Step 4: only explore when needed

Explore alternatives only if:

the preferred route fails
the user requested a different mode
the environment clearly changed
the stored pattern is clearly inapplicable to the current task

Do not re-run previously disproven paths unless there is a specific reason.

Step 5: update after execution

If new facts were learned, update the pattern file. Pattern memory is part of task completion, not optional cleanup.

Choosing browser vs local app

Prefer browser when

the service has a working web app
login state exists in browser
DOM/state inspection improves confidence
you need robust, repeatable verification
avoiding frontmost app disruption matters

Prefer local app when

the task is app-only
the app exposes better native automation than the website
the browser version is missing key capabilities
the task is already known to work reliably through a stored app pattern

Local app operating style

If local app execution is necessary, prefer this sequence:

Determine exact success criteria
Prepare all inputs before touching the app
Open or locate the target app/window
Keep foreground time as short as possible
Execute only the critical path
Verify immediately
Record the winning pattern

Handling silent mode requests

If the user asks to avoid stealing focus:

first see whether browser or non-UI paths can satisfy the task
if local-app background execution is only partially reliable, say so internally in planning and choose it only when the task risk is low or the user explicitly prefers silence over certainty
for high-risk tasks, recommend minimum-foreground execution rather than pretending a background path is equally safe

Failure handling

When a path fails:

identify whether the failure was due to targeting, focus, auth, UI drift, missing permissions, or bad path choice
switch to the next-best path class instead of repeating the same failing method blindly
if the failure teaches something reusable, record it in the target pattern file

References

Read references/pattern-memory.md for the pattern-memory policy.
Read references/site-patterns/\x3Ctarget>.md when a known software target or domain already has stored experience.

安全使用建议

This skill is coherent as a playbook for automating tasks, but it assumes access to local/browser sessions and persistent storage for per-site patterns without declaring where those files live or what they may contain. Before installing: (1) confirm how and where pattern memory will be stored and who can read it, (2) restrict or review any platform permissions that let the agent control the GUI/clipboard/screenshots, (3) avoid leaving sensitive accounts logged into browsers or apps you don't want automated, (4) prefer explicit user confirmation for high-risk actions (sending messages, deleting data), and (5) ask the developer to document exact runtime capabilities and storage behavior. If you cannot get those clarifications, treat the skill as higher risk and avoid enabling autonomous invocation or long-lived persistence.

能力评估

ℹ Purpose & Capability

The name/description describe executing real user tasks and the SKILL.md provides a coherent decision model for doing that. However, the instructions assume capabilities (local GUI automation, browser DOM inspection, screenshots, and persistent 'pattern memory' updates) that are not declared in the skill metadata (no config paths, no required binaries). That omission is plausible for an instruction-only skill, but it's a meaningful gap between claimed requirements and implied actions.

⚠ Instruction Scope

Instructions direct runtime behavior that touches sensitive surfaces: driving local apps, focusing windows, pasting via clipboard, reading DOM/browser state, taking screenshots/visual verification, and updating/reusing a persistent pattern memory. Those actions can access personal data (messages, documents, calendar items). The skill does not explicitly limit what is recorded in pattern memory or where it is stored, and it gives broad discretion ('choose the most reliable method'), which could lead to unexpected data reads or writes.

✓ Install Mechanism

This is instruction-only and has no install spec or external downloads. That minimizes supply-chain risk because no code is pulled during install.

ℹ Credentials

The skill requests no environment variables or credentials in metadata, which is proportionate. But it repeatedly assumes browser login state and access to local app accounts; it also expects to persist target-specific patterns. The absence of declared storage/config paths or explicit permission prompts is a gap worth clarifying.

⚠ Persistence & Privilege

SKILL.md mandates updating 'pattern memory' after runs (read-before-run is mandatory if a pattern exists). The skill package includes pattern templates, but runtime writes/reads to persistent storage are implied without any declared config path or explanation of storage location, retention, or sensitivity controls. Combined with autonomous invocation (allowed by default), this persistent behavior increases risk if not constrained.

版本历史

v1.0.0

computer-task-execution 1.0.0 initial release - Launches a new skill focused on reliably executing user tasks across websites and local apps. - Establishes a reliability-first decision model for choosing execution paths: browser/web, official interfaces, hybrid, UI automation, and background. - Provides golden rules for minimizing user disruption and improving success verification. - Introduces pattern memory to store and reuse per-target execution knowledge. - Details verification standards and a format for documenting successful and failed patterns.

元数据

Slug computer-task-execution

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题