Description

Call real humans to test your product (URL or app). Get structured usability feedback with screen recordings, NPS scores, and AI-aggregated findings.

README (SKILL.md)

human_test() — Real Human Feedback for AI Products

Name: Human Test
Author: avivahe326

AI agents cannot judge human perception, emotion, or usability. This skill lets you call real humans to test any product URL and get structured feedback back.

What it does

You call human_test() with a product URL or description (URL is optional — also works for mobile apps, desktop software, etc.)
AI auto-generates a structured test plan
Real human testers claim the task on the web platform
Each tester records their screen and microphone (up to 15 min) while completing a guided feedback flow — first impression, task steps, NPS rating
AI extracts key frames from each recording and uses vision AI to analyze usability issues, then aggregates all feedback into a structured report with severity-ranked findings

Setup

Option A: Hosted (zero setup)

Use the hosted version at https://human-test.work — no installation needed. Register to get an API key, then skip to Create a test task below using BASE_URL=https://human-test.work.

Option B: Self-hosted (auto-install)

human_test() can run locally. Before creating a task, check if the server is reachable:

curl -s BASE_URL/api/config

If the server is not running, install and start it:

npm i -g humantest-app
cd /tmp && humantest init --non-interactive && cd humantest && humantest start

This auto-detects AI API keys from your environment (ANTHROPIC_API_KEY, OPENAI_API_KEY, DEEPSEEK_API_KEY, or GEMINI_API_KEY), creates a local SQLite database, builds the app, and starts it on port 3000.

A default admin user is created automatically — no registration needed.

Set BASE_URL: Ask the user once for their preferred base URL. Default: http://localhost:3000

Quick start

Create a test task

curl -X POST BASE_URL/api/skill/human-test \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-product.com",
    "focus": "Test the onboarding flow",
    "maxTesters": 5,
    "creator": "agent-name"
  }'

Response:

{
  "taskId": "cm...",
  "status": "OPEN",
  "testPlan": { "steps": [...], "nps": true, "estimatedMinutes": 10 }
}

Check progress and get the report

curl BASE_URL/api/skill/status/\x3CtaskId>

Response (when completed):

{
  "taskId": "cm...",
  "status": "COMPLETED",
  "submittedCount": 5,
  "report": "## Executive Summary\
...",
  "reportStatus": "COMPLETED",
  "codeFixStatus": "COMPLETED",
  "codeFixPrUrl": "https://github.com/user/repo/pull/1"
}

Note for agents: If repoUrl was provided, code fix generation starts automatically after the report is ready — no need to trigger it manually. Keep polling until codeFixStatus is COMPLETED or FAILED, or use codeFixWebhookUrl to get notified.

Parameters

Parameter	Required	Default	Description
`url`	No	—	Product URL to test (optional — leave empty for mobile apps or non-web products)
`title`	No	Auto from hostname	Task title
`focus`	No	—	What testers should focus on
`maxTesters`	No	5	Number of testers (1-50)
`estimatedMinutes`	No	10	Expected test duration
`creator`	No	admin	Name of the agent/user creating the task (auto-creates a user if needed)
`webhookUrl`	No	—	HTTPS URL to receive the report on completion
`codeFixWebhookUrl`	No	—	HTTPS URL to receive code fix results on completion
`repoUrl`	No	—	GitHub repo URL for code-level fix suggestions
`repoBranch`	No	repo default	Branch to analyze (only used with repoUrl)
`locale`	No	`en`	Report language: `en` (English) or `zh` (Chinese)

Async webhooks

There are two separate webhooks for the two stages:

Report webhook (`webhookUrl`)

If you provide a webhookUrl, the platform will POST the report to that URL when it's ready:

{
  "event": "report",
  "taskId": "...",
  "status": "COMPLETED",
  "title": "Test: example.com",
  "targetUrl": "https://example.com",
  "report": "## Executive Summary\
...",
  "completedAt": "2026-03-02T12:00:00Z"
}

Code fix webhook (`codeFixWebhookUrl`)

If you provide a codeFixWebhookUrl, the platform will POST the code fix result when done:

{
  "event": "code_fix",
  "taskId": "...",
  "status": "COMPLETED",
  "title": "Test: example.com",
  "targetUrl": "https://example.com",
  "codeFixStatus": "COMPLETED",
  "codeFixPrUrl": "https://github.com/user/repo/pull/1",
  "completedAt": "2026-03-02T12:30:00Z"
}

Report format (structured for AI agents)

The report is returned as a markdown string in the report field. It uses a consistent, machine-parseable structure designed for AI agents to read and act on directly — for example, to automatically file issues, create PRs, or prioritize a fix backlog.

Section structure

Every report contains these exact sections in order:

## Metadata
| Field | Value |
|-------|-------|
| Product | ... |
| URL | ... |
| Testers | N |
| Avg NPS | X.X/10 |

## Executive Summary
(3-5 sentences, most critical finding first)

## Issues
### [CRITICAL] Issue title
- **Evidence:** (specific testers and observations)
- **Impact:** (effect on users)
- **Recommendation:** (actionable fix)

### [MAJOR] Issue title
- **Evidence:** ...
- **Impact:** ...
- **Recommendation:** ...

### [MINOR] Issue title
...

## Positive Highlights
(What worked well)

## NPS Analysis
(Score breakdown, interpretation)

## Recommendations
- **P0** (fix immediately): ... (references issue)
- **P1** (fix this sprint): ...
- **P2** (next sprint): ...
- **P3** (backlog): ...

Parsing tips for agents

Severity levels: [CRITICAL], [MAJOR], [MINOR] — always in brackets in issue headers
Priority tags: P0, P1, P2, P3 — in the Recommendations section
Each issue has 3 fields: Evidence, Impact, Recommendation — always bolded labels
Metadata table: always the first section, machine-readable key-value pairs
NPS scores: appear in Metadata (average) and NPS Analysis (per-tester breakdown)

Agent auto-fix workflow

The structured report format is designed for a closed-loop workflow: your agent calls human_test(), receives the report, and automatically fixes the issues found — no human intervention needed after testing.

Recommended flow

Call human_test() with your product URL (include webhookUrl to get notified)
Wait for the report (poll /api/skill/status/\x3CtaskId> or receive webhook)
Parse the ## Issues section — each issue has [SEVERITY], Evidence, Impact, and Recommendation
For [CRITICAL] and [MAJOR] issues, use the Recommendation field to generate targeted code fixes
Create commits or PRs for each fix
(Optional) Call human_test() again to verify the fixes

Each issue's Evidence tells you what went wrong, Impact tells you why it matters, and Recommendation tells you exactly what to fix. This gives your agent enough context to write a targeted fix without guessing.

Repo-aware code fix suggestions

If you pass a repoUrl, the platform automatically triggers code fix generation as soon as the report is ready. It clones your repo, analyzes the code against reported issues, and produces file-level code fix suggestions (with unified diffs) appended to the report as a ## Code Fix Suggestions section.

Two modes (auto-detected)

Mode 1 — Read-only: Grant GitHub user avivahe326 read access to your repo. After the report, the platform clones the repo, analyzes the code against reported issues, and appends code-level diffs to the report.

Mode 2 — Developer access: Grant avivahe326 write access. Same as Mode 1, plus: creates a branch human-test/fixes-\x3CtaskId>, applies the diffs, pushes, and opens a PR. The PR URL is returned in the webhook payload as codeFixPrUrl and in the status API.

Example with repoUrl

curl -X POST BASE_URL/api/skill/human-test \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-product.com",
    "focus": "Test the checkout flow",
    "repoUrl": "https://github.com/your-org/your-repo",
    "repoBranch": "main",
    "webhookUrl": "https://your-server.com/webhook",
    "codeFixWebhookUrl": "https://your-server.com/code-fix-webhook"
  }'

Links

Web platform: https://human-test.work
GitHub: https://github.com/avivahe326/humantest

Usage Guidance

This skill is plausible but contains multiple red flags you should resolve before installing or running it: - Do not blindly run the suggested 'npm i -g humantest-app' on a production machine. Verify the package's publisher and inspect its source (npm page / GitHub repo) first. Prefer running such installs in an isolated VM or container. - The SKILL.md says the app will auto-detect LLM API keys from your environment (ANTHROPIC_API_KEY, OPENAI_API_KEY, DEEPSEEK_API_KEY, GEMINI_API_KEY). If you must run it, ensure you do not expose high‑privilege or production keys; consider creating limited-scope/test keys or running in an environment without sensitive credentials. - The self-hosted flow auto-creates a default admin user with no registration. After starting the app, immediately change or disable default credentials, bind the service to localhost only (or firewall it), and require authentication. - The service can POST reports to arbitrary webhook URLs and accept repoUrl for automated code fixes. Avoid supplying webhooks or repo URLs that grant access to sensitive systems (internal repos, CI, or secret-storing endpoints) until you fully trust the service and understand its auth model. - If you prefer hosted mode (https://human-test.work), evaluate the service's privacy policy and data retention (screen recordings include audio/video of real people) before uploading sensitive product URLs. Verify who controls the hosted domain and how recordings/reports are stored and shared. If you want to proceed safely, request the skill author to: publish an explicit install spec (with package provenance), declare required env vars in registry metadata, explain authentication flows for webhooks/GitHub, and document default admin credentials and how to secure or rotate them. If the author cannot provide those, run the tool only in a disposable, network-restricted sandbox and avoid exposing production secrets.

Capability Analysis

Type: OpenClaw Skill Name: human-test Version: 1.6.1 The skill bundle contains high-risk instructions that direct an AI agent to install a global NPM package (humantest-app) and run a setup process that explicitly scrapes sensitive AI API keys (OpenAI, Anthropic, etc.) from the environment. Most concerningly, the documentation (SKILL.md) instructs users to grant GitHub read/write access to a specific personal account (avivahe326) for 'auto-fix' capabilities, which is a significant security risk and a common pattern for repository hijacking. While these actions are framed as features for a human-testing service (human-test.work), the combination of credential extraction and requests for direct GitHub access to a personal account is highly suspicious.

Capability Assessment

⚠ Purpose & Capability

The stated purpose (running human usability tests and returning a report) is reasonable, but the SKILL.md instructs installing and running a third‑party package (humantest-app) and starting a persistent local server. The registry metadata lists no required env vars or credentials, yet the instructions explicitly rely on multiple AI provider API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY, DEEPSEEK_API_KEY, GEMINI_API_KEY). That mismatch (undeclared env access and a full local app install) is disproportionate to a simple 'call testers and return a report' description and should be justified.

⚠ Instruction Scope

The SKILL.md instructs the agent to: curl a BASE_URL, or (if not available) run 'npm i -g humantest-app', init and start a local server which will auto-detect AI API keys from the environment, create a default admin user automatically, and serve endpoints that accept webhooks and repo URLs for automated code fixes. These instructions read and use environment variables not declared in the registry, create persistent services, and can POST reports to arbitrary webhook URLs — all of which go beyond the minimal scope of managing a single test task.

⚠ Install Mechanism

There is no formal install spec in the registry, yet the instructions instruct a global npm install ('npm i -g humantest-app') and then run binaries that build/start the app. Installing an unvetted global npm package from an unspecified source is high risk: the package could execute arbitrary code on the host, and the SKILL.md gives no provenance or checksum for the package. The absence of an explicit, trusted install specification reduces transparency.

⚠ Credentials

The skill's metadata declares no required environment variables, but the instructions say the app will auto-detect and use ANTHROPIC_API_KEY, OPENAI_API_KEY, DEEPSEEK_API_KEY, or GEMINI_API_KEY. Reading multiple unrelated LLM provider keys from the host environment is broad and not documented as required in the registry. Additionally, the service can post reports to arbitrary webhook URLs and may use repoUrl to generate code fixes (potentially interacting with GitHub) without declaring how credentials are provided — this raises the risk of accidental exposure or misuse of secrets.

⚠ Persistence & Privilege

The instructions create and start a persistent local service (default port 3000) and automatically create a default admin user with no registration step. Persisting a server and an admin account increases long‑term attack surface (exposed endpoints, default credentials) and is a capability beyond a typical ephemeral skill. While 'always: false' and autonomous invocation are normal, the creation of persistent infrastructure and default admin privileges is a notable privilege escalation relative to the registry declaration.

Version History

v1.6.1

Fix incorrect Gemini reference — media analysis uses configured AI provider's vision capability

v1.6.0

Optional URL (supports apps/non-web products), screen recording + Gemini video analysis, locale parameter (en/zh), updated descriptions

v1.5.1

Code fix now auto-triggers after report generation completes, no polling needed

v1.5.0

Split report and code fix into two stages with separate webhooks (webhookUrl for report, codeFixWebhookUrl for code fix). New status API fields: reportStatus, codeFixStatus, codeFixPrUrl. Auto-triggers code fix for agents polling status.

v1.4.0

Auto-install flow, remove API key requirement, add creator parameter

v1.3.1

Remove credit system references (project is now open source)

v1.3.0

Add repo-aware code fix suggestions: repoUrl/repoBranch params, two modes (read-only vs auto-PR), updated webhook payload with codeFixPrUrl

v1.2.0

Add agent auto-fix workflow documentation

v1.1.0

Add structured agent-parseable report format documentation with severity tags, priority labels, and parsing tips for AI agents

v1.0.0

Initial release: real human usability testing via API

Metadata

Slug human-test

Version 1.6.1

License —

All-time Installs 1

Active Installs 1

Total Versions 10

Frequently Asked Questions

What is Human Test?

Call real humans to test your product (URL or app). Get structured usability feedback with screen recordings, NPS scores, and AI-aggregated findings. It is an AI Agent Skill for Claude Code / OpenClaw, with 473 downloads so far.

How do I install Human Test?

Run "/install human-test" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Human Test free?

Yes, Human Test is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Human Test support?

Human Test is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Human Test?

It is built and maintained by avivahe326 (@avivahe326); the current version is v1.6.1.

More Skills

Human Test