← Back to Skills Marketplace
SkillProbe
by
LuarAssassin
· GitHub ↗
· v1.0.0
· MIT-0
235
Downloads
0
Stars
1
Active Installs
8
Versions
Install in OpenClaw
/install skillprobe
Description
A/B evaluates any AI agent skill's real impact through three-role isolation (orchestrator + two sub-agents). Generates skill profiles, synthetic test tasks,...
Usage Guidance
This skill is internally coherent and does what it claims: it A/B evaluates other skills by sending tasks and the target skill's content to your configured LLM provider and scoring the outputs locally. Before running: (1) do not include secrets, API keys, or sensitive credentials in the SKILL.md or the skill bundle you evaluate — those will be sent to the LLM provider; (2) confirm which LLM provider/runtime is configured and whether you trust its data handling; (3) if using the local CLI helper, be aware it may read runtime/provider config or env vars needed by your LLM client; (4) run evaluations in a sandbox or with redacted skill content if you need to protect sensitive artifacts. If you want an extra safety check, inspect the specific skill bundle being evaluated and remove any embedded secrets before using SkillProbe.
Capability Analysis
Type: OpenClaw Skill
Name: skillprobe
Version: 1.0.0
The skillprobe bundle provides a structured framework for A/B testing and evaluating other AI agent skills using a multi-agent orchestration approach. The logic involves an orchestrator agent generating tasks and dispatching them to isolated sub-agent sessions to compare performance with and without a specific skill, as detailed in SKILL.md and DISPATCH_PROTOCOL.md. While the evaluate.sh script performs basic input sanitization for its CLI arguments, no evidence of malicious intent, data exfiltration, or unauthorized remote execution was found; the behavior is consistent with its stated purpose of skill benchmarking.
Capability Assessment
Purpose & Capability
Name, description, SKILL.md, DISPATCH_PROTOCOL.md, SCORING_REFERENCE.md, and the helper script are coherent: all are focused on designing tasks, dispatching two isolated sub-agents, scoring, and reporting. No unrelated binaries, env vars, or config paths are requested.
Instruction Scope
Instructions stay within the evaluator role (profile target SKILL.md, generate tasks, dispatch two sub-agents, score). A key behavioral detail: Sub-Agent B receives the full skill content and both arms' prompts are sent to the configured LLM provider. That is expected for evaluation but means evaluated skill content (including anything embedded in its SKILL.md) will be transmitted to the LLM provider.
Install Mechanism
No install spec; instruction-only plus an optional local helper script. No downloads from external URLs or archive extraction. The helper script is benign and only attempts to invoke an existing runtime/CLI if present.
Credentials
The skill declares no required environment variables or credentials. The helper script's security manifest notes that a configured runtime/SkillProbe CLI may access provider environment variables at runtime — this is plausible for a tool that dispatches to an LLM provider, but users should be aware that a local CLI invocation could read environment-configured provider credentials.
Persistence & Privilege
No elevated privileges requested (always:false). The skill does not request persistent system-wide changes and does not modify other skills' configs. It is an orchestrator/workflow and does not demand permanent presence.
How to Use
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install skillprobe - After installation, invoke the skill by name or use
/skillprobe - Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
v1.0.0: Three-role isolation architecture. Orchestrator designs tasks and scores; two isolated sub-agents execute baseline and with-skill arms. Progressive disclosure (DISPATCH_PROTOCOL.md + SCORING_REFERENCE.md). dispatch_evidence guardrail. Self-execution detection.
v0.2.5
Clarify orchestrator vs execution workers for strict A/B independence; remove provider-specific assumptions in medical helper script; add provider-neutral regression test.
v0.2.4
Strict arm independence update: baseline and with-skill must run in separate child-agent sessions; single-subagent sequential dual-arm runs are explicitly disallowed. Added auditable session evidence requirements in docs and ingestion guardrails.
v0.2.3
Execution policy tightened: no early stop at Step 1-3; in-agent evaluations must complete real dual-arm A/B runs using isolated contexts, with explicit fallback proxy when skill toggles are unavailable.
v0.2.2
Enforced strict real A/B wording in skill docs: baseline/with-skill must be true dual runs in isolated contexts; hypothetical/simulated comparisons are explicitly disallowed.
v0.2.1
Docs wording cleanup: removed provider-specific example text and made standalone CLI endpoint description fully provider-neutral.
v0.2.0
Provider-neutral runtime support; added repeated-run stability scoring, optional LLM judge integration, automatic SQLite persistence in evaluate pipeline, and helper-script/docs updates.
v0.1.0
Initial release: A/B skill evaluation methodology with 6-dimension scoring
Metadata
Frequently Asked Questions
What is SkillProbe?
A/B evaluates any AI agent skill's real impact through three-role isolation (orchestrator + two sub-agents). Generates skill profiles, synthetic test tasks,... It is an AI Agent Skill for Claude Code / OpenClaw, with 235 downloads so far.
How do I install SkillProbe?
Run "/install skillprobe" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is SkillProbe free?
Yes, SkillProbe is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does SkillProbe support?
SkillProbe is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created SkillProbe?
It is built and maintained by LuarAssassin (@luarassassin); the current version is v1.0.0.
More Skills