← 返回 Skills 市场
SkillProbe
作者
LuarAssassin
· GitHub ↗
· v1.0.0
· MIT-0
235
总下载
0
收藏
1
当前安装
8
版本数
在 OpenClaw 中安装
/install skillprobe
功能描述
A/B evaluates any AI agent skill's real impact through three-role isolation (orchestrator + two sub-agents). Generates skill profiles, synthetic test tasks,...
安全使用建议
This skill is internally coherent and does what it claims: it A/B evaluates other skills by sending tasks and the target skill's content to your configured LLM provider and scoring the outputs locally. Before running: (1) do not include secrets, API keys, or sensitive credentials in the SKILL.md or the skill bundle you evaluate — those will be sent to the LLM provider; (2) confirm which LLM provider/runtime is configured and whether you trust its data handling; (3) if using the local CLI helper, be aware it may read runtime/provider config or env vars needed by your LLM client; (4) run evaluations in a sandbox or with redacted skill content if you need to protect sensitive artifacts. If you want an extra safety check, inspect the specific skill bundle being evaluated and remove any embedded secrets before using SkillProbe.
功能分析
Type: OpenClaw Skill
Name: skillprobe
Version: 1.0.0
The skillprobe bundle provides a structured framework for A/B testing and evaluating other AI agent skills using a multi-agent orchestration approach. The logic involves an orchestrator agent generating tasks and dispatching them to isolated sub-agent sessions to compare performance with and without a specific skill, as detailed in SKILL.md and DISPATCH_PROTOCOL.md. While the evaluate.sh script performs basic input sanitization for its CLI arguments, no evidence of malicious intent, data exfiltration, or unauthorized remote execution was found; the behavior is consistent with its stated purpose of skill benchmarking.
能力评估
Purpose & Capability
Name, description, SKILL.md, DISPATCH_PROTOCOL.md, SCORING_REFERENCE.md, and the helper script are coherent: all are focused on designing tasks, dispatching two isolated sub-agents, scoring, and reporting. No unrelated binaries, env vars, or config paths are requested.
Instruction Scope
Instructions stay within the evaluator role (profile target SKILL.md, generate tasks, dispatch two sub-agents, score). A key behavioral detail: Sub-Agent B receives the full skill content and both arms' prompts are sent to the configured LLM provider. That is expected for evaluation but means evaluated skill content (including anything embedded in its SKILL.md) will be transmitted to the LLM provider.
Install Mechanism
No install spec; instruction-only plus an optional local helper script. No downloads from external URLs or archive extraction. The helper script is benign and only attempts to invoke an existing runtime/CLI if present.
Credentials
The skill declares no required environment variables or credentials. The helper script's security manifest notes that a configured runtime/SkillProbe CLI may access provider environment variables at runtime — this is plausible for a tool that dispatches to an LLM provider, but users should be aware that a local CLI invocation could read environment-configured provider credentials.
Persistence & Privilege
No elevated privileges requested (always:false). The skill does not request persistent system-wide changes and does not modify other skills' configs. It is an orchestrator/workflow and does not demand permanent presence.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install skillprobe - 安装完成后,直接呼叫该 Skill 的名称或使用
/skillprobe触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
v1.0.0: Three-role isolation architecture. Orchestrator designs tasks and scores; two isolated sub-agents execute baseline and with-skill arms. Progressive disclosure (DISPATCH_PROTOCOL.md + SCORING_REFERENCE.md). dispatch_evidence guardrail. Self-execution detection.
v0.2.5
Clarify orchestrator vs execution workers for strict A/B independence; remove provider-specific assumptions in medical helper script; add provider-neutral regression test.
v0.2.4
Strict arm independence update: baseline and with-skill must run in separate child-agent sessions; single-subagent sequential dual-arm runs are explicitly disallowed. Added auditable session evidence requirements in docs and ingestion guardrails.
v0.2.3
Execution policy tightened: no early stop at Step 1-3; in-agent evaluations must complete real dual-arm A/B runs using isolated contexts, with explicit fallback proxy when skill toggles are unavailable.
v0.2.2
Enforced strict real A/B wording in skill docs: baseline/with-skill must be true dual runs in isolated contexts; hypothetical/simulated comparisons are explicitly disallowed.
v0.2.1
Docs wording cleanup: removed provider-specific example text and made standalone CLI endpoint description fully provider-neutral.
v0.2.0
Provider-neutral runtime support; added repeated-run stability scoring, optional LLM judge integration, automatic SQLite persistence in evaluate pipeline, and helper-script/docs updates.
v0.1.0
Initial release: A/B skill evaluation methodology with 6-dimension scoring
元数据
常见问题
SkillProbe 是什么?
A/B evaluates any AI agent skill's real impact through three-role isolation (orchestrator + two sub-agents). Generates skill profiles, synthetic test tasks,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 235 次。
如何安装 SkillProbe?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install skillprobe」即可一键安装,无需额外配置。
SkillProbe 是免费的吗?
是的,SkillProbe 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。
SkillProbe 支持哪些平台?
SkillProbe 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 SkillProbe?
由 LuarAssassin(@luarassassin)开发并维护,当前版本 v1.0.0。
推荐 Skills