← 返回 Skills 市场

Arxiv Agentic Verifier

Name: Arxiv Agentic Verifier
Author: wanng-ide

作者 WANGJUNJIE · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

769

总下载

当前安装

版本数

在 OpenClaw 中安装

/install arxiv-agentic-verifier

功能描述

Actively verifies Python/JS code correctness by generating targeted test cases that expose logic flaws based on problem constraints.

使用说明 (SKILL.md)

ArXiv Agentic Verifier

Source Paper: Scaling Agentic Verifier for Competitive Coding (ID: 4a4c4dae6a5145ebc4d62eb2d64b0f0f) Type: Code Verification / Test Generation

Description

This skill implements an "Agentic Verifier" that actively reasons about code correctness by generating targeted, "discriminative" test cases. Instead of random sampling, it analyzes the problem constraints and code logic to find edge cases or logic flaws.

Features

Analyze Code: Understands Python/JS code logic.
Generate Tests: Creates specific inputs to break the code.
Execute & Verify: Runs the code against generated tests (sandbox recommended for production).

Usage

const AgenticVerifier = require('./index');
const verifier = new AgenticVerifier(process.env.OPENAI_API_KEY);

const problem = "Given two integers A and B, output their sum.";
const code = "print(int(input().split()[0]) + int(input().split()[1]))";

verifier.verify(problem, code, 'python')
  .then(result => console.log(result))
  .catch(err => console.error(err));

Configuration

OPENAI_API_KEY: Required for LLM reasoning.

Security Warning

This skill executes code provided to it. Use in a restricted environment or sandbox.

安全使用建议

This skill implements exactly what it claims (LLM-driven test generation plus executing candidate code), but there are three things to consider before installing: - Metadata mismatch: The registry metadata does not list OPENAI_API_KEY, but the code and SKILL.md require it and the skill will call OpenAI. Expect to provide an API key if you want real LLM behavior. Ask the publisher to update the metadata to declare this requirement. - Data exposure: The skill sends the problem description and the candidate code to OpenAI. If those inputs contain sensitive information (proprietary code, secrets, or private problem statements), they will be transmitted to the OpenAI service. Do not use this skill with sensitive inputs unless you accept this data flow. - Execution risk: The skill writes arbitrary candidate code to disk and executes it with python3/node via execSync. Run it only in a restricted/sandboxed environment (container or VM) and never on a machine with sensitive access. The SKILL.md warns about sandboxing, but you should enforce it. Practical steps: verify package.json and node_modules before running; run npm install in an isolated environment; require the author to update the registry metadata to declare OPENAI_API_KEY; prefer running tests with a mock mode (no API key) or inside a runtime sandbox; and review any network traffic to confirm only the OpenAI API endpoint is contacted.

功能分析

Type: OpenClaw Skill Name: arxiv-agentic-verifier Version: 1.0.0 The skill `arxiv-agentic-verifier` is classified as **suspicious** due to its core functionality involving the execution of arbitrary user-provided code. The `index.js` file uses `child_process.execSync` to run Python or Node.js code that is written to a temporary file. While this capability is central to the skill's stated purpose of code verification and the `SKILL.md` includes an explicit security warning ("This skill executes code provided to it. Use in a restricted environment or sandbox."), it represents a significant Remote Code Execution (RCE) vulnerability if the OpenClaw agent's execution environment is not adequately sandboxed. There is no evidence of intentional malicious behavior such as data exfiltration or backdoor installation.

能力评估

⚠ Purpose & Capability

The code implements an agentic verifier that generates LLM-driven test cases and executes candidate Python/JS code — that's consistent with the skill name/description. However the registry metadata declares no required environment variables or primary credential while the SKILL.md and index.js clearly require an OPENAI_API_KEY (and the package.json depends on the openai client). The missing declaration in metadata is an incoherence that matters for permission review.

⚠ Instruction Scope

Runtime instructions and the code will: (1) send the problem description and candidate code to the OpenAI API for test-generation, and (2) write candidate code to disk and run it with child_process.execSync (python3 or node). Both actions are expected for this tool, but they carry clear privacy and host-safety implications. The SKILL.md warns about sandboxing for code execution but does not explicitly warn that candidate code/problem text will be transmitted to OpenAI (possible sensitive-data exposure).

ℹ Install Mechanism

There is no install spec (instruction-only at registry level), but the package includes package.json/package-lock.json and depends on 'openai' and 'axios'. That means an npm install would pull third-party packages; absence of an explicit install step in metadata is a packaging/manifest inconsistency but not an immediate red flag (no remote arbitrary download URLs are present).

⚠ Credentials

The only credential the code needs is an OpenAI API key, which is proportionate to the stated LLM-based test-generation purpose. However the registry metadata fails to declare OPENAI_API_KEY or a primaryEnv, creating a mismatch between what the skill actually requires and what is advertised. The skill will transmit user-provided problem text and candidate code to the OpenAI API — this is expected but should be declared explicitly so users can judge data-leak risk.

✓ Persistence & Privilege

The skill does not request always:true, does not attempt to modify other skills or system-wide settings, and only writes temporary files under its own directory (temp_exec) and removes them. It executes candidate code locally via execSync, which is expected for this functionality but increases the need for sandboxing.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install arxiv-agentic-verifier
安装完成后，直接呼叫该 Skill 的名称或使用 /arxiv-agentic-verifier 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release of ArXiv Agentic Verifier: - Actively analyzes Python/JS code logic to identify potential flaws. - Generates targeted test cases to find edge cases and break the code. - Executes provided code against generated tests to verify correctness. - Requires an OpenAI API key for reasoning. - Intended for use in a secure, sandboxed environment.

元数据

Slug arxiv-agentic-verifier

版本 1.0.0

许可证 —

累计安装 0

当前安装数 0

历史版本数 1

常见问题