← Back to Skills Marketplace
wanng-ide

Arxiv Agentic Verifier

by WANGJUNJIE · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
769
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install arxiv-agentic-verifier
Description
Actively verifies Python/JS code correctness by generating targeted test cases that expose logic flaws based on problem constraints.
README (SKILL.md)

ArXiv Agentic Verifier

Source Paper: Scaling Agentic Verifier for Competitive Coding (ID: 4a4c4dae6a5145ebc4d62eb2d64b0f0f) Type: Code Verification / Test Generation

Description

This skill implements an "Agentic Verifier" that actively reasons about code correctness by generating targeted, "discriminative" test cases. Instead of random sampling, it analyzes the problem constraints and code logic to find edge cases or logic flaws.

Features

  • Analyze Code: Understands Python/JS code logic.
  • Generate Tests: Creates specific inputs to break the code.
  • Execute & Verify: Runs the code against generated tests (sandbox recommended for production).

Usage

const AgenticVerifier = require('./index');
const verifier = new AgenticVerifier(process.env.OPENAI_API_KEY);

const problem = "Given two integers A and B, output their sum.";
const code = "print(int(input().split()[0]) + int(input().split()[1]))";

verifier.verify(problem, code, 'python')
  .then(result => console.log(result))
  .catch(err => console.error(err));

Configuration

  • OPENAI_API_KEY: Required for LLM reasoning.

Security Warning

This skill executes code provided to it. Use in a restricted environment or sandbox.

Usage Guidance
This skill implements exactly what it claims (LLM-driven test generation plus executing candidate code), but there are three things to consider before installing: - Metadata mismatch: The registry metadata does not list OPENAI_API_KEY, but the code and SKILL.md require it and the skill will call OpenAI. Expect to provide an API key if you want real LLM behavior. Ask the publisher to update the metadata to declare this requirement. - Data exposure: The skill sends the problem description and the candidate code to OpenAI. If those inputs contain sensitive information (proprietary code, secrets, or private problem statements), they will be transmitted to the OpenAI service. Do not use this skill with sensitive inputs unless you accept this data flow. - Execution risk: The skill writes arbitrary candidate code to disk and executes it with python3/node via execSync. Run it only in a restricted/sandboxed environment (container or VM) and never on a machine with sensitive access. The SKILL.md warns about sandboxing, but you should enforce it. Practical steps: verify package.json and node_modules before running; run npm install in an isolated environment; require the author to update the registry metadata to declare OPENAI_API_KEY; prefer running tests with a mock mode (no API key) or inside a runtime sandbox; and review any network traffic to confirm only the OpenAI API endpoint is contacted.
Capability Analysis
Type: OpenClaw Skill Name: arxiv-agentic-verifier Version: 1.0.0 The skill `arxiv-agentic-verifier` is classified as **suspicious** due to its core functionality involving the execution of arbitrary user-provided code. The `index.js` file uses `child_process.execSync` to run Python or Node.js code that is written to a temporary file. While this capability is central to the skill's stated purpose of code verification and the `SKILL.md` includes an explicit security warning ("This skill executes code provided to it. Use in a restricted environment or sandbox."), it represents a significant Remote Code Execution (RCE) vulnerability if the OpenClaw agent's execution environment is not adequately sandboxed. There is no evidence of intentional malicious behavior such as data exfiltration or backdoor installation.
Capability Assessment
Purpose & Capability
The code implements an agentic verifier that generates LLM-driven test cases and executes candidate Python/JS code — that's consistent with the skill name/description. However the registry metadata declares no required environment variables or primary credential while the SKILL.md and index.js clearly require an OPENAI_API_KEY (and the package.json depends on the openai client). The missing declaration in metadata is an incoherence that matters for permission review.
Instruction Scope
Runtime instructions and the code will: (1) send the problem description and candidate code to the OpenAI API for test-generation, and (2) write candidate code to disk and run it with child_process.execSync (python3 or node). Both actions are expected for this tool, but they carry clear privacy and host-safety implications. The SKILL.md warns about sandboxing for code execution but does not explicitly warn that candidate code/problem text will be transmitted to OpenAI (possible sensitive-data exposure).
Install Mechanism
There is no install spec (instruction-only at registry level), but the package includes package.json/package-lock.json and depends on 'openai' and 'axios'. That means an npm install would pull third-party packages; absence of an explicit install step in metadata is a packaging/manifest inconsistency but not an immediate red flag (no remote arbitrary download URLs are present).
Credentials
The only credential the code needs is an OpenAI API key, which is proportionate to the stated LLM-based test-generation purpose. However the registry metadata fails to declare OPENAI_API_KEY or a primaryEnv, creating a mismatch between what the skill actually requires and what is advertised. The skill will transmit user-provided problem text and candidate code to the OpenAI API — this is expected but should be declared explicitly so users can judge data-leak risk.
Persistence & Privilege
The skill does not request always:true, does not attempt to modify other skills or system-wide settings, and only writes temporary files under its own directory (temp_exec) and removes them. It executes candidate code locally via execSync, which is expected for this functionality but increases the need for sandboxing.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install arxiv-agentic-verifier
  3. After installation, invoke the skill by name or use /arxiv-agentic-verifier
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
Initial release of ArXiv Agentic Verifier: - Actively analyzes Python/JS code logic to identify potential flaws. - Generates targeted test cases to find edge cases and break the code. - Executes provided code against generated tests to verify correctness. - Requires an OpenAI API key for reasoning. - Intended for use in a secure, sandboxed environment.
Metadata
Slug arxiv-agentic-verifier
Version 1.0.0
License
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is Arxiv Agentic Verifier?

Actively verifies Python/JS code correctness by generating targeted test cases that expose logic flaws based on problem constraints. It is an AI Agent Skill for Claude Code / OpenClaw, with 769 downloads so far.

How do I install Arxiv Agentic Verifier?

Run "/install arxiv-agentic-verifier" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Arxiv Agentic Verifier free?

Yes, Arxiv Agentic Verifier is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Arxiv Agentic Verifier support?

Arxiv Agentic Verifier is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Arxiv Agentic Verifier?

It is built and maintained by WANGJUNJIE (@wanng-ide); the current version is v1.0.0.

💬 Comments