← Back to Skills Marketplace

Arxiv Agentic Verifier

Name: Arxiv Agentic Verifier
Author: wanng-ide

by WANGJUNJIE · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

769

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install arxiv-agentic-verifier

Description

Actively verifies Python/JS code correctness by generating targeted test cases that expose logic flaws based on problem constraints.

README (SKILL.md)

ArXiv Agentic Verifier

Source Paper: Scaling Agentic Verifier for Competitive Coding (ID: 4a4c4dae6a5145ebc4d62eb2d64b0f0f) Type: Code Verification / Test Generation

Description

This skill implements an "Agentic Verifier" that actively reasons about code correctness by generating targeted, "discriminative" test cases. Instead of random sampling, it analyzes the problem constraints and code logic to find edge cases or logic flaws.

Features

Analyze Code: Understands Python/JS code logic.
Generate Tests: Creates specific inputs to break the code.
Execute & Verify: Runs the code against generated tests (sandbox recommended for production).

Usage

const AgenticVerifier = require('./index');
const verifier = new AgenticVerifier(process.env.OPENAI_API_KEY);

const problem = "Given two integers A and B, output their sum.";
const code = "print(int(input().split()[0]) + int(input().split()[1]))";

verifier.verify(problem, code, 'python')
  .then(result => console.log(result))
  .catch(err => console.error(err));

Configuration

OPENAI_API_KEY: Required for LLM reasoning.

Security Warning

This skill executes code provided to it. Use in a restricted environment or sandbox.

Usage Guidance

This skill implements exactly what it claims (LLM-driven test generation plus executing candidate code), but there are three things to consider before installing: - Metadata mismatch: The registry metadata does not list OPENAI_API_KEY, but the code and SKILL.md require it and the skill will call OpenAI. Expect to provide an API key if you want real LLM behavior. Ask the publisher to update the metadata to declare this requirement. - Data exposure: The skill sends the problem description and the candidate code to OpenAI. If those inputs contain sensitive information (proprietary code, secrets, or private problem statements), they will be transmitted to the OpenAI service. Do not use this skill with sensitive inputs unless you accept this data flow. - Execution risk: The skill writes arbitrary candidate code to disk and executes it with python3/node via execSync. Run it only in a restricted/sandboxed environment (container or VM) and never on a machine with sensitive access. The SKILL.md warns about sandboxing, but you should enforce it. Practical steps: verify package.json and node_modules before running; run npm install in an isolated environment; require the author to update the registry metadata to declare OPENAI_API_KEY; prefer running tests with a mock mode (no API key) or inside a runtime sandbox; and review any network traffic to confirm only the OpenAI API endpoint is contacted.

Capability Analysis

Type: OpenClaw Skill Name: arxiv-agentic-verifier Version: 1.0.0 The skill `arxiv-agentic-verifier` is classified as **suspicious** due to its core functionality involving the execution of arbitrary user-provided code. The `index.js` file uses `child_process.execSync` to run Python or Node.js code that is written to a temporary file. While this capability is central to the skill's stated purpose of code verification and the `SKILL.md` includes an explicit security warning ("This skill executes code provided to it. Use in a restricted environment or sandbox."), it represents a significant Remote Code Execution (RCE) vulnerability if the OpenClaw agent's execution environment is not adequately sandboxed. There is no evidence of intentional malicious behavior such as data exfiltration or backdoor installation.

Capability Assessment

⚠ Purpose & Capability

The code implements an agentic verifier that generates LLM-driven test cases and executes candidate Python/JS code — that's consistent with the skill name/description. However the registry metadata declares no required environment variables or primary credential while the SKILL.md and index.js clearly require an OPENAI_API_KEY (and the package.json depends on the openai client). The missing declaration in metadata is an incoherence that matters for permission review.

⚠ Instruction Scope

Runtime instructions and the code will: (1) send the problem description and candidate code to the OpenAI API for test-generation, and (2) write candidate code to disk and run it with child_process.execSync (python3 or node). Both actions are expected for this tool, but they carry clear privacy and host-safety implications. The SKILL.md warns about sandboxing for code execution but does not explicitly warn that candidate code/problem text will be transmitted to OpenAI (possible sensitive-data exposure).

ℹ Install Mechanism

There is no install spec (instruction-only at registry level), but the package includes package.json/package-lock.json and depends on 'openai' and 'axios'. That means an npm install would pull third-party packages; absence of an explicit install step in metadata is a packaging/manifest inconsistency but not an immediate red flag (no remote arbitrary download URLs are present).

⚠ Credentials

The only credential the code needs is an OpenAI API key, which is proportionate to the stated LLM-based test-generation purpose. However the registry metadata fails to declare OPENAI_API_KEY or a primaryEnv, creating a mismatch between what the skill actually requires and what is advertised. The skill will transmit user-provided problem text and candidate code to the OpenAI API — this is expected but should be declared explicitly so users can judge data-leak risk.

✓ Persistence & Privilege

The skill does not request always:true, does not attempt to modify other skills or system-wide settings, and only writes temporary files under its own directory (temp_exec) and removes them. It executes candidate code locally via execSync, which is expected for this functionality but increases the need for sandboxing.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install arxiv-agentic-verifier
After installation, invoke the skill by name or use /arxiv-agentic-verifier
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release of ArXiv Agentic Verifier: - Actively analyzes Python/JS code logic to identify potential flaws. - Generates targeted test cases to find edge cases and break the code. - Executes provided code against generated tests to verify correctness. - Requires an OpenAI API key for reasoning. - Intended for use in a secure, sandboxed environment.

Metadata

Slug arxiv-agentic-verifier

Version 1.0.0

License —

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Arxiv Agentic Verifier?

Actively verifies Python/JS code correctness by generating targeted test cases that expose logic flaws based on problem constraints. It is an AI Agent Skill for Claude Code / OpenClaw, with 769 downloads so far.

How do I install Arxiv Agentic Verifier?

Run "/install arxiv-agentic-verifier" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Arxiv Agentic Verifier free?

Yes, Arxiv Agentic Verifier is completely free (open-source). You can download, install and use it at no cost.

Which platforms does Arxiv Agentic Verifier support?

Arxiv Agentic Verifier is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Arxiv Agentic Verifier?

It is built and maintained by WANGJUNJIE (@wanng-ide); the current version is v1.0.0.

More Skills

Arxiv Agentic Verifier

ArXiv Agentic Verifier

Description

Features

Usage

Configuration

Security Warning

What is Arxiv Agentic Verifier?

How do I install Arxiv Agentic Verifier?

Is Arxiv Agentic Verifier free?

Which platforms does Arxiv Agentic Verifier support?

Who created Arxiv Agentic Verifier?

💬 Comments