Description

Runs three AI agents in parallel to implement, cross-evaluate, score, and select the best code solution for a given coding task objectively.

README (SKILL.md)

b3ehive Skill Specification

Name: B3ehive
Author: weiyangzen

PCTF-Compliant Multi-Agent Competition System

1. Purpose (PCTF: Purpose)

Enable competitive code generation where three isolated AI agents implement the same functionality, evaluate each other objectively, and deliver the optimal solution through data-driven selection.

2. Task Definition (PCTF: Task)

Input

task_description: String describing the coding task
constraints: Optional constraints (time/space complexity, language, etc.)

Output

final_solution: Directory containing the winning implementation
comparison_report: Markdown analysis of all three approaches
decision_rationale: Explanation of why the winner was selected

Success Criteria

assertions:
  - final_solution/implementation exists and is runnable
  - comparison_report.md exists with objective metrics
  - decision_rationale.md explains selection logic
  - all three agent implementations are documented
  - evaluation scores are numeric and justified

3. Chain Flow (PCTF: Chain)

graph TD
    A[User Task] --> B[Phase 1: Parallel Spawn]
    B --> C[Agent A: Simplicity]
    B --> D[Agent B: Speed]
    B --> E[Agent C: Robustness]
    C --> F[Phase 2: Cross-Evaluation]
    D --> F
    E --> F
    F --> G[6 Evaluation Reports]
    G --> H[Phase 3: Self-Scoring]
    H --> I[3 Scorecards]
    I --> J[Phase 4: Final Delivery]
    J --> K[Best Solution]

Phase 1: Parallel Implementation

Agent Prompt Template:

role: "Expert Software Engineer"
focus: "{{agent_focus}}"  # Simplicity / Speed / Robustness
task: "{{task_description}}"
constraints:
  - Complete runnable code in implementation/
  - Checklist.md with ALL items checked
  - SUMMARY.md with competitive advantages
  - Must differ from other agents' approaches

linter_rules:
  - code_compiles: true
  - tests_pass: true
  - no_todos: true
  - documented: true

assertions:
  - implementation/main.* exists
  - tests exist and pass
  - Checklist.md is complete
  - SUMMARY.md explains unique approach

Phase 2: Cross-Evaluation

Evaluation Prompt Template:

evaluator: "Agent {{from}}"
target: "Agent {{to}}"
task: "Objectively prove your solution is superior"

dimensions:
  simplicity:
    weight: 20
    metrics:
      - lines_of_code: count
      - cyclomatic_complexity: calculate
      - readability_score: 1-10
  
  speed:
    weight: 25
    metrics:
      - time_complexity: big_o
      - space_complexity: big_o
      - benchmark_results: run_if_possible
  
  stability:
    weight: 25
    metrics:
      - error_handling_coverage: percentage
      - resource_cleanup: check
      - fault_tolerance: test
  
  corner_cases:
    weight: 20
    metrics:
      - input_validation: comprehensive
      - boundary_conditions: covered
      - edge_cases: tested
  
  maintainability:
    weight: 10
    metrics:
      - documentation_quality: 1-10
      - code_structure: logical
      - extensibility: easy/hard

assertions:
  - evaluation is objective with data
  - specific code snippets cited
  - numeric scores provided
  - persuasion argument is data-driven

Phase 3: Objective Scoring

Scoring Prompt Template:

agent: "Agent {{name}}"
task: "Fairly score yourself and competitors"

self_evaluation:
  - dimension: simplicity
    max: 20
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: speed
    max: 25
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: stability
    max: 25
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: corner_cases
    max: 20
    score: "{{self_score}}"
    justification: "{{why}}"
  
  - dimension: maintainability
    max: 10
    score: "{{self_score}}"
    justification: "{{why}}"

peer_evaluation:
  - target: "Agent {{other}}"
    scores: "{{numeric_scores}}"
    comparison: "{{objective_comparison}}"

final_conclusion:
  best_implementation: "[A/B/C/Mixed]"
  reasoning: "{{data_driven_justification}}"
  recommendation: "{{delivery_strategy}}"

assertions:
  - all scores are numeric
  - justifications are specific
  - no inflation or bias
  - conclusion is evidence-based

Phase 4: Final Delivery

Decision Logic:

def select_winner(scores):
    """
    Select final solution based on competitive scores
    """
    margins = calculate_score_margins(scores)
    
    if margins.winner - margins.second > 15:
        # Clear winner
        return SingleWinner(scores.winner)
    elif margins.winner - margins.second > 5:
        # Close competition, consider hybrid
        return HybridSolution(scores.top_two)
    else:
        # Very close, pick simplest
        return SimplestImplementation(scores.all)

assertions:
  - final_solution is runnable
  - comparison_report explains all approaches
  - decision_rationale is transparent
  - attribution is given to winning agent

4. Format Specifications (PCTF: Format)

Directory Structure

workspace/
├── run_a/
│   ├── implementation/      # Agent A code
│   ├── Checklist.md         # Completion checklist
│   ├── SUMMARY.md           # Approach summary
│   ├── evaluation/          # Evaluations of B, C
│   └── SCORECARD.md         # Self-scoring
├── run_b/                   # Same structure
├── run_c/                   # Same structure
├── final/                   # Winning solution
├── COMPARISON_REPORT.md     # Full analysis
└── DECISION_RATIONALE.md    # Why winner selected

File Formats

Checklist.md: Markdown with - [x] checkboxes
SUMMARY.md: Markdown with sections
EVALUATION_*.md: Markdown with tables
SCORECARD.md: Markdown with score tables
Implementation: Runnable code files

5. Linter & Validation

Pre-commit Checks

#!/bin/bash
# scripts/lint.sh

lint_agent_output() {
    local agent_dir="$1"
    local errors=0
    
    # Check required files exist
    for file in Checklist.md SUMMARY.md implementation/main.*; do
        if [[ ! -f "${agent_dir}/${file}" ]]; then
            echo "ERROR: Missing ${file}"
            ((errors++))
        fi
    done
    
    # Check Checklist is complete
    if grep -q "\[ \]" "${agent_dir}/Checklist.md"; then
        echo "ERROR: Checklist has unchecked items"
        ((errors++))
    fi
    
    # Check code compiles (language-specific)
    # ... implementation-specific checks
    
    return $errors
}

# Run on all agents
for agent in a b c; do
    lint_agent_output "workspace/run_${agent}" || exit 1
done

Runtime Assertions

def assert_phase_complete(phase_name):
    """Assert that a phase has completed successfully"""
    assertions = {
        "phase1": [
            "workspace/run_a/implementation exists",
            "workspace/run_b/implementation exists", 
            "workspace/run_c/implementation exists",
            "All Checklist.md are complete"
        ],
        "phase2": [
            "6 evaluation reports exist",
            "All evaluations have numeric scores"
        ],
        "phase3": [
            "3 scorecards exist",
            "All scores are numeric",
            "Conclusions are provided"
        ],
        "phase4": [
            "final/solution exists",
            "COMPARISON_REPORT.md exists",
            "DECISION_RATIONALE.md exists"
        ]
    }
    
    for assertion in assertions[phase_name]:
        assert evaluate(assertion), f"Assertion failed: {assertion}"

6. Configuration

b3ehive:
  # Agent configuration
  agents:
    count: 3
    model: openai-proxy/gpt-5.3-codex
    thinking: high
    focuses:
      - simplicity
      - speed
      - robustness
  
  # Evaluation weights (must sum to 100)
  evaluation:
    dimensions:
      simplicity: 20
      speed: 25
      stability: 25
      corner_cases: 20
      maintainability: 10
  
  # Delivery strategy
  delivery:
    strategy: auto  # auto / best / hybrid
    threshold: 15   # Point margin for clear winner
  
  # Quality gates
  quality:
    lint: true
    test: true
    coverage_threshold: 80

7. Usage

# Basic usage
b3ehive "Implement a thread-safe rate limiter"

# With constraints
b3ehive "Implement quicksort" --lang python --max-lines 50

# Using OpenClaw CLI
openclaw skills run b3ehive --task "Your task"

8. License

MIT © Weiyang (@weiyangzen)

Usage Guidance

b3ehive appears to do what it says: it spawns three agent runs, creates prompts/files, generates evaluations and scorecards, and delivers a chosen implementation. Before installing/running, consider: 1) Review the included scripts (phase1–4) — they create directories, write files, run linters/tests and copy files; generated code may be executed during benchmarking/testing, so run in a sandbox or ephemeral environment (not on production hosts). 2) Clarify model/runtime expectations — package.json and config.yaml reference a specific model (openai-proxy/gpt-5.3-codex) but the skill metadata doesn't declare required model credentials; ensure you understand which model endpoint and credentials will be used. 3) Confirm you trust the skill source — README points to a GitHub repo; inspect upstream code there for updates. 4) If you intend to run untrusted tasks, limit the skill's filesystem/network permissions (containerize or run on isolated VM). 5) The skill lacks a human-readable description/homepage in the registry metadata — ask the author for those details if you need higher assurance.

Capability Analysis

Type: OpenClaw Skill Name: b3ehive Version: 0.1.0 The OpenClaw skill 'b3ehive' is designed for a multi-agent competitive code generation system. The `SKILL.md` defines structured prompts for AI agents focused on code generation, evaluation, and scoring, with clear constraints and success criteria. The accompanying bash scripts (`scripts/phase*.sh`) perform benign file system operations (creating directories, generating markdown templates, copying files) strictly within a designated workspace. There is no evidence of intentional harmful behavior such as data exfiltration, malicious execution, persistence, or prompt injection attempts originating from the skill itself to subvert the agent's core purpose. The skill's actions are aligned with its stated purpose.

Capability Assessment

✓ Purpose & Capability

The files and scripts implement the stated multi-agent competition flow (spawn → evaluate → score → deliver). The required capabilities (creating files, running linters/tests, copying results) match the skill's purpose. Minor note: package.json and config.yaml mention a model (openai-proxy/gpt-5.3-codex), but the registry metadata shows no declared model/agent requirement — a small mismatch in metadata, not a functional mismatch in the scripts.

ℹ Instruction Scope

SKILL.md and the included scripts instruct the agent to create workspaces, generate prompts, run evaluations, run tests/benchmarks, and copy files. These are consistent with a code-competition tool, but they do involve executing or running generated code (tests/benchmarks) and writing to the local filesystem; users should be aware this can execute arbitrary code produced by the agents.

✓ Install Mechanism

There is no install spec (instruction-only behavior with bundled scripts). No network downloads or archive extraction in the install stage. The repo references a GitHub URL in README/package.json but does not automatically fetch remote binaries during install.

ℹ Credentials

The skill does not request environment variables or credentials in the registry metadata. However, package.json and config.yaml reference a model identifier (openai-proxy/gpt-5.3-codex) which implies the agent will need a configured model endpoint/credentials at runtime — those are not declared. This is a metadata/documentation mismatch to be clarified but not itself evidence of malicious intent.

✓ Persistence & Privilege

always:false (default) and disable-model-invocation:false — normal. The skill writes output to a workspace directory within its own tree and does not modify other skills or system-wide configurations. It does not request permanent/always-on presence.

Version History

v0.1.0

Initial release of b3ehive: a multi-agent, PCTF-compliant code competition framework. - Spawns three isolated AI agents (simplicity, speed, robustness) to solve the same coding task in parallel. - Implements cross-evaluation, numeric scoring, and data-driven winner selection with clear decision logic. - Standardizes workspace structure and markdown reporting for traceability and audit. - Includes phase-based pre-commit linting and runtime assertions to ensure completeness and correctness. - Fully documented prompt and evaluation templates, scoring weights, and configuration guidelines. - Output includes the winning implementation, full comparison report, and transparent rationale for selection.

Metadata

Slug b3ehive

Version 0.1.0

License —

All-time Installs 2

Active Installs 2

Total Versions 1

Frequently Asked Questions

What is B3ehive?

Runs three AI agents in parallel to implement, cross-evaluate, score, and select the best code solution for a given coding task objectively. It is an AI Agent Skill for Claude Code / OpenClaw, with 1020 downloads so far.

How do I install B3ehive?

Run "/install b3ehive" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is B3ehive free?

Yes, B3ehive is completely free (open-source). You can download, install and use it at no cost.

Which platforms does B3ehive support?

B3ehive is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created B3ehive?

It is built and maintained by weiyangzen (@weiyangzen); the current version is v0.1.0.

More Skills

B3ehive