← Back to Skills Marketplace
notestone

AI Intelligence Hub - Real-time Model Capability Tracking

by Notestone · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
378
Downloads
0
Stars
1
Active Installs
1
Versions
Install in OpenClaw
/install model-benchmarks
Description
Real-time AI model capability tracking via leaderboards (LMSYS Arena, HuggingFace, etc.) for intelligent compute routing and cost optimization
Usage Guidance
This skill appears to be an early or local-only implementation: it promises real-time leaderboard scraping but the shipped code returns mocked benchmark and price data and does not call the listed external APIs. Before installing or scheduling it to run automatically: 1) Inspect scripts/run.py fully to confirm whether it will fetch external endpoints or require API keys (and never provide credentials unless you trust the source). 2) If you add cron jobs or use the example scripts, be aware they will regularly write logs and may call `openclaw config set` (so they can change agent config). 3) Only wire up external webhooks (Slack) or API keys you control and trust; the skill does not declare or validate those env vars. 4) If you need real-time external data, either update/verify the fetch implementations yourself or only run the skill in a safe environment until upstream adds proper API integrations and explicit credential handling. If you want higher assurance, ask the publisher for a version that performs actual API calls with documented credential requirements and for a reproducible audit of network behavior.
Capability Analysis
Type: OpenClaw Skill Name: model-benchmarks Version: 1.0.0 The OpenClaw AgentSkills skill bundle 'model-benchmarks' is classified as benign. The code and documentation consistently align with its stated purpose of tracking AI model capabilities and optimizing costs. While the `scripts/run.py` file imports `urllib.request` and defines external API URLs, the current implementation of data fetching functions (`fetch_lmsys_arena`, `fetch_bigcode_leaderboard`, `fetch_current_prices`) explicitly uses `mock_data` and does not make actual external network requests. File system operations are limited to the skill's own directory or legitimate OpenClaw internal configuration paths (`~/.openclaw/workspace/skills/compute-router/dynamic_config.json`). The markdown files (`SKILL.md`, `README.md`, `examples/integration-examples.md`) contain clear instructions and examples for users, including shell commands and Python snippets, but these do not contain any prompt injection attempts, unauthorized commands, or instructions for data exfiltration. The `curl` command in an example is for user-configured Slack alerts, not malicious exfiltration. No obfuscation or persistence mechanisms are present within the skill's core logic.
Capability Assessment
Purpose & Capability
The README/SKILL.md claim 'real-time' pulls from LMSYS, BigCode, HuggingFace and 'No external dependencies'. The bundled scripts (scripts/run.py) currently implement mocked fetch functions and write local JSON files rather than performing actual network scraping/API queries. The code includes BENCHMARK_SOURCES with real-looking URLs (HuggingFace spaces) and imports urllib, but does not actually fetch those endpoints in the provided implementation. This is a clear capability mismatch: the skill advertises live data but ships a simulated/local-only implementation.
Instruction Scope
Runtime instructions center on running the included Python script to fetch, query, recommend, and write local benchmark data, and on integrating results into OpenClaw config or dashboards. The instructions do not direct the agent to read unrelated system files or exfiltrate data. Examples include sending alerts to external endpoints (Slack webhook) and invoking `openclaw config set`, but those are optional example workflows and are within the skill's stated purpose (integration/automation).
Install Mechanism
There is no install spec and no remote download/install step; the skill is instruction-only with bundled Python scripts. Nothing in the manifest writes or executes code fetched from external URLs during installation, which reduces installation-time risk.
Credentials
The skill declares no required environment variables or credentials. However, integration examples reference external webhook variables (e.g., SLACK_WEBHOOK_URL) and CLI commands that rely on an existing OpenClaw installation and its credentials. If you enable the roadmap features (OpenRouter/Anthropic price polling) or modify BENCHMARK_SOURCES to call external APIs, those will likely require API keys—none are declared now. Be aware future/modified versions could ask for unrelated secrets.
Persistence & Privilege
always:false and the skill does not auto-enable itself. Documentation and examples recommend scheduling runs with cron and programmatically changing OpenClaw config (`openclaw config set`). Those are reasonable for the skill's goal but create persistent changes (cron jobs, config updates) under your account if you follow the examples. The skill itself does not request elevated privileges or modify other skills.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install model-benchmarks
  3. After installation, invoke the skill by name or use /model-benchmarks
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
🚀 Model Benchmarks v1.0.0 - Initial Release 🧠 CORE FEATURES: • Real-time AI capability tracking from multiple leaderboards • LMSYS Chatbot Arena integration (100+ models, daily updates) • BigCode programming leaderboard (50+ models, weekly updates) • HuggingFace Open LLM leaderboard (200+ models, daily updates) • Alpaca Eval instruction-following benchmark (80+ models) 💰 COST OPTIMIZATION: • Performance-per-dollar calculations for all tracked models • 445x cost efficiency discovery (Gemini 2.0 Flash vs expensive models) • Task-specific model recommendations (coding, writing, analysis, translation, math, creative, simple) • Real-time pricing integration from OpenRouter and provider APIs 📊 INTELLIGENT ANALYSIS: • Unified 0-100 scoring system across all capabilities • Multi-dimensional performance tracking (general, reasoning, creative, coding, knowledge, comprehension) • Trend analysis and performance change detection • Export capabilities for custom analysis (JSON, CSV) 🔗 PERFECT INTEGRATION: • Seamless compatibility with model-manager skill • Auto-sync capabilities to compute routing systems • CLI and programmatic API access • Cross-platform Python implementation (3.8+) 🎯 PROVEN RESULTS: • Users report 60-95% AI cost reduction • Data-driven model selection replaces guesswork • Discover hidden gem models with superior cost efficiency • Optimize for specific task types with intelligence FIRST RELEASE - Complete AI intelligence platform for OpenClaw optimization!
Metadata
Slug model-benchmarks
Version 1.0.0
License
All-time Installs 1
Active Installs 1
Total Versions 1
Frequently Asked Questions

What is AI Intelligence Hub - Real-time Model Capability Tracking?

Real-time AI model capability tracking via leaderboards (LMSYS Arena, HuggingFace, etc.) for intelligent compute routing and cost optimization. It is an AI Agent Skill for Claude Code / OpenClaw, with 378 downloads so far.

How do I install AI Intelligence Hub - Real-time Model Capability Tracking?

Run "/install model-benchmarks" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is AI Intelligence Hub - Real-time Model Capability Tracking free?

Yes, AI Intelligence Hub - Real-time Model Capability Tracking is completely free (open-source). You can download, install and use it at no cost.

Which platforms does AI Intelligence Hub - Real-time Model Capability Tracking support?

AI Intelligence Hub - Real-time Model Capability Tracking is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created AI Intelligence Hub - Real-time Model Capability Tracking?

It is built and maintained by Notestone (@notestone); the current version is v1.0.0.

💬 Comments