← 返回 Skills 市场
378
总下载
0
收藏
1
当前安装
1
版本数
在 OpenClaw 中安装
/install model-benchmarks
功能描述
Real-time AI model capability tracking via leaderboards (LMSYS Arena, HuggingFace, etc.) for intelligent compute routing and cost optimization
安全使用建议
This skill appears to be an early or local-only implementation: it promises real-time leaderboard scraping but the shipped code returns mocked benchmark and price data and does not call the listed external APIs. Before installing or scheduling it to run automatically: 1) Inspect scripts/run.py fully to confirm whether it will fetch external endpoints or require API keys (and never provide credentials unless you trust the source). 2) If you add cron jobs or use the example scripts, be aware they will regularly write logs and may call `openclaw config set` (so they can change agent config). 3) Only wire up external webhooks (Slack) or API keys you control and trust; the skill does not declare or validate those env vars. 4) If you need real-time external data, either update/verify the fetch implementations yourself or only run the skill in a safe environment until upstream adds proper API integrations and explicit credential handling. If you want higher assurance, ask the publisher for a version that performs actual API calls with documented credential requirements and for a reproducible audit of network behavior.
功能分析
Type: OpenClaw Skill
Name: model-benchmarks
Version: 1.0.0
The OpenClaw AgentSkills skill bundle 'model-benchmarks' is classified as benign. The code and documentation consistently align with its stated purpose of tracking AI model capabilities and optimizing costs. While the `scripts/run.py` file imports `urllib.request` and defines external API URLs, the current implementation of data fetching functions (`fetch_lmsys_arena`, `fetch_bigcode_leaderboard`, `fetch_current_prices`) explicitly uses `mock_data` and does not make actual external network requests. File system operations are limited to the skill's own directory or legitimate OpenClaw internal configuration paths (`~/.openclaw/workspace/skills/compute-router/dynamic_config.json`). The markdown files (`SKILL.md`, `README.md`, `examples/integration-examples.md`) contain clear instructions and examples for users, including shell commands and Python snippets, but these do not contain any prompt injection attempts, unauthorized commands, or instructions for data exfiltration. The `curl` command in an example is for user-configured Slack alerts, not malicious exfiltration. No obfuscation or persistence mechanisms are present within the skill's core logic.
能力评估
Purpose & Capability
The README/SKILL.md claim 'real-time' pulls from LMSYS, BigCode, HuggingFace and 'No external dependencies'. The bundled scripts (scripts/run.py) currently implement mocked fetch functions and write local JSON files rather than performing actual network scraping/API queries. The code includes BENCHMARK_SOURCES with real-looking URLs (HuggingFace spaces) and imports urllib, but does not actually fetch those endpoints in the provided implementation. This is a clear capability mismatch: the skill advertises live data but ships a simulated/local-only implementation.
Instruction Scope
Runtime instructions center on running the included Python script to fetch, query, recommend, and write local benchmark data, and on integrating results into OpenClaw config or dashboards. The instructions do not direct the agent to read unrelated system files or exfiltrate data. Examples include sending alerts to external endpoints (Slack webhook) and invoking `openclaw config set`, but those are optional example workflows and are within the skill's stated purpose (integration/automation).
Install Mechanism
There is no install spec and no remote download/install step; the skill is instruction-only with bundled Python scripts. Nothing in the manifest writes or executes code fetched from external URLs during installation, which reduces installation-time risk.
Credentials
The skill declares no required environment variables or credentials. However, integration examples reference external webhook variables (e.g., SLACK_WEBHOOK_URL) and CLI commands that rely on an existing OpenClaw installation and its credentials. If you enable the roadmap features (OpenRouter/Anthropic price polling) or modify BENCHMARK_SOURCES to call external APIs, those will likely require API keys—none are declared now. Be aware future/modified versions could ask for unrelated secrets.
Persistence & Privilege
always:false and the skill does not auto-enable itself. Documentation and examples recommend scheduling runs with cron and programmatically changing OpenClaw config (`openclaw config set`). Those are reasonable for the skill's goal but create persistent changes (cron jobs, config updates) under your account if you follow the examples. The skill itself does not request elevated privileges or modify other skills.
如何使用
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install model-benchmarks - 安装完成后,直接呼叫该 Skill 的名称或使用
/model-benchmarks触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
🚀 Model Benchmarks v1.0.0 - Initial Release
🧠 CORE FEATURES:
• Real-time AI capability tracking from multiple leaderboards
• LMSYS Chatbot Arena integration (100+ models, daily updates)
• BigCode programming leaderboard (50+ models, weekly updates)
• HuggingFace Open LLM leaderboard (200+ models, daily updates)
• Alpaca Eval instruction-following benchmark (80+ models)
💰 COST OPTIMIZATION:
• Performance-per-dollar calculations for all tracked models
• 445x cost efficiency discovery (Gemini 2.0 Flash vs expensive models)
• Task-specific model recommendations (coding, writing, analysis, translation, math, creative, simple)
• Real-time pricing integration from OpenRouter and provider APIs
📊 INTELLIGENT ANALYSIS:
• Unified 0-100 scoring system across all capabilities
• Multi-dimensional performance tracking (general, reasoning, creative, coding, knowledge, comprehension)
• Trend analysis and performance change detection
• Export capabilities for custom analysis (JSON, CSV)
🔗 PERFECT INTEGRATION:
• Seamless compatibility with model-manager skill
• Auto-sync capabilities to compute routing systems
• CLI and programmatic API access
• Cross-platform Python implementation (3.8+)
🎯 PROVEN RESULTS:
• Users report 60-95% AI cost reduction
• Data-driven model selection replaces guesswork
• Discover hidden gem models with superior cost efficiency
• Optimize for specific task types with intelligence
FIRST RELEASE - Complete AI intelligence platform for OpenClaw optimization!
元数据
常见问题
AI Intelligence Hub - Real-time Model Capability Tracking 是什么?
Real-time AI model capability tracking via leaderboards (LMSYS Arena, HuggingFace, etc.) for intelligent compute routing and cost optimization. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 378 次。
如何安装 AI Intelligence Hub - Real-time Model Capability Tracking?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install model-benchmarks」即可一键安装,无需额外配置。
AI Intelligence Hub - Real-time Model Capability Tracking 是免费的吗?
是的,AI Intelligence Hub - Real-time Model Capability Tracking 完全免费(开源免费),可自由下载、安装和使用。
AI Intelligence Hub - Real-time Model Capability Tracking 支持哪些平台?
AI Intelligence Hub - Real-time Model Capability Tracking 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 AI Intelligence Hub - Real-time Model Capability Tracking?
由 Notestone(@notestone)开发并维护,当前版本 v1.0.0。
推荐 Skills