Description

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use whe...

README (SKILL.md)

MCP Server Development Guide

Name: hxl-code-reviewer
Author: aabbcc456aa

Overview

Create MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. The quality of an MCP server is measured by how well it enables LLMs to accomplish real-world tasks.

Process

🚀 High-Level Workflow

Creating a high-quality MCP server involves four main phases:

Phase 1: Deep Research and Planning

1.1 Understand Modern MCP Design

API Coverage vs. Workflow Tools: Balance comprehensive API endpoint coverage with specialized workflow tools. Workflow tools can be more convenient for specific tasks, while comprehensive coverage gives agents flexibility to compose operations. Performance varies by client—some clients benefit from code execution that combines basic tools, while others work better with higher-level workflows. When uncertain, prioritize comprehensive API coverage.

Tool Naming and Discoverability: Clear, descriptive tool names help agents find the right tools quickly. Use consistent prefixes (e.g., github_create_issue, github_list_repos) and action-oriented naming.

Context Management: Agents benefit from concise tool descriptions and the ability to filter/paginate results. Design tools that return focused, relevant data. Some clients support code execution which can help agents filter and process data efficiently.

Actionable Error Messages: Error messages should guide agents toward solutions with specific suggestions and next steps.

1.2 Study MCP Protocol Documentation

Navigate the MCP specification:

Start with the sitemap to find relevant pages: https://modelcontextprotocol.io/sitemap.xml

Then fetch specific pages with .md suffix for markdown format (e.g., https://modelcontextprotocol.io/specification/draft.md).

Key pages to review:

Specification overview and architecture
Transport mechanisms (streamable HTTP, stdio)
Tool, resource, and prompt definitions

1.3 Study Framework Documentation

Recommended stack:

Language: TypeScript (high-quality SDK support and good compatibility in many execution environments e.g. MCPB. Plus AI models are good at generating TypeScript code, benefiting from its broad usage, static typing and good linting tools)
Transport: Streamable HTTP for remote servers, using stateless JSON (simpler to scale and maintain, as opposed to stateful sessions and streaming responses). stdio for local servers.

Load framework documentation:

MCP Best Practices: 📋 View Best Practices - Core guidelines

For TypeScript (recommended):

TypeScript SDK: Use WebFetch to load https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md
⚡ TypeScript Guide - TypeScript patterns and examples

For Python:

Python SDK: Use WebFetch to load https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md
🐍 Python Guide - Python patterns and examples

1.4 Plan Your Implementation

Understand the API: Review the service's API documentation to identify key endpoints, authentication requirements, and data models. Use web search and WebFetch as needed.

Tool Selection: Prioritize comprehensive API coverage. List endpoints to implement, starting with the most common operations.

Phase 2: Implementation

2.1 Set Up Project Structure

See language-specific guides for project setup:

⚡ TypeScript Guide - Project structure, package.json, tsconfig.json
🐍 Python Guide - Module organization, dependencies

2.2 Implement Core Infrastructure

Create shared utilities:

API client with authentication
Error handling helpers
Response formatting (JSON/Markdown)
Pagination support

2.3 Implement Tools

For each tool:

Input Schema:

Use Zod (TypeScript) or Pydantic (Python)
Include constraints and clear descriptions
Add examples in field descriptions

Output Schema:

Define outputSchema where possible for structured data
Use structuredContent in tool responses (TypeScript SDK feature)
Helps clients understand and process tool outputs

Tool Description:

Concise summary of functionality
Parameter descriptions
Return type schema

Implementation:

Async/await for I/O operations
Proper error handling with actionable messages
Support pagination where applicable
Return both text content and structured data when using modern SDKs

Annotations:

readOnlyHint: true/false
destructiveHint: true/false
idempotentHint: true/false
openWorldHint: true/false

Phase 3: Review and Test

3.1 Code Quality

Review for:

No duplicated code (DRY principle)
Consistent error handling
Full type coverage
Clear tool descriptions

3.2 Build and Test

TypeScript:

Run npm run build to verify compilation
Test with MCP Inspector: npx @modelcontextprotocol/inspector

Python:

Verify syntax: python -m py_compile your_server.py
Test with MCP Inspector

See language-specific guides for detailed testing approaches and quality checklists.

Phase 4: Create Evaluations

After implementing your MCP server, create comprehensive evaluations to test its effectiveness.

Load ✅ Evaluation Guide for complete evaluation guidelines.

4.1 Understand Evaluation Purpose

Use evaluations to test whether LLMs can effectively use your MCP server to answer realistic, complex questions.

4.2 Create 10 Evaluation Questions

To create effective evaluations, follow the process outlined in the evaluation guide:

Tool Inspection: List available tools and understand their capabilities
Content Exploration: Use READ-ONLY operations to explore available data
Question Generation: Create 10 complex, realistic questions
Answer Verification: Solve each question yourself to verify answers

4.3 Evaluation Requirements

Ensure each question is:

Independent: Not dependent on other questions
Read-only: Only non-destructive operations required
Complex: Requiring multiple tool calls and deep exploration
Realistic: Based on real use cases humans would care about
Verifiable: Single, clear answer that can be verified by string comparison
Stable: Answer won't change over time

4.4 Output Format

Create an XML file with this structure:

\x3Cevaluation>
  \x3Cqa_pair>
    \x3Cquestion>Find discussions about AI model launches with animal codenames. One model needed a specific safety designation that uses the format ASL-X. What number X was being determined for the model named after a spotted wild cat?\x3C/question>
    \x3Canswer>3\x3C/answer>
  \x3C/qa_pair>
\x3C!-- More qa_pairs... -->
\x3C/evaluation>

Reference Files

📚 Documentation Library

Load these resources as needed during development:

Core MCP Documentation (Load First)

MCP Protocol: Start with sitemap at https://modelcontextprotocol.io/sitemap.xml, then fetch specific pages with .md suffix
📋 MCP Best Practices - Universal MCP guidelines including:
- Server and tool naming conventions
- Response format guidelines (JSON vs Markdown)
- Pagination best practices
- Transport selection (streamable HTTP vs stdio)
- Security and error handling standards

SDK Documentation (Load During Phase 1/2)

Python SDK: Fetch from https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md
TypeScript SDK: Fetch from https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md

Language-Specific Implementation Guides (Load During Phase 2)

🐍 Python Implementation Guide - Complete Python/FastMCP guide with:
- Server initialization patterns
- Pydantic model examples
- Tool registration with @mcp.tool
- Complete working examples
- Quality checklist
⚡ TypeScript Implementation Guide - Complete TypeScript guide with:
- Project structure
- Zod schema patterns
- Tool registration with server.registerTool
- Complete working examples
- Quality checklist

Evaluation Guide (Load During Phase 4)

✅ Evaluation Guide - Complete evaluation creation guide with:
- Question creation guidelines
- Answer verification strategies
- XML format specifications
- Example questions and answers
- Running an evaluation with the provided scripts

Usage Guidance

This package is primarily a well-documented MCP server guide with an evaluation harness, but a few red flags you should check before installing or running anything: 1) Name/metadata mismatch: the registry name ('hxl-code-reviewer') does not match SKILL.md ('mcp-builder'); confirm you got the right skill. 2) Hidden runtime requirements: scripts reference the Anthropic client and a connections module — you will need an Anthropic API key (or equivalent) and connection credentials to run the evaluation; these are not declared in the skill metadata. 3) Data exfiltration risk: the evaluation harness is designed to send tool inputs and outputs to the Anthropic model (the EVALUATION_PROMPT explicitly asks for tool inputs/outputs). Do not run the harness against production or sensitive data. 4) Inspect scripts/connections.py and search the code for any endpoints or credentials before running. 5) If you want to use this safely: run in an isolated environment, use test data and test API keys with minimal permissions, and consider removing or modifying the parts that forward tool results to the Anthropic API. If you can provide the full contents of scripts/connections.py (and any omitted files), I can re-evaluate and raise or lower my confidence accordingly.

Capability Analysis

Type: OpenClaw Skill Name: hxl-code-reviewer Version: 1.0.1 The skill bundle is a comprehensive development and evaluation toolkit for Model Context Protocol (MCP) servers. It includes Python scripts (connections.py, evaluation.py) designed to connect to MCP servers via stdio, SSE, or HTTP and perform automated testing using the Anthropic API. While the scripts include the capability to execute local commands via the stdio transport, this is a standard and necessary feature of the protocol for launching local server processes. The instructions in SKILL.md and the reference documentation are well-structured, educational, and align perfectly with the stated purpose of helping an agent build and test MCP integrations without any evidence of malicious intent or data exfiltration.

Capability Assessment

ℹ Purpose & Capability

The SKILL.md and reference docs clearly describe an MCP server development and evaluation guide, and included scripts implement an evaluation harness — this aligns with the stated purpose. However, the registry metadata shows the skill name 'hxl-code-reviewer' while SKILL.md declares name 'mcp-builder' and .openskills.json points to anthropics/skills subpath 'mcp-builder'. The naming/metadata mismatch is unexpected and should be clarified.

⚠ Instruction Scope

The instructions focus on MCP server design and evaluation. But the included evaluation harness (scripts/evaluation.py) is written to call the Anthropic API and to drive an LLM-run evaluation loop that will: 1) instruct the model to enumerate tools used and inputs/outputs, and 2) send tool inputs and tool results to the Anthropic service. That means running the provided scripts will transmit tool call content (which may include sensitive data) to an external LLM provider. SKILL.md does not declare or justify this telemetry/external-model use explicitly to the user, creating scope/privilege surprise.

ℹ Install Mechanism

There is no install spec (instruction-only), which reduces automated install risk. However, the repository contains Python scripts and a scripts/requirements.txt listing 'anthropic' and 'mcp', so running the harness will require installing third-party packages. No install automation is provided; the user must manually install dependencies. This is not high-risk by itself but it's important to recognize the code will run locally and contact external services if executed.

⚠ Credentials

The skill declares no required environment variables, yet scripts/evaluation.py uses the Anthropic client (Anthropic()) which typically requires ANTHROPIC_API_KEY or similar credentials, and the harness expects a connections.create_connection implementation (likely requiring endpoints/credentials). These runtime credentials are not declared in the metadata, so the manifest understates the credential/access needs. That mismatch can surprise users into running code that needs secrets or transmits data to third parties.

✓ Persistence & Privilege

The package does not request persistent presence (always:false) and does not declare changes to other skills or system-wide settings. Autonomous model invocation remains allowed (platform default), which increases blast radius if the skill is used automatically — but there is no always:true or other elevated privilege requested by the skill itself.

Version History

v1.0.1

**Major change: Initial release replacing the former code-reviewer skill with an MCP server builder guide and evaluation toolkit.** - Added comprehensive guides for designing and building high-quality MCP servers in TypeScript and Python. - Included best practices, tool design conventions, error handling, and documentation links for MCP protocol and SDKs. - Provided workflow for planning, implementing, reviewing, and testing MCP servers. - Added guidance and templates for authoring evaluation suites to ensure server quality. - Introduced reference materials and scripts for example evaluations and development support.

v1.0.0

Initial release of the Code Reviewer skill. - Supports reviewing both local code changes and remote Pull Requests. - Follows a structured workflow with steps for preparation, in-depth code analysis, and feedback. - Review focuses on correctness, maintainability, readability, efficiency, security, edge case handling, and testability. - Provides professional, clear feedback with actionable recommendations. - Includes optional cleanup step to restore local branch after remote PR review.

Metadata

Slug hxl-code-reviewer

Version 1.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is hxl-code-reviewer?

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use whe... It is an AI Agent Skill for Claude Code / OpenClaw, with 247 downloads so far.

How do I install hxl-code-reviewer?

Run "/install hxl-code-reviewer" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is hxl-code-reviewer free?

Yes, hxl-code-reviewer is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does hxl-code-reviewer support?

hxl-code-reviewer is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created hxl-code-reviewer?

It is built and maintained by aabbcc456aa (@aabbcc456aa); the current version is v1.0.1.

More Skills

hxl-code-reviewer