第 13 章

Hermes 整体系统架构概览

第十三章：Hermes 整体系统架构概览

一个优秀的架构设计，让每个模块都知道自己的边界，让数据流向一目了然。Hermes 的系统架构正是这样一个典范——它的每一层都有清晰的职责，每一个接口都有明确的契约。

13.1 架构设计哲学

13.1.1 四个核心设计原则

Hermes 的系统架构遵循四个核心原则：

分层隔离（Layered Isolation）：核心引擎不依赖具体工具实现，工具不依赖具体平台
数据流单向（Unidirectional Data Flow）：请求从用户到引擎，响应从引擎到用户，避免循环依赖
记忆持久化（Memory Persistence）：跨会话的知识积累是系统核心价值
开放扩展（Open Extension）：新工具和新平台的接入不需要修改核心引擎

13.1.2 系统整体架构图

┌─────────────────────────────────────────────────────────────────┐
│                    Hermes Agent System                           │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                 平台适配层 (Platform Adapter Layer)       │    │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────┐ │    │
│  │  │  CLI     │  │  REST API│  │  Web UI  │  │ SDK    │ │    │
│  │  │ Interface│  │ Endpoint │  │(Open-    │  │(Python)│ │    │
│  │  │          │  │          │  │ WebUI)   │  │        │ │    │
│  │  └────┬─────┘  └────┬─────┘  └────┬─────┘  └───┬────┘ │    │
│  └───────┼─────────────┼─────────────┼─────────────┼──────┘    │
│          │             │             │             │             │
│          └─────────────┴──────┬──────┴─────────────┘            │
│                               │ 统一消息格式                     │
│                               ↓                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                 核心引擎层 (Core Engine Layer)            │    │
│  │                                                          │    │
│  │  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐ │    │
│  │  │  对话管理器  │    │  规划器     │    │  压缩引擎   │ │    │
│  │  │ Conversation│    │  Planner    │    │  Compressor │ │    │
│  │  │  Manager    │    │             │    │             │ │    │
│  │  └──────┬──────┘    └──────┬──────┘    └──────┬──────┘ │    │
│  │         └──────────────────┴──────────────────┘         │    │
│  │                            │                             │    │
│  │                     ┌──────┴──────┐                     │    │
│  │                     │  模型接口层  │                     │    │
│  │                     │Model Interface│                    │    │
│  │                     └──────┬──────┘                     │    │
│  └────────────────────────────┼─────────────────────────── ┘    │
│                               │                                  │
│          ┌────────────────────┼────────────────────┐            │
│          │                    │                    │             │
│          ↓                    ↓                    ↓             │
│  ┌───────────────┐  ┌──────────────────┐  ┌───────────────┐    │
│  │  工具层        │  │  记忆层          │  │  模型后端层   │    │
│  │  Tool Layer   │  │  Memory Layer    │  │  Model Backend│    │
│  │               │  │                  │  │               │    │
│  │  ┌─────────┐  │  │  ┌────────────┐ │  │  ┌─────────┐ │    │
│  │  │内置工具  │  │  │  │ 工作记忆   │ │  │  │Hermes 4 │ │    │
│  │  │(40+)   │  │  │  │(上下文窗口)│ │  │  │ (local) │ │    │
│  │  ├─────────┤  │  │  ├────────────┤ │  │  ├─────────┤ │    │
│  │  │自定义工具│  │  │  │ 情节记忆   │ │  │  │GPT-4o   │ │    │
│  │  │(插件)  │  │  │  │(会话历史)  │ │  │  │ (API)   │ │    │
│  │  ├─────────┤  │  │  ├────────────┤ │  │  ├─────────┤ │    │
│  │  │MCP 工具  │  │  │  │ 语义记忆   │ │  │  │Claude   │ │    │
│  │  │(协议)  │  │  │  │(Skill库)  │ │  │  │ (API)   │ │    │
│  │  └─────────┘  │  │  └────────────┘ │  │  └─────────┘ │    │
│  └───────────────┘  └──────────────────┘  └───────────────┘    │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              持久化层 (Persistence Layer)                │    │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐              │    │
│  │  │ SQLite   │  │ 文件系统  │  │ 向量数据库│              │    │
│  │  │(会话/技能)│  │(MEMORY.md)│  │(语义检索) │              │    │
│  │  └──────────┘  └──────────┘  └──────────┘              │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

13.2 核心引擎层详解

13.2.1 对话管理器（Conversation Manager）

对话管理器是系统的"状态机"，负责维护整个会话的生命周期：

class ConversationManager:
    """
    管理对话状态、轮次控制和消息路由
    """
    def __init__(self, config: HermesConfig):
        self.config = config
        self.sessions: Dict[str, Session] = {}
        self.compressor = ContextCompressor(config)
        self.memory_manager = MemoryManager(config)
    
    async def process_message(
        self, 
        session_id: str, 
        user_message: str,
        attachments: Optional[List[Attachment]] = None
    ) -> str:
        """主消息处理入口"""
        
        # 1. 获取或创建会话
        session = self.get_or_create_session(session_id)
        
        # 2. 注入持久记忆（MEMORY.md + Skill 库）
        context = await self.memory_manager.inject_context(session)
        
        # 3. 添加用户消息
        session.add_message(role="user", content=user_message)
        
        # 4. 压缩检查：如果接近上下文限制，触发压缩
        if session.token_count > self.config.compression_threshold:
            await self.compressor.compress(session)
        
        # 5. 执行 Agent 循环
        response = await self.run_agent_loop(session, context)
        
        # 6. 提取并保存新技能
        await self.memory_manager.extract_skills(session, response)
        
        return response
    
    async def run_agent_loop(self, session: Session, context: str) -> str:
        """Agent 主循环：思考-行动-观察"""
        for step in range(self.config.max_steps):
            # 获取模型响应
            response = await self.model_interface.generate(
                messages=session.get_context_window(),
                tools=self.tool_registry.get_schemas(),
                system=context
            )
            
            # 解析响应类型
            if response.type == "final_response":
                return response.content
            elif response.type == "tool_call":
                # 执行工具并获取结果
                tool_result = await self.tool_executor.execute(response.tool_call)
                session.add_tool_result(response.tool_call, tool_result)
            elif response.type == "thinking":
                # 保存思维链（不作为工具调用）
                session.add_thought(response.content)
        
        # 超出最大步数，强制返回
        return await self.model_interface.generate_summary(session)

13.2.2 规划器（Planner）

规划器负责将复杂任务分解为可执行的子任务序列：

class Planner:
    """
    任务规划与分解
    使用思维链（CoT）在行动前生成显式计划
    """
    
    PLANNING_PROMPT = """
在开始执行任务之前，请先制定一个清晰的计划：

任务：{task}

请按以下格式输出计划：
## 任务分析
[理解任务目标和约束]

## 执行计划
1. 第一步：[具体行动]
2. 第二步：[具体行动]
...

## 可能的风险
[识别潜在问题和备选方案]

## 成功标准
[如何判断任务完成]
"""
    
    async def create_plan(self, task: str, available_tools: List[str]) -> TaskPlan:
        plan_response = await self.model.generate(
            self.PLANNING_PROMPT.format(task=task),
            max_tokens=1000
        )
        return self.parse_plan(plan_response)
    
    def should_replan(self, execution_state: ExecutionState) -> bool:
        """判断是否需要重新规划"""
        # 超过 30% 的步骤失败 → 重新规划
        if execution_state.failure_rate > 0.3:
            return True
        # 发现了意外信息 → 可能需要调整计划
        if execution_state.unexpected_discovery:
            return True
        return False

13.2.3 压缩引擎（Context Compressor）

压缩引擎是 Hermes 处理长任务的关键组件，详细内容见第十六章，这里给出接口定义：

class ContextCompressor:
    def __init__(self, config: HermesConfig):
        self.sacred_zone_tokens = config.sacred_zone_tokens  # 默认 20K
        self.target_ratio = config.compression_ratio  # 目标压缩率 ~50%
    
    async def compress(self, session: Session) -> None:
        """执行上下文压缩"""
        messages = session.messages
        
        # 确定神圣区边界
        sacred_start = self._identify_sacred_zone_start(messages)
        
        # 压缩神圣区之前的工具输出
        for i, msg in enumerate(messages):
            if i < sacred_start and msg.role == "tool":
                messages[i].content = self._compress_tool_output(msg.content)
        
        session.messages = messages
        session.update_token_count()

13.3 工具层架构

13.3.1 工具注册与发现

class ToolRegistry:
    """工具注册中心"""
    
    def __init__(self):
        self._tools: Dict[str, BaseTool] = {}
        self._load_builtin_tools()
    
    def _load_builtin_tools(self):
        """加载 40+ 内置工具"""
        builtin_categories = {
            "代码与计算": [
                PythonExecTool(),      # Python 代码执行
                ShellExecTool(),       # Shell 命令执行
                JavaScriptTool(),      # JavaScript 执行
            ],
            "文件操作": [
                FileReadTool(),        # 文件读取
                FileWriteTool(),       # 文件写入
                FileBrowserTool(),     # 文件浏览
                PdfParserTool(),       # PDF 解析
                ExcelTool(),           # Excel 操作
            ],
            "网络与搜索": [
                WebSearchTool(),       # 网络搜索
                WebFetchTool(),        # 网页抓取
                ApiCallTool(),         # REST API 调用
            ],
            "数据处理": [
                SqliteTool(),          # SQLite 数据库
                CsvAnalysisTool(),     # CSV 分析
                JsonTool(),            # JSON 处理
            ],
            "系统工具": [
                ProcessManagerTool(),  # 进程管理
                NetworkTool(),         # 网络诊断
                GitTool(),             # Git 操作
            ],
            "AI 工具": [
                ImageAnalysisTool(),   # 图像分析（调用外部 API）
                TextEmbeddingTool(),   # 文本向量化
                SummarizerTool(),      # 文本摘要
            ]
        }
        for category_tools in builtin_categories.values():
            for tool in category_tools:
                self.register(tool)
    
    def register(self, tool: BaseTool):
        self._tools[tool.name] = tool
    
    def get_schemas(self) -> List[dict]:
        """获取所有工具的 JSON Schema"""
        return [tool.get_schema() for tool in self._tools.values()]

13.3.2 工具基类设计

class BaseTool:
    """所有工具的基类"""
    
    name: str
    description: str
    parameters_schema: dict
    
    @abstractmethod
    async def execute(self, **kwargs) -> ToolResult:
        """执行工具逻辑"""
        pass
    
    def get_schema(self) -> dict:
        """返回 JSON Schema 格式的工具描述"""
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters_schema
            }
        }
    
    async def safe_execute(self, **kwargs) -> ToolResult:
        """带错误处理的安全执行"""
        try:
            result = await asyncio.wait_for(
                self.execute(**kwargs),
                timeout=self.timeout
            )
            return result
        except asyncio.TimeoutError:
            return ToolResult(
                success=False,
                error=f"工具 {self.name} 执行超时（{self.timeout}s）"
            )
        except Exception as e:
            return ToolResult(
                success=False,
                error=f"工具 {self.name} 执行错误: {str(e)}"
            )


class PythonExecTool(BaseTool):
    """Python 代码安全执行工具"""
    
    name = "python_exec"
    description = "在安全沙箱中执行 Python 代码"
    timeout = 30
    
    parameters_schema = {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "要执行的 Python 代码"
            },
            "timeout": {
                "type": "integer",
                "description": "执行超时时间（秒），默认 30",
                "default": 30
            }
        },
        "required": ["code"]
    }
    
    async def execute(self, code: str, timeout: int = 30) -> ToolResult:
        # 在 Docker 容器中执行代码
        result = await self.sandbox.run_python(code, timeout=timeout)
        return ToolResult(
            success=result.exit_code == 0,
            output=result.stdout,
            error=result.stderr if result.exit_code != 0 else None
        )

13.3.3 MCP（Model Context Protocol）工具集成

class MCPToolAdapter(BaseTool):
    """
    将 MCP 协议工具适配为 Hermes 工具接口
    允许接入任意 MCP 兼容的第三方工具
    """
    
    def __init__(self, mcp_server_url: str, tool_name: str):
        self.mcp_client = MCPClient(mcp_server_url)
        self.name = f"mcp_{tool_name}"
        self._schema = None
    
    async def initialize(self):
        """从 MCP 服务器获取工具描述"""
        tools = await self.mcp_client.list_tools()
        tool = next(t for t in tools if t.name == self.tool_name)
        self._schema = tool.input_schema
        self.description = tool.description
    
    async def execute(self, **kwargs) -> ToolResult:
        response = await self.mcp_client.call_tool(
            name=self.tool_name,
            arguments=kwargs
        )
        return ToolResult(
            success=not response.is_error,
            output=response.content[0].text if response.content else "",
            error=str(response.content) if response.is_error else None
        )

13.4 记忆层架构

13.4.1 三层记忆的接口设计

class MemoryManager:
    """记忆管理器：协调三层记忆系统"""
    
    def __init__(self, config: HermesConfig):
        # 工作记忆（当前上下文）
        self.working_memory = WorkingMemory(
            max_tokens=config.context_window_size
        )
        # 情节记忆（历史会话）
        self.episodic_memory = EpisodicMemory(
            storage=SQLiteStorage(config.db_path)
        )
        # 语义记忆（Skill 知识库）
        self.semantic_memory = SemanticMemory(
            storage=VectorDB(config.vector_db_path),
            embedding_model=config.embedding_model
        )
    
    async def inject_context(self, session: Session) -> str:
        """将持久记忆注入当前上下文"""
        
        # 1. 从语义记忆检索相关 Skill
        relevant_skills = await self.semantic_memory.search(
            query=session.current_task,
            top_k=5,
            threshold=0.75
        )
        
        # 2. 从情节记忆检索相关历史
        relevant_episodes = await self.episodic_memory.search(
            query=session.current_task,
            top_k=3
        )
        
        # 3. 组合注入文本
        context_parts = []
        if relevant_skills:
            context_parts.append(self._format_skills(relevant_skills))
        if relevant_episodes:
            context_parts.append(self._format_episodes(relevant_episodes))
        
        return "\n\n".join(context_parts)
    
    def _format_skills(self, skills: List[Skill]) -> str:
        skills_text = "\n".join([
            f"### 技能：{s.name}\n{s.description}\n```python\n{s.code}\n```"
            for s in skills
        ])
        return f"## 相关技能（来自过往经验）\n\n{skills_text}"

13.5 平台适配层

13.5.1 多平台支持架构

class PlatformAdapter:
    """平台适配器基类"""
    
    @abstractmethod
    async def receive_input(self) -> UserInput:
        pass
    
    @abstractmethod
    async def send_output(self, response: AgentResponse) -> None:
        pass
    
    @abstractmethod
    def format_message(self, message: str) -> str:
        """平台特定的消息格式化"""
        pass


class CLIAdapter(PlatformAdapter):
    """命令行界面适配器"""
    
    async def receive_input(self) -> UserInput:
        user_input = input("\n你: ").strip()
        return UserInput(text=user_input)
    
    async def send_output(self, response: AgentResponse) -> None:
        # 流式输出
        print("\nHermes: ", end="", flush=True)
        async for token in response.token_stream:
            print(token, end="", flush=True)
        print()  # 换行
    
    def format_message(self, message: str) -> str:
        return message  # CLI 无需特殊格式化


class RestAPIAdapter(PlatformAdapter):
    """REST API 适配器"""
    
    def __init__(self, app: FastAPI):
        self.app = app
        self._register_routes()
    
    def _register_routes(self):
        @self.app.post("/v1/chat/completions")
        async def chat_completions(request: ChatRequest):
            """兼容 OpenAI API 格式"""
            response = await self.process(request)
            return ChatResponse(
                id=response.id,
                object="chat.completion",
                choices=[{
                    "message": {
                        "role": "assistant",
                        "content": response.content
                    }
                }]
            )


class OpenWebUIAdapter(PlatformAdapter):
    """Open WebUI 适配器（WebSocket）"""
    
    async def handle_websocket(self, websocket: WebSocket):
        await websocket.accept()
        while True:
            data = await websocket.receive_json()
            response = await self.agent.process(data["message"])
            await websocket.send_json({
                "type": "response",
                "content": response
            })

13.6 配置文件结构解析

13.6.1 主配置文件

# hermes_config.yaml
# Hermes Agent 系统配置

# 模型后端配置
model:
  backend: "local"  # local | openai | anthropic | custom
  
  # 本地模型配置
  local:
    model_path: "./models/hermes-4-q4_k_m.gguf"
    n_gpu_layers: 48          # GPU 加速层数
    n_ctx: 32768              # 上下文窗口大小
    n_batch: 512              # 批处理大小
    temperature: 0.7
    top_p: 0.9
    repeat_penalty: 1.1
  
  # API 模型配置（备选）
  openai:
    api_key: "${OPENAI_API_KEY}"
    model: "gpt-4o"
    max_tokens: 4096

# 工具配置
tools:
  builtin:
    enabled: true
    categories:
      - "code_execution"
      - "file_operations"
      - "web_search"
      - "data_processing"
      - "system_tools"
  
  # 沙箱安全配置
  sandbox:
    type: "docker"            # docker | subprocess | none
    image: "hermes-sandbox:latest"
    memory_limit: "1g"
    cpu_limit: "0.5"
    network: "restricted"     # restricted | none | host
    allowed_hosts:
      - "pypi.org"
      - "api.duckduckgo.com"
  
  # 自定义工具
  custom:
    - path: "./tools/company_api.py"
      class: "CompanyAPITool"
  
  # MCP 工具服务器
  mcp_servers:
    - name: "github"
      url: "mcp://localhost:3001"
    - name: "notion"
      url: "mcp://localhost:3002"

# 记忆系统配置
memory:
  enabled: true
  
  # 工作记忆
  working:
    max_tokens: 32768
  
  # 情节记忆
  episodic:
    storage: "sqlite"
    db_path: "./data/episodes.db"
    max_episodes: 10000
    retention_days: 90
  
  # 语义记忆（Skill 库）
  semantic:
    storage: "chroma"        # chroma | faiss | qdrant
    db_path: "./data/skills"
    embedding_model: "nomic-embed-text"
    max_skills: 5000

# 上下文压缩配置
compression:
  enabled: true
  threshold_tokens: 24000    # 触发压缩的 token 阈值
  target_ratio: 0.5          # 目标压缩率
  sacred_zone_tokens: 20000  # 神圣区保护大小
  strategy: "tool_output"    # 优先压缩工具输出

# 学习循环配置（自我改进）
learning:
  enabled: true
  skill_extraction:
    min_conversation_length: 5   # 至少 5 轮对话才提取技能
    extraction_model: "local"    # 使用本地模型提取
  atropos:
    enabled: false               # 默认关闭（需要额外计算资源）
    judge_model: "gpt-4o"
    training_interval: 100       # 每 100 次对话训练一次

# 平台适配器配置
adapters:
  cli:
    enabled: true
    history_file: "./data/cli_history.txt"
  
  api:
    enabled: true
    host: "0.0.0.0"
    port: 8080
    auth:
      enabled: true
      api_keys: ["${HERMES_API_KEY}"]
  
  openwebui:
    enabled: false
    websocket_port: 8081

# 日志与监控
logging:
  level: "INFO"              # DEBUG | INFO | WARNING | ERROR
  file: "./logs/hermes.log"
  max_size_mb: 100
  backup_count: 5
  
monitoring:
  metrics_enabled: true
  metrics_port: 9090         # Prometheus 格式

13.6.2 配置热加载

class ConfigManager:
    """支持热加载的配置管理器"""
    
    def __init__(self, config_path: str):
        self.config_path = config_path
        self.config = self._load()
        self._watch_thread = threading.Thread(
            target=self._watch_changes,
            daemon=True
        )
        self._watch_thread.start()
    
    def _load(self) -> HermesConfig:
        with open(self.config_path) as f:
            raw = yaml.safe_load(f)
        # 环境变量替换
        raw = self._resolve_env_vars(raw)
        return HermesConfig(**raw)
    
    def _watch_changes(self):
        """监听配置文件变化，自动热加载"""
        last_mtime = os.path.getmtime(self.config_path)
        while True:
            time.sleep(5)
            current_mtime = os.path.getmtime(self.config_path)
            if current_mtime != last_mtime:
                self.config = self._load()
                last_mtime = current_mtime
                logging.info("配置已热加载")

13.7 数据流向详解

13.7.1 一次完整请求的数据流

用户输入："分析 data.csv 文件中的销售趋势"
    │
    ↓ [平台适配层]
    CLI/API 接收输入
    格式化为标准 UserInput
    │
    ↓ [核心引擎层 - 对话管理器]
    创建/获取 Session
    │
    ↓ [记忆层注入]
    检索相关 Skill（数据分析）
    检索相关历史（CSV 操作经验）
    组合 SystemPrompt
    │
    ↓ [核心引擎层 - 规划器]
    生成任务计划：
    1. 读取 CSV 文件
    2. 探索数据结构
    3. 计算趋势指标
    4. 生成可视化
    5. 输出报告
    │
    ↓ [模型接口层]
    向模型发送请求（含工具定义）
    模型生成：<think>...</think> + 工具调用
    │
    ↓ [工具层]
    执行 python_exec: pd.read_csv('data.csv')
    返回：DataFrame 信息（50 行 × 5 列）
    │
    ↓ [核心引擎层 - 压缩检查]
    Token 计数 < 阈值 → 不触发压缩
    │
    ↓ [循环：模型生成 → 工具执行 × N]
    python_exec → 统计分析
    python_exec → 绘图（matplotlib）
    file_write → 保存图表
    │
    ↓ [模型最终响应]
    生成自然语言分析报告
    │
    ↓ [记忆层学习]
    提取新技能："CSV 销售趋势分析"
    存储到语义记忆
    │
    ↓ [平台适配层]
    格式化响应
    返回给用户

本章小结

Hermes 系统采用五层架构：平台适配层、核心引擎层、工具层、记忆层、持久化层
核心引擎包含对话管理器、规划器和压缩引擎三个子组件
工具层支持 40+ 内置工具、自定义插件和 MCP 协议工具的统一管理
记忆层实现三层记忆（工作/情节/语义），通过 SQLite + 向量数据库持久化
配置文件采用 YAML 格式，支持环境变量替换和热加载
数据流设计为单向：用户→引擎→工具→引擎→记忆→用户

思考题

Hermes 架构中，工具层和记忆层是完全分离的。这种设计带来了哪些好处？在什么场景下可能需要它们有更紧密的耦合？
平台适配层通过统一接口隔离了不同平台的差异。如果要新增一个"Telegram Bot"平台，需要在哪些地方添加代码？
配置文件支持热加载，但某些配置（如模型路径）的变更需要重启才能生效。如何在系统设计中区分"热更新"和"冷更新"的配置项？
工具执行超时后，系统如何优雅地处理？在 Agent 循环中，一个工具超时应该触发什么样的恢复策略？

本章评分

4.7 / 5 (33 评分)

Hermes 整体系统架构概览

第十三章：Hermes 整体系统架构概览

13.1 架构设计哲学

13.1.1 四个核心设计原则

13.1.2 系统整体架构图

13.2 核心引擎层详解

13.2.1 对话管理器（Conversation Manager）

13.2.2 规划器（Planner）

13.2.3 压缩引擎（Context Compressor）

13.3 工具层架构

13.3.1 工具注册与发现

13.3.2 工具基类设计

13.3.3 MCP（Model Context Protocol）工具集成

13.4 记忆层架构

13.4.1 三层记忆的接口设计

13.5 平台适配层

13.5.1 多平台支持架构

13.6 配置文件结构解析

13.6.1 主配置文件

13.6.2 配置热加载

13.7 数据流向详解

13.7.1 一次完整请求的数据流

本章小结

思考题

💬 留言讨论