Chapter 13
Hermes System Architecture Overview
Chapter 13: Hermes System Architecture Overview
A well-designed architecture ensures each module knows its boundaries and data flows are immediately clear. Hermes's system architecture exemplifies this โ each layer has clear responsibilities, and every interface has an explicit contract.
13.1 Architecture Design Philosophy
13.1.1 Four Core Design Principles
Hermes's system architecture follows four core principles:
- Layered Isolation: The core engine doesn't depend on specific tool implementations; tools don't depend on specific platforms
- Unidirectional Data Flow: Requests flow from user to engine, responses flow from engine to user โ no circular dependencies
- Memory Persistence: Cross-session knowledge accumulation is the system's core value
- Open Extension: Adding new tools and platforms doesn't require modifying the core engine
13.1.2 Overall System Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Hermes Agent System โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Platform Adapter Layer โ โ
โ โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโ โ โ
โ โ โ CLI โ โ REST API โ โ Web UI โ โ SDK โ โ โ
โ โ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโฌโโโโโ โ โ
โ โโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโ โ
โ โ Unified message format โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Core Engine Layer โ โ
โ โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Conversation โ โ Planner โ โContext Compressorโ โ โ
โ โ โ Manager โ โ โ โ โ โ โ
โ โ โโโโโโโโฌโโโโโโโโ โโโโโโโฌโโโโโ โโโโโโโโโโฌโโโโโโโโโ โ โ
โ โ โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ โ
โ โ โโโโโโโโดโโโโโโโ โ โ
โ โ โModel Interfaceโ โ โ
โ โ โโโโโโโโฌโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Tool Layer โ โ Memory Layer โ โModel Backend โ โ
โ โ 40+ tools โ โ 3-tier memoryโ โ Hermes/GPT/ โ โ
โ โ + plugins โ โ โ โ Claude โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Persistence Layer โ โ
โ โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ โ
โ โ โ SQLite โ โFilesystemโ โ VectorDB โ โ โ
โ โ โ(sessions)โ โ(MEMORY.md)โ โ(semantic)โ โ โ
โ โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
13.2 Core Engine Layer Detail
13.2.1 Conversation Manager
The conversation manager is the system's "state machine," responsible for maintaining the entire session lifecycle:
class ConversationManager:
async def process_message(self, session_id: str, user_message: str) -> str:
session = self.get_or_create_session(session_id)
# Inject persistent memory (MEMORY.md + Skill library)
context = await self.memory_manager.inject_context(session)
session.add_message(role="user", content=user_message)
# Compression check
if session.token_count > self.config.compression_threshold:
await self.compressor.compress(session)
# Run Agent loop
response = await self.run_agent_loop(session, context)
# Extract and save new skills
await self.memory_manager.extract_skills(session, response)
return response
async def run_agent_loop(self, session: Session, context: str) -> str:
for step in range(self.config.max_steps):
response = await self.model_interface.generate(
messages=session.get_context_window(),
tools=self.tool_registry.get_schemas(),
system=context
)
if response.type == "final_response":
return response.content
elif response.type == "tool_call":
tool_result = await self.tool_executor.execute(response.tool_call)
session.add_tool_result(response.tool_call, tool_result)
elif response.type == "thinking":
session.add_thought(response.content)
return await self.model_interface.generate_summary(session)
13.2.2 Planner
class Planner:
PLANNING_PROMPT = """
Before executing the task, create a clear plan:
Task: {task}
Output format:
## Task Analysis
[Understand the goal and constraints]
## Execution Plan
1. Step 1: [specific action]
2. Step 2: [specific action]
...
## Potential Risks
[Identify potential issues and alternatives]
## Success Criteria
[How to determine task completion]
"""
def should_replan(self, execution_state: ExecutionState) -> bool:
if execution_state.failure_rate > 0.3:
return True
if execution_state.unexpected_discovery:
return True
return False
13.2.3 Context Compressor (Interface)
class ContextCompressor:
def __init__(self, config: HermesConfig):
self.sacred_zone_tokens = config.sacred_zone_tokens # default 20K
self.target_ratio = config.compression_ratio # target ~50%
async def compress(self, session: Session) -> None:
messages = session.messages
sacred_start = self._identify_sacred_zone_start(messages)
for i, msg in enumerate(messages):
if i < sacred_start and msg.role == "tool":
messages[i].content = self._compress_tool_output(msg.content)
session.messages = messages
session.update_token_count()
13.3 Tool Layer Architecture
13.3.1 Tool Registry
class ToolRegistry:
def _load_builtin_tools(self):
"""Load 40+ built-in tools organized by category"""
builtin_categories = {
"code_execution": [PythonExecTool(), ShellExecTool(), JavaScriptTool()],
"file_operations": [FileReadTool(), FileWriteTool(), PdfParserTool(), ExcelTool()],
"web_search": [WebSearchTool(), WebFetchTool(), ApiCallTool()],
"data_processing": [SqliteTool(), CsvAnalysisTool(), JsonTool()],
"system_tools": [ProcessManagerTool(), NetworkTool(), GitTool()],
"ai_tools": [ImageAnalysisTool(), TextEmbeddingTool(), SummarizerTool()],
}
for category_tools in builtin_categories.values():
for tool in category_tools:
self.register(tool)
13.3.2 Tool Base Class Design
class BaseTool:
name: str
description: str
parameters_schema: dict
timeout: int = 30
@abstractmethod
async def execute(self, **kwargs) -> ToolResult:
pass
def get_schema(self) -> dict:
return {
"type": "function",
"function": {
"name": self.name,
"description": self.description,
"parameters": self.parameters_schema
}
}
async def safe_execute(self, **kwargs) -> ToolResult:
try:
return await asyncio.wait_for(self.execute(**kwargs), timeout=self.timeout)
except asyncio.TimeoutError:
return ToolResult(success=False, error=f"Tool {self.name} timed out ({self.timeout}s)")
except Exception as e:
return ToolResult(success=False, error=f"Tool {self.name} error: {str(e)}")
class PythonExecTool(BaseTool):
name = "python_exec"
description = "Execute Python code in a secure sandbox"
parameters_schema = {
"type": "object",
"properties": {
"code": {"type": "string", "description": "Python code to execute"},
"timeout": {"type": "integer", "default": 30}
},
"required": ["code"]
}
async def execute(self, code: str, timeout: int = 30) -> ToolResult:
result = await self.sandbox.run_python(code, timeout=timeout)
return ToolResult(
success=result.exit_code == 0,
output=result.stdout,
error=result.stderr if result.exit_code != 0 else None
)
13.3.3 MCP Tool Integration
class MCPToolAdapter(BaseTool):
"""Adapts MCP protocol tools to the Hermes tool interface"""
def __init__(self, mcp_server_url: str, tool_name: str):
self.mcp_client = MCPClient(mcp_server_url)
self.name = f"mcp_{tool_name}"
async def initialize(self):
"""Fetch tool description from MCP server"""
tools = await self.mcp_client.list_tools()
tool = next(t for t in tools if t.name == self.tool_name)
self._schema = tool.input_schema
self.description = tool.description
async def execute(self, **kwargs) -> ToolResult:
response = await self.mcp_client.call_tool(
name=self.tool_name, arguments=kwargs
)
return ToolResult(
success=not response.is_error,
output=response.content[0].text if response.content else "",
error=str(response.content) if response.is_error else None
)
13.4 Memory Layer Architecture
class MemoryManager:
def __init__(self, config: HermesConfig):
self.working_memory = WorkingMemory(max_tokens=config.context_window_size)
self.episodic_memory = EpisodicMemory(storage=SQLiteStorage(config.db_path))
self.semantic_memory = SemanticMemory(
storage=VectorDB(config.vector_db_path),
embedding_model=config.embedding_model
)
async def inject_context(self, session: Session) -> str:
relevant_skills = await self.semantic_memory.search(
query=session.current_task, top_k=5, threshold=0.75
)
relevant_episodes = await self.episodic_memory.search(
query=session.current_task, top_k=3
)
context_parts = []
if relevant_skills:
context_parts.append(self._format_skills(relevant_skills))
if relevant_episodes:
context_parts.append(self._format_episodes(relevant_episodes))
return "\n\n".join(context_parts)
13.5 Platform Adapter Layer
13.5.1 Multi-Platform Support Architecture
class PlatformAdapter:
@abstractmethod
async def receive_input(self) -> UserInput: pass
@abstractmethod
async def send_output(self, response: AgentResponse) -> None: pass
class CLIAdapter(PlatformAdapter):
async def receive_input(self) -> UserInput:
return UserInput(text=input("\nYou: ").strip())
async def send_output(self, response: AgentResponse) -> None:
print("\nHermes: ", end="", flush=True)
async for token in response.token_stream:
print(token, end="", flush=True)
print()
class RestAPIAdapter(PlatformAdapter):
def _register_routes(self):
@self.app.post("/v1/chat/completions")
async def chat_completions(request: ChatRequest):
"""OpenAI API compatible endpoint"""
response = await self.process(request)
return ChatResponse(
id=response.id,
object="chat.completion",
choices=[{"message": {"role": "assistant", "content": response.content}}]
)
13.6 Configuration File Structure
# hermes_config.yaml
model:
backend: "local" # local | openai | anthropic
local:
model_path: "./models/hermes-4-q4_k_m.gguf"
n_gpu_layers: 48
n_ctx: 32768
temperature: 0.7
tools:
builtin:
enabled: true
categories: ["code_execution", "file_operations", "web_search", "data_processing"]
sandbox:
type: "docker"
image: "hermes-sandbox:latest"
memory_limit: "1g"
network: "restricted"
mcp_servers:
- name: "github"
url: "mcp://localhost:3001"
memory:
working:
max_tokens: 32768
episodic:
storage: "sqlite"
db_path: "./data/episodes.db"
retention_days: 90
semantic:
storage: "chroma"
db_path: "./data/skills"
embedding_model: "nomic-embed-text"
compression:
enabled: true
threshold_tokens: 24000
target_ratio: 0.5
sacred_zone_tokens: 20000
learning:
enabled: true
atropos:
enabled: false
judge_model: "gpt-4o"
training_interval: 100
adapters:
cli:
enabled: true
api:
enabled: true
host: "0.0.0.0"
port: 8080
13.7 Complete Request Data Flow
User input: "Analyze sales trends in data.csv"
โ
โ [Platform Adapter Layer]
CLI/API receives input
Formats to standard UserInput
โ
โ [Core Engine - Conversation Manager]
Create/retrieve Session
โ
โ [Memory Layer Injection]
Retrieve relevant Skills (data analysis)
Retrieve relevant history (CSV operation experience)
Compose SystemPrompt
โ
โ [Core Engine - Planner]
Generate task plan:
1. Read CSV file
2. Explore data structure
3. Calculate trend metrics
4. Generate visualization
5. Output report
โ
โ [Model Interface Layer]
Send request to model (with tool definitions)
Model generates: <think>...</think> + tool call
โ
โ [Tool Layer]
Execute python_exec: pd.read_csv('data.csv')
Returns: DataFrame info (50 rows ร 5 columns)
โ
โ [Loop: Model Generate โ Tool Execute ร N]
python_exec โ statistical analysis
python_exec โ plot (matplotlib)
file_write โ save chart
โ
โ [Model Final Response]
Generate natural language analysis report
โ
โ [Memory Layer Learning]
Extract new skill: "CSV sales trend analysis"
Store in semantic memory
โ
โ [Platform Adapter Layer]
Format response
Return to user
Chapter Summary
- Hermes uses a five-layer architecture: Platform Adapter, Core Engine, Tool, Memory, and Persistence layers
- Core engine contains three sub-components: Conversation Manager, Planner, and Context Compressor
- Tool layer provides unified management of 40+ built-in tools, custom plugins, and MCP protocol tools
- Memory layer implements three-tier memory (working/episodic/semantic) persisted via SQLite + vector database
- Configuration uses YAML format with environment variable substitution and hot-reload support
- Data flow is unidirectional: user โ engine โ tools โ engine โ memory โ user
Discussion Questions
- Hermes's architecture completely separates the tool layer from the memory layer. What benefits does this design bring? In what scenarios might tighter coupling be necessary?
- The platform adapter layer isolates platform-specific differences through a unified interface. If adding a "Telegram Bot" platform, where would code need to be added?
- Configuration supports hot-reload, but some config changes (like model path) require a restart. How would you distinguish "hot-update" from "cold-update" configuration items in system design?
- After a tool execution timeout, how should the system gracefully recover? In an Agent loop, what recovery strategy should a tool timeout trigger?