System Architecture Deep Dive
OpenClaw implements a complete agent architecture design. Explore its core components and design philosophy.
Core Architecture Components
- Gateway - Unified multi-channel access
- Tools + Skills - Define capability boundaries
- Memory - Persistent memory implementation
- Multi-layer protection - Device security
Table of Contents
1Architecture Overview
From an architectural perspective, you can think of OpenClaw as an intelligent platform with five key functional areas:
Gateway (Entry Point)
Manages sessions, routes requests, and handles authentication. Typically runs locally with the control panel bound to loopback by default, supporting remote access through Tailscale and other private networks.
Agent (Brain)
With a dedicated persona, responsible for understanding context and intent, creating step-by-step plans, and deciding which tools or skills to invoke.
Skills (Toolbox)
A collection of plugins/skills (described in Markdown and scripts) that enable the Agent to "open doors, pour coffee, send emails, run scripts".
Channels (Pathways)
Connects to various apps like WhatsApp, Telegram, Discord, Slack, SMS, etc., enabling seamless communication between AI and users.
Nodes (Sensors/Terminals)
Small agents running on user devices (mobile phones, laptops, Raspberry Pi, desktops) that provide local capabilities such as camera, geolocation, or system notifications.
Component Function Comparison
| Component | Function | Technical Details |
|---|---|---|
| Gateway | Central Control Plane | Node.js daemon running locally or on VPS, responsible for session management, authentication, and routing. |
| Pi Agent (Agent) | Reasoning Brain | Handles natural language, creates task plans, and selects appropriate tools. Supports Claude, GPT-4, and Ollama local models. |
| Skills | Execution Capabilities | Modular plugin system that defines functionality through SKILL.md, supporting file operations, browser control, API calls, etc. |
| Channels | Communication Interface | Connects to users' existing instant messaging apps (WhatsApp, Telegram, Discord, Slack, etc.). |
| Nodes | Device Extension | Lightweight agents running on iOS/Android or macOS, allowing AI to access camera, geolocation, or send system notifications. |
This layered design allows OpenClaw to quickly scale with community skills (skills and MCPs) while flexibly deploying and executing tasks across different devices.
2Gateway: Central Control Plane
Gateway is the core hub of the system—a long-running daemon responsible for managing all message channels and serving as the WebSocket control plane. OpenClaw supports multi-Agent architecture, where one Gateway can host multiple independent Agents.
Three Core Functions of Gateway
Receive Messages
Collect user commands from various channels
Route Dispatch
Decide which Agent should handle this message
Reply Delivery
Send Agent's response back to the corresponding channel
Default Configuration
- WebSocket Endpoint: ws://127.0.0.1:18789
- Canvas Server: HTTP Port 18793, Path /__openClaw__/canvas/
- Recommend running a single Gateway per host (exclusive WhatsApp Web session)
WebSocket Protocol Details
3Agent: Reasoning Engine
Upon receiving messages and tasks, the Agent uses its brain (LLM), hands and feet (Tools), and expertise (Skills) to complete tasks as best as possible. This may include accessing the web, running commands, reading/writing files, writing code, or calling on other Nodes' capabilities (such as camera).
The Core Agent Loop
Question → Think → Plan → Act → Observe → Think → Act → Wait → Check → Correct ... → Complete
The LLM handles "thinking" (deciding what to do), while Tools handle "acting" (executing operations). Execution results are fed back to the LLM as "observations", then the next cycle continues.
💡 This is the fundamental difference between Agent and Chatbot: Chatbots only talk, Agents act.
Multi-Agent Mode
OpenClaw supports multi-Agent mode, where agents can operate independently or collaboratively. Each Agent has its own workspace with dedicated configuration and memory.
4Four Core Stages of Agent
Context Assembly
The Agent needs to tell the LLM "who you are, what you can do, what tools you have, and what the user said". This includes:
- System Prompt: Agent's identity, rules, tool list
- Conversation History: Previous conversation records
- Bootstrap Files: Workspace files like AGENTS.md, SOUL.md, TOOLS.md
OpenClaw concatenates this content into a complete prompt and sends it to the LLM.
Model Inference
After receiving the prompt, the LLM decides on the next action. It may:
- Reply directly to the user
- Call a tool (Tool Call)
- Request more information
Tool Execution
If the LLM decides to call a tool, the Agent will:
- Parse Tool Call parameters
- Execute the corresponding tool (exec, read, write, browser...)
- Return execution results to the LLM
Reply Dispatch
When the LLM generates the final response, the Agent will:
- Format response content
- Send back through Gateway to the corresponding message channel
- Support streaming output (send while generating)
Pi Agent Core Features
Agent Loop: Process user messages, execute tool calls, feed results back to LLM, looping until the model generates a response without tool calls
Event-Driven Architecture: Emits lifecycle events during the loop process, supporting reactive UI
Message Queue: Supports two modes (sequential processing or batch processing)
Tool Streaming: Supports chunk streaming and incremental streaming for real-time output
Core Tools (Only 4 Needed):
bash- Execute shell commandsread- Read file contentswrite- Write file contentsedit- Edit text files
The system prompt is also extremely concise, only about 1000 tokens (including tool definitions), allowing large models to understand the programming agent context.
Summary
OpenClaw's architecture embodies the philosophy of "simple yet powerful": through concise core tools and clear layered architecture, it implements a complete and easily extensible personal AI assistant system.
This design enables OpenClaw to meet the daily needs of individual users while providing developers with flexible extensibility, truly achieving the complete closed loop from concept to implementation of "AI Agent".