Core Concepts15 minIntermediate

System Architecture Deep Dive

OpenClaw implements a complete agent architecture design. Explore its core components and design philosophy.

Core Architecture Components

Gateway - Unified multi-channel access
Tools + Skills - Define capability boundaries
Memory - Persistent memory implementation
Multi-layer protection - Device security

1Architecture Overview

From an architectural perspective, you can think of OpenClaw as an intelligent platform with five key functional areas:

Gateway (Entry Point)

Manages sessions, routes requests, and handles authentication. Typically runs locally with the control panel bound to loopback by default, supporting remote access through Tailscale and other private networks.

Agent (Brain)

With a dedicated persona, responsible for understanding context and intent, creating step-by-step plans, and deciding which tools or skills to invoke.

Skills (Toolbox)

A collection of plugins/skills (described in Markdown and scripts) that enable the Agent to "open doors, pour coffee, send emails, run scripts".

Channels (Pathways)

Connects to various apps like WhatsApp, Telegram, Discord, Slack, SMS, etc., enabling seamless communication between AI and users.

Nodes (Sensors/Terminals)

Small agents running on user devices (mobile phones, laptops, Raspberry Pi, desktops) that provide local capabilities such as camera, geolocation, or system notifications.

Component Function Comparison

Component	Function	Technical Details
Gateway	Central Control Plane	Node.js daemon running locally or on VPS, responsible for session management, authentication, and routing.
Pi Agent (Agent)	Reasoning Brain	Handles natural language, creates task plans, and selects appropriate tools. Supports Claude, GPT-4, and Ollama local models.
Skills	Execution Capabilities	Modular plugin system that defines functionality through SKILL.md, supporting file operations, browser control, API calls, etc.
Channels	Communication Interface	Connects to users' existing instant messaging apps (WhatsApp, Telegram, Discord, Slack, etc.).
Nodes	Device Extension	Lightweight agents running on iOS/Android or macOS, allowing AI to access camera, geolocation, or send system notifications.

This layered design allows OpenClaw to quickly scale with community skills (skills and MCPs) while flexibly deploying and executing tasks across different devices.

2Gateway: Central Control Plane

Gateway is the core hub of the system—a long-running daemon responsible for managing all message channels and serving as the WebSocket control plane. OpenClaw supports multi-Agent architecture, where one Gateway can host multiple independent Agents.

Three Core Functions of Gateway

Receive Messages

Collect user commands from various channels

Route Dispatch

Decide which Agent should handle this message

Reply Delivery

Send Agent's response back to the corresponding channel

Default Configuration

WebSocket Endpoint: ws://127.0.0.1:18789
Canvas Server: HTTP Port 18793, Path /__openClaw__/canvas/
Recommend running a single Gateway per host (exclusive WhatsApp Web session)

WebSocket Protocol Details

Transport Layer: WebSocket text, JSON format

First Frame: Must be connect

Request Format: {type:"req", id, method, params} → {type:"res", id, ok, payload|error}

Event Format: {type:"event", event, payload, seq?, stateVersion?}

Supported Event Types: agent, chat, presence, health, heartbeat, cron, tick, shutdown

3Agent: Reasoning Engine

Upon receiving messages and tasks, the Agent uses its brain (LLM), hands and feet (Tools), and expertise (Skills) to complete tasks as best as possible. This may include accessing the web, running commands, reading/writing files, writing code, or calling on other Nodes' capabilities (such as camera).

The Core Agent Loop

Question → Think → Plan → Act → Observe → Think → Act → Wait → Check → Correct ... → Complete

The LLM handles "thinking" (deciding what to do), while Tools handle "acting" (executing operations). Execution results are fed back to the LLM as "observations", then the next cycle continues.

💡 This is the fundamental difference between Agent and Chatbot: Chatbots only talk, Agents act.

Multi-Agent Mode

OpenClaw supports multi-Agent mode, where agents can operate independently or collaboratively. Each Agent has its own workspace with dedicated configuration and memory.

4Four Core Stages of Agent

Context Assembly

The Agent needs to tell the LLM "who you are, what you can do, what tools you have, and what the user said". This includes:

System Prompt: Agent's identity, rules, tool list
Conversation History: Previous conversation records
Bootstrap Files: Workspace files like AGENTS.md, SOUL.md, TOOLS.md

OpenClaw concatenates this content into a complete prompt and sends it to the LLM.

Model Inference

After receiving the prompt, the LLM decides on the next action. It may:

Reply directly to the user
Call a tool (Tool Call)
Request more information

Tool Execution

If the LLM decides to call a tool, the Agent will:

Parse Tool Call parameters
Execute the corresponding tool (exec, read, write, browser...)
Return execution results to the LLM

Reply Dispatch

When the LLM generates the final response, the Agent will:

Format response content
Send back through Gateway to the corresponding message channel
Support streaming output (send while generating)

Pi Agent Core Features

Agent Loop: Process user messages, execute tool calls, feed results back to LLM, looping until the model generates a response without tool calls

Event-Driven Architecture: Emits lifecycle events during the loop process, supporting reactive UI

Message Queue: Supports two modes (sequential processing or batch processing)

Tool Streaming: Supports chunk streaming and incremental streaming for real-time output

Core Tools (Only 4 Needed):

bash- Execute shell commands
read- Read file contents
write- Write file contents
edit- Edit text files

The system prompt is also extremely concise, only about 1000 tokens (including tool definitions), allowing large models to understand the programming agent context.

Summary

OpenClaw's architecture embodies the philosophy of "simple yet powerful": through concise core tools and clear layered architecture, it implements a complete and easily extensible personal AI assistant system.

This design enables OpenClaw to meet the daily needs of individual users while providing developers with flexible extensibility, truly achieving the complete closed loop from concept to implementation of "AI Agent".

Continue Learning

View All Tutorials Getting Started Guide Official Documentation