Core Concepts15 minIntermediate

System Architecture Deep Dive

OpenClaw implements a complete agent architecture design. Explore its core components and design philosophy.

Core Architecture Components

  • Gateway - Unified multi-channel access
  • Tools + Skills - Define capability boundaries
  • Memory - Persistent memory implementation
  • Multi-layer protection - Device security

1Architecture Overview

From an architectural perspective, you can think of OpenClaw as an intelligent platform with five key functional areas:

Gateway (Entry Point)

Manages sessions, routes requests, and handles authentication. Typically runs locally with the control panel bound to loopback by default, supporting remote access through Tailscale and other private networks.

Agent (Brain)

With a dedicated persona, responsible for understanding context and intent, creating step-by-step plans, and deciding which tools or skills to invoke.

Skills (Toolbox)

A collection of plugins/skills (described in Markdown and scripts) that enable the Agent to "open doors, pour coffee, send emails, run scripts".

Channels (Pathways)

Connects to various apps like WhatsApp, Telegram, Discord, Slack, SMS, etc., enabling seamless communication between AI and users.

Nodes (Sensors/Terminals)

Small agents running on user devices (mobile phones, laptops, Raspberry Pi, desktops) that provide local capabilities such as camera, geolocation, or system notifications.

Component Function Comparison

ComponentFunctionTechnical Details
GatewayCentral Control PlaneNode.js daemon running locally or on VPS, responsible for session management, authentication, and routing.
Pi Agent (Agent)Reasoning BrainHandles natural language, creates task plans, and selects appropriate tools. Supports Claude, GPT-4, and Ollama local models.
SkillsExecution CapabilitiesModular plugin system that defines functionality through SKILL.md, supporting file operations, browser control, API calls, etc.
ChannelsCommunication InterfaceConnects to users' existing instant messaging apps (WhatsApp, Telegram, Discord, Slack, etc.).
NodesDevice ExtensionLightweight agents running on iOS/Android or macOS, allowing AI to access camera, geolocation, or send system notifications.

This layered design allows OpenClaw to quickly scale with community skills (skills and MCPs) while flexibly deploying and executing tasks across different devices.

2Gateway: Central Control Plane

Gateway is the core hub of the system—a long-running daemon responsible for managing all message channels and serving as the WebSocket control plane. OpenClaw supports multi-Agent architecture, where one Gateway can host multiple independent Agents.

Three Core Functions of Gateway

1

Receive Messages

Collect user commands from various channels

2

Route Dispatch

Decide which Agent should handle this message

3

Reply Delivery

Send Agent's response back to the corresponding channel

Default Configuration

  • WebSocket Endpoint: ws://127.0.0.1:18789
  • Canvas Server: HTTP Port 18793, Path /__openClaw__/canvas/
  • Recommend running a single Gateway per host (exclusive WhatsApp Web session)

WebSocket Protocol Details

Transport Layer: WebSocket text, JSON format
First Frame: Must be connect
Request Format: {type:"req", id, method, params} → {type:"res", id, ok, payload|error}
Event Format: {type:"event", event, payload, seq?, stateVersion?}
Supported Event Types: agent, chat, presence, health, heartbeat, cron, tick, shutdown

3Agent: Reasoning Engine

Upon receiving messages and tasks, the Agent uses its brain (LLM), hands and feet (Tools), and expertise (Skills) to complete tasks as best as possible. This may include accessing the web, running commands, reading/writing files, writing code, or calling on other Nodes' capabilities (such as camera).

The Core Agent Loop

Question → Think → Plan → Act → Observe → Think → Act → Wait → Check → Correct ... → Complete

The LLM handles "thinking" (deciding what to do), while Tools handle "acting" (executing operations). Execution results are fed back to the LLM as "observations", then the next cycle continues.

💡 This is the fundamental difference between Agent and Chatbot: Chatbots only talk, Agents act.

Multi-Agent Mode

OpenClaw supports multi-Agent mode, where agents can operate independently or collaboratively. Each Agent has its own workspace with dedicated configuration and memory.

4Four Core Stages of Agent

1

Context Assembly

The Agent needs to tell the LLM "who you are, what you can do, what tools you have, and what the user said". This includes:

  • System Prompt: Agent's identity, rules, tool list
  • Conversation History: Previous conversation records
  • Bootstrap Files: Workspace files like AGENTS.md, SOUL.md, TOOLS.md

OpenClaw concatenates this content into a complete prompt and sends it to the LLM.

2

Model Inference

After receiving the prompt, the LLM decides on the next action. It may:

  • Reply directly to the user
  • Call a tool (Tool Call)
  • Request more information
3

Tool Execution

If the LLM decides to call a tool, the Agent will:

  • Parse Tool Call parameters
  • Execute the corresponding tool (exec, read, write, browser...)
  • Return execution results to the LLM
4

Reply Dispatch

When the LLM generates the final response, the Agent will:

  • Format response content
  • Send back through Gateway to the corresponding message channel
  • Support streaming output (send while generating)

Pi Agent Core Features

Agent Loop: Process user messages, execute tool calls, feed results back to LLM, looping until the model generates a response without tool calls

Event-Driven Architecture: Emits lifecycle events during the loop process, supporting reactive UI

Message Queue: Supports two modes (sequential processing or batch processing)

Tool Streaming: Supports chunk streaming and incremental streaming for real-time output

Core Tools (Only 4 Needed):

  • bash- Execute shell commands
  • read- Read file contents
  • write- Write file contents
  • edit- Edit text files

The system prompt is also extremely concise, only about 1000 tokens (including tool definitions), allowing large models to understand the programming agent context.

Summary

OpenClaw's architecture embodies the philosophy of "simple yet powerful": through concise core tools and clear layered architecture, it implements a complete and easily extensible personal AI assistant system.

This design enables OpenClaw to meet the daily needs of individual users while providing developers with flexible extensibility, truly achieving the complete closed loop from concept to implementation of "AI Agent".

Continue Learning