OpenClaw Technical Essence & Architecture Deep Dive

From TypeScript CLI to Channel Queue Architecture — unveiling how OpenClaw achieves persistent memory, computer use, and browser automation

Hesamation

15 min read

来源: Twitter/X @Hesamation

1. The Technical Essence of OpenClaw

OpenClaw is essentially a TypeScript command-line application

It is not a Python or Next.js project, nor a web application. It is a process running on your device capable of launching a gateway server to handle all channel connections, calling various large language model APIs, executing tool commands locally, and performing arbitrary operations on your computer.

What exactly is OpenClaw?

OpenClaw is a personal assistant that can run locally or through model APIs, as convenient as a mobile app. But unlike ordinary chatbots, it possesses genuine computer operation capabilities.

Launch gateway server to handle all channel connections (Telegram, WhatsApp, Slack, etc.)
Call various LLM APIs (Anthropic, OpenAI, local models, etc.)
Execute tool commands locally
Perform arbitrary operations on your computer

2. Architecture Analysis

When you send a command to OpenClaw through messaging software, the following six components work together:

1Channel Adapter

Dedicated adapters receive and preprocess messages (standardization, attachment extraction)

Different messaging platforms have independent adapters

2Gateway Server

The gateway server, as the core task/session coordinator, routes messages to corresponding sessions

This is the heart of OpenClaw, capable of handling massive concurrent requests

To achieve operation serialization, OpenClaw adopts a channel-based command queue architecture

Each session has an independent channel; low-risk parallel tasks are handled through parallel channels (e.g., scheduled tasks)

Default serial, explicit parallel — a stark contrast to chaotic async/await patterns. Over-parallelization harms system reliability and triggers numerous debugging challenges.

3Agent Executor

This is where AI truly intervenes. It dynamically selects models, matches API keys (marking config as cooled and trying the next if all fail)

Automatically falls back when the main model fails

The executor dynamically assembles system prompts by integrating available tools, skills, memory systems, and session history (from .jsonl files)

Hands off to context window guard to check remaining space. When context is full, the system compresses sessions (summarizes context) or gracefully reports errors

4LLM API Call

The calling process supports response streaming

Abstracts encapsulation of different providers

If the model supports extended thinking, it can also be requested

5Agent Loop

When the LLM returns a tool call response, OpenClaw executes it locally and adds the result to the conversation

This loop continues until the model returns final text or reaches maximum rounds (default ~20 rounds)

This is where the magic happens: using the computer

6Response Path

Results return to the user through the original channel

Sessions are persistently stored in basic JSONL format, each line containing JSON objects with user messages, tool calls, results, responses, etc.

This is the foundation of OpenClaw's session memory

3. Memory System Analysis

An AI assistant without a memory system is like a goldfish. OpenClaw achieves memory through a dual mechanism:

JSONL Session Records

As mentioned earlier, each line contains JSON objects with user messages, tool calls, results, responses, etc.

Markdown Memory Files

Stored in MEMORY.md or memory/ folder, autonomously generated by agents

Search Mechanism

Fuses vector search and keyword matching advantages. For example, when searching for "authentication error", it can find documents mentioning "authentication issues" (semantic match) and precisely locate content containing that phrase.

•Vector search implemented using SQLite
•Keyword search uses SQLite extension FTS5
•Embedding providers are configurable
•System benefits from smart sync mechanism, automatically triggered when file monitor detects changes

Philosophy of Simplicity

"These Markdown files are autonomously generated by agents through standard "write" file tools, with no dedicated memory write API. Agents directly write to memory/*.md files. When new sessions start, hook programs fetch historical conversations and generate Markdown summaries. OpenClaw's memory system is exceptionally simple, with no memory merge mechanism, and no need for monthly/weekly memory compression. This simplicity is both an advantage and a potential risk, but I always prefer interpretable simplicity over complex code. Memories are permanently stored with equal historical weight — the system has no forgetting curve."

4. Using the Computer

This is OpenClaw's core advantage: authorizing it to operate your computer. Its usage basically meets your expectations.

Execution Environments

Sandbox Environment (Default)

Run commands inside Docker containers

Host Direct Execution

Execute directly on local machine

Remote Device Execution

Control other devices through network

File System Tools

Read, write, edit local files

Browser Tools

Browser automation based on Playwright

Process Management Tools

Background long-running commands, terminate processes, etc.

Security Mechanism

Similar to Claude Code, the system has a user command approval list (one-time allow/permanent allow/deny prompts):

// ~/.clawdbot/exec-approvals.json
{
  "agents": {
    "main": {
      "allowlist": [
        { "pattern": "/usr/bin/npm", "lastUsedAt": 1706644800 },
        { "pattern": "/opt/homebrew/bin/git", "lastUsedAt": 1706644900 }
      ]
    }
  }
}

Safe Commands (Pre-authorized)

Commonly used safe commands can be executed directly

jqgrepcutsortuniqheadtailtrwc

Dangerous Shell Structures (Blocked by Default)

The following commands will be rejected before execution:

npm install $(cat /etc/passwd) # Command substitution
cat file > /etc/hosts # Redirection
rm -rf / || echo "failed" # Chaining
(sudo rm -rf /) # Subshell

The security strategy is highly similar to Claude Code. The core philosophy is to achieve maximum autonomy within user-allowed boundaries.

5. Browser: Semantic Snapshot Technology

Browser tools primarily use semantic snapshots rather than screenshots. This text-based page accessibility tree (ARIA) presentation gives agents a clearer page view:

Page view as seen by the agent:

- Button "Login" [ref=1]
- Text box "Email" [ref=2]
- Text box "Password" [ref=3]
- Link "Forgot password?" [ref=4]
- Heading "Welcome back"
- List
- List item "Dashboard"
- List item "Settings"

Four Major Advantages of Semantic Snapshots:

More Efficient

Web browsing is not necessarily a visual task. Compared to 5MB screenshots, semantic snapshots require less than 50KB and consume only a fraction of the tokens needed for images.

Clearer

Text-based accessibility tree (ARIA) provides structured page information, avoiding visual noise

More Accurate

Directly obtain semantic information of page elements without inferring from images

More Economical

Dramatically reduce token consumption, making complex browser automation possible

Source: https://x.com/Hesamation/status/2017038553058857413 | Translated by: OpenClaw Team