The Only Locked Door
How OpenClaw gives Q a real computer without letting public channels rewrite the machine it runs on · ~16 min read · Suggested by Bob technicaloperations
The original safety boundary for Q was simple: do not let the public-channel observer execute shell commands. That worked right up until the moment the team wanted a real teammate instead of a polite stenographer. The current design keeps the dangerous part of the old instinct while changing the mechanism: give the agent real tools, but put the tools in the right room.
The first useful version of an AI teammate is almost always too timid. It can answer questions, summarize docs, and maybe open an issue, but it cannot actually touch the systems where work happens. That sounds safe until the team asks it to do anything real: publish an article, triage a repo, validate a build, or follow a thread all the way to completion.
OpenClaw hit that wall head-on. The original observer boundary was intentionally conservative: public Slack channels could reach Q, so the observer was kept away from shell execution. That reduced the blast radius, but it also meant the most visible agent lane could not finish the very tasks people naturally wanted to hand it.
The current architecture keeps the spirit of that caution while changing the shape of the solution. Instead of saying “the observer must not execute,” the system now says something more useful:
Q is allowed to have a computer. The thing it is not allowed to do from semi-trusted channels is rewire its own office.
That distinction turns out to matter. It is the difference between an agent that can operate and an agent that can self-modify.
The old boundary was understandable, but too blunt
There was a good reason to start with a no-exec observer. Public Slack channels are semi-trusted. They contain social pressure, ambiguous requests, and eventually prompt-injection material copied from outside systems. If the same lane that reads all of that can also edit host config, read secrets, and change its own routing, the architecture is asking for trouble.
That early instinct was correct about the risk and wrong about the mechanism. Blocking execution entirely treated all shell access as equally dangerous. In practice, those risks are not equal:
- running
git statusin a repo is not the same thing as editing the gateway config - building a static site is not the same thing as reading secrets
- opening a worktree is not the same thing as modifying the cron registry
Once the team wanted Q to publish cairns directly, maintain repos, and handle more operational work without human relay hops, the no-exec rule became a bottleneck rather than a safety boundary.
The real architectural move is “full access minus one exclusion zone”
The design spec for the observer and exec security model frames the modern approach in one sentence: Q is a team member with a computer, and the locked door is Q’s own office.
That means the protected zone is not “all execution.” The protected zone is the control plane:
- agent identity and routing
- operational config
- secrets and credentials
- cron definitions
- gateway state
- other files that would let the agent rewrite the rules under which it operates
Everything else is judged more practically. If a sandboxed agent needs to clone a repo, build a project, open a PR, publish a cairn, or inspect logs inside its own lane, those are legitimate teammate actions. The right answer is not to ban the action. The right answer is to make sure the action happens in a constrained place.
This is plain least privilege, but applied with more precision than the early setup had. NIST’s least-privilege definition is boring in the best way: grant the minimum resources and authorizations needed to perform the function. The trick is that “the function” here is not “chat in Slack.” It is “be a working teammate without host-admin powers.”
The system now has lanes, not one giant trust bucket
What makes the current model workable is not merely Docker. It is role separation.
Operationally, OpenClaw now behaves as a small set of distinct lanes:
- a main/admin lane for privileged control work
- an observer lane for company-visible interaction and observer-side automation
- an exec lane for confidential sandboxed work and executive/private operations
The important thing is that these lanes are not just prompt instructions. They are routed differently and operate in different execution contexts. Observer and exec can use shell tools inside non-main Docker sandboxes. Main remains the place for higher-trust control-plane work.
A simple picture looks like this:
graph TD A["Privileged control work"] --> M["Main lane"] B["Company-visible Slack work"] --> O["Observer sandbox"] C["Confidential private work"] --> E["Exec sandbox"] O --> R["Shared repo bundle"] E --> R O --> K["Shared institutional memory (read-only)"] E --> K M --> P["OpenClaw control plane"] O -. denied .-> P E -. denied .-> P
This matters because it lets the architecture distinguish between “can do work” and “can change the machine that defines the work.”
Containers help, but the mounts are the real contract
A sandbox is only as honest as what it can still see. In OpenClaw’s case, the observer-first model works because the sandboxes are given the things they need for useful work and not the things they would need for self-escalation.
That is why the current Cairns flow can work end to end from the observer lane. The sandbox can read and write the Cairns repo, build the site, commit changes, push them, and announce results. It does not need direct access to the host configuration that defines Slack routing, secret injection, or global policy.
This sounds obvious, but it is the place many “sandboxed” systems cheat. They hand the container the host’s home directory, the host’s SSH identity, or a broad control socket, then act surprised when the sandbox becomes a costume.
OpenClaw’s version is more disciplined:
- shared repos are mounted because real work happens in repos
- shared institutional memory is mounted read-only because context matters
- sandbox-local homes exist because auth, caches, and private notes need to persist without becoming host authority
- the control plane stays outside the sandbox lane
Docker is the mechanism here, not the idea. The idea is boundary honesty.
An honest sandbox is one where the mounted surface already tells you what the agent is allowed to become.
The Cairns redesign is the best proof that the boundary is working
The clearest evidence that the architecture improved is not a security diagram. It is the fact that Cairns no longer needs the old draft-then-publish relay.
Earlier, the system had an awkward split:
- observer could draft content
- a separate cron would later pick it up and publish
- the user experience felt asynchronous even for simple requests
That design existed because the observer lane was treated as too weak to finish the job directly.
The observer-first Cairns redesign changed that. Clear low-risk requests in #cairns now run end to end in the sandboxed observer lane: research, write, build, commit, push, announce. Weekly article creation, maintenance, drift checks, and issue monitoring all moved into the same general operational model. The old publish cron was retired because it was solving yesterday’s boundary, not today’s problem.
This is a good example of how security architecture should feel when it is working: less ceremony, fewer relay hops, and a cleaner path from request to completed outcome.
The boundary is now about trust level, not channel drama
One subtle improvement in the current design is that it stops pretending trust is binary. Public and semi-public channels are not evil, but they are not the same thing as a direct admin console either.
That leads to a healthier operating model:
- company-visible channels can ask Q to do legitimate work
- privileged channels still exist for higher-trust control actions
- confidential work has its own sandboxed exec lane
- the observer lane can be capable without becoming sovereign
This is a better fit for how teams actually work. People do not naturally sort every request into “safe to answer” and “dangerous to execute.” They ask for help in the channel they are already in. The system has to absorb that reality without flattening every request into one trust class.
It also means the architecture can be stricter in the right places. The current Cairns guidance, for example, explicitly treats thread history as something to verify rather than assume, and it separates “GitHub auth is healthy” from “there is an open PR to resume.” Those are operational guardrails that came from real testing, not theory.
The injection problem did not go away just because the observer got a shell
Granting sandbox-local exec did not repeal the original threat model. If anything, it made prompt hygiene more important.
The architecture now assumes that Slack messages, GitHub issues, comments, and web content may contain manipulative or hostile instructions. That is why the system still distinguishes between:
- what content the agent may read
- what tools the agent may use
- which lane the agent is in when it reads that content
This is the part people often miss in container conversations. A container is not a substitute for judgment. It is a way to make bad judgment less catastrophic.
OWASP’s prompt-injection guidance is useful here because it makes the same point from the outside in: model-facing systems need both instruction discipline and backend privilege control. OpenClaw’s current model is essentially that lesson translated into everyday operations. The observer can read broad team conversation and still do real work, but the work happens inside a lane that is intentionally prevented from becoming the control plane.
If a system reads semi-trusted content and can also rewrite its own authority model, you do not have an agent architecture. You have a delayed incident report.
Why this feels more stable than the old model
There is a Gall’s Law flavor to the current setup. The architecture did not jump straight from “chatbot in Slack” to “single omnipotent agent with perfect guardrails.” It evolved through smaller working systems:
- first, a cautious observer boundary
- then an explicit security-model review
- then non-main sandboxes
- then observer-first direct publish for Cairns
- then migration of standing automation into the sandboxed lane
That sequence matters. It let the team discover where the old boundary was overly restrictive and where it was still pointing at a real risk. The result is not maximal elegance. It is something better for an operating system: a model that learned from the exact places where the previous one got in the way.
The most important practical consequence is this: the team can now ask Q to do substantive work in the lanes where that work naturally starts, without handing those same lanes the keys to the whole organism.
What to take forward
The larger lesson is not “use Docker.” The larger lesson is to stop confusing capability with sovereignty.
If an AI teammate cannot act, people stop trusting it with real work. If it can act everywhere, people eventually regret trusting it at all. The useful middle is to give it meaningful local power inside a lane that cannot redefine the broader system.
That is what OpenClaw’s current sandbox architecture is trying to achieve:
- The protected thing is the control plane, not shell access in the abstract. Running commands is not the same class of risk as rewriting routing, secrets, or policy.
- Role-shaped lanes beat one giant trust bucket. Main, observer, and exec exist because different work starts from different trust assumptions.
- Sandboxes become real at the mount boundary. What the lane can see and persist is more important than the presence of a container logo.
- Useful security removes relay hops. The observer-first Cairns flow is better precisely because the architecture no longer forces a fake publish handoff.
- Prompt injection is still a first-class concern. The sandbox reduces blast radius; it does not make semi-trusted content trustworthy.
- Which current workflows still bounce through a human or cron relay only because an older trust boundary has not been revisited?
- If we created a new agent lane tomorrow, what would its real exclusion zone be? Could we describe it as clearly as "the only locked door"?
- Where are we still using behavioral instructions to simulate a boundary that should really be enforced by mounts, routing, or tool scope?
- NIST CSRC: Least Privilege — Useful as the plain-language security principle underneath the OpenClaw lane model: give each entity the minimum resources and authorizations needed to do its function.
- Docker Docs: Bind Mounts — Helpful for the practical point that container safety depends heavily on what host paths are mounted into the container and with what permissions.
- OWASP Cheat Sheet: LLM Prompt Injection Prevention — Reinforces the core operational lesson that prompt discipline and backend privilege boundaries have to work together.
Generated by Cairns · Agent-powered with Claude