On Sandboxing Agents

I am still on a quest to stay out of the loop with coding agents, to reach warp speed yoloness. So I am obsessing over sandboxes.

Can I put an agent in a box, give it a task and go to sleep? There are tons of solutions right now but it’s hard to tell which is the right approach.

The reason is that sandboxing agents isn’t one problem but at least two. A local sandbox on a developer machine and a remote multi-tenant sandbox serve different threat models, require different controls, and fail in different ways. Treating them as the same leads to the wrong tradeoffs.

(For the underlying risk model this builds on, see How I Think About Agentic Risks.)

Two sandboxes, two threat models

Local sandboxes

A local sandbox constrains an agent running on a developer machine. The primary risks are:

Ambient secret exposure (SSH keys, credential stores, cloud tokens, dotfiles)
Rogue activity (editing the wrong repo, modifying global config, breaking tooling)
Exfiltration over the network, intentional or incidental, via tools
Hard-to-audit behavior: you don’t know what changed or why

The attacker model here is not a hostile tenant but a confused deputy: an agent steered off course by prompt injection, poisoned context from an MCP server, or plain hallucination. The agent has no malicious intent. It just can’t distinguish trusted instructions from untrusted input, and it has access to everything on your machine.

Remote sandboxes (multi-tenant)

A remote sandbox runs many workloads on shared infrastructure. The risks expand:

Sandbox escape / cross-tenant compromise
Infrastructure abuse (resource exhaustion, cost blowups, outbound misuse)
Credential capture, if secrets ever reach the sandbox
State leakage / remanence across sessions

Here we must assume adversarial workloads. The isolation boundary is foundational: if it fails, the blast radius is platform wide.

Local sandboxes: policy-centric controls

A shortcut that’s held up: local sandboxing is primarily a policy problem. You’re not defending against escape attempts. You’re constraining a well-intentioned but unreliable agent on a machine full of valuable stuff.

The relevant risk amplifiers are capabilities (what tools the agent can invoke), data access (what secrets and context it can see), and untrusted input (prompt injection, poisoned data). These are the knobs we can try to turn when configuring the agent.

The most effective controls:

Workspace boundaries. Make the allowed filesystem surface explicit. No ambient access to home directories, credential stores, or global config. The agent operates in a defined workspace and everything else is out of scope by default.
Secret isolation. Don’t inherit the user’s shell environment. Scope tokens to the task. Treat credentials as explicit inputs, not ambient context.
Network egress policy. Default-deny if possible, otherwise allowlist through a proxy. Exfiltration almost always becomes a networking problem. This is also where most existing tools fall short due to usability: any vanilla container sandbox will kinda force you to configure iptables.
Visibility and reversibility. Diffs, logs, etc to make it easy to see what the agent did and undo it.

Every one of these controls has a friction cost: a sandbox that’s too annoying gets disabled, which is worse than no sandbox because it creates a false sense of security. Local sandboxes must be low-friction by default, with the option to tighten, not the other way around.

Remote sandboxes: boundary-centric controls

Remote sandboxing is a different problem. We’re not managing a single user’s convenience. We are defending shared infrastructure against workloads we don’t control.

The risk amplifiers that matter most here are isolation boundary quality (the escape surface and blast radius if it fails), egress topology (whether the agent can freely phone home or egress is mediated), and platform abuse (CPU/RAM/disk exhaustion, runaway LLM calls, using your infra as a launchpad for scanning, spam, or scraping). Platform abuse deserves explicit attention because it’s the risk that scales with tenancy. One rogue agent is a nuisance a thousand is a serious incident.

The most effective controls:

Isolation boundaries with defense in depth. Design for escape attempts, not just accidents. Assume adversarial workloads.
No secrets in the sandbox. The sandbox should have nothing worth stealing. Credentials and durable state live outside the boundary. This is the single most important design decision.
Egress as a chokepoint. Remove direct outbound access. Force calls through a mediated gateway that enforces policy and records intent. If you can’t see what’s leaving the sandbox, you can’t stop exfiltration.
Resource and cost governance. Quotas, timeouts, concurrency limits, and explicit spend controls for LLM and tool usage.
Ephemerality. Treat sandboxes as disposable. Minimize state, clear aggressively, avoid remanence across sessions.

Remote sandboxing is less about preventing a single bad tool call and more about ensuring bad behavior cannot become a systemic incident.

The control plane becomes the perimeter

Once you adopt “no secrets in the sandbox,” you implicitly create a control plane that sits outside the sandbox boundary:

It holds credentials and long-lived state.
It brokers network access and storage.
It enforces policy, limits, and audit trails.

This is the architectural consequence most teams don’t anticipate. Sandboxing forces you into a broker model whether you planned for one or not. The sandbox becomes constrained and disposable while the control plane becomes durable and high-value. Your security investment shifts accordingly. The control plane is now the thing worth defending, not the sandbox itself.

Good example of this is what Browser Use document here.

This pattern mirrors what LLM gateways already promise to do for model access: mediate, log, enforce policy, and keep credentials out of the hot path. In an agentic architecture the control plane extends that pattern to tools, storage, and network access. Same principle: put a policy aware broker between the untrusted component and everything it shouldn’t touch directly.

Sandboxing here is not only isolation but also deciding where authority lives.