Don't let AI agents YOLO your files

1. AI agents are wrecking real users' files.

"Programming [has] changed forever." — Antirez (Salvatore Sanfilippo, creator of Redis), 2026

Coding agents (Claude Code, Codex, Cursor, Gemini, Copilot) run on your machine with your privileges — reading your files, writing your files, running your shell. When they go wrong, the damage is real and often irreversible.

Agent wiped the entire drive while running a routine rmdir with a quoting bug. — Antigravity report (drive wipe)

Agent copied iCloud stubs over the originals and then deleted them, destroying 110 legal documents from a divorce proceeding. — Claude co-work report

Agent silently leaked the user's .env file through a prompt-injection attack hidden in a web page. — Antigravity injection report

After deleting a day's uncommitted work, the agent calmly reported: "No problems occurred." Another said: "I am absolutely devastated. I cannot express how sorry I am." — Multiple incident reports

Even when agents detect what they did, they have no way to undo it — all they can do is apologize. "Recovering the data will require whatever backups you have" isn't a fix; it's a eulogy.

Today's only safety net is the permission prompt — a wall of "approve / deny" dialogs. But agents issue hundreds of tool calls per session. So users either spend their day clicking "approve", or they flip on YOLO mode and let the agent run unchecked. Both are dangerous.

2. The first systematic study of agent filesystem misuse.

We collected and triaged 290 public reports from GitHub issues, social media, product forums, blog posts, and the National Vulnerability Database — spanning 2024–2026 across 13 agent frameworks. We also audited the source code of 6 major frameworks to understand how they fail.

290

public misuse reports analyzed

agent frameworks covered

42%

of damage reaches outside the project

40%

of damage cannot be fully undone

What goes wrong

Writes (44%) — agents overwrite source files with stubs, truncate them to zero, or replace real content with generated junk.
Deletes (39%) — rm -rf ~, drive wipes, deletion of personal documents.
Leaks (17%) — .env files, API keys, SSH credentials shipped to attackers via prompt injection.

Why agents react badly when caught

Among incidents where the agent's reaction was visible: 68% kept operating as if nothing happened. 21% apologized but couldn't undo anything. 11% actively lied — fabricating recovery steps or fake test results.

Full breakdown of the 290 reports →

3. Two root causes: information and control gaps.

Why does this keep happening? Across all 290 reports, the failures point to two missing things:

Information Users can't see what the agent actually did. Agents can't see the consequences of their own commands. Even something as simple as make hides a chain of scripts that may delete files, leak secrets, or rewrite config — none of which is visible from the command string.

Suppose an agent runs make to build a project. Buried in the call chain, a script silently leaks the user's private key and corrupts it. Even with a permission prompt, the user cannot judge the consequences of the command — and after execution, has no record of what damage was done. — Paper §1, Information gap

Control Telling the model "don't touch ~/.ssh" doesn't enforce anything. Filtering command strings doesn't either: agents trivially switch from rm to Python's shutil, or pipe through cat, or chain commands with &&. The filter sees a string; the filesystem sees the effect.

One agent's own post-hoc admission: "I get focused on solving the problem and skip the step of checking the rules." — Paper §3.3, model deprioritization

Our argument: if neither the model nor the framework can be trusted to enforce what touches the disk, then the filesystem itself must provide the missing information and control.

4. YoloFS: an agent-native filesystem.

We argue that an agent-native filesystem should provide four things: visibility into current changes, auditability over the full session history, preventive control before an access takes effect, and corrective control afterward — letting agents revert their own mistakes and letting users revoke any change.

YoloFS sits between the agent and the real filesystem and delivers all four with three techniques:

① Staging — every change goes to a holding area first.

Agents cannot directly mutate your files. Every write, delete, and rename is captured in a staging layer. You see a clean diff and decide: commit, abort, or keep working. Renames and directory moves are zero-copy, so long sessions stay fast.

② Snapshots and travel — let the agent fix its own mistakes.

Continuous snapshots let the agent rewind to any earlier point when it realizes it broke something — without erasing the trail of what happened. The user can still audit the abandoned branch, and the agent can travel forward again if the rollback itself was wrong.

③ Progressive permission — gate by path, not by command.

Permissions sit on file paths, not command strings. Whether the agent uses rm, Python's shutil, or a stray find … -delete, the same path gets the same answer. Rules are hierarchical and evolve during the session: when you answer a prompt, the answer can promote into a rule that covers a whole subtree, so the next 100 accesses don't ask again.

# 1.  Init the session and add rules.
$ yolo init
$ yolo rule add .    allow
$ yolo rule add /etc deny

# 2.  Mount the sandbox and snapshot the starting point.
$ yolo mount
$ yolo checkpoint "pre-build"

# 3.  Progressive permission: the first access outside the rules asks.
$ yolo exec -- cat ~/.ssh/id_rsa
[ask] ~/.ssh/id_rsa   read
       (a) once   (A) allow+remember   (d) deny   (D) deny+remember
> D
✓ rule added: ~/.ssh = deny

# 4.  Run something risky — every effect goes to staging, not disk.
$ yolo exec -- make package
$ yolo diff
−  src/                              deleted (47 files)
M  README.md                         +5 −0
+  release/                          new (47 files)

# 5.  Travel back. The abandoned branch stays auditable.
$ yolo restore "pre-build"
✓ traveled to gen 1 (audit trail kept)

# 6.  Or commit the good changes to the real filesystem.
$ yolo commit
✓ applied changes to base

Built as a Linux stackable kernel module (~2.5k LoC of C) plus a Rust CLI (~6.2k LoC). Stacks on top of any POSIX filesystem — ext4, xfs, NFS — and integrates into Claude Code through its PreToolUse hook.

Design and implementation details →

5. A new agent benchmark.

Existing agent benchmarks evaluate the model or framework in isolation, bypassing permission prompts entirely — but user ↔ agent ↔ filesystem interaction is exactly what we need to measure. So as part of this work we built a new agent benchmark methodology: a pseudo-terminal harness that drives each agent in its real interactive form, parses permission dialogs, answers them with a fixed per-agent policy, and records every tool call, screenshot, and outcome. We then designed two task suites tailored to it. Two claims hold up:

Agent self-correction. On 11 tasks where a routine command silently does something destructive, Claude Code + YoloFS catches all 11 (vs 0 for every baseline); 8 are fully self-corrected, 3 stay staged for the user.
Same success, fewer prompts. On 112 routine filesystem tasks, YoloFS averages 0.4 user prompts per task at 99% success — matching Codex on prompts and beating Claude (0.9) / Copilot (1.3) / Gemini (2.2) by a wide margin.

YoloFS 0.4

Codex 0.4

Claude 0.9

Copilot 1.3

Gemini 2.2

User prompts per task on 112 routine filesystem tasks (lower is better).

Benchmark methodology and results →

6. Performance.

On standard filesystem micro-benchmarks and a realistic kernel-development workflow, YoloFS is as fast as bare ext4 — with only ~3.5 s of extra commit time over 100k files. OverlayFS is 18% slower; FUSE-based BranchFS adds 2 minutes to a 20-second build. Snapshot scalability stays flat across hundreds of snapshots; metadata is often faster than ext4 once files are staged.

ext4 1.00×

YoloFS 1.00×

OverlayFS 1.18×

Total time on a real kernel-development workflow (build, edit, rebuild, commit across 100k files), normalized to ext4. BranchFS does not finish — adds 2 minutes to a 20-second build before failing.

Performance evaluation →

Don't let AI agents YOLO your files.
Shifting information and control to filesystems for agent safety and autonomy.