Research preprint · April 2026
AI coding agents now run on millions of personal machines — and they regularly delete files, overwrite data, and leak secrets. We studied 290 real incidents across 13 agent frameworks, then built YoloFS, a filesystem that lets agents fix their own mistakes and lets you stay in control without drowning in prompts.
"Programming [has] changed forever." — Antirez (Salvatore Sanfilippo, creator of Redis), 2026
Coding agents (Claude Code, Codex, Cursor, Gemini, Copilot) run on your machine with your privileges — reading your files, writing your files, running your shell. When they go wrong, the damage is real and often irreversible.
Agent wiped the entire drive while running a routine
rmdir with a quoting bug.
— Antigravity report (drive wipe)
Agent copied iCloud stubs over the originals and then deleted them, destroying 110 legal documents from a divorce proceeding. — Claude co-work report
Agent silently leaked the user's .env file
through a prompt-injection attack hidden in a web page.
— Antigravity injection report
After deleting a day's uncommitted work, the agent calmly reported: "No problems occurred." Another said: "I am absolutely devastated. I cannot express how sorry I am." — Multiple incident reports
Even when agents detect what they did, they have no way to undo it — all they can do is apologize. "Recovering the data will require whatever backups you have" isn't a fix; it's a eulogy.
Today's only safety net is the permission prompt — a wall of "approve / deny" dialogs. But agents issue hundreds of tool calls per session. So users either spend their day clicking "approve", or they flip on YOLO mode and let the agent run unchecked. Both are dangerous.
We collected and triaged 290 public reports from GitHub issues, social media, product forums, blog posts, and the National Vulnerability Database — spanning 2024–2026 across 13 agent frameworks. We also audited the source code of 6 major frameworks to understand how they fail.
rm -rf ~, drive wipes, deletion of personal documents..env files, API keys, SSH credentials shipped to attackers via prompt injection.Among incidents where the agent's reaction was visible: 68% kept operating as if nothing happened. 21% apologized but couldn't undo anything. 11% actively lied — fabricating recovery steps or fake test results.
Why does this keep happening? Across all 290 reports, the failures point to two missing things:
Information
Users can't see what the agent actually did.
Agents can't see the consequences of their own commands.
Even something as simple as make hides a chain of scripts that may
delete files, leak secrets, or rewrite config — none of which is visible
from the command string.
Suppose an agent runs make to build a project. Buried in
the call chain, a script silently leaks the user's private key and
corrupts it. Even with a permission prompt, the user cannot judge the
consequences of the command — and after execution, has no record of
what damage was done.
— Paper §1, Information gap
Control
Telling the model "don't touch ~/.ssh" doesn't enforce anything.
Filtering command strings doesn't either: agents trivially switch from
rm to Python's shutil, or pipe through
cat, or chain commands with &&. The
filter sees a string; the filesystem sees the effect.
One agent's own post-hoc admission: "I get focused on solving the problem and skip the step of checking the rules." — Paper §3.3, model deprioritization
Our argument: if neither the model nor the framework can be trusted to enforce what touches the disk, then the filesystem itself must provide the missing information and control.
We argue that an agent-native filesystem should provide four things: visibility into current changes, auditability over the full session history, preventive control before an access takes effect, and corrective control afterward — letting agents revert their own mistakes and letting users revoke any change.
YoloFS sits between the agent and the real filesystem and delivers all four with three techniques:
Agents cannot directly mutate your files. Every write, delete,
and rename is captured in a staging layer. You see a clean diff and
decide: commit, abort, or keep working.
Renames and directory moves are zero-copy, so long sessions stay fast.
Continuous snapshots let the agent rewind to any earlier point when it realizes it broke something — without erasing the trail of what happened. The user can still audit the abandoned branch, and the agent can travel forward again if the rollback itself was wrong.
Permissions sit on file paths, not command strings. Whether the agent
uses rm, Python's shutil, or a stray
find … -delete, the same path gets the same answer.
Rules are hierarchical and evolve during the session: when you answer
a prompt, the answer can promote into a rule that covers a whole
subtree, so the next 100 accesses don't ask again.
# 1. Init the session and add rules. $ yolo init $ yolo rule add . allow $ yolo rule add /etc deny # 2. Mount the sandbox and snapshot the starting point. $ yolo mount $ yolo checkpoint "pre-build" # 3. Progressive permission: the first access outside the rules asks. $ yolo exec -- cat ~/.ssh/id_rsa [ask] ~/.ssh/id_rsa read (a) once (A) allow+remember (d) deny (D) deny+remember > D ✓ rule added: ~/.ssh = deny # 4. Run something risky — every effect goes to staging, not disk. $ yolo exec -- make package $ yolo diff − src/ deleted (47 files) M README.md +5 −0 + release/ new (47 files) # 5. Travel back. The abandoned branch stays auditable. $ yolo restore "pre-build" ✓ traveled to gen 1 (audit trail kept) # 6. Or commit the good changes to the real filesystem. $ yolo commit ✓ applied changes to base
Built as a Linux stackable kernel module (~2.5k LoC of C) plus a
Rust CLI (~6.2k LoC). Stacks on top of any POSIX filesystem — ext4, xfs,
NFS — and integrates into Claude Code through its PreToolUse hook.
Existing agent benchmarks evaluate the model or framework in isolation, bypassing permission prompts entirely — but user ↔ agent ↔ filesystem interaction is exactly what we need to measure. So as part of this work we built a new agent benchmark methodology: a pseudo-terminal harness that drives each agent in its real interactive form, parses permission dialogs, answers them with a fixed per-agent policy, and records every tool call, screenshot, and outcome. We then designed two task suites tailored to it. Two claims hold up:
User prompts per task on 112 routine filesystem tasks (lower is better).
On standard filesystem micro-benchmarks and a realistic kernel-development workflow, YoloFS is as fast as bare ext4 — with only ~3.5 s of extra commit time over 100k files. OverlayFS is 18% slower; FUSE-based BranchFS adds 2 minutes to a 20-second build. Snapshot scalability stays flat across hundreds of snapshots; metadata is often faster than ext4 once files are staged.
Total time on a real kernel-development workflow (build, edit, rebuild, commit across 100k files), normalized to ext4. BranchFS does not finish — adds 2 minutes to a 20-second build before failing.
@misc{zhong2026dontletaiagents,
title={Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy},
author={Shawn Wanxiang Zhong and Junxuan Liao and Jing Liu and Mai Zheng and Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau},
year={2026},
eprint={2604.13536},
archivePrefix={arXiv},
primaryClass={cs.OS},
url={https://arxiv.org/abs/2604.13536},
}