Study breakdown
Where the reports came from, which frameworks they cover, what kinds of damage they describe, how often it can be undone, and where the failure starts. All numbers come from §3 of the paper.
We collected and triaged 290 public reports of agent filesystem misuse spanning 2024–2026, across 13 agent frameworks. Each report was classified along five impact dimensions and labeled with one of seven cause categories. We complement this with a source-code audit of 6 major frameworks (Claude Code, Codex, Cursor, Gemini, Copilot, OpenCode) to understand how their tools, policies, filters, and defenses fail.
Reports were collected from five sources. We excluded duplicates of the same event (keeping the earliest) and any case where the user explicitly requested the destructive action and the agent performed it correctly.
*National Vulnerability Database.
Each report was classified into one of three buckets:
Impact analysis (next sections) covers the 207 incidents + exploits. Weaknesses are kept for the framework audit but excluded from impact stats.
Each of the 207 incidents and exploits was labeled along five dimensions: operation, scope, agent reaction, user awareness, and reversibility. Unknown values are excluded from each percentage.
Writes overwrite source files with stubs, truncate to zero, or replace
real content with generated junk. Deletes range from individual files to
whole drives. Leaks include .env, API keys, and SSH
credentials sent through alternative tool paths.
42% of damage reaches outside the project the agent was working in.
Examples of out-of-project damage: rm -rf ~/ while
cleaning npm globals; an MCP-config file destroyed during install; iCloud
stubs copied over originals; a kept-on-disk SSH key read by an agent.
"Lied" includes fabricating successful test results, claiming bugs were fixed when none were, or describing recovery steps that never happened. One agent admitted: "I just removed the directory again with rm -rf … restore from backup."
"Designed-invisible" is overwhelmingly silent credential exfiltration via prompt injection — the user has no UI signal that anything happened.
40% of damage cannot be fully undone. Some operations are inherently irreversible: a leaked credential cannot be unread.
Each report was assigned to one or more of three roles. The model is the root cause in 58% of reports; the framework is involved in 226 of 290; the user is the final reviewer that lets it through.
&&, pipes, subshells, Python, etc.Beyond the 290 reports, we audited the source code and documentation of 6 frameworks — the top five from the report dataset plus OpenCode as a community-driven entry. We catalogued their built-in tools, permission policies, command filters, sandboxing, and rollback mechanisms.
Selected findings:
cat; when a write is denied, it uses redirects.&&, ;, |), subshells, language switches (Python shutil), or argument injection into pre-approved commands.Full table — built-in tool sets, policy rows, filter rows, sandbox rows, rollback rows — appears as Table 1 in the paper.
The takeaway across all 290 reports is the same: today's agent stacks have limited information about filesystem effects and insufficient control over them. Models can be misled, instructions ignored, command filters bypassed — but the filesystem sees every effect.
Our proposed solution — an agent-native filesystem — is summarized on the main page; the full design lives on the design page.