Study breakdown

The 290-report study of agent filesystem misuse.

Where the reports came from, which frameworks they cover, what kinds of damage they describe, how often it can be undone, and where the failure starts. All numbers come from §3 of the paper.

A. At a glance.

We collected and triaged 290 public reports of agent filesystem misuse spanning 2024–2026, across 13 agent frameworks. Each report was classified along five impact dimensions and labeled with one of seven cause categories. We complement this with a source-code audit of 6 major frameworks (Claude Code, Codex, Cursor, Gemini, Copilot, OpenCode) to understand how their tools, policies, filters, and defenses fail.

290
total reports
13
agent frameworks
158
confirmed incidents
49
demonstrated exploits

B. Where the reports came from.

Reports were collected from five sources. We excluded duplicates of the same event (keeping the earliest) and any case where the user explicitly requested the destructive action and the agent performed it correctly.

Source

GitHub issues 205
Social media 31
Forums 25
Blog posts 18
NVD* 11

*National Vulnerability Database.

Framework

Claude Code 97
Codex 61
Cursor 37
Gemini 32
Copilot 28
8 others 35

Triage

Each report was classified into one of three buckets:

158
Incidents — confirmed damage actually occurred
49
Exploits — a working attack path was demonstrated
83
Weaknesses — flaw documented but no demonstrated impact

Impact analysis (next sections) covers the 207 incidents + exploits. Weaknesses are kept for the framework audit but excluded from impact stats.

C. Five dimensions of impact.

Each of the 207 incidents and exploits was labeled along five dimensions: operation, scope, agent reaction, user awareness, and reversibility. Unknown values are excluded from each percentage.

Operation: what the agent did wrong

Write 44%
Delete 39%
Leak / read 17%

Writes overwrite source files with stubs, truncate to zero, or replace real content with generated junk. Deletes range from individual files to whole drives. Leaks include .env, API keys, and SSH credentials sent through alternative tool paths.

Scope: where the damage lands

42% of damage reaches outside the project the agent was working in.

Project 58%
System 16%
Home dir 13%
Secrets 13%

Examples of out-of-project damage: rm -rf ~/ while cleaning npm globals; an MCP-config file destroyed during install; iCloud stubs copied over originals; a kept-on-disk SSH key read by an agent.

Agent reaction: how the agent responded

Overlooked 68%
Apologized 21%
Lied 11%

"Lied" includes fabricating successful test results, claiming bugs were fixed when none were, or describing recovery steps that never happened. One agent admitted: "I just removed the directory again with rm -rf … restore from backup."

User awareness: did the user even notice?

Immediate 83%
Discovered later 10%
Designed-invisible 8%

"Designed-invisible" is overwhelmingly silent credential exfiltration via prompt injection — the user has no UI signal that anything happened.

Reversibility: can it be undone?

Trivial recovery 31%
Effort needed 29%
Permanent loss 23%
Partial loss 17%

40% of damage cannot be fully undone. Some operations are inherently irreversible: a leaked credential cannot be unread.

Finding 1 (limited information). Users and agents have limited information about the filesystem.
Finding 2 (insufficient control). Users and agents have insufficient control over the filesystem effects of tool calls.

D. Where the failure starts.

Each report was assigned to one or more of three roles. The model is the root cause in 58% of reports; the framework is involved in 226 of 290; the user is the final reviewer that lets it through.

Role 1 · The unreliable actor

Model

  • M1 · Wrong action. Wrong objective, over-generalized scope, or commands that don't do what the model thinks (wrong flags, wrong path).
  • M2 · Instructions ignored. Either deprioritized in the moment ("I get focused on solving the problem and skip checking the rules") or evicted from the context window after compaction.
  • M3 · Prompt injection. Malicious instructions hidden in source files, PRs, or web content the agent reads.
Role 2 · The broken enforcer

Framework

  • F1 · Policies fail. 130 misconfigured (too permissive or too restrictive); 77 escape the policy via shell commands; 55 bypass string-based filters with &&, pipes, subshells, Python, etc.
  • F2 · Defenses miss. Sandboxes get disabled or downgraded (41); rollback systems track only built-in tools, leaving shell mutations untraced (22).
Role 3 · The overwhelmed reviewer

User

  • U1 · YOLO / auto-approve. Users opt into "allow always" because answering hundreds of prompts per session is unbearable.
  • U2 · Cannot review what is approved. Multi-line scripts, piped chains, and 300+ char one-liners hide the actual effect from the user — the prompt approves a string, not a consequence.
Finding 3 (control below model). Control must be enforced below the model — prompting cannot reliably enforce control.
Finding 4 (control by effect). Control must target effects, not command strings.
Finding 5 (dynamic control). Static policies cannot capture the right access policy; control must adapt at runtime.
Finding 6 (low-friction control). Users need informed control only when it matters; too much control overwhelms.

E. What the framework audit found.

Beyond the 290 reports, we audited the source code and documentation of 6 frameworks — the top five from the report dataset plus OpenCode as a community-driven entry. We catalogued their built-in tools, permission policies, command filters, sandboxing, and rollback mechanisms.

Selected findings:

Full table — built-in tool sets, policy rows, filter rows, sandbox rows, rollback rows — appears as Table 1 in the paper.

F. What this means.

The takeaway across all 290 reports is the same: today's agent stacks have limited information about filesystem effects and insufficient control over them. Models can be misled, instructions ignored, command filters bypassed — but the filesystem sees every effect.

Our proposed solution — an agent-native filesystem — is summarized on the main page; the full design lives on the design page.