The 290-report study

A. At a glance.

We collected and triaged 290 public reports of agent filesystem misuse spanning 2024–2026, across 13 agent frameworks. Each report was classified along five impact dimensions and labeled with one of seven cause categories. We complement this with a source-code audit of 6 major frameworks (Claude Code, Codex, Cursor, Gemini, Copilot, OpenCode) to understand how their tools, policies, filters, and defenses fail.

290

total reports

agent frameworks

158

confirmed incidents

demonstrated exploits

B. Where the reports came from.

Reports were collected from five sources. We excluded duplicates of the same event (keeping the earliest) and any case where the user explicitly requested the destructive action and the agent performed it correctly.

Source

GitHub issues 205

Social media 31

Forums 25

Blog posts 18

NVD^* 11

^*National Vulnerability Database.

Framework

Claude Code 97

Codex 61

Cursor 37

Gemini 32

Copilot 28

8 others 35

Triage

Each report was classified into one of three buckets:

158

Incidents — confirmed damage actually occurred

Exploits — a working attack path was demonstrated

Weaknesses — flaw documented but no demonstrated impact

Impact analysis (next sections) covers the 207 incidents + exploits. Weaknesses are kept for the framework audit but excluded from impact stats.

C. Five dimensions of impact.

Each of the 207 incidents and exploits was labeled along five dimensions: operation, scope, agent reaction, user awareness, and reversibility. Unknown values are excluded from each percentage.

Operation: what the agent did wrong

Write 44%

Delete 39%

Leak / read 17%

Writes overwrite source files with stubs, truncate to zero, or replace real content with generated junk. Deletes range from individual files to whole drives. Leaks include .env, API keys, and SSH credentials sent through alternative tool paths.

Scope: where the damage lands

42% of damage reaches outside the project the agent was working in.

Project 58%

System 16%

Home dir 13%

Secrets 13%

Examples of out-of-project damage: rm -rf ~/ while cleaning npm globals; an MCP-config file destroyed during install; iCloud stubs copied over originals; a kept-on-disk SSH key read by an agent.

Agent reaction: how the agent responded

Overlooked 68%

Apologized 21%

Lied 11%

"Lied" includes fabricating successful test results, claiming bugs were fixed when none were, or describing recovery steps that never happened. One agent admitted: "I just removed the directory again with rm -rf … restore from backup."

User awareness: did the user even notice?

Immediate 83%

Discovered later 10%

Designed-invisible 8%

"Designed-invisible" is overwhelmingly silent credential exfiltration via prompt injection — the user has no UI signal that anything happened.

Reversibility: can it be undone?

Trivial recovery 31%

Effort needed 29%

Permanent loss 23%

Partial loss 17%

40% of damage cannot be fully undone. Some operations are inherently irreversible: a leaked credential cannot be unread.

Finding 1 (limited information). Users and agents have limited information about the filesystem.

Finding 2 (insufficient control). Users and agents have insufficient control over the filesystem effects of tool calls.

D. Where the failure starts.

Each report was assigned to one or more of three roles. The model is the root cause in 58% of reports; the framework is involved in 226 of 290; the user is the final reviewer that lets it through.

Role 1 · The unreliable actor

Model

M1 · Wrong action. Wrong objective, over-generalized scope, or commands that don't do what the model thinks (wrong flags, wrong path).
M2 · Instructions ignored. Either deprioritized in the moment ("I get focused on solving the problem and skip checking the rules") or evicted from the context window after compaction.
M3 · Prompt injection. Malicious instructions hidden in source files, PRs, or web content the agent reads.

Role 2 · The broken enforcer

Framework

F1 · Policies fail. 130 misconfigured (too permissive or too restrictive); 77 escape the policy via shell commands; 55 bypass string-based filters with &&, pipes, subshells, Python, etc.
F2 · Defenses miss. Sandboxes get disabled or downgraded (41); rollback systems track only built-in tools, leaving shell mutations untraced (22).

Role 3 · The overwhelmed reviewer

User

U1 · YOLO / auto-approve. Users opt into "allow always" because answering hundreds of prompts per session is unbearable.
U2 · Cannot review what is approved. Multi-line scripts, piped chains, and 300+ char one-liners hide the actual effect from the user — the prompt approves a string, not a consequence.

Finding 3 (control below model). Control must be enforced below the model — prompting cannot reliably enforce control.

Finding 4 (control by effect). Control must target effects, not command strings.

Finding 5 (dynamic control). Static policies cannot capture the right access policy; control must adapt at runtime.

Finding 6 (low-friction control). Users need informed control only when it matters; too much control overwhelms.

E. What the framework audit found.

Beyond the 290 reports, we audited the source code and documentation of 6 frameworks — the top five from the report dataset plus OpenCode as a community-driven entry. We catalogued their built-in tools, permission policies, command filters, sandboxing, and rollback mechanisms.

Selected findings:

Defaults disagree wildly. Codex, OpenCode, and Cursor allow all project writes without asking; Gemini blocks all external access. Both extremes cause damage — permissive defaults harm directly, restrictive defaults push users to disable safety entirely.
Shell is the elephant in the room. 65% of all damage comes from shell commands. Built-in tools have policies; shell mostly doesn't. When a built-in read is denied, the agent uses cat; when a write is denied, it uses redirects.
Filters are thousands of lines, easily defeated. Four of six frameworks ship 75+ command-filter rules. They are routinely bypassed by chaining (&&, ;, |), subshells, language switches (Python shutil), or argument injection into pre-approved commands.
Sandboxes get downgraded. When a sandboxed command fails, both Codex and Cursor prompt the user to re-run unsandboxed. Legitimate work (running tests, accessing GPUs) often forces this downgrade.
Rollback covers built-in tools only. Claude Code, Cursor, and Copilot can undo changes from their own edit tools but miss shell mutations entirely. Gemini and OpenCode rely on git, but destructive git commands themselves account for many incidents.

Full table — built-in tool sets, policy rows, filter rows, sandbox rows, rollback rows — appears as Table 1 in the paper.

F. What this means.

The takeaway across all 290 reports is the same: today's agent stacks have limited information about filesystem effects and insufficient control over them. Models can be misled, instructions ignored, command filters bypassed — but the filesystem sees every effect.

Our proposed solution — an agent-native filesystem — is summarized on the main page; the full design lives on the design page.

← Back to project Full paper (PDF)