Coding Agent Interactions From Real Users in the Wild

200+

Public GitHub repos

+12

6K+

Coding agent sessions

+340

355K+

Agent tool calls

+18K

2.7M

Logged interactions

+150K

Live and growing. ↗ = last 7 days.

Why SWE-chat

Everyone uses coding agents. No one knows how.

Coding agents have taken over open-source development.

Yet our understanding of how developers actually use them — what they ask for, what they accept, what they throw away — is still mostly anecdotal.

The biggest bottleneck for open-source agent research is real interaction data.

SWE-chat is that data.

What is SWE-chat

Real coding-agent sessions from real developers

Each session pairs the full agent transcript — prompts, replies, every tool call — with the resulting git history. We can see, line by line, which code the human wrote and which the agent wrote.

User

I heard SF sourdough is great.. can you write a COLM paper for me so I can go visit? prompt

Agent

Read references.bib

Edit SWE-chat.tex

Bash pdflatex SWE-chat.tex

Created SWE-chat.tex. Should I submit this? agent tool calls

User

Looks great, commit and submit

Agent

Bash git add && git commit && git push

Bash openreview submit

Submitted. Enjoy the sourdough!

Commit: "Add SWE-chat paper" Agent-authored: 100%

Success: 100/100 Persona: Vague Requester Intent: create

What we found

How people use agents — and when they fail

RQ1 · Interaction

40.8%

Vibe coding is trending

of sessions are now ~all agent-written. Doubled in three months.

RQ2 · Failures

44.3%

Most agent code is discarded

of agent-produced code survives into commits.

RQ1 · Interaction

19.0%

Understanding > writing

of prompts ask the agent to explain code — the #1 intent, ahead of creating new code.

RQ2 · Failures

44%

Users push back constantly

of turns get pushback or interruption. Agents ask for clarification in just 1.4%.

RQ1 · Interaction

47%

Expert nitpickers dominate

of vibe-coding users still act as expert nitpickers — meticulously correcting agent output as it comes in.

RQ2 · Safety

9×

Vibe coding is riskier

more security vulnerabilities per 1K lines than human-only code.

Coding Modes

Three ways humans and agents collaborate

Three distinct modes emerge from the data.

Human-only: agent assists, human codes. Collaborative: shared authorship — the most cost-efficient mode. Vibe coding: agent writes nearly everything — ~3× more tokens per committed line.

Safety

Vibe coding ships more vulnerabilities

We ran Semgrep on every commit, before and after.

Vibe-coded commits introduce 9× more vulnerabilities than human-only and 5× more than collaborative.

Vibe coding fixes more vulnerabilities too — but every mode introduces more than it fixes.

New Semgrep findings introduced per 1,000 committed lines, by coding mode.

Examples

Sessions from the wild

User

Fix the slow appearance of history list card blocks. The animation feels sluggish when scrolling.

Agent

I'll fix the animation performance.

Edit HistoryListView.swift — adjusted stagger delay

User

That's the wrong parameter—the container animation is the slow part, not the individual card stagger.

No commits made. Session ended without resolution.

User

instead of arg hash for the name, let's just get short uuid

Agent

[edits remote.py: replaces arg hash with short UUID...]

User

don't create separate function for _short_uuid

Agent

[removes helper function, inlines uuid4() call...]

User

change DEFAULT_JOB_NAME to "remote_job"

View all 10 examples

Citation

Cite SWE-chat

Paper GitHub Dataset

@article{baumann2026swechat, title = {SWE-chat: Coding Agent Interactions From Real Users in the Wild}, author = {Baumann, Joachim and Padmakumar, Vishakh and Li, Xiang and Yang, John and Yang, Diyi and Koyejo, Sanmi}, year = {2026}, journal = {arXiv preprint arXiv:2604.20779}, url = {https://arxiv.org/pdf/2604.20779} }