Ask HN: What are the metrics for "AI-generated technical debt"?
Here’s one place where I think proponents and skeptics of agentic coding tools (Claude Code, Codex, etc.) tend to talk past each other:
Proponents say things like:
- “I shipped feature X in days instead of weeks.”
- “I could build this despite not knowing Rust / the framework / the codebase.”
- “This unblocked work that would never have been prioritized.”
Skeptics say things like:
- “This might work for solo projects, but it won’t scale to large codebases with many developers.”
- “You’re trading short-term velocity for long-term maintainability, security, and operability.”
- “You’re creating tons of technical debt that will surface later.”
I’m sympathetic to both sides. But the asymmetry is interesting: The pro side has quantifiable metrics (time-to-ship, features delivered, scope unlocked). The con side often relies on qualitative warnings (maintainability, architectural erosion, future cost).
In most organizations, leadership is structurally biased toward what can be measured: velocity, throughput, roadmap progress. “This codebase is a mess” or “This will be a problem in two years” is a much harder sell than “we shipped this in a week.”
My question: Are there concrete, quantitative ways to measure the quality and long-term cost side of agentic coding?. In other words: if agentic coding optimizes for speed, what are the best metrics that can represent the other side of the tradeoff, so this isn’t just a qualitative craftsmanship argument versus a quantitative velocity argument? One way to think about this is to look at second-order indicators rather than direct “debt” metrics. For example:
- Change failure rate or rollback frequency after AI-assisted changes
- Time-to-fix regressions introduced by generated code
- Ratio of generated code that later gets rewritten or deleted
- Increase in review time or comment volume per PR over time These don’t directly label something as “AI-generated debt,” but they capture the maintenance and coordination costs that tend to show up later. It’s imperfect, but it frames the discussion in measurable signals rather than subjective warnings. Thanks! That makes sense. I suppose this requires commit messages or PRs to indicate code was AI-generated vs. not, or to assume that commits after a certain time period were all from AI coding. It’d be an interesting analysis. Maybe there’s already a study out there. In any case, thank you again!