A Lesson From the Cockpit

10 min read Original article ↗

The tech industry has declared that AI will increase developer productivity. Task-level productivity feels obvious, though concrete evidence is elusive — for example, METR reported a widely cited result: a 19% decline in productivity with AI, but eight months later, they had to drop the survey as developers increasingly refused to participate without AI tools. “Claude is down” is now like “Github is down.” Productivity metrics may also sound hollow due to a gap between measured productivity and developers’ perceptions of it. I don’t think we understand this enough yet.

End-to-end productivity is anything but clear in brownfield situations — most money-making software companies are brownfields with real customers, real data, and code evolved over years. In brownfields, most technical discovery and decision-making is done via Slack or Zoom meetings, where people need to talk to each other to piece together facts from tribal knowledge. It is yet unclear how to deal with brownfield complexity that seems immune to AI.

Nonetheless, engineering leaders are expected to improve and demonstrate productivity while operating with fewer engineers.

On the other hand, critics have been cautioning with counter-evidence. Erosion of skills, due to cognitive surrender, is a real challenge. Experienced engineers are facing identity loss as AI can do in seconds or minutes what they used to do in days or weeks. I know of engineers who are worried about ever getting promoted, since some junior engineers can do their work faster. Some are worried about losing purpose, too. What would intrinsically motivate a competent engineer when AI can do their work?

I have written about AI contributing to increased entropy in software. Last year’s GitClear study showed a spike in duplicated code blocks, an increase in short-term code churn, and a continued decline in code reuse. There is also evidence of AI-assisted development creating persistent debt, security, and correctness issues. The tools are changing so fast that any specific finding may be outdated within months, but the patterns — deskilling, entropy, cognitive surrender — are consistent across studies and across domains. The specific numbers change, but the direction doesn’t.

As I continued my sense-making research, I found something striking: the current AI productivity argument has a precise and surprising historical parallel.

Back in the 1980s and 1990s, the aviation industry faced similar arguments and concerns when Airbus introduced fly-by-wire automation. Their philosophy was radical at that time: automation should have authority over the pilot. Airbus’ idea was to eliminate human error by constraining what a human could do. Bernard Ziegler, a senior Vice President of engineering at Airbus, said at that time that he was building an airplane that even his concierge could fly.

Boeing held the opposite design philosophy: the pilot is the final authority, and the human must always be able to override automation. That philosophy influenced the design of their fly-by-wire aircraft.

Regardless of these philosophical differences, aviation reality played out scenarios that the software industry might face.

In the 1994 Nagoya crash, the pilots and an Airbus A300 worked at cross-purposes — the pilots attempted to pitch the aircraft down while the autopilot was pitching it up. The pilots had the skills and experience, but the system’s behavior was opaque to them in the moment of crisis. This is what happens when humans operate systems they don’t fully comprehend. In the 1995 Cali crash, the pilots could not adapt to the changed circumstances and made errors in judgment, putting a Boeing 757 on a collision course with a 9,800-ft mountain. That’s cognitive surrender in a cockpit. Then, in the 2009 Air France crash, the autopilot disengaged after icing knocked out its airspeed data, and the crew failed to manually recover from a high-altitude stall they inadvertently induced. This is skill atrophy.

The current AI productivity arguments map on this exactly. One camp says trust AI, that it will continue to get better to drive extreme productivity and sky-high returns on investment. The other says cognitive surrender is real; never-skilling (not acquiring foundational reasoning skills) and deskilling (erosion of such reasoning skills) are dangerous, so trust the human. Neither camp was entirely wrong in aviation, and neither is wrong now with AI.

What software engineering hasn’t yet learned?

What Other Disciplines Learned

Fly-by-wire automation reduced aviation accident rates by an order of magnitude. However, overreliance on automation, mismatches between the pilot’s understanding of the system state and the actual automation, and the lack of transparency about what’s going on have had severe consequences in the aviation industry.

About 70-80% of aviation accidents are now attributed to human error, with the rest attributed to equipment failures. A 2019 NYT report pointed out that pilots in such failures didn’t develop “airmanship,” a skill that develops over time and involves a visceral sense of navigation and a real-world understanding of the interplay among weather, traffic flows, and flight dynamics.

The aviation industry is addressing this problem by requiring manual flight training for newcomers before transitioning to automated flight. That’s the first insight that we can take away in the context of using AI in software engineering.

Next, flight training also involves upset prevention and recovery training, called UPRT. This is mandatory training that teaches pilots to recognize, prevent, and recover from unexpected, high-risk flight situations. It thus improves cognitive understanding and physiological resilience in handling unexpected flight behavior. Both new and experienced pilots go through UPRT.

That’s the second insight. The aviation industry has figured out how to ensure periodic unassisted flying across a pilot’s entire career.

Next, there are global and regional regulatory bodies and independent accident investigation agencies that dictate the above. For instance, the International Civil Aviation Organization sets standards and recommended practices, including UPRT. Regional bodies like the FAA and EASA translate those standards into enforceable compliance requirements. Then, independent bodies like the NTSB and the BEA investigate accidents and issue safety recommendations. Their reports create political and institutional pressure for change.

That’s the third insight — the aviation industry has someone forcing them to act responsibly.

But look at medicine.

AI adoption is rapid in clinical practice, particularly in radiology, pathology, endoscopy, and decision support. Diagnostic accuracy has improved when AI is used to complement human judgment. But deskilling and the loss of expertise are real concerns. Early-career clinicians and residents are particularly vulnerable to AI-induced deskilling and never-skilling.

However, no licensing board requires AI-free competency assessments. There is no residency accreditation body that mandates “unassisted reasoning” hours. Such recommendations exist in journal articles and conference presentations, but not in regulations.

AI may be replacing humans in counseling with little regulatory, political, or social pushback. Just last week, I ran into someone whose professional identity as a counselor is threatened because the insurance company is refusing to let her bill, favoring AI instead.

The automobile industry is an interesting case. Automatic emergency braking is a standout success in reducing rear-end collisions and is now being mandated in the US. The EU additionally mandated lane-keeping assist and driver-monitoring systems, which also show clear benefits in reducing crashes. Tesla’s FSD and Robotaxis don’t yet offer clear lessons.

Software engineering is at the beginning of the same arc, but there is a responsible path forward.

A Responsible Path Forward

Despite the shrinking size of software engineering teams everywhere, these challenges present opportunities for engineers and leaders. Let us turn to the aviation example again.

Insight 1: Sequential exposure — foundational skills before automation

This insight applies to all aspects of software engineering — from PRDs to designs to code to operations. You have to incorporate unassisted practice before allowing assisted practice. It is tempting to skip this, but each shortcut makes the next one easier to justify — that’s how cognitive surrender becomes a habit. Thankfully, most tech interview processes still rely on unassisted problem-solving to test for competencies. That needs to continue.

As tempting as it is, I would be careful about giving AI coding tools to developers who don’t yet know the codebase. But the market will penalize such investments — how would you justify a junior engineer to spend six months developing foundational skills while shipping code slowly when a similar engineer at a competitor is shipping AI-assisted code from week one? But that’s the job of leadership to build a strong engineering culture in the AI era.

Insight 2: Mandatory unassisted practice — UPRT across an entire career

We know how to do this. Think of chaos testing that helped the industry to deal with unreliable infrastructure and incomplete/drifted automation. Such habits allowed us to learn how our systems work or fail, and to improve their correctness.

In the context of AI, this is an opportunity for engineers worried about identity.

Such worries are understandable, but dwelling on them is a dead end. The more productive response is to take the bull by its horns to figure out what an equivalent unassisted practice looks like. For example, could you have a mandatory no-AI bugfixing week every quarter? Or a mandatory chaos-testing exercise in which AI is not allowed in the triage process? Or architecture reviews, system design sessions, and code comprehension exercises — done without AI? These are all opportunities for experienced engineers to build such processes.

Insight 3: Institutional oversight — regulatory bodies and independent investigation

Software engineering has no FAA or NTSB. There is no licensing body to mandate training requirements. Anyone can ship code regardless of their understanding of the system. There is no industry-wide transparency mechanism for companies to learn from each other’s failures.

This is the hardest gap to close in software engineering, because individual leaders can’t create regulatory bodies. Yet, what a leader can do is create the local equivalent of such functions by inspiring and motivating teams to build practices that focus on correctness, thorough postmortems, and the sharing of failure patterns internally and, where possible, externally. What we need are engineers from tech companies sharing examples of what’s working and what’s not, so that collective awareness can influence AI coding tools and software engineering practices.

In the absence of such institutions, leaders will need to rethink what they measure. The current push is toward task-level proxy metrics such as PRs, lines of code, cycle time, and DORA metrics. While such metrics seem reasonable in this early phase, I suspect that organizational leaders will soon need to shift their focus to end-to-end outcomes such as overall time-to-value, feature velocity from idea to production, and escaped bugs.

My prediction is that the software industry will eventually build some versions of such practices, not because of idealism, but because entropy will force it. As AI-generated systems accumulate complexity that nobody fully understands, the failures will become expensive enough and public enough that institutional pressure builds.

Software engineering’s Air France 447 hasn’t happened yet. When it does — a catastrophic, publicly visible failure traceable to AI-generated code that nobody understood — the conversation about patterns and practices will begin.

Leaders will be forced to shift their focus from productivity metrics to preserving institutional knowledge and ensuring system correctness. The question is whether leaders and engineers anticipate such things and are proactive or wait for the wreckage.