The Realities of Generative AI in Software Engineering

Press enter or click to view image in full size

Verification Above All

There’s still a gap between the excitement around generative AI (GenAI) and the reality of how it actually works. Many see it as a shortcut to higher productivity; few grasp its limits. The truth is simple: GenAI is powerful — but fundamentally unreliable without human verification.

Note: GenAI is a subset of AI, even though the two terms are used interchangeably. AI ranges from rule-based systems to the most sophisticated large learning models (LLMs), including the ultimate goal in autonomy, often called “artificial general intelligence” (AGI).

Human in the Loop

Generative AI excels at the mechanical parts of software creation, such as generating boilerplate and accelerating the initial draft. However, the core of Software Engineering is inherently a human and social endeavor, not a purely technical one.

The difference between an LLM that generates code and an engineer lies in the critical human factors it cannot replicate. This includes the social context and synthesis found in the conversations needed to unpack requirements, negotiate trade-offs, and explore solutions. Engineering relies on architectural foresight, the ability of experienced developers to “see the future” of a system under scale and changing demands, a long-term predictive modeling that goes beyond an LLM’s statistical pattern matching.

Furthermore, a successful system is built on institutional knowledge — the unwritten rules, historical compromises, and domain-specific details that inform every major design decision. Beyond skill, the engineer holds ethical and legal accountability for the output, ensuring the code meets security, legal, and maintainability standards. The LLM can generate a technical artifact, but it cannot bear the liability.

The human remains essential for validation, verification, ethical oversight, and connecting the technical artifact to its ultimate purpose of solving a human problem.

Press enter or click to view image in full size

Hallucinations are real.

Hallucination: It’s Not a Bug; It’s a Feature

LLMs don’t understand the world; they generate text by predicting what’s statistically most likely to come next based on patterns in data. Their “reasoning” isn’t logical — it’s statistical reasoning. What looks like deduction or inference is really pattern completion at scale.

That’s why hallucination isn’t a rare glitch — it’s a natural consequence of how these systems work. They can simulate reasoning impressively well, but they don’t actually know when they’re right.

They can’t self-verify because they don’t reason about truth — their only mechanism is statistical likelihood. And because their outputs are fluent, coherent, and confident, humans tend to trust them more than they should. Even under strict deductive logic, a perfectly valid statement can still be false if its premise is wrong. For example: “All cats can fly. Luna is a cat. Therefore, Luna can fly.” Thus, if an LLM learns a false premise, it will produce deductions that are logically sound but fundamentally untrue — and sometimes absurd.

The real risk isn’t hallucination itself — it’s misplaced trust. Humans are wired to believe articulate systems, even when they’re confidently wrong. It is the same psychological lever that pseudo-science exploits.

Verification isn’t optional; it’s essential.

Fast, But Efficient?

AI is undeniably fast at generating output — but that speed comes at a cost. Running large models requires enormous computational power, translating into significant energy and infrastructure expense. Users rarely perceive that cost directly, even if they pay for a service, but compared to the human brain, it’s vastly less efficient. What looks instant is actually the result of immense parallel computation happening elsewhere.

Beyond resource cost, speed often sacrifices clarity. AI compresses generation time but inflates verification time. You get something quickly — but you spend longer checking, debugging, refining and reworking it. For many experienced engineers, that makes the overall process slower than doing it properly from scratch.

The real advantage isn’t pure velocity — it’s exploration. GenAI is great for brainstorming and breaking creative inertia, not for bypassing the thinking process. Crafting effective prompts can itself require significant thought, effort and time, and the insights you get depend on how carefully you guide the system.

Press enter or click to view image in full size

Use AI Where It Fits

GenAI works well when the task is narrow and well-defined: generating boilerplate, exploring design alternatives, or drafting repetitive code, but it’s no substitute for deep engineering. There’s no prompt that will architect and deploy a large, scalable and resilient system on its own or from a couple of interactions. It’s also not capable of autonomously maintaining and improving a system.

The rule is simple: use AI to accelerate, not to abdicate. Keep humans responsible for testing and verification — that’s where accountability lives.

Learning vs. Mastery

AI works best in expert hands. Senior engineers can spot inconsistencies and know when something “feels off”, but it can also accelerate learning — if used deliberately.

The danger is when junior developers use AI as a crutch instead of a coach. Without the incentive to question and understand what’s being produced, they risk becoming surface-level operators — able to prompt, but unable to debug or solve problems effectively. By asking well-defined, targeted questions on topics that a developer has some familiarity with, AI can become a powerful tool for discovery and learning. For instance, it can clarify abstract concepts, suggest alternative approaches to coding challenges, or offer fresh perspectives on solving problems.

Access without understanding is a liability. Democratisation only works if it’s paired with education and curiosity.

Press enter or click to view image in full size

Context Is Everything

Even the most capable models struggle with complex systems that depend on implicit context, integration details, and domain-specific rules. Larger context windows help, but they don’t fix the core problem: AI doesn’t know what it doesn’t know.

Humans navigate complex systems with implicit understanding; a terse sentence or shared context can convey volumes of unspoken institutional knowledge. The LLM, by contrast, demands extreme verbiage and relentless prompting. The engineer must externalise every assumption, constraint, and piece of implicit context into a series of explicit, unambiguous instructions. This inflation of the required conversational bandwidth, coupled with the high cost of the back-and-forth refinement loop, often negates the initial speed advantage. The process trades rapid generation for slow, painstaking communication and verification, leading to the feeling that a “simple task” requires too many iterations to get done.

Human verification remains essential, not only to catch mistakes, but to ensure the output even fits the problem it’s meant to solve.

Democratising AI, Responsibly

GenAI’s growing accessibility is a double-edged sword. Developers can now embed GenAI capabilities through APIs or prompt engineering without deep knowledge of how LLMs work — a genuine step forward for productivity, and for addressing coding challenges that were traditionally difficult to solve using purely syntactic checks or weak semantic validation.

At a fundamental level, someone still needs to architect, build, and maintain the AI models and systems themselves, and that is still software engineering. The real risk of democratisation is the erosion of foundational skill. When powerful tools are used blindly, they create a dependency that prevents learning and obscures crucial flaws. This makes professional judgment and expertise more vital, not less. The balance lies in using AI to extend developer capability, not to replace foundational skill or judgement.

As an analogy, handing out scalpels to everyone doesn’t make people surgeons. For simple tasks — like bandaging a wound — everyday users manage fine. But intricate surgeries, like removing an appendix, require professionals. Similarly, while everyday users can likely generate functional code for a quick fix, specialist engineers are indispensable for creating scalable, ethical, and resilient systems.

Press enter or click to view image in full size

A Voice of Reason in the Hype

Generative AI has shown real utility in software engineering, particularly in automating repetitive coding tasks, generating boilerplate code, and assisting with documentation. It’s a powerful tool for accelerating development workflows and reducing the cognitive load of routine tasks, freeing engineers to focus on more complex and creative problem-solving. When used thoughtfully, it can amplify productivity and unlock new possibilities in software development.

But every wave of new technology brings hype, and AI is no exception. The bigger danger today isn’t that companies and software engineers ignore AI, but that they adopt it without purpose. Many projects chase “AI for AI’s sake,” without defining a problem or evaluating value. In such cases, the result could be wasted effort, inefficiency, and technical debt that skilled professionals will eventually need to untangle.

Caution isn’t resistance. It’s maturity. Verification, governance, and critical thinking turn AI from a shiny demo into a durable capability.

The bottom line: GenAI is here to stay — but it predicts; it doesn’t know. Treat its output as inspiration, not truth. Use it with expertise, curiosity, and skepticism, and it will make you faster and smarter. Use it blindly, and it will just make your mistakes arrive sooner.