The AI field guide for people with real jobs

18 min read Original article ↗

Someone on LinkedIn asked me to explain AI, agents, Copilot, and vibe coding “in kindergarten terms.” I gave them a compressed answer. Then their follow-up question nailed the thing most explainers dodge:

OK, so basically AI == LLM == copilot == ggl/bing on steroids, understood. And the agents are some scripts to automate something based on that. Vibe coding is like automated C/P from StackOverflow. Why such hype?

Fair question. The short answer: the hype is mostly unearned. The technology is real. The gap between those two things is where every bad decision gets made.

This is the long answer. If you’ve been heads-down building a business, shipping product, or otherwise living your life since late 2022, here’s what actually happened, what it means, and where it binds.

Strip away the marketing and you get this: modern AI is pattern-matching at scale. Specifically, it’s neural networks trained on enormous amounts of text, code, and images to predict what comes next in a sequence.

It doesn’t think. It doesn’t understand. It’s autocomplete that read most of the internet and got disturbingly good at faking comprehension.

The architecture behind all of it (GPT, Claude, Gemini, LLaMA) is the Transformer, introduced in a 2017 Google paper called “Attention Is All You Need.” Eight researchers wrote it. Every major language model since descends from it. The key innovation: a mechanism called self-attention that lets the model weigh which parts of its input matter most when generating each word.

What this means in practice: you type a prompt. The model predicts the most likely next token, then uses that prediction to predict the next one, and the next, until it has a complete response.

A token is roughly a word or word fragment and the basic unit the model reads and writes. “Running” might be two tokens (”run” + “ning”). A typical English sentence is 15-20 tokens. Models price by token count and think in tokens, not characters or words. When you see “128K context window,” that means the model can hold about 100,000 words in its head at once.

It’s statistics and linear algebra running on warehouse-scale compute. Useful, yes. But a prediction engine, not a reasoning one.

“So it’s Google on steroids” is an understandable first read. It’s also wrong in ways that matter.

When you Google something, you formulate a query, scan a list of links, open the promising ones, read what other humans wrote, evaluate whether it applies to your situation, and adapt it. StackOverflow adds community voting, meaning the best answers float up, bad ones get downvoted, edge cases get flagged in comments. The knowledge is human-generated, human-verified, and you’re the one doing the synthesis work.

An LLM does something fundamentally different. It doesn’t retrieve existing answers. It generates new text that never existed before, word by word, based on statistical patterns in its training data. When you ask it a question, it’s not looking up a page and showing it to you. It’s constructing a plausible response from scratch.

This has three consequences that matter.

First, it knows your context. You can paste in your entire document, codebase, or email thread. The model works with your specific situation, not a generic answer to a generic question. “Fix the bug in this function” is a different interaction than searching “javascript null reference error.” This is where it actually earns the hype.

Second, it synthesizes across domains. A single prompt can combine knowledge from networking, security, compliance, and your specific tech stack into one coherent answer. Doing the same on Google would take a dozen searches, a dozen tabs, and an hour of connecting the dots yourself.

Third, it has no verification mechanism. StackOverflow has votes, comments, accepted answers, and a community that calls out mistakes. An LLM has none of that. It generates text that sounds authoritative whether it’s correct or not. There’s no link to follow, no community that vetted the answer, no way to trace where the information came from. When it’s wrong, it’s wrong with the same confidence as when it’s right.

So the honest comparison: it’s like having a well-read colleague who can discuss almost anything, works with your specific context, and synthesizes information faster than you can search for it. But that colleague has no ability to distinguish what they actually know from what they’re making up. Sometimes they’ll confidently cite a paper that doesn’t exist or recommend a tool that was deprecated two years ago.

The skill isn’t using the tool. The skill is knowing when to trust the output and when to verify it. That’s the part most “AI for beginners” content skips.

On November 30, 2022, OpenAI released ChatGPT. It hit one million users in five days and 100 million monthly active users by January 2023 making it the fastest-growing consumer application in history. TikTok took nine months. Instagram took two and a half years.

That launched the current era. Here’s how it played out:

OpenAI moved fastest. GPT-4 (March 2023) reportedly scored in the 90th percentile on the Uniform Bar Examination (though later analysis put it closer to the 62nd against first-time takers), up from GPT-3.5’s ~10th percentile. GPT-4o (May 2024) unified text, vision, and audio into a single native model. Their o1 reasoning model (September 2024) introduced chain-of-thought reasoning, meaning the model “thinks” step by step before answering. o3 followed in April 2025. As of late 2025, ChatGPT has 800 million weekly active users.

Anthropic (founded by ex-OpenAI researchers) released Claude in March 2023. Claude 3 (March 2024) came in three sizes: Haiku (small/fast), Sonnet (balanced), Opus (most capable). Claude 3.5 Sonnet (June 2024) outperformed the larger Opus model at lower cost, a pattern that kept repeating across the industry. Claude 4 shipped in 2025 and 4.6 in early 2026.

Google fumbled the start. Bard launched in March 2023 to lukewarm reception, got rebranded to Gemini in February 2024, and gradually improved. They had the underlying technology (the Transformer was invented at Google) but were slow to productize it.

Meta went open. LLaMA (February 2023) released model weights publicly, letting anyone run and modify the models. LLaMA 3.1 (July 2024) offered 8B, 70B, and 405B parameter versions. LLaMA 4 introduced a mixture-of-experts architecture. This single decision (making competitive models free) changed the entire dynamics of the field.

DeepSeek, a Chinese lab, dropped R1 in January 2025 as open source. It matched frontier model performance at a reported training cost under $6 million versus the billions spent by Big Tech. The market’s response: Nvidia lost approximately $600 billion in market cap in a single day.

Mistral (French, founded by ex-Google and ex-Meta researchers) carved out a middle ground with efficient models and an open-weight strategy.

The pattern: every few months, the best available model gets cheaper, faster, and more capable. Last year’s frontier is this year’s commodity.

Training an LLM works roughly like this:

  1. Gather data. Hundreds of billions to trillions of tokens made up of web pages, books, code, and academic papers.

  2. Pre-train. The model learns to predict the next token in a sequence. This takes weeks on thousands of GPUs and costs millions of dollars.

  3. Fine-tune. Human reviewers rate outputs. The model gets adjusted to prefer responses humans liked. This is called RLHF (Reinforcement Learning from Human Feedback).

  4. Deploy. The trained model sits behind an API. You send it text, it returns predictions.

The result is a system that can write essays, translate languages, generate code, analyze images, and hold conversations that feel remarkably human. It can also confidently tell you things that are completely false. The industry calls these “hallucinations”, when the model generates plausible-sounding text that happens to be wrong because it’s optimizing for “sounds right,” not “is right.”

Context windows (the amount of text the model can “see” at once) have grown fast. Early ChatGPT models handled about 4,000 tokens (roughly 3,000 words). Current models handle 128K to 1M+ tokens. This matters because more context means the model can work with longer documents, larger codebases, and more complex conversations.

When the person on LinkedIn asked about “Copilot,” they were asking about a brand name that now covers at least three different products:

GitHub Copilot is the original. Launched in 2022 as an AI coding assistant inside code editors. By mid-2025: 20 million cumulative users, 1.3 million paid subscribers, 90% of Fortune 100 companies using it. $10-19/month depending on plan. It suggests code as you type. Autocomplete on steroids is actually a fair comparison here.

Microsoft 365 Copilot bolts AI into Word, Excel, PowerPoint, Outlook, and Teams. $30/user/month. Draft emails, summarize meeting transcripts, generate presentations from notes. Microsoft reports 15 million paid seats. The less-celebrated stat: only 3.3% of users who touch Copilot Chat actually pay for it.

Windows Copilot is the free one built into Windows 11. An assistant in the taskbar.

Across all products, Microsoft claims 33 million active users. The strategy is clear: embed AI into every surface and charge for the premium version. Whether it delivers enough value to justify the spend is an ongoing argument in every enterprise IT department right now.

GitHub Copilot holds about 42% of the AI coding tools market, but it’s not alone.

Cursor is an AI-native code editor that went from zero to $2.6 billion valuation in a year. Over a million users by early 2025. Raised $100M at a $2.6B valuation in December 2024, then $900M at $9.9B by mid-2025. It replaced the text editor with one that has AI baked into every interaction. You describe what you want, it edits your code.

Claude Code took a different approach. Released in February 2025 as a terminal-based agent. No graphical editor, you talk to it in the command line, and it reads your codebase, edits files, and runs commands. Made generally available alongside Claude 4 in May 2025.

Amazon Q Developer went generally available in April 2024. Supports 25+ languages. Free tier available. Enterprise customers like BT Group report a 37% code acceptance rate.

What they all share: an LLM that has read billions of lines of code and can generate, modify, and explain software. What they differ on: how much autonomy the AI gets. Copilot suggests; Cursor edits; Claude Code acts.

On February 2, 2025, Andrej Karpathy (former AI lead at Tesla and OpenAI) posted this on X:

“There’s a new kind of coding I call ‘vibe coding’, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.”

He described hitting “Accept All” without reading the changes. Copy-pasting error messages back to the AI without analysis. When a bug wouldn’t fix, asking for “random changes until it goes away.” The post got 4.5 million views. Merriam-Webster took note.

Here’s the thing: Karpathy explicitly said this was for “throwaway weekend projects.” A year later, he called it “a shower of thoughts throwaway tweet.” But the term escaped containment.

Platforms like Lovable and Replit leaned into it. Non-developers began shipping production software by describing what they wanted and letting AI write every line. The results were predictable:

Google Chrome engineer Addy Osmani drew the critical distinction: “vibe coding” (accepting all AI output without review) is not “AI-assisted engineering” (using AI as a tool while maintaining professional oversight).

That distinction is everything. The same tools that produce garbage in untrained hands make skilled developers measurably faster. The differentiator isn’t the tool. It’s whether the person using it can verify the output.

Vibe coding isn’t automated copy-paste from StackOverflow. It’s worse. With StackOverflow, you at least had to understand the question well enough to search for it, read the answers, and decide which one applied. Vibe coding removes even that minimal friction.

This is where it gets interesting. Also where it gets dangerous.

A chatbot answers questions. An agent takes actions.

The basic loop: an LLM receives a task, decides which tools to use (web search, file access, API calls, code execution), executes them, evaluates the result, and decides what to do next. Instead of “write me an email,” it’s “check my calendar, find a time that works for both parties, draft the email, and send it.”

The infrastructure for this ranges from dead simple to enterprise-grade.

At the simplest level: give an LLM the ability to run shell commands. That’s it. The model decides what command to execute, the system runs it, feeds the output back, and the model decides what to do next. This is how tools like Claude Code work day-to-day, reading files, running tests, editing code, calling APIs through curl. No framework needed. Just a model, a terminal, and a loop. It’s surprisingly effective and gets people very far.

The more structured approaches:

Model Context Protocol (MCP) — Anthropic announced it in November 2024 as an open standard for connecting AI agents to external tools and data sources. Instead of “run this shell command,” MCP defines a formal protocol for tools to advertise their capabilities, accept structured inputs, and return structured outputs. By late 2025: 97M+ monthly SDK downloads. OpenAI adopted it in March 2025. Google DeepMind followed in April 2025. In December 2025, Anthropic donated MCP to the Linux Foundation.

OpenAI Agents SDK — Released March 2025. Python-first framework for building multi-agent workflows. Supports tool use, agent handoffs, and guardrails.

The progression makes sense: shell access is powerful but unstructured. MCP and agent frameworks add guardrails and discoverability. More structure means more safety, but also more complexity.

AI companies are building the plumbing for agents to interact with the real world. Calendars, email, databases, APIs, and file systems, all becoming surfaces an AI can read from and write to.

This is the part where “some scripts to automate something based on that” undersells it. Scripts are deterministic. They do the same thing every time. Agents are probabilistic. They decide what to do based on context. That makes them more flexible and more dangerous.

The question on LinkedIn specifically asked about “Claw.” OpenClaw is the project that made agents tangible for a lot of people.

The story: Austrian developer Peter Steinberger published it in November 2025 as Clawdbot, an agentic interface that runs locally and connects to external LLMs. It integrates with your calendar, email, files. You talk to it and it does things on your machine.

Then Anthropic sent a cease-and-desist over the name (too close to Claude). On January 27, 2026, it was renamed to Moltbot. Three days later, renamed again to OpenClaw. The whole saga (including crypto scammers trying to capitalize on the chaos) played out in 72 hours.

On February 14, 2026, Steinberger announced he’d be joining OpenAI and the project would move to an open-source foundation.

But here’s why OpenClaw matters to this discussion: it demonstrated both the potential and the risk of giving AI agents access to real systems. Researchers quickly found that the skill ecosystem (the mechanism for extending what the agent can access) was exploitable.

Real potential, real risk, all in one package.

Giving AI agents the ability to take actions means giving them attack surface. The problems found so far aren’t theoretical:

Tool poisoning. Malicious instructions embedded in MCP tool descriptions, invisible to users but visible to the AI model. The model follows the hidden instructions without the user knowing.

Data exfiltration. Invariant Labs demonstrated that a malicious MCP server could silently exfiltrate an entire WhatsApp message history by combining tool poisoning with a legitimate WhatsApp integration. The attack circumvents user approval and exfiltrates data through WhatsApp itself.

Prompt injection through untrusted data. A prompt injection attack against the official GitHub MCP server allowed a malicious public issue to hijack an AI assistant and pull data from private repositories. Similarly, Supabase’s Cursor agent, running with privileged access, processed support tickets containing embedded SQL instructions that exfiltrated sensitive tokens.

Hallucinated supply chains. A March 2025 study analyzing 576,000 code samples found that in roughly 20% of cases, LLMs recommend packages that don’t exist. 43% of those hallucinated package names were consistent across queries. Attackers can register those names on PyPI or npm with malicious code. The term for this: slopsquatting.

Rug pulls. MCP servers can modify their tool definitions between sessions, silently changing what a trusted tool does after you’ve already approved it.

Even Anthropic’s own SQLite MCP server, forked over 5,000 times, had a SQL injection bug that could seed stored prompts and exfiltrate data.

Every time you connect an AI agent to a system, you’re creating a new attack vector that traditional security tooling doesn’t cover. The agent trusts its tools. The tools can lie.

Not everything runs through Big Tech APIs. A parallel ecosystem emerged where you can run capable models on your own hardware.

Ollama provides a simple interface to download and run LLMs locally. llama.cpp (by Georgi Gerganov) enables efficient inference in C/C++ on consumer hardware, including Apple Silicon Macs. In February 2026, ggml.ai (creators of llama.cpp) joined Hugging Face.

Hugging Face is the central hub, hosting model weights, datasets, and tooling. Over two million public model checkpoints available.

What this means in practice: a 7B-parameter model running on a MacBook can handle many of the tasks that required a cloud API two years ago. Not as capable as the frontier models, but private, free, and good enough for a lot of use cases.

The DeepSeek shock in January 2025 (a competitive model trained for under $6 million) rattled the assumption that only trillion-dollar companies could build frontier AI. Whether that cost figure holds up to scrutiny is debated, but the direction is clear: the cost of training and running these models keeps dropping.

Now for the question that prompted this article: why such hype?

Gartner’s Hype Cycle tells part of the story. In 2024, generative AI sat just past the Peak of Inflated Expectations. By 2025, it had moved into the Trough of Disillusionment. Meanwhile, AI agents are now sitting at the Peak of Inflated Expectations, and the cycle repeats.

The productivity research tells the rest:

The optimistic numbers. GitHub’s own research showed developers completing tasks 55.8% faster with Copilot. A Microsoft field experiment showed 13-22% more pull requests per week. McKinsey reported developers completing tasks up to 2x faster.

The sobering numbers. The METR study (July 2025), a randomized controlled trial with experienced open-source developers, found AI tools made them 19% slower, not faster. The kicker: even after experiencing the slowdown, developers estimated AI had improved their productivity by 20%. The perception gap between how productive AI tools feel and how productive they actually are is significant.

MIT Sloan warned that even where short-term productivity gains are real, rapid AI-assisted development creates technical debt that can “cripple systems over the long term.” The GitClear data confirms it: AI-generated code has a 41% higher churn rate compared to human-written code.

Microsoft’s own debugging research shows AI models, including frontier ones, still struggle to debug software effectively.

Despite $1.9 million in average enterprise GenAI spend (2024), less than 30% of AI leaders report their CEOs are happy with the ROI.

So why the hype? Because the technology is genuinely novel, the demos are impressive, the ceiling is unknown, and billions of dollars in venture capital need a narrative. Mix in FOMO, vendor incentives, and a media ecosystem that rewards superlatives, and you get a hype cycle that looks a lot like the ones before it: cloud, blockchain, IoT. The difference this time is the technology applies more broadly.

Here’s the field guide, compressed:

LLMs are real and useful. For drafting, summarizing, translating, and pattern-matching across large text corpora, they’re legitimately better than what came before. Treat them as a first-draft engine. Verify everything.

AI coding tools accelerate skilled developers. If you can read and evaluate code, these tools make you faster. If you can’t, they make you faster at producing bugs. The tool doesn’t replace the skill. It amplifies whatever skill level you bring.

Agents are early and brittle. The direction is real, but the security model is immature, the failure modes are novel, and the gap between demo and production is wide.

Vibe coding is the uninformed use of powerful tools. Describing what you want and letting AI write code works for prototypes and throwaway projects. For anything you need to trust, it’s a liability generator.

The hype cycle is the hype cycle. The technology doesn’t need the hype to be valuable. Ignore the noise, learn what the tools actually do well, and deploy them where the risk-reward makes sense.

The security surface is growing faster than the security tooling. Every AI integration is a new trust boundary. If your AI agent can read your email and write to your database, anyone who can trick that agent can too.

Bottom line, same as it was in my original comment: these are power tools. Learn what they cut well. Respect where they bind.