Agents Were LLMs All Along.

We’re starting to live in that future where LLMs don’t just use tools in the sense of calling something external that you can choose to add in the process. Tools are now a more natural part of them. They’re being baked into the model’s brain. This means the line between “LLMs” and “agents” is dissolving.

I was thinking about this few months before…

…but it’s visible even more clearly on the newest LLMs, like GPT-5 by OpenAI.

Why do I think the newest LLMs converge to AI agents?

Back in 2023, agents were hyped as the next killer app, loosely defined as LLMs with tools, memory, and reasoning. In the AI/tech bubble, there was no clear consensus on the definition of AI agents, but they were always connected to “AI using tools”. Tools are the core of agents; it was the “product” agents were.

But that’s exactly what LLMs are happening now. A lot of what once required a separate product has collapsed into the models themselves since 2023 (when the AI agent hype started).

And I don’t mean just tools like accessing a web browser or running a terminal command. LLMs are also improving at skills like code generation, which wasn’t really practical until Claude Sonnet by Anthropic. Now it’s something that we expect from LLMs. And there are many such cases.

Emerging examples of agents and LLMs converging to one are:

Compound systems like Groq, weaving RAG and other components directly into LLMs.
Cognitive cores: smaller but highly capable LLMs that feel adaptive and personal.
Multi-LLM systems: planner, executor, analyst—roles filled by different instances of the same model, with cheap switching between them.

GPT-5 model also shows what this absorption of agentic capabilities into the LLM looks like in practice. It’s the best coding model yet, slightly ahead of Claude Code in SWE benchmarks. It one-shots complex apps, can fix more difficult bugs, and orchestrates multiple tools in parallel. OpenAI even added support for free-form function calling, making tool use more fluid than ever. The focus on using tools (even in parallel) is strong in GPT-5.

So what happens when “agentic” capabilities are native to the model? The center of gravity shifts, and the challenge isn’t having an LLM, or even an “agent.” The challenge is orchestrating the pieces so they work together coherently.

This is why I think agents and LLMs are becoming closer to infrastructure-level concepts. Agentic capabilities - planning, reasoning, tool use - become part of the underlying machinery: rules, deterministic vs. fuzzy logic, context mapping, data pipelines. These start to settle below the application layer, turning into the invisible substrate that products are built on.

It’s also visible in the other part of the stack. I see this shift being relevant at E2B (we’re building a cloud environment for AI agents). In 2023, we first positioned ourselves as a “code interpreter” and marketed running AI-generated code generated by AI agents. But we evolved into a versatile, robust environment where agents (or LLMs) could run tools for days, in parallel, safely and reliably. We built full computers that LLMs can use natively, the same way humans interact with computers.

Similarly, for other companies building in this space, the advantage won’t come from throwing more tools on models or building just another agent. It will come from how well you orchestrate multiple LLMs, design workflows and UIs that feel natural, and give users real customization and control on top of this.

The story of agents is really the story of LLMs maturing. I feel like people criticizing GPT-5 for not bringing enough improvement from the previous models are missing this point. GPT-5 might not show dramatic spikes on famous benchmarks or PhD-level tasks, and that can make it look “incremental.” But the real shift is that the model is better built for tool use - the kind of low-level capabilities that will enable all the super-complex autonomous tasks.

Appreciate discussion on this & thanks for reading!