LLMs : The gift that keeps on giving

2 min read Original article ↗

It strikes me that many promising results in AI are results that did not need to happen. Large language models just start exhibiting “intelligence” beyond a certain scale(Emergent Abilities of Large Language Models.), which is unleashed through simple instruction tuning. It is not obvious why, but this was the key to the breakthrough OpenAI made in 2022 with InstructGPT. A breakthrough that will be talked about for centuries to come.

I’ve had a similar feeling about a few other results since ChatGPT. The first is that LLMs are in-context learners. They can learn functions with just a few examples provided in context(Language Models are Few-Shot) Learners. This has immense implications because it challenges the paradigm of machine learning. Machine learning used to work like this: you have a problem, there is something to predict, you collect enough data, set up training infrastructure, hire some ML folk, and get them to train and deploy a model. But LLMs can learn in context. If you can collect even a little bit of data and feed it into the LLM’s prompt, you’ve got yourself a model. Again, this did not need to be true. No one asked for this. It just is.

The third result that I put in the same category is that LLMs can be used as optimizers for their own context(Large Language Models as Optimizers). They can introspect about what they did and come up with better ways just by examining their context and the quality of their own previous outputs (as judged by a verifier).

All three of these results are remarkable. The first has largely sunk in, but the latter two - where LLMs can learn without retraining through in-context learning and that they can optimise themselves by looking at their own work - are where I find a lot of interesting things happening. The manifestations are projects like GEPA (Genetic-Pareto), ACE (Agentic Context Engineering), and AlphaEvolve. In all of these, LLMs iteratively improve themselves by generating solutions and evolving to get better through feedback on their work from a verifier.

With gains in pretraining being squeezed out, I think test-time scaling - running many LLM calls with adaptive contexts and smart verifiers is likely the next frontier.

Discussion about this post

Ready for more?