Reflections on Vibe Researching

10 min read Original article ↗

I tried that prompt (and nothing more than it) this morning with ChatGPT 5.2 Pro and it actually produced an entire paper. Here it is. I have no idea what it is saying or whether it is complete nonsense or not. But it took 19 minutes and 10 seconds to produce. To be clear, ChatGPT didn’t think it was a publishable paper and that it needed more work. So I decided to ask Gemini if the paper made any sense and it said:

The paper is scientifically coherent. It identifies a genuine gap in a top-tier paper (the computational slowness of OT-GMM) and applies the correct mathematical tool (Debiased Sinkhorn) to fix it. If implemented, this would likely be a publishable contribution in econometrics.

I’ll leave it to others to work that all out. For the moment, let me reflect on my own more serious attempts at ‘vibe researching.’

The first reasoning LLM (ChatGPT-o1-pro) came out just over a year ago. It was the first model that opened up the possibility of using AI write at the start and through the process of a research project, all the way through to publication of results. I ran a little experiment in what I guess would now be called “vibe researching,” which took an idea I had long had (a fairly non-serious one) to see if I could, with the use of o1-pro produce a publishable paper in less than an hour. The answer turned out to be ‘yes,’ and the paper was published in Economics Letters as I recounted here. I make no claims that it is a great paper. It is far from that. Indeed, I could improve it markedly now, but that wasn’t the point.

This led me to worry about what the future of research might look like and whether there was any point to research at all — why speculate and produce research that might be useful when people could just ask for research to be done on specific problems they had?

The only way to answer these questions, it seemed to me, was to really go for it and see what jumping all in on AI-first research could do to my own productivity. I had lots of ideas for papers that I hadn’t developed, so I decided to spend the year working my way down the list. I would also add new ideas as they came to me. My proposed workflow was all about speed. Get papers done and out the door as quickly as possible, where a paper would only be released if I decided I was “satisfied” with the output. So it cut any peer reviews or discussions out during the process of generating research quickly, but I would send those papers to journals for validation. If I produced a paper that I didn’t think could be published (or shouldn’t be), then I would discard it. There were many such papers.

Nonetheless, if you look at my website, 2025 produced a lot of working papers. A few have been accepted for publication, but not in top journals. A few others are R&R at better journals. I haven’t yet broken through at a top-tier journal, but I did fail at a number.

The bottom line is this: AI is really useful, and the latest models leave o1-pro well in the dust. It definitely accelerates research. But at the same time, it has only made me more cognisant of the human factor in research. By shutting people (including myself) out of the research process, I left myself open to pushing lower-quality ideas, which the review process itself clearly surfaced. I firmly believe that even if there is a short-term flood of papers to journals because of AI, in the ultimate equilibrium, we will want to double down on the human scarce resource, at least in economics and related social sciences. The tools will make research better and more pleasant and, ultimately, higher quality, but the quantity is unlikely to move very much. Below, I explain why.

AI makes mistakes. This we all know. But as I was researching, I was more concerned about mathematical mistakes. So I carefully checked the derivations and all such results in my theoretical models.

But that is not the only sort of mistakes theorists can make. The real issues, especially with game theory, are that too much focus on formal derivations can cause you to pay less attention to the more subtle aspects of the theory, including information sets and applications of equilibrium concepts. Some of the papers I sent out purported to establish or analyse certain equilibrium outcomes that were, in retrospect, incomplete. It was not that they could not be an equilibrium outcome, but my usual approach of working backwards between formal mathematics and intuition about the equilibrium broke in places. My previous style was to simplify problems to be less mathematically general, to lay out intuition clearly. The LLMs with their ability to generate more complex mathematics, easily tempted me to the dark side of trying to produce more general results (a constant ask of referees in the past), but that proved to be a bridge too far. My usual approaches didn’t work and didn’t surface overclaiming. Interestingly, it was rare for more than one referee to pick up on this, but thankfully, there was always one. In each case, I was able to revise the paper to preserve the baseline intuition of what I thought I was doing and produce a more rigorous foundation. It was rocky, however, and I learned to be far more cautious over the course of the year.

When you lower the cost of doing something, you do more of it. Normally, the decision whether to continue or abandon a project gives rise to some introspection (or rationalisation) of whether continuing is worthwhile relative to the costs. When the going gets tough, you drop ideas that don’t look as great.

The issue with an AI-first approach is that its benefit, reducing the toughness of going, is also its weak point; you don’t face those decision points of continuing/abandoning as often. That means that you are more likely to end up completing a project. But this lack of decision points means that you end up pursuing more lower-quality ideas to fruition than you would otherwise.

That most definitely happened to me. I was able to write papers on waiting at the airport, exiting a theatre, Catch-22 situations and whether AI reviewing could be manipulated. Those ideas were all fine but not high quality, and what is worse, I didn’t realise that they weren’t that significant until external referees said so. I didn’t realise it because they were reasonably hard to do, and I was happy to have solved them. This is not to say they shouldn’t be published, but to say that I was over-optimistic as to the general interest level of the journal they should be published in.

But more generally, where the low-quality ideas ended up was in the extensions sections of better-quality ideas. Initially, I had some ideas that I thought were higher quality, but what ended up being was that because AI could, I would generate extensive additional sections in papers that themselves were not that great. To be sure, they are the kind of things that referees ask for, but in this case, I was ahead of them and filled my own papers with bloat. That could detract from the main idea, though, who really knows? The point is, they weren’t worth reading, even if they were now really easy to write. (Even before AI, I can’t say that my own idea quality filter was that great, but AI didn’t help. As a colleague said, “Never has a more powerful tool been handed to someone who is a font for OK ideas.”)

In all of this, the review process picked this up. My point here is that it would be better if I had realised this before that happened. And I could have realised it, but I was in too much of a rush due to my experiment. My lesson from this is now to self-require a pause before anything goes up for at least a month, so I can come back to all of it with a fresh eye and force those decision points at least ex post.

We all know about LLMs being excessively sycophantic. But with research, that isn’t the problem. The problem is that formal results are claimed with confidence, and it is very easy to take them as true when they are far from it. This is something that I learned before submitting papers, but I cannot tell you how many days I spent thinking that I had a result when I did not. Thus, it is worth stressing that you need to be extra cautious in working with AIs. They are not trained for ground truth. They are trained to be pleasing, and by golly, they know what pleases a researcher. They can easily seduce you into thinking that you have discovered something when you have not. And I think that seduction might turn out to be all the more significant as we go. Indeed, the outputs may fool referees.

The point here is that LLMs are an interested party, and economic theory (and common sense) tells us you have to be extra sceptical when taking on board messages from such parties. We may yet learn how to train an AI not to do this and to be the sceptic we want it to be. But at the moment, it takes work. Basically, I used to play the different models off against each other to find issues and check stuff. And then, of course, Refine.ink (which I highly recommend for any research, AI or not) came along to help out further. But it isn’t enough, and you can’t be complacent.

In reading this, you might think that you should stay away from AI. That is not what I want you to take away here.

AI has been incredibly helpful for me this year in producing better research, even despite the pitfalls already expressed. For instance, it helped me read a very dense paper by Carnehl and Schneider (Econometrica, 2025) that I was able to digest and then use to produce several other papers that I still think were better ideas. They also all had the quality of being within my research wheelhouse of AI economics, which certainly helped. I was able to understand that paper and also build off of it in ways that would have been very difficult without AI tools.

My point is that the experiment — can we do research at high speed without much human input — was a failure. And it wasn’t just a failure because LLMs aren’t yet good enough. I think that even if LLMs improve greatly, the human taste or judgment in research is still incredibly important, and I saw nothing over the course of the year to suggest that LLMs were able to encroach on that advantage. They could be of great help and certainly make research a ton more fun, but there is something in the judgment that comes from research experience, the judgment of my peers and the importance of letting research gestate that seems more immutable to me than ever.

Going forward, I will continue to be AI-first in my research, now with guardrails to ensure the human element is retained. That means self-generated pauses, more peer feedback through seminars and discussions and more decision points to ask whether what I am doing is really worth doing. In 2026, the papers I produce will be ones that have used AI, but with those guardrails. I’ll report back in a year’s time on how that went.