Domain Knowledge Is the New Syntax

I started learning programming in the begining of the 2000s, and this wasn't something easy. I was a lonely child in Egypt with no English knowledge at the beginning. Me having a computer with stable DSL internet connection was a blessing and something of a privilege in my surroundings. I was attracted to the computer and the internet. I was always interested in knowing how things work, and I was always trying to learn new things (Technically I now ask very fundamental questions about the universe itself, ironically usually using code). So with little to no English knowledge, I started using the computer and serve the internet. Some Arabic forums and a couple of websites that I don't remember their names were my main source of information. As a kid I wanted to play games and know people. I used to go to some forums and chat rooms, but also to what was called "cyber" cafés, which were places where people could go and use the internet. I used to go there to play games and there were a story about how I started programming.

For kids growing in Egypt, the most popular game at this time at cybercafés was "Medal of Honor: Allied Assault". I was playing this game a lot, and I was very good at it. I used to play it with my friends (cyber friends) and we used to have a lot of fun. But it was always that you have a random person joining your lobby and decide to use dog and tele magic commands. These commands were console commands to activate cheats in the game. The "dog" was in particular a command that would make you immortal and tele was giving you ability to teleport in (x, y, z) coordinates ¹.

There were usually some options to have lobbies that were more controlled and an option to disable console hidden somewhere. But this was more about trust and usually people would just go with the flow and play the game. But one day I was annoyed by this and wanted to solve the problem of identifying the cheaters because when people realize and then start looking in characters then they could disable the cheat. I wanted to know if I can auto kick the cheaters using those cheats. So the idea that you get kicked once you use the cheat was something that I wanted to have. And boom this was my first real life problem (technically we had physical fights sometimes) that can be solved by programming. This is a story of how I started programming and how I was attracted to it. I wanted to solve a problem that I had in my life, and programming was the solution. Hence, my first programming project was a simple script that acts as cheat detector for a LAN hosted games.

Since then, I enjoyed programming, and it attracted me more than games, how computers work, and the internet. I was always trying to learn new things and solve problems. I started with object pascal because it was the language that I could find Arabic book in my local library for ². I started learning English by practice and in my fourth grade I started taking English classes. I learned a lot and did coding for fun. Then I grow up and in high school I used to work as a freelancer doing web development with most of it being supporting WordPress and vBulletin forums. I was doing this for fun and to make some money.

The most difficult part for me was always that I feel lost and don't know what to do most of the time. I didn't have a clear path to follow, and coding technical details consumed most of my time. I wasn't able to achieve most of what I wanted, I have never been active contributor to open source projects, never had the chance to submit a patch to Linux kernel ³, and I never had the chance to work on a project that I was really passionate about. I was always interested more in physics and math and that's why in university I studied physics. I wanted to be a High Energy Physicist (Which I'm now) which means working with computers and programming a lot too. A job where I do scientific computing is my dream job. I currently work on GPUs and Trigger systems with a little physics analysis on the side to remind me that I still do physics.

The previous paragraph is something I've been thinking about a lot recently. The feeling of being lost, of spending most of your energy on how rather than the what. I knew what I wanted to build, what problem I wanted to solve, but the technical friction of getting there ate most of my time and motivation. I think many people in scientific computing share this experience. You have a physicist who deeply understands the decay channels they're studying, or a mathematician who can see the structure of a problem clearly, but they spend 80% of their time fighting with memory management, debugging segfaults ⁴, or figuring out why their build system broke after a minor update. The actual intellectual work, the part that requires domain expertise usually gets squeezed into whatever time is left.

This is where, I think, something genuinely interesting is happening with LLMs and code generation.

Let me be clear about what I mean and what I don't mean. I'm not saying that LLMs will replace programmers or that coding skills don't matter anymore. That's a boring take and probably wrong. What I'm saying is something more specific: for a certain class of work, scientific computing, data analysis and research tooling, LLMs are changing where the hard part is (to the where it should be).

For most of my life, the hard part was the coding itself. Not the thinking. I could think about the problem, sketch the algorithm on paper, understand the physics, but then translating that into working code was where I'd lose days or weeks. The gap between "I know what this should do" and "I have working code that does it" was enormous. And that gap was not about intelligence or understanding, it was about technical fluency in a specific language, familiarity with a specific framework, knowing which API to call and how. Not to mention the spaghetti code that you often inherit from your dependencies and the technical debt that you accumulate over time. This is a huge barrier, especially for people who are not primarily coders but need to code to do their research.

LLMs have, maybe for the first time, made that gap significantly smaller. Not zero or even close. But smaller. If you can clearly articulate what you want, if you can write a good specification then you can get surprisingly far with an LLM doing the heavy lifting on implementation. And this is not a trivial observation, because it means the bottleneck is shifting. The bottleneck is moving from "can you write the code" to "can you describe what the code should do with enough precision and understanding."

Specifications as the real skill

This is the part that I find most interesting and maybe underappreciated. Writing a good specification is hard. It requires you to think clearly about what you actually want, what the edge cases are, what the constraints are, and what "correct" means in your context. In scientific computing, this is deeply tied to domain expertise. You need to know what a physically meaningful result looks like. You need to understand the statistical properties of your data. You need to know when a fit is converging to something real versus an artifact. You need to understand more about the physics goals of your detector well before trying to design a trigger algorithm/s.

An LLM can write you a maximum likelihood fit. It can set up your RooFit workspace, define your PDFs, and run the minimization. But it doesn't know whether the result makes physical sense. It doesn't know that your mass resolution should be around 15 MeV and not 150 MeV. It doesn't know that a negative branching fraction is not a sign of new physics but a sign that something went wrong. That knowledge and the ability to look at output and say "this is right" or "this is nonsense" is the specification sometime. It's what you bring to the table, or you just define the specification from the begining in quantitative terms and a markdown file and then ask the LLM to implement it.

And I think this is where domain experts who maybe struggled with the coding part now have a real advantage. If you deeply understand your field, you can describe what you need with the kind of precision that makes LLM-assisted coding actually work well. You become the architect, and the LLM becomes a very fast (and sometimes very confidently wrong) builder. That means that you still need to understand the tools, packages and programming languages you're using, but you don't need to be an expert in them. You just need to be good enough. Your ability to use a debugger and read documentation is still important, but it's not the main bottleneck anymore. The main bottleneck is your ability to specify what you want in a way that an LLM can understand and implement. And if necessary, to do the work yourself when the LLM fails, but at least you have a clear target to aim for. Your specification document you created for the failing LLM implementation is a roadmap for you to follow when doing the implementation yourself anyway.

But what does this mean in practice?

In my own work, I've started treating coding sessions differently. Instead of sitting down and thinking "how do I implement this," I spend more time thinking "what exactly do I want this to do." I write down the inputs, the outputs, the expected behavior, the failure modes. Then I work with the LLM to get there. Sometimes it gets it right on the first try. Often it doesn't. But the debugging process is different too. So instead of staring at code trying to understand what it's doing, I'm comparing its output against what I know the output should be. The domain knowledge becomes the debugging tool. The caveats that this does not work well for any task that is not well-defined and has a lot of edge cases, but for something like a physics analysis, where you have a clear idea of what the result should look like, this can be very powerful. It does not work at all for GPU programming as an example because it is messy, unpredictable, and has a lot of edge cases ⁵.

For something like particle physics analysis, this might look like: "I need a function that takes a ROOT TTree with these branches, applies these selection cuts, performs a binned fit with this PDF model, and returns the yield with uncertainty." That's a specification. It requires knowing what those words mean, what reasonable values look like, and what assumptions you're making. The actual C++ or Python that implements it? That's important, but it's also the part that an LLM can increasingly handle.

This is also why I think the idea of "vibe coding" of just letting the LLM do whatever and hoping for the best is dangerous, especially in scientific work. If you don't understand what the code should do, you can't verify what it actually does. The LLM might produce code that runs without errors but gives you subtly wrong results. In web development, a wrong button color is a minor issue. In physics analysis, a wrong selection cut could invalidate your entire result. The specification is your safety net.

The limitations (because there are many)

I want to be honest about what doesn't work well, because I think the enthusiasm around LLM coding sometimes skips over real problems.

First, LLMs struggle with large codebases and long-range dependencies. If your project has a complex architecture with many interacting components, the LLM's context window and understanding might not be sufficient. It works best for relatively self-contained tasks. Ask it to write a function, and you'll probably get something useful. Ask it to refactor an entire framework (i.e. ROOT itself), and you'll probably get a mess.

Second, LLMs can be confidently wrong in ways that are hard to catch if you don't know what you're looking for. I've seen generated code that looks perfectly reasonable, follows all the right patterns, uses the right function names, but has a subtle bug in the logic that would only be apparent to someone who understands the underlying physics or mathematics. This is not a failure of the LLM, it's a fundamental limitation of generating code without understanding the domain. LLM at the end of the day is statistical pattern matching, not actual cognitive reasoning as we understand it. So it can produce code that looks right but isn't, and if you don't have the domain knowledge to spot it, you might end up with incorrect results.

Third, and this might be controversial, I think over-reliance on LLMs for coding could atrophy certain skills that are still important. Understanding how memory works, how floating point arithmetic behaves, how race conditions emerge, these are things that matter when your code needs to run on a GPU trigger system processing 30 million events per second. You can't just prompt your way through that. There's a level of systems understanding that you still need, and maybe always will.

Fourth, the reproducibility question. In scientific computing, reproducibility is not optional. If I generate code with an LLM, and someone else generates code for the same task with a different prompt or a different model version, we might get functionally different implementations. The specification should be the invariant, the thing that's reproducible, but the implementation might vary. This is maybe fine in practice, but it does make me a bit uncomfortable philosophically.

Where this leaves us

I think we're in an interesting transition period. The bar to create software has been lowered significantly, and this is both good and bad. Good because people who have deep domain expertise, but limited coding experience can now build tools that were previously out of reach for them. Bad because people without domain expertise can also produce software that looks correct but isn't, and they might not have the knowledge to tell the difference.

For scientific computing specifically, I'm cautiously optimistic. The people who will benefit most from this shift are those who invest in understanding their domain deeply and who learn to write precise specifications. Not "make me a plot" but "make me a Dalitz plot of m²(K⁺K⁻) vs m²(pK⁻) with these specific binning, these axis ranges, using this selection, with this color scheme, and overlay the phase space boundary calculated from these masses." The difference between a vague request and a precise specification is the difference between getting something useless and getting something publishable.

Maybe the title of this post should have been "learn to read and write specifications, the LLM will handle the rest." But that's too long, and also not entirely true because the LLM won't handle all of the rest. It will handle a lot of it though, and that might be enough to change how we work.

I spent years feeling lost because the technical details of coding consumed my time and energy. I wonder if a kid in Egypt today, with a computer and an internet connection and a curiosity about how the universe works, might have a slightly easier path. Not because the problems are easier —they're not— but because the gap between thinking about a solution and implementing it is getting smaller. And maybe that's enough to let more people focus on the questions that actually matter.

One last thing. I realize I've been using the word "specification" rather loosely throughout this post. In software engineering and engineering in general, a specification has a very specific meaning. It is a formal document with precise requirements, acceptance criteria, and sometimes even legal weight. What I'm describing here is something less formal than that, more like a detailed description of intent backed by domain knowledge. I took the liberty of stretching the word because I think the underlying idea is the same: know what you want before you start building. Whether that's a 200-page requirements document or a well-thought-out prompt with clear constraints, the principle holds. But I didn't want to pretend I was talking about formal specs in the traditional sense. I wasn't, and people who write actual specs for a living would probably be annoyed if I did.