The Software of Science

The summer after my freshman year I interned at Lawrence Livermore National Laboratories. While it turns out that this was a very prestigious internship, I didn't know that when I applied—I applied because my friend was applying. So when I showed up to the lab and saw uniformed military personnel guarding the entrance, I was extremely confused.

Thus began my journey down a path of writing software for research labs which are focused on science, rather than the more common path of writing software for software companies. I've only recently come to understand how different my work is from most software jobs, so I figured I would talk a bit about what it is like.

Working with Scientists

One of the great benefits of working in a research startup, university, or medical institute is that you as the programmer are not the biggest nerd on the team. In this context I am using 'nerd' to mean 'person who is unreasonably obsessed with their work and hobbies, to the detriment of themselves and others.'

My boss, a cardiothoracic surgeon, has not taken a vacation since 2009. If asked why he hasn't taken a vacation, he will reply that his work is his vacation. I'm writing this in 2025, but I'm pretty sure that so long as you are reading this before 2035, it will still be an accurate statement [EDIT: this prediction turned out to be false and a few months after the time of writing, he took the singular most well-deserved week off that anybody has ever taken, so now presumably he's good for another 16 years]. Similarly, my previous boss was in his late 70s and worked as a CTO, a DARPA project manager, and was doing a very large-scale personal project on the side.

Is this unhealthy? Maybe. Probably. I won't weigh in on the whole 'have work-life balance' vs 'grind and hustle and be successful' debate, because while I have many opinions on it, I don't think it is anything that hasn't been said before.

If you ask a scientist about their work, you will enter an unskippable cutscene that can last over an hour. If you ask about their non-technical hobbies, you will likely get the same result. I had a boss once who was very concerned that I took the right path from my office to her office because the other paths either went by the bathrooms, which were smelly, or went around the boba place downstairs, which was suboptimally long. I needed to take the correct path.

Good professors often see themeselves as general-purpose mentors in all aspects of your professional career, which is extremely helpful. Some even give life advice. This can be helpful, so long as it's good advice, but sometimes professors can go somewhat futher than you might expect with the advice giving. I had a prof randomly tell me during lab updates that I really needed to moisurize my hands (it was Pittsburgh winter and the air was incredibly dry). She wasn't wrong.

I had a mentor in undergrad who believed very strongly that there should not be statues of any human being, even 'good' ones, because in future times, all human beings will be considered evil due to changing moral standards. He brought this up unprompted during the morning lab meeting. He also brought up in a discussion about Bitcoin that a big purpose of buying art was to make you feel like a better person, i.e., a person who bought art. The head of my institute once drove me and a couple of other junior employees to West Virginia University to discuss a project he was working on, and while in the van, he brought up a time when he took a trip to go live with gorillas for a week and just eat plants that he found in the jungle. Everybody in academia is completely insane.

Needless to say, I fit right in.

Science Code

Scientists often write bad code.

It is not due to lack of skill or care, but rather because their focus is the science, not the code that serves the science. There is a difference between code written by people who don't care about their job, and code written by people who care a whole lot about their job, but their job is the science, not the code.

Now you could argue that the point of code is never really the code itself, but the thing that the code does. This doesn't change the fact that quality code is always important because it ensures that the codebase is easy to maintain. And this is all very true.

But the code is still usually written by software engineers who either are more passionate about code than the business, or simply don't care about either the software or the business and are just there for the paycheck. The executives at DoorDash probably care about food being delivered efficiently and on time, or at least about extracting money from the aforementioned phenomenon, but that doesn't mean the software engineers care. They either care a whole lot about software engineering and write good code, or don't give a crap about software engineering and write bad code. There may be some exceptions, such as programmers for cybersecurity companies who likely are passionate about threat detection or programmers who work at AI companies (although AI people are arguably researchers as well), but the majority of software is written by folks who care more about the actual code than the thing the code does.

Scientists don't typically outsource their software development. There are some exceptions in very large institutes with lots of funding, such as national labs, but generally speaking, code is written by the scientists themselves, often grad students. The result of this is that the failure modes in scientific code are very different from failure modes in industry code. I am less familiar with the failure modes in industry, but from what I have been told via friends and the internet, two big issues are people not caring at all about the job, and large legacy codebases.

Scientific code has very different problems. I can't describe every issue in every scientific codebase in a short blog post, but I will try to describe the three biggest issues that I keep seeing.

1) Jupyter Notebooks Where They Do Not Belong

If you haven't used them before, a notebook, sometimes called a Jupyter Notebook after the most popular program for writing/running them in, is essentially a way of writing Python (or Julia) such that it is split into separate 'cells.' If you run a cell, the variables established in that cell are stored for when you run another cell. There are three main benefits to this:

1) If you have a cell that runs for a long time, you can run it once, save the variables, and run other, quicker, cells that use those variables.
2) If your code's product is mostly plots, notebooks are a good call because notebooks display plots within the body of the notebook right below the cell that created it.
3) You can create a cell that displays markdown, making it easier to describe mathematical formulas than it would be in a code comment.

As an example, notebooks that I write typically look something like... well, not like this. The below screenshot comes from what I write when I am thrown into a lab filled with very smart, but non-programming, materials scientists:

I would like to restate: the above program design was not entirely my fault, but I will take partial responsibility for not being as opposed to it as perhaps I should have been.

Notebooks are very useful if the purpose of your code is to study data or do complex math. In fact, usually when I am sending a notebook to somebody, I print it out as a PDF and send the PDF rather than the code itself. You are not sending them a runnable file, but rather a report. The code is only there so that if they have confusion over how exactly you got your results, they can look at the code and see exactly what you did. It's like a PowerPoint, except they get to check the code that generates your figures. An example of notebooks being used exactly the way they should be used can be found here.

The problem is that if you write too often in notebook files, any other format starts to seem difficult to read. I once was given several notebooks that all referenced eachother and most of the notebooks contained only one cell with code alongside several markdown cells. The one cell with code contained an extremely long Python class. A notebook is a terrible way to write a long Python class. Firstly, importing a class from a notebook is unweidly. Secondly, a long class cannot be split among several cells, so there is not benefit to writing a class in a notebook over a .py file because you can't write the documentation in markdown which is a huge point of using notebooks in the first place. In fact these classes did not do any complex math that required a markdown description! But the person who wrote the code claimed that if somebody joined the team, they would have an easier time understanding the classes by reading a notebook rather than a simple .py file, which contained the exact same information. Writing in notebooks does crazy things to your brain.

2) Nobody actually looks at the code

When I submitted my student paper to the AAAI conference, nobody asked for the code. When I submitted my work from Purdue to Review of Scientific Instruments, nobody asked for the code. I'm currently working at the McGowan Institute, and they require weekly code reviews! This entails sending my code to a professor who then looks over it and makes sure it's not crap and it does what I claim it will do. But the reason that this happens is because the McGowan Institute is a medical research facility. The reproducibility crisis in the life sciences certainly exists, but that is only an issue with the reproducibility of publications. Actual medical treatments have to get through a whole different review process. It of course has its own issues, it's a system designed by humans, but it is much better and more rigorous than publishing in journals, especially low-tier journals. Many journal and conference review processes do not actually involve reviewers looking at the code that goes along with the publication.

This is mostly an issue in lower-tier journals, but it is absolutely an issue everywhere. At my previous job, we tried to reproduce a pre-eclampsia paper that had a public codebase, used a public genome repository, and was published in Nature. Not a single person could reproduce it, and we had some pretty good people working in the company. Does this mean that the Nature paper was bogus and irreproducible? Or did it mean that the code was just written so poorly that it was impossible to tell what it was doing? I have no idea, neither did anybody else in the company, and both possibilities are quite bad. But what is so weird about this is that you would think Nature would review the code and hold it to a very high standard? For some reason it seems that it did not.

This has actually caused some fairly significant issues in the past. This is obviously not in my feild, but this study was retracted because it incorrectly stated that divorce rates were higher in heterosexual marriages when a woman became sick, but are less significantly affected when a man becomes sick. But actually this was actually an incorrect result which was mistakenly reported due to a very simple code bug. They are currently conducting the study a second time with corrected code. Similar things have happened with other studies, and I'm sure there are many we have not heard about because we still have not figured out the error.

I think a part of this is that the peer review system was developed before programming was ubiquitous in science, so there wasn't a system in place for ensuring that the code actually does what the researchers claim it does. There is the obvious issue of people outright lying, but I suspect that lying is less common and it is more likely that there is a bug in the code that nobody noticed, or that the code is so poorly written that other publications which build upon the paper cannot build upon the code limiting the usefulness of the paper to the scientific community.

3) We have to talk about the tools

Up until now I have tried to be at least somewhat diplomatic, because while scientists may write bad code, it usually does the job, which is ultimately what is important. But now I will remove all semblance of diplomacy and just say straight up: science toolstacks are terrible. There is nothing redeeming about the standard science toolstack. It doesn't do software engineering well, it doesn't do science well, it does absolutely zero things well. There is no excuse at all for this many smart people to be using the terrible stack that they do.

And the thing about these tools is that they are absolutely used by smart people. Talented programmers who work in science exist, and I have had the privilege to work with a few of them in the past. These folks are mostly able to write clean code, with minor concessions to the necessity of ensuring that their work conforms to the standard of what is considered respectable in science. But an absurd amount of brilliant people use terrible tools. The greatest data scientist (and possibly smartest human) I've met in my life used Spyder, Conda, and wrote nearly all their code in Jupyter Notebooks. When he had to make an actual user-facing application with architecture complex enough that it needed to be designed beforehand, he ditched the notebooks but still used Spyder and Conda.

I think a lot of what makes scientific programming tools so bad is that it is a lot easier for somebody with a math background to see a computer as a magical math machine rather than a device with its own architecture which does not directly see into "God's book of proofs". When I was asking Jesse Alford for book recommendations, he asked me if it makes sense to imagine a computer as "system of telegraph relays?"

I was and still am a very math-focused programmer, so I told him that I really just viewed a computer as 'esoteric math' (if you are curious, he recommended 'Code' by Charles Petzold). Many scientists find it easier to view a computer as a very advanced calculator, and as such the programs tend to try and have a lot of bloat and guardrails, with very little customizability.

For example, Conda, which is the Python package manager of choice for most data scientists, works by creating various Python environments that are accessible everywhere. This is a different model from UV which creates .venv folders in specific projects, which you activate presumably when you enter the project folder.

You could technically activate a .venv folder in project 'A' while in directory 'B', but that would be weird. As such, you typically end up with an environment for each project you are working on.

With Conda, the environments are all stored in the same folder, the location of which is determined during the installation. As such, you can activate any Conda environment easily from anywhere in the filesystem. It is common to take the default 'base' environment and set your shell script to activate it automatically when you open a terminal so you always have your base Conda environment running.

Conda has many issues, I will talk about two of them.

Firstly, it is very slow and clunky, and it is prone to package conflicts with opaque stack traces. Back when I used it, I had an alias called 'update' in my shell which ran sudo apt update and sudo apt upgrade, and then it ran conda update --all and conda clean --all. The Conda update often took over a minute.

The second is that Spyder, the IDE of choice for many data scientists, comes bundled with Anaconda (or Miniconda if you want less bloat), which is a bundle of data science libraries including Conda.

You can install Spyder on its own without Ana/miniconda, but you can only install Conda with the bundle. This means that the IDE and the environment manager come bundled in the same package. So far this doesn't seem too bad.

But if you install Spyder via Ana/miniconda, it is installed as part of the specific Conda environment that you are running at the moment, rather than Conda as a whole. You don't run Spyder and then activate the Python 3.x virtual environment. You activate the Python 3.x virtual environment, which contains Spyder and then boot Spyder.

So if you have one Conda environment that runs Python 3.8 and another running 3.12, you will have 2 different Spyder installs, one for each version. If you want to pin it to the dock (which you have to write your own launch script for, at least in Linux), you would have to pin a different Spyder to the dock for each Conda version installed.

You can install Spyder without Conda, but then you will be unable to install spyder-notebooks and you won't be able to run notebooks. If you want to run notebooks in Spyder, you will need to have a seperate Spyder installed for each virtual environment that you are using. And if you want to make a new virtual environment, you have to re-install Spyder on that environment.

So anyways, I switched away from Spyder and Conda after spending two days trying to create a satisfying configuration, and now I use UV and Zed.

All this means, in my opinion, is that scientists need to get better at code. But that is a huge battle that many people, most of whom are far more skilled than myself, are currently fighting.

Okay, but please don't defund us!

I am a bit nervous about putting this out on the internet, because scientists really do not need any more criticism given what is going on right now with the NIH and NSF funding crisis. This is NOT intended to be a justification to defund scientists even more. From where I stand, it seems like if you look closely into the inner workings of any profession, especially ones that we rely on a lot, it is a bit scary to see how the sausage gets made. Improvement is needed, but it will not come by firing everybody. In fact, that will probably make it worse, because there will be less money with which to outsource code to professional developers.

However, I do hope that the standards for code in science go up. If programming is ubiquitous in science, it is important that the programming be good. You cannot discover fundamental truths of reality via buggy code.