Untitled - NFHN Reader

On the Problem of LLM-assisted Contributions to Open Source Projects As the maintainer of an open source project one sooner or later is faced with contributions that have been produced by Large Language Models, or at least which have been created and refined using assistance from LLMs. This puts me, as a maintainer, in a dilemma: The contribution may be genuinely useful, perhaps even a grand improvement, the absence of which may have bugged me for a long time. Also, I assume the intent of the contributor is well meaning and I am grateful for anybody who is offering help to improve and polish our project, but I can't wholeheartedly embrace such a contribution for several reasons. There is an ethical side to the use of large frontier models, one that gets papered over by proponents of using LLMs intensively for coding in a way that surprises me. So, of course we are all very concerned and the situation regarding licensed code is everything but clear(1) and AI crawlers bringing the web to its knees are a big problem and we all feel really sorry for African kids scraping rock for raw materials to build debt-financed hardware that soon becomes obsolete and replaced, armies of underpaid third-world gig workers vetting the training data and the considerable waste of water and energy, all for an experimental automaton that's doubtlessly good in producing boilerplate code of uncertain correctness. Yeah, we totally agree, but, let's move on, can't you see the potential and all those problems caused will surely be solved some day and so on and so on. Here we see the usual line of argument: first the required minimum amount of virtue signalling, then the inevitable change of subject and the promises of vast potential for the future of humanity and (less pronounced, but probably more impactful) personal convenience. Now, I don't eat meat, and when telling my brother-in-law about the water consumption and the raw food being wasted just to produce less food of a different type, he leans forward (picture the sound of creaking leather, of seating furniture groaning under duress), flexes his heavily tatooed knuckles(2), gives me one of his evil stares and tells me to my face that he doesn't care, he likes meat, loads of it and he doesn't give a fuck about what I just said. And I truly appreciate this refreshing level of honesty, this eloquent demonstration of uncompromising directness (he's a really nice chap, by the way). So please don't whine to me about climate change while polishing your CV with ChatGPT, or letting Claude spit out code for something that you only half understand. Lazyness is human, ignorance is a fact of life, but for hypocrisy there is really no excuse. Nothing is wrong with having principles and sticking to them, as old-fashioned as it seems. Or at least have the decency to admit that an increase(3) in productivity and sheer convenience is more important to you than the social and environmental implications. But actually, that's not the issue that I want to talk about. When I am faced with an LLM-generated contribution, I can not help but feeling, well, nervous. Who is contributing here? A person? A machine? Both? Who takes responsibility for the changes and the bugs, the shortcomings or misunderstandings therein? The process of writing code is very different from the process of reading it. Writing code means you effectively execute the code in your head while you shape it into instructions for a machine, reading code just tries to guess intent and everybody with at least a bit of experience will know how stubbornly we can miss bugs and oversights while having glanced over a piece of code a hundred times and more. Writing internalizes information much more than reading does(4). Whenever you debug a hard programming problem while taking a walk, while in the shower, or just sitting on the sofa, staring into space, you sometimes experience the amazing ability of being able to recall highly detailed information, by actively and selectively digging into code stored somewhere deep in your head. It is hard to explain, but it works, and it requires that you not only write the code in question, but also are able to follow the chain of thought that lead to writing it, including the motivation why it was written in the first place. So with code produced by LLMs I have no choice but to be extra diligent when evaluating a change, always expecting something nasty slipping through, never knowing why something has been written the way it is. Is this a sound design? Is this intended? What about that special nasty detail, the one which made me abandon my own effort at implementing the very same feature, a couple of years ago? Is this just hallucinated (by man or machine)? Is it because dozens of other pieces of code agree with this? Is it because there exists some platonic ideal of networking code in the I/O subsystem of a dynamic language runtime that the LLM somehow is able to grasp? If that is what you think then let me tell you that you are dealing in metaphysics, which neither theory nor evidence supports. But people want to believe in all sorts of things and the promises keep being pounded into our heads, as there are whole economies at stake. And I'm not even trying to touch the subject of inventiveness and originality. If you think innovation is just a statistical error, a random mutation, then you are beyond help, anyway. I think the problem here is a misunderstanding of what maintainers of long-lived open source software do. I can only speak for myself, but it is not that we are sitting around, twiddling our thumbs and waiting for 400 pound patches landing with a thump in our inbox, containing big, beautiful, groundbreaking features. We are experienced, often professional programmers who worked on these systems perhaps for decades. We can implement those features, we don't need some coding wizard to submit things like DWARF support for debugging info or transport layer security for our package manager tool. We are perfectly capable of implementing stuff. The feature is not what counts, as useful as it may be. What counts are kindred spirits, co-maintainers, individuals that are willing to dig into the suboptimally designed systems that we have designed (often badly), implemented, patched, fixed, broken and fixed again for such a long time. What counts are people that are ready to stumble over all the corner cases, the inefficencies, the mistakes we made and that lurk in the dark corners of some obscure runtime library module that we'd rather not look at again. Because that is what breeds the expert, and these experts are the true limited resource, co-developers that have had the same misunderstandings and learned them the same hard way, who understand that a collaboratively developed and maintained open source package is never done, never free of bugs, always out of date, too slow, too bloated and that every single line of code may, five years from now, turn into a liability because when it was added some other part of the system was not taken into account and time was scarce and real live demanding, so we just added it, surely to be fixed later... And those who wrote it will be able and willing to keep it working because they put genuine effort into their code and into fully understanding the context, the philosophy and the culture of a project. The most rewarding thing that can happen to an open source maintainer is not a 20k SLOC patch making our code 10 times faster, portable to 3 more platforms, useful for thousands of more users. No, the most rewarding thing is when someone comes along and starts helping out, little by little, taking more and more responsibility, becoming a member of a community and being ready to take over when critical problems appear all of a sudden, security issues, important projects that depend on our software needing our help, or just when the original maintainer is lying in bed with a cold and the next release must be prepared. The quality and usefulness of code is not measured in lines of code or the number of features - that are managerial categories that have contributed to the endless bloat and crappyness that we encounter in corporate software every day. Quality implies reliability, including that of the maintainers, usefulness means stability and robustness in the face of the everchanging environment of platforms, operating systems and system libraries. Things like that don't just happen, they require some sentient being making choices, giving priorities and that in the long term. Perhaps it can boil down to this: open source is not writing cool code and being adored for it. It is stressful and draining work, because you don't earn anything but some gratefulness (hopefully) and a certain amount of merit. And this is fine, because the merit and the feeling of collaboratively improving a piece of software together, that makes life easier for at least some users, is worth much more than a well paid job in some megacorp or investors knocking on my door begging me to sell my soul for some startup. We wouldn't do it if it were otherwise. So please don't send me large patches that you didn't take the pain to code yourself, line by line, as I can never fully trust this code. I trust you (I do), but I don't trust an LLM, I wouldn't be a software developer if I were to blindly trust machine-generated code, too much of my lifetime has been spent with fixing problems caused by machines gone wrong and performing the literal interpretation of what I perceived to be a perfect specification of my programming problem, only to see that the machine had a slightly different idea of what I meant, the complex shell script doing something rather different than what I intended or the compiler taking me by the word and producing (for me) nonsensical behaviour. Now the LLM-powered contributor will point out the obvious contradiction: I proclaim the fallability of humans yet at the same time I don't trust LLMs, so what's the difference, you say. The difference is that LLMs don't give a shit. You may do, I fully believe you, but it's not you who wrote the code in the first place. If you had, then, for every line of code, in some remote corner of your brain, something always reminds you of the fact that the code is a representation of yourself, a reflection of your abilities, and that you will be judged by your peers for it, and that you will judge yourself in front of others. And I'm not talking as a luddite or as a computing dinosaur (even though I'm a bit of both). A compiler takes a specification, a formal verifier a proof and an LLM some sort of prompting, yet all that input is written by us, humans, flawed humans, overeager, distracted, full of hubris, inexperienced. So how can we than assume some magic oracle takes our necessarily flawed and incomplete input and produce quality code? Do you really think that a test-suite that passes is in any way able to measure the degree of quality? Tests never prove anything, not even the most massive, brutal test suite, tests can only raise confidence. That in itself is a useful thing to have, but it is not enough to measure quality. No, this is wishful thinking, understandibly heavily propagated by those who have some stake in this new technological trend. Or by those who believe that every new technology must be good because it is new and represents some form of progress. Or by those of us who are professional coders and who want to make the often quite boring job of coding for a living a little bit more interesting. If your job consists of grinding out soulless boilerplate code, and LLMs make you more productive, then, sure, go ahead and use them. Your boss will love you for it, and subsequently raise the bar, and all you gained in the end is a bigger workload. Oh, and by the way, rest assured, our job will never be in peril. Do you think outsourcing code grinding jobs is something new? We just replaced cheap labor with machines. And it's not even cheaper, in the end, the cost is just externalized. So now we have our personal coding slave, industrious, undemanding, prone to making mistakes, having no concept of correctness or elegance, and completely uninterested in the end goal. Still cheap, but rest assured that this will change, once you are (or think you are) completely dependent on it. So, if you are a drive-by contributor sending just some typo fix in the documentation, a straightforward bugfix, or a well thought out piece of functionality, cleanly designed and limited in scope, then you are welcome. No, you are more than welcome. If you are a committed old hand passing over a large patch, or even a redesign of some central system component, I will do my best to accomodate you, and will reject changes only after careful consideration. But if I never heard from you and you send me big LLM-generated patches making changes all over the place, then I'm very sorry, but please keep them, I'm not interested. There will be situations where I won't be able to assess whether an LLM was involved, or smaller patches that appear to be easy to evaluate and the only choice I have here is to go on a case by case basis, but please do me the favour to at least state explicitly whether you used an LLM or not, that is not too much to ask. Programming is not magic. Everything can be implemented, given enough time and effort. If you want to help, then consider by starting to hang around. Ask stupid questions(5). Give suggestions. Contribute little improvements and bug fixes. Discuss larger issues first before you address them. Go step by step, so that you can learn about the developers, so that the developers can learn about you, and after a certain time, when you are more or less a part of the team, you can contribute just about anything. And at that point of time, you won't need an LLM anymore, because you pretty much know your way around well enough and realize that the more pressing problems are usually not the lack of features or lines of code: it's the lack of time to truly address large design changes while taking into consideration all the implications, again, in the long term. --- (1) or is it? (2) "ROCK" + "ROLL" (3) The jury is still out on that one. (4) You learned that in school. (5) This is important: it helps you assess how open and friendly a developer community is and it shows us that you are willing to take advice from those who know the project best.