On the Problem of LLM-assisted Contributions to Open Source Projects
As the maintainer of an open source project one sooner or later is
faced with contributions that have been produced by Large Language
Models, or at least which have been created and refined using
assistance from LLMs. This puts me, as a maintainer, in a dilemma:
The contribution may be genuinely useful, perhaps even a grand
improvement, the absence of which may have bugged me for a long
time. Also, I assume the intent of the contributor is well meaning
and I am grateful for anybody who is offering help to improve and
polish our project, but I can't wholeheartedly embrace such a
contribution for several reasons.
There is an ethical side to the use of large frontier models, one
that gets papered over by proponents of using LLMs intensively for
coding in a way that surprises me. So, of course we are all very
concerned and the situation regarding licensed code is everything
but clear(1) and AI crawlers bringing the web to its knees are a
big problem and we all feel really sorry for African kids scraping
rock for raw materials to build debt-financed hardware that soon
becomes obsolete and replaced, armies of underpaid third-world gig
workers vetting the training data and the considerable waste of
water and energy, all for an experimental automaton that's doubtlessly
good in producing boilerplate code of uncertain correctness. Yeah,
we totally agree, but, let's move on, can't you see the potential
and all those problems caused will surely be solved some day and
so on and so on. Here we see the usual line of argument: first the
required minimum amount of virtue signalling, then the inevitable
change of subject and the promises of vast potential for the future
of humanity and (less pronounced, but probably more impactful)
personal convenience.
Now, I don't eat meat, and when telling my brother-in-law about the
water consumption and the raw food being wasted just to produce
less food of a different type, he leans forward (picture the sound
of creaking leather, of seating furniture groaning under duress),
flexes his heavily tatooed knuckles(2), gives me one of his evil
stares and tells me to my face that he doesn't care, he likes meat,
loads of it and he doesn't give a fuck about what I just said. And
I truly appreciate this refreshing level of honesty, this eloquent
demonstration of uncompromising directness (he's a really nice chap,
by the way).
So please don't whine to me about climate change while polishing
your CV with ChatGPT, or letting Claude spit out code for something
that you only half understand. Lazyness is human, ignorance is a
fact of life, but for hypocrisy there is really no excuse. Nothing
is wrong with having principles and sticking to them, as old-fashioned
as it seems. Or at least have the decency to admit that an increase(3)
in productivity and sheer convenience is more important to you than
the social and environmental implications.
But actually, that's not the issue that I want to talk about. When
I am faced with an LLM-generated contribution, I can not help but
feeling, well, nervous. Who is contributing here? A person? A
machine? Both? Who takes responsibility for the changes and the
bugs, the shortcomings or misunderstandings therein?
The process of writing code is very different from the process of
reading it. Writing code means you effectively execute the code in
your head while you shape it into instructions for a machine, reading
code just tries to guess intent and everybody with at least a bit
of experience will know how stubbornly we can miss bugs and oversights
while having glanced over a piece of code a hundred times and more.
Writing internalizes information much more than reading does(4).
Whenever you debug a hard programming problem while taking a walk,
while in the shower, or just sitting on the sofa, staring into
space, you sometimes experience the amazing ability of being able
to recall highly detailed information, by actively and selectively
digging into code stored somewhere deep in your head. It is hard
to explain, but it works, and it requires that you not only write
the code in question, but also are able to follow the chain of
thought that lead to writing it, including the motivation why it
was written in the first place.
So with code produced by LLMs I have no choice but to be extra
diligent when evaluating a change, always expecting something nasty
slipping through, never knowing why something has been written the
way it is. Is this a sound design? Is this intended? What about
that special nasty detail, the one which made me abandon my own
effort at implementing the very same feature, a couple of years
ago? Is this just hallucinated (by man or machine)? Is it because
dozens of other pieces of code agree with this? Is it because there
exists some platonic ideal of networking code in the I/O subsystem
of a dynamic language runtime that the LLM somehow is able to grasp?
If that is what you think then let me tell you that you are dealing
in metaphysics, which neither theory nor evidence supports. But
people want to believe in all sorts of things and the promises keep
being pounded into our heads, as there are whole economies at stake.
And I'm not even trying to touch the subject of inventiveness and
originality. If you think innovation is just a statistical error,
a random mutation, then you are beyond help, anyway.
I think the problem here is a misunderstanding of what maintainers
of long-lived open source software do. I can only speak for myself,
but it is not that we are sitting around, twiddling our thumbs and
waiting for 400 pound patches landing with a thump in our inbox,
containing big, beautiful, groundbreaking features. We are
experienced, often professional programmers who worked on these
systems perhaps for decades. We can implement those features, we
don't need some coding wizard to submit things like DWARF support
for debugging info or transport layer security for our package
manager tool. We are perfectly capable of implementing stuff. The
feature is not what counts, as useful as it may be.
What counts are kindred spirits, co-maintainers, individuals that
are willing to dig into the suboptimally designed systems that we
have designed (often badly), implemented, patched, fixed, broken
and fixed again for such a long time. What counts are people that
are ready to stumble over all the corner cases, the inefficencies,
the mistakes we made and that lurk in the dark corners of some
obscure runtime library module that we'd rather not look at again.
Because that is what breeds the expert, and these experts are the
true limited resource, co-developers that have had the same
misunderstandings and learned them the same hard way, who understand
that a collaboratively developed and maintained open source package
is never done, never free of bugs, always out of date, too slow,
too bloated and that every single line of code may, five years from
now, turn into a liability because when it was added some other
part of the system was not taken into account and time was scarce
and real live demanding, so we just added it, surely to be fixed
later... And those who wrote it will be able and willing to keep
it working because they put genuine effort into their code and into
fully understanding the context, the philosophy and the culture of
a project.
The most rewarding thing that can happen to an open source maintainer
is not a 20k SLOC patch making our code 10 times faster, portable
to 3 more platforms, useful for thousands of more users. No, the
most rewarding thing is when someone comes along and starts helping
out, little by little, taking more and more responsibility, becoming
a member of a community and being ready to take over when critical
problems appear all of a sudden, security issues, important projects
that depend on our software needing our help, or just when the
original maintainer is lying in bed with a cold and the next release
must be prepared.
The quality and usefulness of code is not measured in lines of code
or the number of features - that are managerial categories that
have contributed to the endless bloat and crappyness that we encounter
in corporate software every day. Quality implies reliability,
including that of the maintainers, usefulness means stability and
robustness in the face of the everchanging environment of platforms,
operating systems and system libraries. Things like that don't just
happen, they require some sentient being making choices, giving
priorities and that in the long term.
Perhaps it can boil down to this: open source is not writing cool
code and being adored for it. It is stressful and draining work,
because you don't earn anything but some gratefulness (hopefully)
and a certain amount of merit. And this is fine, because the merit
and the feeling of collaboratively improving a piece of software
together, that makes life easier for at least some users, is worth
much more than a well paid job in some megacorp or investors knocking
on my door begging me to sell my soul for some startup. We wouldn't
do it if it were otherwise.
So please don't send me large patches that you didn't take the pain
to code yourself, line by line, as I can never fully trust this
code. I trust you (I do), but I don't trust an LLM, I wouldn't be
a software developer if I were to blindly trust machine-generated
code, too much of my lifetime has been spent with fixing problems
caused by machines gone wrong and performing the literal interpretation
of what I perceived to be a perfect specification of my programming
problem, only to see that the machine had a slightly different idea
of what I meant, the complex shell script doing something rather
different than what I intended or the compiler taking me by the
word and producing (for me) nonsensical behaviour.
Now the LLM-powered contributor will point out the obvious
contradiction: I proclaim the fallability of humans yet at the same
time I don't trust LLMs, so what's the difference, you say. The
difference is that LLMs don't give a shit. You may do, I fully
believe you, but it's not you who wrote the code in the first place.
If you had, then, for every line of code, in some remote corner of
your brain, something always reminds you of the fact that the code
is a representation of yourself, a reflection of your abilities,
and that you will be judged by your peers for it, and that you will
judge yourself in front of others.
And I'm not talking as a luddite or as a computing dinosaur (even
though I'm a bit of both). A compiler takes a specification, a
formal verifier a proof and an LLM some sort of prompting, yet all
that input is written by us, humans, flawed humans, overeager,
distracted, full of hubris, inexperienced. So how can we than assume
some magic oracle takes our necessarily flawed and incomplete input
and produce quality code? Do you really think that a test-suite
that passes is in any way able to measure the degree of quality?
Tests never prove anything, not even the most massive, brutal test
suite, tests can only raise confidence. That in itself is a useful
thing to have, but it is not enough to measure quality.
No, this is wishful thinking, understandibly heavily propagated by
those who have some stake in this new technological trend. Or by
those who believe that every new technology must be good because
it is new and represents some form of progress. Or by those of us
who are professional coders and who want to make the often quite
boring job of coding for a living a little bit more interesting.
If your job consists of grinding out soulless boilerplate code, and
LLMs make you more productive, then, sure, go ahead and use them.
Your boss will love you for it, and subsequently raise the bar, and
all you gained in the end is a bigger workload.
Oh, and by the way, rest assured, our job will never be in peril.
Do you think outsourcing code grinding jobs is something new? We
just replaced cheap labor with machines. And it's not even cheaper,
in the end, the cost is just externalized. So now we have our
personal coding slave, industrious, undemanding, prone to making
mistakes, having no concept of correctness or elegance, and completely
uninterested in the end goal. Still cheap, but rest assured that
this will change, once you are (or think you are) completely dependent
on it.
So, if you are a drive-by contributor sending just some typo fix
in the documentation, a straightforward bugfix, or a well thought
out piece of functionality, cleanly designed and limited in scope,
then you are welcome. No, you are more than welcome. If you are a
committed old hand passing over a large patch, or even a redesign
of some central system component, I will do my best to accomodate
you, and will reject changes only after careful consideration. But
if I never heard from you and you send me big LLM-generated patches
making changes all over the place, then I'm very sorry, but please
keep them, I'm not interested. There will be situations where I
won't be able to assess whether an LLM was involved, or smaller
patches that appear to be easy to evaluate and the only choice I
have here is to go on a case by case basis, but please do me the
favour to at least state explicitly whether you used an LLM or not,
that is not too much to ask.
Programming is not magic. Everything can be implemented, given
enough time and effort. If you want to help, then consider by
starting to hang around. Ask stupid questions(5). Give suggestions.
Contribute little improvements and bug fixes. Discuss larger issues
first before you address them. Go step by step, so that you can
learn about the developers, so that the developers can learn about
you, and after a certain time, when you are more or less a part of
the team, you can contribute just about anything. And at that point
of time, you won't need an LLM anymore, because you pretty much
know your way around well enough and realize that the more pressing
problems are usually not the lack of features or lines of code:
it's the lack of time to truly address large design changes while
taking into consideration all the implications, again, in the long
term.
---
(1) or is it?
(2) "ROCK" + "ROLL"
(3) The jury is still out on that one.
(4) You learned that in school.
(5) This is important: it helps you assess how open and friendly a
developer community is and it shows us that you are willing to take
advice from those who know the project best.