Debian dismisses AI-contributions policy

Did you know...?
LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

In April, the Gentoo Linux project banned the use of generative AI/ML tools due to copyright, ethical, and quality concerns. This means contributors cannot use tools like ChatGPT or GitHub Copilot to create content for the distribution such as code, documentation, bug reports, and forum posts. A proposal for Debian to adopt a similar policy revealed a distinct lack of love for those kinds of tools, though it would also seem few contributors support banning them outright.

Tiago Bortoletto Vaz started the discussion on the Debian project mailing list on May 2, with the suggestion that the project should consider adopting a policy on the use of AI/ML tools to generate content. Vaz said that he feared that Debian was "already facing negative consequences in some areas" as a result of this type of content, or it would be in a short time. He referenced the Gentoo AI policy, and Michał Górny's arguments against AI tools on copyright, quality, and ethical grounds. He said he was in agreement with Górny, but wanted to know how other Debian contributors felt.

Ansgar Burchardt wrote that generative AI is "just another tool". He noted that Debian doesn't ban Tor, even though it can be used to violate copyright or for unethical things, and it doesn't ban human contributions due to quality concerns: "I don't see why AI as yet another tool should be different."

Others saw it differently. Charles Plessy responded that he would probably vote for a general resolution against "the use of the current commercial AI for generating Debian packaging, native, or infrastructure code". He specified "commercial AI" because "these systems are copyright laundering machines" that abuse free software, and found the idea that other Debian developers would use them discouraging. He was not against generative AI technology itself, however, as long as it was trained on content that the copyright holders gave consent to use for that purpose.

Russ Allbery was skeptical of Gentoo's approach of an outright ban, since "it is (as they admit) unenforceable". He also agreed with Burchardt, "we don't make policies against what tools people use locally for developing software". He acknowledged that there are potential problems for Debian if output from AI tools infringes copyright. Even so, banning the use of those tools would not make much difference: "we're going to be facing that problem with upstreams as well, so the scope of that problem goes far beyond" direct contributions to Debian. The project should "plan to be reactive [rather] than attempt to be proactive". If there are reports that AI-generated content is a copyright violation, he said, then the project should deal with it as it would with any Debian Free Software Guidelines (DFSG) violation. The project may need to make judgment calls about the legal issues then, but "hopefully this will have settled out a bit in broader society before we're forced to make a decision on a specific case".

Allbery said his primary concern about the impact of AI is its practical impact:

Most of the output is low-quality garbage and, because it's now automated, the volume of that low-quality garbage can be quite high. (I am repeatedly assured by AI advocates that this will improve rapidly. I suppose we will see. So far, the evidence that I've seen has just led me to question the standards and taste of AI advocates.)

Ultimately, Allbery said he saw no need for new policies. If there is a deluge of junk, "we have adequate mechanisms to complain and ask that it stop without making new policy". The only statement he wanted to convey so far is that "anyone relying on AI to summarize important project resources like Debian Policy or the Developers Guide or whatnot is taking full responsibility for any resulting failures".

A sense of urgency

In reply to Allbery, Vaz conceded that Gentoo's policy was not perfect but, despite the difficulty in enforcing it, he maintained there was a need to do something quickly.

Vaz, who is an application manager (AM) for the Debian new maintainer process, suggested that Debian was already seeing problems with AI output submitted during the new maintainer (NM) process and as DebConf submissions, but declined to provide examples. "So far we can't [prove] anything, and even if we could, of course we wouldn't bring any of the involved to the public arena". He did, however, agree that a statement was a more appropriate tool than a policy.

Jose-Luis Rivas replied that Vaz had more context than the rest of the participants in the discussion and that "others do not have this same information and can't share this sense of urgency". He inferred that an NM applicant might be using a large-language model (LLM) tool during the NM process, but in that scenario there was "even less point" in making policy or a statement about the use of such tools. It would be hard to prove that an LLM was in use, and "ultimately [it] is in the hands of those judging" to make the decisions. "I can't see the point of 'something needs to be done' without a clear reasoning of the expectations out of that being done".

Vaz argued that having a policy or statement would be useful, even in the absence of proof that an LLM was in use. He made a comparison to Debian's code of conduct and its diversity statement: "They might seem quite obvious to some, and less so to others." Having an explicit position on the use of LLMs would be useful to educate those who are "getting to use LLMs in their daily life in a quite mindless way" and "could help us both avoid and mitigate possible problems in the future".

The NM scenario Vaz gave was not convincing to Sam Hartman, who replied that the process would not benefit from a policy. It is up to candidates to prove to their application manager (AM), advocates, and reviewers that they can be trusted and have the technical skills to be a Debian Developer:

I as an AM would find an applicant using an LLM as more than a possibly incorrect man page without telling me would violate trust. I don't need a policy to come to that conclusion.

He said he did not mind if a candidate used an LLM to refresh their memory, and saw no need for them to cite the use of the LLM. But if the candidate didn't know the material well enough to catch bad information from an LLM, then it's clear they are not to be trusted to choose good sources of information.

On May 8, after the conversation had died down, Vaz wrote that it was apparent "we are far from a consensus on an official Debian position regarding the use of generative AI as a whole in the project". He thanked those who had commented, and said that he hoped the debate would surface again "at a time when we better understand the consequences of all this".

It is not surprising to see Debian take a conservative, wait-and-see approach. If Debian is experiencing real problems from AI-generated content, they are not yet painful or widespread enough to motivate support for a ban or specific policy shift. A flood of AI gibberish, or a successful legal challenge to LLM-generated content, might turn the tide.