Lost in compilation | Socrate.chat

10 min read Original article ↗

Who is being fooled by the Claude C Compiler?

Alice: Have you seen that a researcher at Anthropic built the Claude C Compiler (CCC), with an autonomous agent team powered by a frontier AI model? I've read that it wrote the compiler in Rust with 100k lines of code, and that it was able to compile the Linux kernel. They even booted Linux to prove that it worked. All this was done by agents with minimal human supervision. Super impressive, don't you think?

Bob: Woah! Sounds cool. Were the agents able to use external resources?

Alice: Agents did not have internet access. It was a clean-room implementation.

Bob: No internet? What computer did the author use?

Alice: His laptop I guess. Well, he mentions Docker containers so probably some kind of remote machines actually.

Bob: So he ran the frontier AI model locally on these machines running Docker?

Alice: Oh no, I assume they had access to the company's supercomputers through the coding agents.

Bob: I see. So actually he used internet in a way. Did they open-source the compiler?

Alice: Yes!

Bob: That's great. Have you tried using it?

Alice: Well, I tried to compile a hello world program as per the README instructions, but it failed. The Gnu C Compiler (GCC) had no issue with the same program.

Bob: That's odd. The Linux kernel contains more than 35M lines of complex code, surely there is something that you've missed?

Alice: Indeed, just a path issue.

Bob: Ah! So the main problem here is that the README, generated by agents I assume, did not contain accurate instructions?

Alice: Yes. It also behaves differently from the GCC standard, which has the behaviour users would expect.

Bob: If the agents had no internet access (sic), how would they know about GCC behaviour?

Alice: Well, the author used GCC as an oracle to speed up development. I guess agents could have copied its behaviour.

Bob: Oh, so in order to build his own C Compiler, the author leveraged an existing C compiler? Isn't that considered cheating?

Alice: It was only treated as a black-box executable, with no access to its source code so that agents could not just copy it.

Bob: I see. Still, using this oracle feels like data leakage. Moreover, since GCC is open source and has been widely used for decades, the underlying AI model could have learned a lot about it from its training data, don't you think? Or even memorize it with some form of compression.

Alice: Maybe. Even so, CCC is written in Rust, so the agents had to be somewhat creative to make it work.

Bob: That's a fair point. I am wondering though, how did the agents verify, during the development process, that they were making good progress? It's surely very hard to compile the linux kernel all at once.

Alice: Yes, that's why the author leveraged existing high-quality compiler test suites to identify bugs in the first development phases.

Bob: Oh I understand better now! Such test suites have probably been written by humans over decades, and encompass a large body of human expertise, knowledge, and experience. I wonder if we can still qualify this experiment as "clean room" or "from scratch", at this point.

Alice: You are absolutely right, overall this agent team did leverage a lot of existing human-generated knowledge such as the training data, the test suites, and the GCC executable. From scratch or not, I believe this is still an impressive achievement! It was only done in 2 weeks, and API cost of $20k. The company said in their video that this project would take a small team months.

Bob: It is certainly showing an ability from the AI agents to write software that does something expected, given a lot of human-produced evaluation strategies. In other words, guidance. So it seems these agents are not really autonomous. I am wondering also, how did experts in the Programming languages and Compilers field react to this experiment?

Alice: I've read a few comments. For instance, Talia Ringer points out that this software is "weird", in that it was able to compile the huge and complex Linux kernel to something that runs, but does not really perform type checking for example. And that humans would not develop such software.

Bob: I see. What's your take on that? Do you think type checking is important?

Alice: Well, why bother about type checking when it is able to compile a Linux kernel that boots properly?

Bob: If you want to use this compiler, better watch out for nasal demons and the like. But, you said the agents had access to a GCC binary, right?

Alice: Yes, to identify bugs faster and fix them.

Bob: I'm wondering then, what exactly does this experiment achieve?

Alice: I just told you: it achieves building a C compiler from scratch that is able to compile Linux! And then you can boot it to prove that it works.

Bob: I believe we have already established that it was not built from scratch, quite the opposite. Furthermore, we already have access to such a compiler and its source code: GCC. Why build a poor copy of something that is easily reproducible and distributed, free as in beer, free as in speech, widely used and functional? Clean-room design usually means reverse-engineering to avoid copyright infringement. But GCC is copyleft! Who is ever going to need CCC to do something they couldn't do beforehand?

Alice: Are you saying this software is useless?

Bob: It seems so.

Alice: That's ok, it's a research project! Research software that has no industrial use is produced every day, and that's fine! It successfully demonstrated the capabilities of AI agents!

Bob: Let me summarize. The claim that a small team would have taken month to create this C compiler by hand does not make sense, since humans would never code such a weird software artifact. No new feature has been created, since the goal was to compile Linux exactly like GCC does. It was not built from scratch nor "clean-room", but rather leveraging a lot of existing human generated content — C Compiler test suites, GCC binaries, and a huge amount of training data. The development was not autonomous, but rather supervised, both directly by the researcher and guided by automated tests and such. Is there even a single claim made by this company with this project that does not end up being blatantly false? And more importantly, what has actually been achieved?

Alice: You make it sound like it was staged, overall a big lie. Do you believe the authors had malicious intent?

Bob: God, I hope not... Well, actually I am not sure which scenario I prefer. Did the author and the company's marketing department actually end up fooling themselves too? That's not unlikely. But that is speculation, and it does not really matter to us right now. What do you think about my previous question, as to what has actually been achieved?

Alice: A new kind of C Compiler entirely written in Rust by a machine?

Bob: Wrong. It is not really new, and not written by a machine. Is it even a C Compiler by the traditional sense, since it's not able to compile Hello World, much less arbitrary C programs? It is at best mediocre plagiarism with many extra steps using a convoluted, resource-hungry and wasteful technique relying on many, many hours of collective embodied human labor.

Alice: So you believe nothing valuable has been achieved?

Bob: Quite the contrary, dear Alice. The important detail here is: valuable to whom?

Alice: I'm not sure I'm following. They open-sourced the compiler so surely everyone can benefit from it?

Bob: As previously discussed, the compiler in itself is not very useful. However, important things have still been achieved with this project and press release, besides our discussion that I am enjoying. Can you venture a guess?

Alice: Ah! You are mentioning the press release. Maybe something to do with marketing? Company image? Fake news? Hype?

Bob: I would simply call it strengthening the illusion.

Alice: What illusion?

Bob: The illusion of machine intelligence. Or, to put it plainly, the concept of AI or the fact that it already has, or will be, achieved.

Alice: You don't believe in AI?

Bob: Historically, AI has always meant tasks that machines are not yet able to do well. As such, it is a moving target.

Alice: You don't think large language models are real AI?

Bob: Define real. If enough people believe it is real, even if some skeptics remain, then how can it not be real?

Alice: Do you not believe in science? Objective truth?

Bob: If I did not believe in objective truth, how could I answer your question? But we are getting off-topic. By the way, have you looked at said source code of this useless compiler?

Alice: I did peek at a few files here and there. Overall I get the feeling that it is hard to understand what is going on.

Bob: I see. Reading the source code does not appear to provide obvious benefits either.

Alice: For you, the inability for humans to understand AI-generated software is an issue? Isn't that a human limitation rather than a machine one?

Bob: Not quite. If we delegate the task of writing code entirely to machines, especially for new products, not just subpar replica of existing robust software that has been around for decades such as GCC, how are we able to: first, tell what the expected behaviour is. And second, verify that this new software indeed exhibits the expected behaviour?

Alice: You can tell the agent to write some tests.

Bob: Indeed, but how to tests these tests? What are the expectations? All software have bugs. One important aspect of software, ultimately, is what its users are able to accomplish with it. And even with APIs, compilers, MCPs and such, that allow programs to operate other programs, eventually the end user is always human.

Alice: Why could you not just ask the agent to write software that builds a product for humans, and observe their interactions with it, and collect their feedback, in order to improve it?

Bob: It sounds reasonable, yet I claim it is impractical, if not borderline disrespectful, akin to spam. Are you going to generate somewhat random software, that you don't fully understand, and then expect users to figure out how to use it? It's likely that the user will not understand, nor enjoy, this process and will churn. Besides, weren't machines built to help us humans and save us some time and labor, rather than spamming us with dubious software? In our hello world example, CCC failed the most basic expectation that humans have for a compiler. That does not build trust.

Alice: I think I get what you are saying. So you believe we should write code by hand?

Eve: Hey folks! I couldn't help myself but eavesdrop a bit on your coffee chat. This dialogue is fascinating, but we are having a high severity incident on production servers right now. Could you help a bit?

Bob: Sure! Let me just get my Claude Code with MCP to get started with the investigation.