Ask HN: Why do people say LLMs create bad code "quality"?

2 points by chaidhat 2 months ago · 13 comments · 1 min read

Happy Thanksgiving! For context, the only time I've been exposed to "good quality code" was when I interned at a YC startup. I am creating my own startup at the moment. I keep hearing that LLMs create "bad quality code" and wondered what that meant? I've been trying to use Claude Code in the development of my app. If I am the one architecting the functions and services, making sure they are high cohesion loose coupling, abide by service oriented architecture, etc. and only having LLMs implement the functions themselves and not touch the architecture, will this ensure "better code?" What is "good quality code?" Why do LLMs inherently create bad quality code, and what ways should I AVOID using them?

Thank you kindly. Best Chai

Chaidhat Chaimongkol

WheelsAtLarge 2 months ago

When i started using LLMs for code writing the code it wrote was messy and mostly wrong. Recently my experience with relative small apps has been very positive. The code has been very readable and compact. But I can see where the code can get very messy if you don't have a structured plan for code development. It's very easy to create a messy code base that's hard to maintain.But the reality is that human developers do the same thing regularly.

I think that keeping the code that the LLMs write relative small and highly focus will lead to a code base that's easy to maintain over the long term.

I plan to create an app that relies heavily on LLM written code. I'm hoping that if I architect it as if am working with a team of programmers I'll be able to create a code based that I can update over the long term. I'll assign it assingments that I can then merge into the master codebase. You're in the same path. That's 2 of us. I think it will work out. We'll get what we want while saving a ton of time.

chaidhatOP 2 months ago

Thank you for the reply! Good luck with your app too! I heard of Claude Code filling out PRs but in my experience I haven't been able to successfully pull that off, as it creates errors and doesn't see it themselves. I am trying to experiment with a pipeline which it can find the feature it created by writing a frontend integration test and take a screenshot, using Playwright MCP, to verify whether it successfully or did not successfully execute the task. If it did not, then it loops until it does. This removes the human-in-the-loop and probably (I need to want internal evals to prove this out) increases its correctness per run. The bottleneck then becomes code review and making sure the code it did write isn't hot garbage.

bediger4000 2 months ago

"AI" generated code often has a number of styles in the same functions or modules. In an "AI" generated bash program I recently reviewed, the code had both "if [ testtesttest ]" and "if [[ testtesttest ]]" styles along with different styles of testing lexical equality.

It also had a globbing test I'd never seen before, something like "if [[ $X = *"string"* ]]". It works, but wow, ugly.

The new bash program looked old, like someone with a distinct set of preferences wrote it years ago and 2 or 3 maintainers had hacked on it, being less careful to keep a consistent style.

That sort of thing would be judged low quality legacy code 5 years ago. It's harder for a human to understand, and since it just might depend on a peculiarity of "=" vs "==" vs "-eq", people make minimum changes or maybe no change at all. That's one way "low quality" you seek is exhibited.

chaidhatOP 2 months ago

Thank you for the reply and the well thought-out example. This is a good point on garbage in => garbage out as someone else said. Interestingly, the opposite may be true for newer languages: I wrote my frontend in dart over three years in a very C way where getters and setters are just functions. However, after introducing LLMs to my codebase, it started using `get`s and `set`s keywords (which is the newer way). I have a question on personal preference: as an engineer yourself and you were in my shoes, would you prefer newer style > consistency (i.e., I should refactor my 50k LOC codebase) or the opposite (i.e., correct the LLM to use my style)?
- bediger4000 2 months ago
  
  For something that size, I'd prefer consistency, unless the original style is too idiosyncratic, like the fabled "writing Fortran in Perl".
  - chaidhatOP 2 months ago
    
    Thank you! Never heard that idiom but caused me to go into a rabbit hole researching Fortran haha

bn-l 2 months ago

- deprecated api use - mixture of styles vs one consistent hand throughout the code. - implicit code vs declarative - convoluted logic when there’s a simpler way. Sometimes complexity just for the sake of it (basically it’s all spaghetti code to me until I edit it myself)

Even thought the code quality is garbage ~90% time (to me), it’s still more fun than before—which is why I still use it.

Woe on to you though if you vibe code and don’t edit and don’t have taste from years of doing it manually before LLMs. You are building a mountain of tech debt (although maybe you can grow and hire fast enough to where that’s not an issue).

chaidhatOP 2 months ago

Thank you for the reply! I'll certainly try to keep those in mind. I've been coding in dart for 5 years (pre-chatgpt) but inside my own bubble and not with a world-class team-- that's what's worrying me. What would you recommend me to do to keep up with the latest best practices/write better "quality" code? Currently, I try to read open source codebases to keep up with best practice (but isn't that's where the LLMs are getting their data anyways?).

dmezzetti 2 months ago

The perception is that it's sloppier than what humans write. I wouldn't waste too much time worrying about that if the code is solving your problem. Sounds like you need to focus on getting a MVP to show to potential customers and if LLMs help you get there faster, so be it.

chaidhatOP 2 months ago

Thank you! That is a good take on things. If it works, it works. If it fails, then a customer complains and we fix it + make sure other code doesn't suffer the same error.
- dmezzetti 2 months ago
  
  Good luck!

mmphosis 2 months ago

Garbage in, garbage out. Beautiful code in, beautiful code out.

Code "quality" is a judgement call. Just like some LLMs, I'll repeat your question: What do you consider quality code?

I treat all code from LLM, search, stackoverflow, etc, as "sample code" that requires review and most likely modification, or writing the code from scratch. Your abilities and laziness may vary.

Also, there are reasons other than code quality not to trust LLMs.

chaidhatOP 2 months ago

Thank you! This is a good way to critically think about the code. Sort of like when Chegg was a thing and to answer your math homework, you'd have to adapt someone else's solution to a problem similar to yours. Easier than deriving from first principles but not quite asking your friend to do it for you.

Settings

Ask HN: Why do people say LLMs create bad code "quality"?

Keyboard Shortcuts