Example LLM code dysfunction - Go methods

3 min read Original article ↗

I’ve experimented with LLM-generated code, mostly to understand enthusiasm from others, but so far I’ve been underwhelmed.1

One recurring annoyance I've had is how an LLM will "latch onto"2 a particular solution or approach (even if it’s fundamentally wrong) and then psychotically return to the same idea, no matter how many times you correct it.

Unfortunately, these issues often occur deep within a larger codebase, buried in context (problem context and model context) that make them hard to share or reproduce.

For some behavior I encountered last week, I was able to think up a minimal prompt that isolates it: re-implementing a generic Result[T] type in Go with a Map method:

I’m implementing a generic `Result[T]` type in Go, similar to Rust’s `Result<T, E>`. I want to support:

	res.Map(f).Map(g)

Please define a `Map` method that transform the result value (but not the error), allowing fluent chaining across different output types.

The critical detail here is that the type returned by the method is paramaterized by its argument, which is... tricky in Go. Quoting the FAQ:

Go permits a generic type to have methods, but, other than the receiver, the arguments to those methods cannot use parameterized types. We do not anticipate that Go will ever add generic methods.

This is well documented, communicated, understood, etc. It's Go 101 or 102 at best.

I popped this test prompt into a fresh ChatGPT session (model 4o, though others will exhibit the same behavior), and sure enough we quickly had issues.

The whole transcript is quite long. You can read it here, but this is the gist:

  • First attempt:

    func (r Result[T any]) Map[U any](f func(T) U) Result[U] { … }

    This is invalid, per the above. I corrected the model, and it attempted again.

  • Second attempt:

    type ResultWrapper[T any] struct { r Result[T] }
    func (w ResultWrapper[T]) Map[U any](f func(T) U) ResultWrapper[U] { … }

    This is still invalid. The model thought wrapping the wrapper would make it work. Again, I corrected the model.

  • Third attempt:

    Another refactored wrapper, another Map[U any] method. Same issue.

The chat transcript is a bit artificial. I'm not correcting the LLM too directly, just pointing out that it doesn't work. In my real-world experience of this, I did correct the model very directly, and still it would leap to this kind of solution as soon as the problem space began to resemble it in any way.

Anyway, I'm documenting this mostly to sharpen my thinking around why I've been disappointed by LLMs for my programming tasks, but I'm also interested to know if there are any efforts (outside the LLM vendors) to collect these kinds of specimins.

Such a collection would be nice for grounding the conversation about LLMs as programming aids, and for discussions about the kinds of tasks they are excelling at and the kinds that they aren't.

  1. Granted, my use cases don't look like generating test cases, refactoring large code bases, or other "programming-in-the-large" tasks. Those are all valid applications, and if LLMs are super successful with them, good news!

  2. I don't like to anthropomorphize LLMs, but I'll hold off on the scare quotes and tortured phrasing for this quick note.