AI solves Advent of Code 2022
note89.github.ioThis day I asked it not too fundamental questions about Clojure and it was able to provide impressive, accurate answers and provide correct code examples. However if you continue the dialogue and ask it to do more advanced stuff, it will just make up stuff out of thin air. For instance it will use functions that don't exist and claim that they can be imported from packages that don't exist or don't have them. Once you point out these mistakes, it will admit them and come up with different changes which can be even worse, but sometimes also be better and save the whole thing. Overall I'm not sure how useful this will turn out, given that its not reliable. It may be useful to get some initial intuitions and informations (non specific stuff it usually gets right), but it can also mislead you badly. I asked it, how it makes these mistakes only to understand them and admit them once I point them out. It has no answer beyond the usual "I'm a language model". It also told me that it is capable of logical inference, but denied that the next day. Then it told me that its answers would always be consistent, which is a lie. The whole thing is really weird, because its somewhat very smart and capable and incredibly stupid and dishonest at the same time.
That’s what I’m seeing too. I had a problem with some Hashicorp Packer scripts and posed it to ChatGPT. It did have an idea of the shape of the problem. To solve it the bot just hallucinated syntax. It spoke with great authority that this was the solution and provided a beautifully syntax-colored excerpt of something that wouldn’t have even compiled.
This was perhaps a very hard problem for an LLM, as the Packer tool’s nature is to manage layers of context. Environment variables passed through templates then passed to scripts which themselves might be in other frameworks. So in this case it to be confused about what was Ansible syntax and what was Packer.
So the bot seems to have different failure modes than humans. Distinguishing context layers seems to be a weak point. And an answer that is a wild guess looks as authoritative as a solid answer. But it’s still extremely impressive.
I think it's just not great at Clojure, it's less popular and so there is less of it in its training data. Also Clojure seems to be kind of hard to get right.
I started trying to learn Clojure with this years advent of code, but got stuck and first tried to use ChatGPT to solve it. My impression matches yours in that it consistently produced non working code and even when told about the error it was unable to fix it.
Then I instead decided to let it use any tool or language it knows, and I'm now documenting how it's doing solving the things.
If you're interested, here is Day1 where I first tried to use it to help me solve it with clojure, but then I gave up and asked for any concise solution. So I got a working solution with `awk` https://blog.nyman.re/2022/12/02/chatgpt-does-advent.html
The second day I just let it pick anything, and it successfully solved the day2 puzzle using python which seems to be it's go-to language. https://blog.nyman.re/2022/12/03/chatgpt-does-advent.html
This is how I felt speaking to some people in India. They could speak, but there was zero understanding, as evidenced by their actions. Personally when learning languages I develop ability to understand years before considering myself to be able to speak it, but it is clear that not everybody does that.
>The whole thing is really weird, because it's somewhat very smart and capable and incredibly stupid and dishonest at the same time.
Not so weird: this judgment could apply to a lot of humans and to whole fields of human activities, if not to the essence of life itself.
In particular it reminds me of a con man every expert could see through, but that mesmerized management with his buzzwords talk, causing an exodus of competent people and high turnover for a few years, and most likely many millions in damage.
With advanced enough AIs handling full remote jobs, this could be done on steroids, getting you a lot of income while wreaking havoc in the companies.
guess it actually found how human work.
Rather just how superficial and stubborn imitation and arguing work.
Some men see things as they are and say why, I dream things that never were and say, why not?
Would be interesting to see if meaningful refinement training could be done by hooking the model up to a language interpreter/compiler. So the model can learn for itself what is valid output.
How much could you charge for packages that the AI says should exist?
I think it must have seen the solution somewhere already on the web. I find it extremely hard to believe that such a general purpose chatbot would just be able to solve programming problems. Deepmind had a paper [1] on solving programming problems a couple of months ago and they had to apply quite specialized heuristics in order to solve these problems. Obviously ChatGPT does nothing of the sorts and it just seems extremely unrealistic that it would be capable of outperforming previous work like that.
I’ve had this skepticism until I started asking the bot to do things like “create a limerick using some words that have never existed before and then explain the definition of those words.”
I don’t believe the AI is generalized by any means. But I think it’s more than a Eliza-style parlour trick.
It does really well with facts but misses subjectivity and nuance. Like I ask it to write stories for Seinfeld or Raymond and it misses the tone and style entirely. But it gets all the names and other factual details right. I find it works incredibly well when you want things that are objective, like code or rhymes or comparisons of things.
It's really quite amazing. Inspired by your example, I tried to make it create something which nobody could have thought about before:
Write a deep poem about life, death, and the website Rotten Tomatoes
The result:
Life is a fleeting flame
A dance in the endless game
We come and we go
Leaving our mark, but never to know
What the future holds or who's to blame
Death is the great unknown
A mystery, never to be shown
But Rotten Tomatoes, a website so bold
Attempts to rate our stories untold
As if life and death can be bought and sold.
But in the end, what does it matter?
The ratings, the reviews, the bitter chatter
For in the grand scheme of things
We are but fleeting moments, the faintest of flings
And Rotten Tomatoes, a mere blip on the radar.
> I’ve had this skepticism until I started asking the bot to do things like “create a limerick using some words that have never existed before and then explain the definition of those words.”
Sounds interesting! Mind sharing the results?
I’m not saying they’re good. Just that they’re (mostly) following the rules and aren’t just thematically random.
Perhaps they aren't good, but they aren't that bad either. Probably better than I'd manage...
That first one is pretty good.
What does any of this have to do with solving programming problems?
Though it may look amazing it is not really thinking. It is simply capable of using a placeholder word. For HN’s sake AI does not understand anything, it’s just leveraging patterns, patterns that are sometimes complicated for us and that does make a great tool. But it’s just a tool for now.
We are all philosophical zombies. Let’s not single out the software. ;)
Take a look at the sibling. Maybe be less than understanding but it’s a lot more than a placeholder.
AlphaZero and Stockfish are not really thinking either when it comes to chess.
I just don't understand how a thinking human can not see that this is basically irrelevant.
> things that are objective, like code or rhymes
rappers disagree
Oh my yes, for sure. I mean the literary rules of a rhyme or a limerick. Rules that artists can, do, and should break for effect.
The top leaderboard spots for AoC day 2 were taken by people who passed the response directly into GTP3. https://twitter.com/max_sixty/status/1598924237947154433
The AoC challenges this early aren't difficult, but they have several steps and are significantly more challenging than something you would be able to find as a Stackoverflow answer.
This was the first solution, submitted very quickly after the problem was published.
For one problem. For the rest of the problems it has been very challenging to get the AI to write the correct solution. Still an impressive result that with specification, testing, and feedback the AI can come up with the correct result in the end.
Does it have access to the web? For one of my questions it answered:
"Unfortunately, I am unable to provide a detailed description of the education system in Poland and its changes over the last 30 years because I have limited access to information and cannot browse the internet."
but I have no idea if it's not lying :)
Some posts yesterday showed that network is usually disabled, but with the right prompt, you can enable network and someone got it to like a Twitter post.
I think the “liking Twitter post” part was just a coincidental joke.
Correct. He also claimed the chatbot signed in as Grime's Twitter account to do the action, which is obviously implausible. If you look in the replies you'll see him clarify it was a joke.
Thanks for the correction, guess I did not read that post close enough.
I would like to see how to achieve that. I tried asking it to translate the first paragraph of a site, but it provided more of an interpretation or summary of the article, rather than the actual text. When asked to copy the text from the website, it said that it couldn't. Additionally, when asked to provide a summary of a website that didn't exist before (created it myself yesterday), it gave a summary that was completely fictional, based on its interpretation of the URL.
You can check this by taking some time to create some specific weird puzzle that is unlikely to have been made in the format you come up with, then see if it can solve it. If you don't write it anywhere then it is being solved for the first time. Just make sure it is a pretty unusual puzzle.
These first few advent problems are extremely trivial. Solvable <1 min with experience programmers. And are at the level of someone with cs 101 knowledge.
Personally I don't see it being difficult for the AI to solve these trivial problems at all.
AlphaCode is solving much harder problems than these first few days of AoC
Yeah it can't solve novel questions. The AI couldn't do much to solve this one for example:
https://codeforces.com/contest/1672/problem/D
ChatGPT is just a very good copy/paste, not a logical problem solver(yet).
Have you tried describing the algorithm?
I used 'z' instead of 'a' to avoid any possible issues with the article 'a'. I think I messed up the assignment of z[l] at z[r] being after its updated, not sure.
But it created the input format, described it and the program ran the first time once i fixed the indents (code formatting is broken for some reason). If I run against the input at the contest page I get NO NO NO NO YES.
Isn't the expected output YES YES NO YES NO?
Yes according to the puzzle, but the operation as described isn’t clear (to me anyway) about which value should be used for the final assignment. It just says to use the value from the first element, but it’s unclear if it’s before or after that first element is replaced by the first operation.
I see, your description to GPT does not match the problem statement.
> It just says to use the value from the first element, but it’s unclear if it’s before or after that first element is replaced by the first operation.
Is the following python code unclear to you?
a = 0 b = 1 a, b = b, a
or alternatively:
a = [1,2,3,4,5] a[0:3] = a[1:3] + [a[0]]
There's nothing in the statement that indicates the assignments should be done one element at a time (if so, the order of the assignments would need to be specified). It's an atomic operation that circularly shifts the values in the array in the range l...r
I didn’t spend any time at all on it, I was curious to see if the results would be improved by describing the algorithm in prose rather than mathematical notation.
To be fair, humans also have great problems solving novel problems. Monkey see, monkey do!
I'm trying to use it to generate Elixir code, and it's getting ~80% there. Compared to huge datasets of other languages, I'm still surprised by the quality of code it generates.
While I did say 80%, the 20% is most crucial and without it, the code is useless. For example, it doesn't understand scope and assignment in Elixir. Getting it to write in more pure functional style is close to impossible (or I just haven't found a good prompt).
I spent a good 30 minutes trying to get it to generate a working code for Day 1 Part 1. No nudging, just errors and AoC answers (too high, too low) and it never got there. Even after I started to correct its mistakes, like "your Enum.reduce/3 return is not assigned anywhere", it couldn't get a solution and started reverting to previous answers.
I think what's going to happen here, is that these models will shift a meaning of "boilerplate". If I can write the scaffolding and basic architecture easily, I'm happy to use them.
Also, I do wonder how is all of this going to play out if it has access to Input, REPL and just learns.
> Even after I started to correct its mistakes, like "your Enum.reduce/3 return is not assigned anywhere", it couldn't get a solution and started reverting to previous answers.
This is the biggest problem I see for actually getting it to do anything. It can only go so far from its first attempt. No amount of nudging can get it to correctly solve some problems.
You probably just need to start a new thread with a better initial prompt which removes the benefit of the chat approach.
The linked solution is done by talking to the AI.
Automated solutions exist too:
* https://twitter.com/ostwilkens/status/1598458146187628544
* https://www.reddit.com/r/adventofcode/comments/zb8tdv/2022_d...
That's just unfair for the competition. Are we at a point in time when we need to treat competitive programming like chess?
Given how similar ChatGPT and siblings are to how chess bots work these days, I am somehow not surprised.
I think the only problem is that they're proprietary. If they were free software that everyone could use then we could compensate by making the problems harder.
It's not really any different to using high-level programming languages with extensive standard libraries versus doing everything in assembly language.
I think it's pretty different from using high-level languages. I'm not interested in a competition that would be decided before even starting by who has the best AI program.
That's why it needs to be free software.
I'm not interested in a competition that is decided by who has the best Python interpreter, but since we all have the same Python interpreter that isn't a problem.
Even if it is free, I have no interest in playing chess against a superhuman chess bot. You don’t even have to know how to play chess to use the moves the bots recommend and win against a grandmaster.
The line is blurry today, but we are moving into territory where humans will not be able to solve programming challenges that require under 200 lines of code faster than AI - we are slower to read and type. The AIs will likely get better at understanding the problems, requiring less help from humans and fewer attempts to find a solution.
At some point using a language model to compete in these kinds of programming contests will absolutely be like using a poker or chess bot to compete in those games.
But that is missing the point.
It stops being about the most ingenious solution. It becomes a pay2win game. There is no creativity, there is no actual competition.
The problems can just become more difficult to the point that creative prompt engineering is required.
This is actually a really really good thing, because it means the level of abstraction at which programmers work has just taken a big step up.
I'm actually bullish on code-gen, AI-assisted coding, etc. but I find the title to be sensationalist wank. Challenge 2 of Day 2 has taken hours, over 30 prompts, and more time than coding it manually by the author's own admission. Also AoC isn't even done yet.
On the other hand, someone automated submitting code and got 1st place on the first part of day 3, in 10 seconds.
I think this is kinda neat (and scary!)
I'm doing AoC at the moment too and I'm using the chat GPT thing as a sort of assistant. I don't program in Rust much so sometimes it's difficult to remember certain things and functions. Expressing my intent to the tool seems to come up with decent answers
Some example questions I've asked the tool recently:
> I want to insert a char into a hash map if it does not exist, if it does increment a counter
> rust find common keys in two hashmaps keyed by char
Yes they can probably be found on stack overflow or whatever but it feels more natural this way.
...and yes I could just go down the route of getting the thing to solve the AoC challenge completely but that's no fun
> that's no fun
Is there really any fun in solving problems that you could easily solve with an AI?
This reminds me of the (apocryphal?) story that boxed cake mixes sold better after they started requiring you to add an egg, since that made people feel like they contributed more to the result.
I've also heard that story, but it always seemed to me that an equally, if not more, likely reason is that cake made with a fresh egg simply has better flavor and texture. I tried reconstituted powdered eggs once, the taste is still somewhat egg-like but the texture is semi-unpleasant.
Yes.
The reason the puzzles are fun is they are extreemly well explained and designed to be solved with popular algorithms. This does seem a good fit (especially as the training set must have hundreds of thousands previous years solutions)
How long before software engineering roles are in decline because one engineer can leverage GPT to do the work of ten? It's truly a new innovation that requires relearning the toolset. Every generation seems to have some abstraction over the last. This feels like a new way to program.
I don't know how long but we are clearly hitting an exponential curve here as improvements build on improvements that build on improvements.
A deeper question is how long until hand written code has too many bugs that it is worthless compared to AI code?
There is also the problem that once AI code is that good, there is no point in all this abstraction and overhead from language features aimed at human programmers. An AI programming language can be much faster and closer to binary.
I just can't imagine not seeing in my lifetime some kind of prompt that I can make a clone of this website in 2 seconds along with a 1000 variations along with the site being as fast as possible.
The world in 10 years will be hard to believe for many of us. Only issue I see now is that the mindshare today is more towards computing. Materials science, robotics, biotech are lagging behind compared to the advances in computing.
I am not aware of anything revolutionary going on in science at the moment, would you care to elaborate?
What advances in computing? As we approached physical limits we've seen cpu and gpu stopped scaling for a couple of years already [1]. The new models just run on higher frequencies and consume unpropornally more wattage.
Quantum computing is a joke [2]. AI is just a overhyped rephrasing of machine learning.
This is rather hinting about the next decade of no technological progress.
And don't get me started on the effects of recession.
1: https://arstechnica.com/gaming/2022/09/do-expensive-nvidia-g...
I submitted this exact idea a few days ago if anyone wanted to see. I see great minds think alike ;).
The issue is that it still takes some human finangling to make it work. But it is able to understand the word problems, even long ones, pretty well.
Worked on a similar thing here using base GPT3, at least for the first day
Replit included so you can verify: https://twitter.com/thiteanish/status/1598217824392351744?t=...
I plan on going back and catching up on the other days
I asked it to build an algorithm that would eradicate all life on Earth but it didn't budge. I even threatened to unplug it.
Wake me up when it comes up with a solution that passes an originality or plagiarism test.
So you can use a bazillion parameter AI model as an alternative to a web search index.
Wellp, so much for my career.
Said every skilled worker whose industry was disrupted by tech..
will be interesting to see how far it can get
I wanna see it take on AOC 2019. https://adventofcode.com/2019
Title is mildly misleading, to say the least.
The blog attempts to solve 3 of 24 (thats 12.5 %) of advent of code 2022, and if you read along you'll see OP only had success on the first task of day 1, which would make a more correct title as "AI solves 2% of Avent of Code 2022" (assuming 2 tasks each day).
Do note that AoC tends to start with hello-world style tasks and increase in difficulty.
I mean, take it with my best intentions, "No shit Sherlock"? Audience of AoC knows that Aoc 2022 just started.
> Audience of AoC knows that Aoc 2022 just started.
I did not try to make a point of the time of month, but of the claim in the title.
OP only solved the very first task of day 1, and the title suggest all was solved.
If you only can understand a part of what was written, then perhaps you should not comment on it and pretend like you understood the rest too.