AI solves Advent of Code 2022

note89.github.io

157 points by waitforit 3 years ago · 81 comments

Reader

This day I asked it not too fundamental questions about Clojure and it was able to provide impressive, accurate answers and provide correct code examples. However if you continue the dialogue and ask it to do more advanced stuff, it will just make up stuff out of thin air. For instance it will use functions that don't exist and claim that they can be imported from packages that don't exist or don't have them. Once you point out these mistakes, it will admit them and come up with different changes which can be even worse, but sometimes also be better and save the whole thing. Overall I'm not sure how useful this will turn out, given that its not reliable. It may be useful to get some initial intuitions and informations (non specific stuff it usually gets right), but it can also mislead you badly. I asked it, how it makes these mistakes only to understand them and admit them once I point them out. It has no answer beyond the usual "I'm a language model". It also told me that it is capable of logical inference, but denied that the next day. Then it told me that its answers would always be consistent, which is a lie. The whole thing is really weird, because its somewhat very smart and capable and incredibly stupid and dishonest at the same time.

neilk 3 years ago

That’s what I’m seeing too. I had a problem with some Hashicorp Packer scripts and posed it to ChatGPT. It did have an idea of the shape of the problem. To solve it the bot just hallucinated syntax. It spoke with great authority that this was the solution and provided a beautifully syntax-colored excerpt of something that wouldn’t have even compiled.
This was perhaps a very hard problem for an LLM, as the Packer tool’s nature is to manage layers of context. Environment variables passed through templates then passed to scripts which themselves might be in other frameworks. So in this case it to be confused about what was Ansible syntax and what was Packer.
So the bot seems to have different failure modes than humans. Distinguishing context layers seems to be a weak point. And an answer that is a wild guess looks as authoritative as a solid answer. But it’s still extremely impressive.
gnyman 3 years ago

I think it's just not great at Clojure, it's less popular and so there is less of it in its training data. Also Clojure seems to be kind of hard to get right.
I started trying to learn Clojure with this years advent of code, but got stuck and first tried to use ChatGPT to solve it. My impression matches yours in that it consistently produced non working code and even when told about the error it was unable to fix it.
Then I instead decided to let it use any tool or language it knows, and I'm now documenting how it's doing solving the things.
If you're interested, here is Day1 where I first tried to use it to help me solve it with clojure, but then I gave up and asked for any concise solution. So I got a working solution with `awk` https://blog.nyman.re/2022/12/02/chatgpt-does-advent.html
The second day I just let it pick anything, and it successfully solved the day2 puzzle using python which seems to be it's go-to language. https://blog.nyman.re/2022/12/03/chatgpt-does-advent.html
Euphorbium 3 years ago

This is how I felt speaking to some people in India. They could speak, but there was zero understanding, as evidenced by their actions. Personally when learning languages I develop ability to understand years before considering myself to be able to speak it, but it is clear that not everybody does that.
jffhn 3 years ago

>The whole thing is really weird, because it's somewhat very smart and capable and incredibly stupid and dishonest at the same time.
Not so weird: this judgment could apply to a lot of humans and to whole fields of human activities, if not to the essence of life itself.
In particular it reminds me of a con man every expert could see through, but that mesmerized management with his buzzwords talk, causing an exodus of competent people and high turnover for a few years, and most likely many millions in damage.
With advanced enough AIs handling full remote jobs, this could be done on steroids, getting you a lot of income while wreaking havoc in the companies.
- melagonster 3 years ago
  
  guess it actually found how human work.
  - jffhn 3 years ago
    
    Rather just how superficial and stubborn imitation and arguing work.
californiadreem 3 years ago

Some men see things as they are and say why, I dream things that never were and say, why not?
theptip 3 years ago

Would be interesting to see if meaningful refinement training could be done by hooking the model up to a language interpreter/compiler. So the model can learn for itself what is valid output.
cl0ckt0wer 3 years ago

How much could you charge for packages that the AI says should exist?

FiberBundle 3 years ago

I think it must have seen the solution somewhere already on the web. I find it extremely hard to believe that such a general purpose chatbot would just be able to solve programming problems. Deepmind had a paper [1] on solving programming problems a couple of months ago and they had to apply quite specialized heuristics in order to solve these problems. Obviously ChatGPT does nothing of the sorts and it just seems extremely unrealistic that it would be capable of outperforming previous work like that.

[1] https://news.ycombinator.com/item?id=30179549

Waterluvian 3 years ago

I’ve had this skepticism until I started asking the bot to do things like “create a limerick using some words that have never existed before and then explain the definition of those words.”
I don’t believe the AI is generalized by any means. But I think it’s more than a Eliza-style parlour trick.
It does really well with facts but misses subjectivity and nuance. Like I ask it to write stories for Seinfeld or Raymond and it misses the tone and style entirely. But it gets all the names and other factual details right. I find it works incredibly well when you want things that are objective, like code or rhymes or comparisons of things.
- hansbo 3 years ago
  
  It's really quite amazing. Inspired by your example, I tried to make it create something which nobody could have thought about before:
  Write a deep poem about life, death, and the website Rotten Tomatoes
  The result:
  Life is a fleeting flame
  A dance in the endless game
  We come and we go
  Leaving our mark, but never to know
  What the future holds or who's to blame
  Death is the great unknown
  A mystery, never to be shown
  But Rotten Tomatoes, a website so bold
  Attempts to rate our stories untold
  As if life and death can be bought and sold.
  But in the end, what does it matter?
  The ratings, the reviews, the bitter chatter
  For in the grand scheme of things
  We are but fleeting moments, the faintest of flings
  And Rotten Tomatoes, a mere blip on the radar.
- tasuki 3 years ago
  
  > I’ve had this skepticism until I started asking the bot to do things like “create a limerick using some words that have never existed before and then explain the definition of those words.”
  Sounds interesting! Mind sharing the results?
  - Waterluvian 3 years ago
    
    I’m not saying they’re good. Just that they’re (mostly) following the rules and aren’t just thematically random.
    https://ibb.co/RpSxcV8 https://ibb.co/883GpHY
    
    tasuki 3 years ago
    
    Perhaps they aren't good, but they aren't that bad either. Probably better than I'd manage...
    
    jwuphysics 3 years ago
    
    That first one is pretty good.
- FiberBundle 3 years ago
  
  What does any of this have to do with solving programming problems?
- tartoran 3 years ago
  
  Though it may look amazing it is not really thinking. It is simply capable of using a placeholder word. For HN’s sake AI does not understand anything, it’s just leveraging patterns, patterns that are sometimes complicated for us and that does make a great tool. But it’s just a tool for now.
  - Waterluvian 3 years ago
    
    We are all philosophical zombies. Let’s not single out the software. ;)
  - motoxpro 3 years ago
    
    Take a look at the sibling. Maybe be less than understanding but it’s a lot more than a placeholder.
  - baandam 3 years ago
    
    AlphaZero and Stockfish are not really thinking either when it comes to chess.
    I just don't understand how a thinking human can not see that this is basically irrelevant.
- aew4ytasghe5 3 years ago
  
  > things that are objective, like code or rhymes
  rappers disagree
  - Waterluvian 3 years ago
    
    Oh my yes, for sure. I mean the literary rules of a rhyme or a limerick. Rules that artists can, do, and should break for effect.
MatthiasPortzel 3 years ago

The top leaderboard spots for AoC day 2 were taken by people who passed the response directly into GTP3. https://twitter.com/max_sixty/status/1598924237947154433
The AoC challenges this early aren't difficult, but they have several steps and are significantly more challenging than something you would be able to find as a Stackoverflow answer.
ZiiS 3 years ago

This was the first solution, submitted very quickly after the problem was published.
- gregwebs 3 years ago
  
  For one problem. For the rest of the problems it has been very challenging to get the AI to write the correct solution. Still an impressive result that with specification, testing, and feedback the AI can come up with the correct result in the end.
pps 3 years ago

Does it have access to the web? For one of my questions it answered:
"Unfortunately, I am unable to provide a detailed description of the education system in Poland and its changes over the last 30 years because I have limited access to information and cannot browse the internet."
but I have no idea if it's not lying :)
- swid 3 years ago
  
  Some posts yesterday showed that network is usually disabled, but with the right prompt, you can enable network and someone got it to like a Twitter post.
  - OctopusLupid 3 years ago
    
    I think the “liking Twitter post” part was just a coincidental joke.
    
    iudqnolq 3 years ago
    
    Correct. He also claimed the chatbot signed in as Grime's Twitter account to do the action, which is obviously implausible. If you look in the replies you'll see him clarify it was a joke.
    
    swid 3 years ago
    
    Thanks for the correction, guess I did not read that post close enough.
  - pps 3 years ago
    
    I would like to see how to achieve that. I tried asking it to translate the first paragraph of a site, but it provided more of an interpretation or summary of the article, rather than the actual text. When asked to copy the text from the website, it said that it couldn't. Additionally, when asked to provide a summary of a website that didn't exist before (created it myself yesterday), it gave a summary that was completely fictional, based on its interpretation of the URL.
logicallee 3 years ago

You can check this by taking some time to create some specific weird puzzle that is unlikely to have been made in the format you come up with, then see if it can solve it. If you don't write it anywhere then it is being solved for the first time. Just make sure it is a pretty unusual puzzle.
12345hn6789 3 years ago

These first few advent problems are extremely trivial. Solvable <1 min with experience programmers. And are at the level of someone with cs 101 knowledge.
Personally I don't see it being difficult for the AI to solve these trivial problems at all.
joaogui1 3 years ago

AlphaCode is solving much harder problems than these first few days of AoC
frontman1988 3 years ago

Yeah it can't solve novel questions. The AI couldn't do much to solve this one for example:
https://codeforces.com/contest/1672/problem/D
ChatGPT is just a very good copy/paste, not a logical problem solver(yet).
- jcims 3 years ago
  
  Have you tried describing the algorithm?
  https://imgur.com/a/7da1vFj
  I used 'z' instead of 'a' to avoid any possible issues with the article 'a'. I think I messed up the assignment of z[l] at z[r] being after its updated, not sure.
  But it created the input format, described it and the program ran the first time once i fixed the indents (code formatting is broken for some reason). If I run against the input at the contest page I get NO NO NO NO YES.
  - pedrosorio 3 years ago
    
    Isn't the expected output YES YES NO YES NO?
    
    jcims 3 years ago
    
    Yes according to the puzzle, but the operation as described isn’t clear (to me anyway) about which value should be used for the final assignment. It just says to use the value from the first element, but it’s unclear if it’s before or after that first element is replaced by the first operation.
    
    pedrosorio 3 years ago
    
    I see, your description to GPT does not match the problem statement.
    > It just says to use the value from the first element, but it’s unclear if it’s before or after that first element is replaced by the first operation.
    Is the following python code unclear to you?
    a = 0 b = 1 a, b = b, a
    or alternatively:
    a = [1,2,3,4,5] a[0:3] = a[1:3] + [a[0]]
    There's nothing in the statement that indicates the assignments should be done one element at a time (if so, the order of the assignments would need to be specified). It's an atomic operation that circularly shifts the values in the array in the range l...r
    
    jcims 3 years ago
    
    I didn’t spend any time at all on it, I was curious to see if the results would be improved by describing the algorithm in prose rather than mathematical notation.
- dislikedtom2 3 years ago
  
  To be fair, humans also have great problems solving novel problems. Monkey see, monkey do!

klohto 3 years ago

I'm trying to use it to generate Elixir code, and it's getting ~80% there. Compared to huge datasets of other languages, I'm still surprised by the quality of code it generates.

While I did say 80%, the 20% is most crucial and without it, the code is useless. For example, it doesn't understand scope and assignment in Elixir. Getting it to write in more pure functional style is close to impossible (or I just haven't found a good prompt).

I spent a good 30 minutes trying to get it to generate a working code for Day 1 Part 1. No nudging, just errors and AoC answers (too high, too low) and it never got there. Even after I started to correct its mistakes, like "your Enum.reduce/3 return is not assigned anywhere", it couldn't get a solution and started reverting to previous answers.

I think what's going to happen here, is that these models will shift a meaning of "boilerplate". If I can write the scaffolding and basic architecture easily, I'm happy to use them.

Also, I do wonder how is all of this going to play out if it has access to Input, REPL and just learns.

thethirdone 3 years ago

> Even after I started to correct its mistakes, like "your Enum.reduce/3 return is not assigned anywhere", it couldn't get a solution and started reverting to previous answers.
This is the biggest problem I see for actually getting it to do anything. It can only go so far from its first attempt. No amount of nudging can get it to correctly solve some problems.
You probably just need to start a new thread with a better initial prompt which removes the benefit of the chat approach.

waitforitOP 3 years ago

The linked solution is done by talking to the AI.

Automated solutions exist too:

* https://twitter.com/ostwilkens/status/1598458146187628544

* https://www.reddit.com/r/adventofcode/comments/zb8tdv/2022_d...

* https://twitter.com/max_sixty/status/1598924237947154433

PartiallyTyped 3 years ago

That's just unfair for the competition. Are we at a point in time when we need to treat competitive programming like chess?
Given how similar ChatGPT and siblings are to how chess bots work these days, I am somehow not surprised.
- jstanley 3 years ago
  
  I think the only problem is that they're proprietary. If they were free software that everyone could use then we could compensate by making the problems harder.
  It's not really any different to using high-level programming languages with extensive standard libraries versus doing everything in assembly language.
  - monsieurbanana 3 years ago
    
    I think it's pretty different from using high-level languages. I'm not interested in a competition that would be decided before even starting by who has the best AI program.
    
    jstanley 3 years ago
    
    That's why it needs to be free software.
    I'm not interested in a competition that is decided by who has the best Python interpreter, but since we all have the same Python interpreter that isn't a problem.
    
    swid 3 years ago
    
    Even if it is free, I have no interest in playing chess against a superhuman chess bot. You don’t even have to know how to play chess to use the moves the bots recommend and win against a grandmaster.
    The line is blurry today, but we are moving into territory where humans will not be able to solve programming challenges that require under 200 lines of code faster than AI - we are slower to read and type. The AIs will likely get better at understanding the problems, requiring less help from humans and fewer attempts to find a solution.
    At some point using a language model to compete in these kinds of programming contests will absolutely be like using a poker or chess bot to compete in those games.
    
    PartiallyTyped 3 years ago
    
    But that is missing the point.
    It stops being about the most ingenious solution. It becomes a pay2win game. There is no creativity, there is no actual competition.
    
    jstanley 3 years ago
    
    The problems can just become more difficult to the point that creative prompt engineering is required.
    This is actually a really really good thing, because it means the level of abstraction at which programmers work has just taken a big step up.

arcturus17 3 years ago

I'm actually bullish on code-gen, AI-assisted coding, etc. but I find the title to be sensationalist wank. Challenge 2 of Day 2 has taken hours, over 30 prompts, and more time than coding it manually by the author's own admission. Also AoC isn't even done yet.

pedrosorio 3 years ago

On the other hand, someone automated submitting code and got 1st place on the first part of day 3, in 10 seconds.
https://twitter.com/ostwilkens/status/1599026699999404033

djhworld 3 years ago

I think this is kinda neat (and scary!)

I'm doing AoC at the moment too and I'm using the chat GPT thing as a sort of assistant. I don't program in Rust much so sometimes it's difficult to remember certain things and functions. Expressing my intent to the tool seems to come up with decent answers

Some example questions I've asked the tool recently:

> I want to insert a char into a hash map if it does not exist, if it does increment a counter

> rust find common keys in two hashmaps keyed by char

Yes they can probably be found on stack overflow or whatever but it feels more natural this way.

...and yes I could just go down the route of getting the thing to solve the AoC challenge completely but that's no fun

sh4rks 3 years ago

> that's no fun
Is there really any fun in solving problems that you could easily solve with an AI?
- throwup 3 years ago
  
  This reminds me of the (apocryphal?) story that boxed cake mixes sold better after they started requiring you to add an egg, since that made people feel like they contributed more to the result.
  - qzw 3 years ago
    
    I've also heard that story, but it always seemed to me that an equally, if not more, likely reason is that cake made with a fresh egg simply has better flavor and texture. I tried reconstituted powdered eggs once, the taste is still somewhat egg-like but the texture is semi-unpleasant.
- kill_nate_kill 3 years ago
  
  Yes.

ZiiS 3 years ago

The reason the puzzles are fun is they are extreemly well explained and designed to be solved with popular algorithms. This does seem a good fit (especially as the training set must have hundreds of thousands previous years solutions)

asim 3 years ago

How long before software engineering roles are in decline because one engineer can leverage GPT to do the work of ten? It's truly a new innovation that requires relearning the toolset. Every generation seems to have some abstraction over the last. This feels like a new way to program.

baandam 3 years ago

I don't know how long but we are clearly hitting an exponential curve here as improvements build on improvements that build on improvements.
A deeper question is how long until hand written code has too many bugs that it is worthless compared to AI code?
There is also the problem that once AI code is that good, there is no point in all this abstraction and overhead from language features aimed at human programmers. An AI programming language can be much faster and closer to binary.
I just can't imagine not seeing in my lifetime some kind of prompt that I can make a clone of this website in 2 seconds along with a 1000 variations along with the site being as fast as possible.

satvikchoudhary 3 years ago

The world in 10 years will be hard to believe for many of us. Only issue I see now is that the mindshare today is more towards computing. Materials science, robotics, biotech are lagging behind compared to the advances in computing.

aew4ytasghe5 3 years ago

I am not aware of anything revolutionary going on in science at the moment, would you care to elaborate?
What advances in computing? As we approached physical limits we've seen cpu and gpu stopped scaling for a couple of years already [1]. The new models just run on higher frequencies and consume unpropornally more wattage.
Quantum computing is a joke [2]. AI is just a overhyped rephrasing of machine learning.
This is rather hinting about the next decade of no technological progress.
And don't get me started on the effects of recession.
1: https://arstechnica.com/gaming/2022/09/do-expensive-nvidia-g...
2: https://www.youtube.com/watch?v=b-aGIvUomTA

satvikpendem 3 years ago

I submitted this exact idea a few days ago if anyone wanted to see. I see great minds think alike ;).

The issue is that it still takes some human finangling to make it work. But it is able to understand the word problems, even long ones, pretty well.

https://news.ycombinator.com/item?id=33821092

aquajet 3 years ago

Worked on a similar thing here using base GPT3, at least for the first day

Replit included so you can verify: https://twitter.com/thiteanish/status/1598217824392351744?t=...

I plan on going back and catching up on the other days

skilled 3 years ago

I asked it to build an algorithm that would eradicate all life on Earth but it didn't budge. I even threatened to unplug it.

LastTrain 3 years ago

Wake me up when it comes up with a solution that passes an originality or plagiarism test.

NovemberWhiskey 3 years ago

So you can use a bazillion parameter AI model as an alternative to a web search index.

bitwize 3 years ago

Wellp, so much for my career.

cloudripper 3 years ago

Said every skilled worker whose industry was disrupted by tech..

TheRealNGenius 3 years ago

will be interesting to see how far it can get

shagie 3 years ago

I wanna see it take on AOC 2019. https://adventofcode.com/2019

aew4ytasghe5 3 years ago

Title is mildly misleading, to say the least.

The blog attempts to solve 3 of 24 (thats 12.5 %) of advent of code 2022, and if you read along you'll see OP only had success on the first task of day 1, which would make a more correct title as "AI solves 2% of Avent of Code 2022" (assuming 2 tasks each day).

Do note that AoC tends to start with hello-world style tasks and increase in difficulty.

klohto 3 years ago

I mean, take it with my best intentions, "No shit Sherlock"? Audience of AoC knows that Aoc 2022 just started.
- aew4ytasghe5 3 years ago
  
  > Audience of AoC knows that Aoc 2022 just started.
  I did not try to make a point of the time of month, but of the claim in the title.
  OP only solved the very first task of day 1, and the title suggest all was solved.
  If you only can understand a part of what was written, then perhaps you should not comment on it and pretend like you understood the rest too.

Settings

AI solves Advent of Code 2022

Keyboard Shortcuts