Are LLMs able to play the card game Set?

43 points by vnglst a year ago · 33 comments

Reader

vnglstOP a year ago

Set is a card game where players have to identify sets of three cards from a layout of 12. Each card features a combination of four attributes: shape, color, number, and shading. A valid set consists of three cards where each attribute is either the same on all three cards or different on each. The goal is to find such sets quickly and accurately.

Though this game is a solved computer problem — easily tackled by algorithms or deep learning — I thought it would be interesting to see if Large Language Models (LLMs) could figure it out.

oidar 10 months ago

If you think this is fun, try to see how it garbles predicate logic.

Corence 10 months ago

FYI: Card 8's transcription is different than the image. In the image 5, 8, 12 is a Set but the transcription says Card 8 only has 2 symbols which removes that Set.

nathanwh 10 months ago

Not only that, but 2,6,7 is also a set but not included in the results
vnglstOP 10 months ago

Oh no, thanks for pointing this out! I asked GTP-4o to convert the image to text for me and I only checked some of the cards, assuming the rest would be correct. That was a mistake.
I've now corrected the experiment to accurately take the image into account. This meant that Deepseek was no longer able to find all the sets, but o3-mini still did a good job.
yuliyp 10 months ago

Both 7 and 8 are incorrect (both claim a count of 2 while the cards have 3). This leads to missing both 5-8-12 and 2-6-7 as valid sets.

RheingoldRiver 10 months ago

Woah, what's going on?? I've always played Set with stripey cards, is this a custom deck or did they change it at some point???

This is wildly disconcerting to me

margalabargala 10 months ago

This is definitely a custom/knock-off deck. Not only are the stripes not stripey, the capsules are now ovals and the diamonds are now rectangles.
- Doxin 10 months ago
  
  My first party set deck looks exactly like that. They must've done a redesign at some point.
  - zvr 10 months ago
    
    Same here. These are the shapes and fill patterns of the edition by Ravensburger, which is the one usually found in Europe.
  - fgeiger 10 months ago
    
    This is exactly what they always looked when I played (here in Europe). Is it maybe a regional thing?

bhouston 10 months ago

I noticed that LLM at least at the Claude and OpenAI 4o level can not play tic tac toe and win against a competent opponent. They make illogical moves.

Interestingly, they can write a piece of code to solve Tic Tac Toe perfectly without breaking a sweat.

levocardia 10 months ago

I've always said that appending "use python" to your prompt is a magic phrase that makes 4o amazingly powerful across a wide range of tasks. I have a whole slew of things in my memories that nudge it to use python when dealing with anything even remotely algorithmic, numeric, etc
3vidence 10 months ago

Playing tic tac toe could be such a basic topic that there is relatively little information on the internet about how to "always" win.
On the other hand writing a piece of code to solve Tic Tac Toe sounds like it could be a relatively common coding challenge.
- eek2121 10 months ago
  
  Win or stalemate? because a stalemate is the likely scenario against a somewhat competent opponent IMO.
  - 3vidence 10 months ago
    
    Ahh good correction, I meant the winning strategy is to force a stalemate.
    
    bhouston 10 months ago
    
    In all of my tests Claude nor 4o can even get to a stalemate, they just make incorrect moves.
potatoman22 10 months ago

It might be the way you're formatting the input. I wonder how they perform when state updates are shared via natural language vs ASCII art vs image
- bhouston 10 months ago
  
  I tried a bunch of different ways. It wasn’t the prompt or input format.

spuz 10 months ago

Since you can train an LLM to play chess from scratch, I would not be surprised if you could also train one to play Set. I might experiment with it tomorrow.

https://adamkarvonen.github.io/machine_learning/2024/01/03/c...

zdw 10 months ago

Get them to play Fluxx, and we'll be talking...

(this one, where ever changing rules is part of the game: https://www.looneylabs.com/games/fluxx )

James_K 10 months ago

I am increasingly concerned that these new reasoning models are thinking.

Waterluvian 10 months ago

I still think that we’re at much greater risk of discovering that human thinking is much less magical, than we are of making a machine that does magical thinking.
recursive 10 months ago

No problem. Just redefine "thinking".
- James_K 10 months ago
  
  To what? And how does that change the reality of what the models are doing?
  - svachalek 10 months ago
    
    I believe GP is being sarcastic, as this seems to be a common reaction. Every time a machine accomplishes something that seems to be intelligent, we redefine intelligence to exclude that thing, and now the issue is fixed: computers are not intelligent and they cannot think.
    Never mind that they can beat the entire world at chess and Go, and 90+% of the population at math, engineering, and physics problems. Those things do not require intelligence or thinking.
hall0ween 10 months ago

My experience of thinking is that it is a constant phenomenon. My experience of LLMs is that they only respond and are not running without input.
- sejje 10 months ago
  
  That's because we don't leave them running, right? We could, though, yes?
  - Baeocystin 10 months ago
    
    Current LLMs decohere rapidly, and as far as I am aware, this problem seems like a fundamental to the architecture one.
    
    MengerSponge 10 months ago
    
    Yup.
    Tangentially related: https://qntm.org/mmacevedo
- James_K 10 months ago
  
  Well there is a gap between the firing of individual neurons in your mind. How long would that gap need to be for it not to count as thinking anymore?

Settings

Are LLMs able to play the card game Set?

Keyboard Shortcuts