Guess the daily Wordle in one try using the tweet distribution
kaggle.comWaiting for a bot that guesses based on Google Trends:
* Wordle 218: https://i.imgur.com/PbYfLm6.jpg * Wordle 221: https://i.imgur.com/pTPbquL.jpg
We'll need to do Wordle-SEO in 2022 ;)
But I hope the game is increasing people's literacy... They should make a version for SAT words.
My favourite part of this is accounting for some fake grids.
Kudos! I have been so curious lately as to whether this was possible.
EDIT: The next question is which (if any) of these signals can be removed and still get it in 1 guess. Or if there are any other signals. Or how many tweets are needed (is 50 enough? 10? or 1000? 10k?)
This uses ~2-3k tweets per day for most days, which seems to be more than enough. According to https://twitter.com/WordleStats/status/1486021209015963649 there's about 250k daily tweets per Wordle right now, so this is about a 1% sample coming from whatever the Twitter search API returned when I ran that query.
The simulated distributions it's comparing to are based on 1000 runs per 5-letter word.
Anecdotally, 250 was enough to get it working for those simulated distributions, 100 and below it became increasingly noisier. A higher N would be nice, but I didn't spend more time optimizing the performance for the simulation code beyond what was needed to get this working.
This is a cool project, but I wanted to tell you that your evaluate_guess function is wrong.
Many people misunderstand this but it's not how the rules actually work. Correct here would be MYNYN, because there is only one E in the correct answer. There must be a 1-1 correspondence between any 'M' letter in the guess and the letter in the answer. This is similar to the rules for the game "Mastermind".evaluate_guess(answer="crest", guess="erase") "MYNYM"Right, I wonder how many of the “fake/invalid” tweets that OP observed are actually this bug in the analysis code.
EDIT: actually it looks like it’s correct - evaluate_guess_char() only returns “M” if there’s an instance of the guess letter that’s not accounted for.
It's not correct, I pasted the code from the article directly into ipython.
It filters out cases where the corresponding character in the answer is correct (a 'Y'), but not cases where it's used in another maybe (a 'M'). The latter requires keeping track of state in a way that this doesn't.
For example:
Which is wrong, as stated above.evaluate_guess(answer="crest", guess="erase") 'MYNYM'
Which is right, even though we only changed the middle letter of the guess, not either of the broken letters. In this case the filtering works correctly.evaluate_guess(answer="crest", guess="erese") 'NYYYN'
If you want to get even more tweets, you could use twitter's streaming API with the keyword "Wordle": http://adilmoujahid.com/posts/2014/07/twitter-analytics/
It should allow capturing a significant fraction of the 250k daily wordle tweets.
Besides eliminating the superficially-impossible rows (like `YYYYM`), does it do anything against more-sophisticated chaffing, like one or more accounts posting possible-but-inaccurate hint grids pointing at an alternate answer?
As the article explains, a grid containing, say, GGYGG, is fake. Finding more complex fakery is more difficult.
(Edit: drat, HN filters out the Unicode colored-block characters).
Two or three guesses with Wordle using the ETAOIN SHRDLU I learned doing cryptopals has been very effective at reaching a solution.
I usually have a first guess like SAINT then something like SCARE, CORED, etc eliminating vowels and frequent constants while also considering the most likely sequencing of matched characters or remaining characters.
Also eliminating S, T, C really reveals there’s no TH, SH, SP, CK, etc and is one factor that gets me suspicious of repeated chars or rarer k, g and x combos.
SPOILER ALERT: shows today's answer!
Sorry about that! Just updated so today's guess is hidden by default (and you can click-to-unveil)
Thank you!
very nice. Thanks
Or take all the fun away and just get it through browser console ¯\_(ツ)_/¯
JSON.parse(localStorage.gameState).solutionOr you can look at the source and see the list of words. They're sequential, so you know every future word. The "Wordle 222" is actually just the 222 index into the array.
That's a real bummer. Is there any way the author can prevent this? Can he generate random indexes that don't repeat while keeping this random number generator code public?
The only way to address this would be validating the answers server-side. Any information you leak in the locally executed code can be discovered without much effort.
I think Wordle doesn't server-side validate because of volume, and also because it's a fun little game and cheating brings you nothing of value.
Wait, did Wordle start counting from 0?
This is brilliant and something I had the intuition was possible, just couldn't put it all together myself. What was missing, I think, in my thought process was just taking into account the general common occurrence of words in English in general. Plus how to deal with static.
Just so cool someone put this together, major props.
Very cool!
One minor improvement here; if the user has toggled colorblind mode on, then their tweeted result will also have altered color blocks. Orange for right letter right place, and blue for right letter wrong place.
That's a really neat attention to detail! I haven't seen the colourblind boxes.
My metagame is guessing my friend's guesses.
I always lead with STOAE so people don't have issues guessing my first one at least. I also tend to follow with UNLID if STOAE has zero or one hits.
ARTSY MODEL CHUNK here. It's not optimal, but it is pleasing to me in an aesthetic sense.
I vary up my starting words to keep things interesting.
Common ones for me are: MEATY, BISON, CHUMP, GROUP
From yesterday's post on the state of the art, I tried SALET, but still took me 4 tries to get today's wordle.
If you stick with one starting word, and that word is in the set of possible Wordle answers, then some day, one day, if you keep playing forever, you are guaranteed to get that magical 1/6.
This is not quite the same as the lottery-player's fear that they change their lucky numbers and then those numbers come up the next week... the lottery has no memory, so it really doesn't change your odds when you change your numbers. But Wordle's drawing words from a finite pool.
Of course, if your go-to starting word is NOT in the set (looking at YOU overly optimized people who play crazy words like STOAE that are almost certainly not in the answer set...) then by sticking with that you're guaranteeing you'll never do better than 2/6...
I turned on hard mode, and it's really forced me to change how I play. You can only really have one "starting word" unless you match zero letters.
After seeing an asterisk in one of my friends' shares, I'm now forced to play on hard mode as well. I can't risk the peer-shame of skating on easy mode anymore :)
So that being said, I'm bracing myself for the curses-of-early-success this will lead me on. Right now I sometimes toss out a completely different word just to cover the search space. It has led me to quickly narrow down options. Am I screwed if my first guess matches on 2 letters? (Say, "___ES").
I guess that's why it's called "Hard Mode" to begin with.
I've had fun with MUSTH.
STEAM HOUND has been a winner for me.
STARE, CHIMP, BLOND or BOUND depending on how many vowels I hit. Sometimes FLUNK.
Given the situation we are in I start with VIRUS, PEACH always
SALTY URINE
AROSE for me. 5 of the top 6 most-frequent letters. But hard mode, so next depends.
Yep—AROSE and then CLINT for me.
Glad to hear I am not the only one squandering my time doing this :) Might be a fun program to write. I find it’s often hard to guess more than the line prior to the win.
I like that it's robust to adversarial tweets!
I did something similar last week using the Twitter Stream API: https://github.com/basile-henry/twitter-wordle
It's not resistant to adversarial tweets, but it usually collects enough tweets to have an answer in around 1 minute, so it's not too bad to restart if some bad tweets were sampled.
Maybe I should try to use your wordle-tweets dataset to make it work offline as well. :)
This is a really cool approach, definitely did not think of trying this! If you'd prefer to play without the crowdsourced data, I spent a couple hours on the following dictionary search algo yesterday which can typically solve puzzles in 3-4 guesses: https://github.com/rgkimball/wordlebot
nice! i did similar, but used character frequencies in the remaining word sets to rank: https://github.com/keredson/wordle_solver
I tried yours out, nice work yourself! Seems we took a similar approach in recalculating the letter distributions based on remaining words - both our algos solved it in 4 turns today.
If I may make two small suggestions as a user, I noticed you have a dictionary with nearly 13k words which often results in invalid suggestions like 'clery' and 'meryl'. In testing I found the Scrabble dictionary to be much more likely to yield valid Wordle words (found here: https://github.com/redbo/scrabble), though the official Wordle answers tend to be an even smaller set of ~2,500 common words.
Second, though the implementation is very clean in code (much more concise than mine!), I found the use of the green/gray/yellow methods to be a bit cumbersome when adding constraints. You could wrap these three in a method like guess(word, reply) where your response encodes the feedback as something like [g]=green, [b]=black, [y]=yellow:
Given: [('arose', 27122), ('aeros', 27122), ('seria', 27095), ('riesa', 27095)]
>>> w.guess('arose', 'bybby')
vs.
>>> w.gray('aos') >>> w.yellow('r', 2) >>> w.yellow('e', 5)
You could even have the guess method trigger a new round of suggestions since the response implies that we've advanced a turn.
Hperwordle works for me It defines the usable letters right on your keyboard.
Hyperwordle defined the usable letters right on your keyboard. Thanks!
Cool but don't read all the way if you haven't done today's Wordle!
Sorry about that! Just updated so today's guess is hidden by default (and you can click-to-unveil)
This is the HN I’m here for. Brilliant.
This is super smart. I wonder how many tweets this approach needs each day to converge to the correct answer? It would be interesting to see some plots vs. num tweets
This is your regular reminder that today's word, and all the upcoming ones, are located in the Wordle minified JS.
Yes, we know. We can also easily "solve" a crossword puzzle by waiting a day and just copying down the published answers.
People are having fun solving puzzles in clever ways. This post is an exceptionally clever way of solving a puzzle in an unexpected way, using forensic data analysis, which is itself something of interest to a lot of us.
“The maximum amount of time I take to complete any given crossword puzzle is one day.”
I know that, I'm just pointing out another clever way to solve today's puzzle :)
The point is it's not clever at all.
It's too easy to cheat at Wordle, even if you don't know what html/javascript is. Just open a new browser, solve it there, and enter the solution in your main browser.
In the age of intrusive anti-cheat software and byzantine security measures, the fact that Wordle doesn't attempt to prevent cheating is something I find weirdly charming.
Shameless plug, but I built a site to do this the other way around.
Enter a 5 letter word and it'll tell you the next the it will be the wordle solution.
Yep, you can try https://wordhoot.com if knowing the answers are accessible somehow decreases your enjoyment of the game.
You can also just run a command in the console to get the answer. But where is the fun in that?
I wonder what percentage of these Twitter posts are fake (“OMG so lucky LOL”).
If you know someone always starts with SLATE it makes it easier...
I start with salet. Same letters but more likely to be in the right place.
I use (not today's puzzle) cat /usr/share/dict/words | grep -v 'w' | grep -v 'a' | grep 't' | egrep '^...er$'
What are the "⬛ squares taking social media by storm"?
That was my question and it's still not answered well here. I guess people post the colors that led up to their solution but no the actual letters.
Might be good to add it to the original post for clarification. I play Wordle but didn't quite get what they were using for source data.
It wasn't obvious to me when I first started playing Wordle, but you can actually share an emoji-fied version of your game (without the letters) by clicking "Share" when the statistics window pops up. I didn't think to do that at first, but when I noticed everyone on social media posting their Wordles with the exact same format, I figured it had to be buried in the game somewhere.
It's the text you copy/paste to share your Wordle results on social media.
Huh, I thought HN stripped out emojis from posts. Has that changed, or is there a limited subset that are available?
The black and white squares are in the older Unicode "Miscellaneous Symbols and Arrrows" block so I guess they're allowed. Several things like that are sort of "retroactively" emoji... there's a "display as emoji" or "display as text" character you can put after them.
What HN does and doesn't allow seems somewhat arbitrary, things like the star emoji are in that same block and yet are not allowed as far as I can tell.
Maybe because ■ is part if ASCII?
Edit: ⬛⬜ work, the colored ones don't.
⬛ and ■ are different characters. But it looks like HN still strips most emoji.
It is definitely not part of ASCII.
> Note that all of these 243 possibilities aren't valid in practice. For example YYYYM will never be seen because if the first four letters are correctly placed and the fifth is also in the word, it will be correctly placed.
Not true. For example if the correct answer is TWEED and you guess TWEET, then you’ll get YYYYM.
Edit: As pointed out by two commenters, the actual implementation contradicts the following claim in the post:
> “Maybe” - the letter is in the answer but in a different position
If the correct answer is TWEED and you gess TWEET, you will still get YYYYN, because the actual implementation uses a different definition of “Maybe” than what is written in the post.
I believe in the actual implementation it's correct. You can confirm a letter is not doubled, when one of the two letters in your guess is gray.
I only played it occasionally and haven’t encountered doubled letters, so I don’t know what the actual response would be. Maybe you’re right.
Posting with such assurance without knowing what the actual response would be, lmao. Very on-brand for HN (and myself too, honestly).
I blindly trusted the post’s claim that “Maybe” means “the letter is in the answer but in a different position”. If you use that definition, you’ll arrive at the same conclusion.
The post should be updated with the correct definition.
It's a fine short summary for how the game works. Assuming no edge cases exist based on a 10-word summary is not the original author's fault.
> if the correct answer is TWEED and you guess TWEET, then you’ll get YYYYM
No, this would give YYYYN
If Yellow/“Maybe” really means “the letter is in the answer but in a different position”, then the final T satisfies this definition.
Another commenter points out the actual implementation may deviate from this definition though.
Try TWEET in today's Wordle and you'll see what I mean
I agree with you. The first E gives a “Maybe” and the second E gives a “No”.
A “Maybe” response gives much more information than simply “the letter is in the answer but in a different position”.
:( Spoiled on hacker news, what a world
Something I’ve wondered about is how well you can guess what peoples guesses were from the images they post.
I mean, I guess - or you can just use a private tab. Same difference, except way less complex.
Amazing, what a brilliant idea
Computer nerds strike wordle! Goddammit! How long before Skynet. Sigh. The end is nigh
/s
This is genius. I love it.
Lessons to be learned in the field of data anonymization!
the point of wordle is to be simple, fun, and social so I don't think there's anything further to be learned here.
Would this be a good use of a hidden Markov model?
Guess the daily wordle by inspecting source code…yes every word is hard coded in the JavaScript in calendar order
Or use incognito for your first attempt at the puzzle and then redo it how you want in your normal account.
Or open your browser's dev tools and type:
$('game-app').solution
And people think bitcoin wastes energy.
What is Wordle?
Can someone TL:dr what this Wordless thing is?
My understanding (I've never played it) is:
each day there is a 5 letter word. You have a limited number of guesses as to what word it is (iirc 6 guesses). When you make a guess, it marks each letter with something indicating whether the actual word had that letter in that position, whether that word has a copy of that letter (and not one you already found), or whether that letter does not appear in that word.
All of your guesses have to be words.
In hard mode, all of your guesses have to contain all of the letters which you got right in a previous guess.
At the end (if you get the word within 6 guesses?) you are given an option to share (on twitter mostly, I think) a representation of your game, in a way that doesn't reveal what words you guessed or what the final word was, just which positions had which of the 3 markings, which, in this share feature, are represented using emoji with the colored square blocks.
This results in many people posting grids of colored square blocks, followed some fraction out of 6.
TL;DR of the TL;DR: https://www.powerlanguage.co.uk/wordle/
Seriously, the quickest way to understand it is to just play the thing.
Now someone make an adversarial twitter bot.
I am trying so hard to not know what Wordle is or how to play. Now it is showing up on Hacker News? Damn. I’ve not had this much trouble since I avoided Sudoku.
The "original" Wordle[0] only lets you play once per day, so if potential addiction is your concern, it shouldn't be a problem. It should take up less than 5 minutes of your time per day.
> Now it is showing up on Hacker News?
Only every day for the last two months
https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=fal...
I like the idea of worldle
But I hate that any guesses have to be words in its dictionary.
As someone who was never really a fan of crosswords, the need to find a real word that fits 5 letters every time severely limits how I can enjoy it.
Seems like you'd be more interested in Master Mind.
Here is an implementation from the great Simon Tatham's Portable Puzzle Collection:
https://www.chiark.greenend.org.uk/~sgtatham/puzzles/js/gues...
You can guess it in one try by carefully reading the code. There is no server that knows the correct answer. The client already knows, based on the date. That is why you can only play once per day.
That's not really guessing though, is it?
Where's the challenge in that?
Reading the minified code, probably.
It's a giant array, tough to miss
Thanks, Captain Kirk. You get a commendation for knowing how to 'View Source'.
Yes, but there's no fun in that
As already explained by carefully reading the opening of the post.