Settings

Theme

Where's Waldo?

stackoverflow.com

409 points by bkaid 14 years ago · 31 comments

Reader

sergeyk 14 years ago

This is a toy example of the kind of problem that the field of Computer Vision is actively working on: object detection. In a (tiny) nutshell, our best answer for general images and objects is:

1) Instead of using the full color pixel image, use an "edge image" with some simple additional normalizations. If color is important, do this per color channel.

2) Create a dataset with as many cropped examples of the target object as you can find (mechanical turk is useful for annotating large datasets); every other crop of every image is a negative example.

3) Train a classifier (SVM if you want it to work, neural network if you're so inclined) using this dataset.

4) Apply the classifier to all subwindows of a new image to generate hypotheses of the target object location. This can be sped up in various ways, but this is the basic idea.

5) Post-process the hypotheses using context (can be as simple as simply finding the most confident hypotheses within a neighborhood).

If you're interested in object detection, an excellent recent summary of the recent decade of research is due to Kristen Grauman and Bastian Leibe: http://www.morganclaypool.com/doi/abs/10.2200/S00332ED1V01Y2... (do some googling if you don't have access to this particular PDF).

A cool paper from a few months ago that should be mentioned when commenting on a post called "Where's Waldo?" is http://www.cs.washington.edu/homes/rahul/data/WheresWaldo.ht...

  • apu 14 years ago

    Heh, I started reading this comment and was ready to jump on something I disagreed with, but remarkably we're in full agreement!

    Somehow I'm always surprised when two vision people agree on the right way to approach a problem =)

TamDenholm 14 years ago

Something unrelated but perhaps interesting to some people, "Waldo" is actually a localised name for the USA and Canada, his original name is Wally.

http://en.wikipedia.org/wiki/Where%27s_Wally%3F

  • robgough 14 years ago

    It brings me an almost indescribable joy to find that Wally is the original name. Yet I have no idea why.

    Waldo always seemed a bit of a strange name, and it still confuses me why it would be changed for the US market. Anyone know why (Wiki doesn't say).

    • ward 14 years ago

      Gone with Where is Waldo?! The all new meta-existential gamebook Why is Waldo? is out now in your nearest bookshop!

      I felt dirty with all the exclamation marks.

    • tomjen3 14 years ago

      Then you may cry when I tell you that his name is Holger in Denmark.

    • polymatter 14 years ago

      at a guess its for market differentiation. like how I get the "international" version of US textbooks, which are exactly the same but with a different cover on the front. Or how Harry Potters and the Philosophers Stone needed a title change to sell to US. Or how you can get adult and child versions of the Harry Potter books.

    • LearnYouALisp 14 years ago

      Me too. Probably some committee or executive idea.

6ren 14 years ago

Are there other examples of it working? (if there were links, I couldn't see them).

There's a danger of overfitting, where a technique works for one instance (or a subset of instances), but not in general. Detecting stripes could work in general, but as a SO commenter noted, "Where's Wally" images often include spurious stripes to undermine this detection strategy for humans.

dice 14 years ago

The algorithm described by Heike is essentially just looking for striped red and white shirts. Anyone who's done more than a couple of "Where's Waldo?" games knows that striped shirts are often thrown in to draw one's eye. In fact, in this very example there is another striped shirt (lower left corner, just above the wall) which could very well have been Waldo that this algorithm did not highlight. Without being able to recognize Waldo's human characteristics (thin, glasses, strong chin) the approach described will inevitably fail.

rgarcia 14 years ago

I had to play around a little with the level. If the level is too high, too many false positives are picked out.

I was impressed until I read that--the guy is basically fitting the model/procedure to the training set (of size 1). I'd wait for a more general approach before accepting the answer.

re 14 years ago

On NPR, this turns into: "an algorithm that can find Waldo in any image."

http://www.npr.org/blogs/waitwait/2011/12/18/143865340/the-w... via http://meta.stackoverflow.com/questions/116401/stack-overflo...

  • rcthompson 14 years ago

    On "Wait, Wait, Don't Tell Me!", which is a comedy make-fun-of-the-news quiz show. They exaggerate everything in that way for comedic effect.

ofca 14 years ago

Programming potential never ceases to amaze me. I want to learn more. NOW!

  • _mrc 14 years ago

    You might want to check out ai-class.com - it includes an introduction to computer vision (and plenty of other cool stuff).

kevinalexbrown 14 years ago

Cool. I've done some work on things like this before. Some of the things I do to make it work on multiple images:

Template matching is your friend in this case, because most Waldos look similar. You already tried this in a basic way by searching for the stripes of a given color. You can make it more powerful by making the template include more properties, and work in more contexts. For instance: what if Waldo's a different size?

The other option is to pretend you don't know what Waldo looks like, find him in a bunch of images, label the subimages as "waldo" candidates, measure certain properties of those subimages, and find which of coordinates of feature space have similar properties. Then use these properties as your template.

Finally, you could train a classifier on subwindows like sergeyk suggested. This has some difficulty because where's waldo images are difficult to subdivide into subwindows on the scale of a single person. Do you move pixel by pixel? Do you divide it into a grid? Each grid will contain weird parts of people in each box. Etc. If you do find a way to divide the image into "people" -- perhaps by doing a preliminary "person"-template sweep that identifies locations of people in the image -- then you can use a supervised learning algorithm to say "yes, this person is waldo" or "nope, FRWONG!", based on the image properties in the subwindow around that person.

viscanti 14 years ago

This needs to be an augmented reality mobile app. The problem on the AI side of things is that a good algorithm that reliably "learns" what Waldo looks like would need a substantial number of examples.

A good solution to this would get close, then calculate the probabilities of every "maybe-waldo" and then display the one with the highest probability of being Waldo. An augmented reality app that highlighted Waldo on every page would be awesome.

  • shabble 14 years ago

    If you've got net access (or even if you don't), it seems almost plausible that you could just identify the book/page in question and use a lookup table of coordinates.

    I don't know how many variations on the /Where's Wa[a-z]+\?/ theme have actually been produced, though, so maybe it wouldn't be easier.

    Then again, if you can upload unknowns, wait until you've got enough samples to generate confidence, and then store the result, it'd scale/perform much better :)

danso 14 years ago

Amusing application, but I'd like to see the version that finds Waldo on the page in which everyone is wearing striped shirts

  • antics 14 years ago

    In most normal applications, the only thing that would change is what your features are. For example, if you wanted to find Waldo using the shape of his face and/or hat, you would probably just find some SIFT points (or something), and then build an eigenWaldoface, possibly using a PCA'd set of Waldo faces and hats as examples, and then SIFT the image and look for the places that are most like the eigenWaldoface.

    This article is not interesting because it's an amazing new algorithm or something that solves some important world problem. It's interesting because it takes something that is not known among the general hacker population for doing this sort of thing really easily, and accomplishes it in a fairly simple way.

    Don't be a grump, this is cool. :(

    • danso 14 years ago

      Ha, I wasn't being a grump...these kinds of problems are An important party of the evidence in showing thre practicality and usefulness of code. I love it.

      I mostly wanted to see who else remembered that particular Waldo puzzle...it was the final one in one of the books

brianbreslin 14 years ago

interesting problem. i'd like to then apply this concept of finding a needle in a haystack to satellite imagery. Using super-computing + giant image data sets, you could theoretically find some pretty obscure stuff if you knew what you were looking for (hidden treasures???).

jastr 14 years ago

This is undoubtedly a data point on the path to the singularity.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection