Learning Game of Life with a Convolutional Neural Network

danielrapp.github.io

71 points by DanielRapp 10 years ago · 13 comments

Reader

shaggyfrog 10 years ago

This is the perfect place to quote one of the hacker koans:

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

"What are you doing?", asked Minsky.

"I am training a randomly wired neural net to play Tic-Tac-Toe" Sussman replied.

"Why is the net wired randomly?", asked Minsky.

"I do not want it to have any preconceptions of how to play", Sussman said.

Minsky then shut his eyes.

"Why do you close your eyes?", Sussman asked his teacher.

"So that the room will be empty."

At that moment, Sussman was enlightened.

claar 10 years ago

If you don't understand this (like me), see https://en.wikipedia.org/wiki/Hacker_koan#Uncarved_block for an explanation.

According to that link, it's based on a true story:

  So Sussman began working on a program. Not long after,
  this odd-looking bald guy came over. Sussman figured the
  guy was going to boot him out, but instead the man sat
  down, asking, "Hey, what are you doing?" Sussman talked
  over his program with the man, Marvin Minsky. At one point
  in the discussion, Sussman told Minsky that he was using a
  certain randomizing technique in his program because he
  didn't want the machine to have any preconceived notions.

  Minsky said, "Well, it has them, it's just that you don't
  know what they are." It was the most profound thing Gerry
  Sussman had ever heard. And Minsky continued, telling him
  that the world is built a certain way, and the most
  important thing we can do with the world is avoid
  randomness, and figure out ways by which things can be
  planned. Wisdom like this has its effect on
  seventeen-year-old freshmen, and from then on Sussman was
  hooked.

mark_l_watson 10 years ago

Neat example! I had not run across convnet.js before, so thanks. Karpathy was I believe one of Hinton's students. I looked over my archived courses last night and saved all of Hinton's videos and viewgraphs from his coursera class in 2012. Really good stuff, and it is a real shame that COursera does not continually re-run that course on auto-pilot (I am sure that Hinton and his students are too busy to participate in the forums but the videos, auto-generated homework, and the quizes would stand on their own without TA intervention.)

Convolutional networks, deep learning, and machine learning in general are arguably the most disruptive technologies in our lifetimes. BTW, pardon the shameless plug, but I just started a new blog this morning to host ML resources and my own experiments: http://blog.cognition.tech/?view=classic

JabavuAdams 10 years ago

That course was fantastic. Work prevented me from doing the assignments, but I've also archived the materials.

tlarkworthy 10 years ago

Cool, but convo nets are not the right tool though. Neural networks have an implicit smoothness and locality prior (which is what back propagation exploits) which this game does not really posses. Decision trees (e.g. random forests), are able to deal with sharp discontinuities better and would be more appropriate. I would expect less training, and perfect results.

azakai 10 years ago

I agree decision trees are more natural here, but even they are overkill, I think: all that needs to be learned is a function from the 8 neighbors to the center, which means from 8 bits to 2 bits, or in other words, there are just 256 values to be learned.
Literally any reasonable learning algorithm, even nearest neighbor, will learn that fairly quickly. The article had to use millions of samples, which is extremely excessive - a more efficient algorithm should only require thousands.
What might be interesting could be to see whether the neural network learns the underlying rule, that placement does not matter, only the sum matters, and that the crucial value is "3". That would be cool to see.
- mdpopescu 10 years ago
  
  ... there are just 256 values to be learned
  This is a very good point and it's the reason why I don't find much value in GAs. There is an inherent conflict between "an algorithm for a specific problem" (which can be optimized to run in a short time - the above problem can be reduced to a lookup in a 256 x 2 bit matrix and thus run in O(1)) and "an algorithm for any problem (in a class)" which, while theoretically capable of finding a solution, is prohibitive in time and/or space.
  As far as I can tell, GAs are barely any better than a random walk and thus not useful for any non-trivial problems. I am not yet sure if NNs are in the same category, though I suspect they are.

boothead 10 years ago

Convolution is a comonad. http://kukuruku.co/hub/haskell/cellular-automata-using-comon...

I love it when there's a nugget of rigor behind things like this that can be pulled out and applied elsewhere!

hughes 10 years ago

I guess it works for a few specific cases you mentioned, but the example animation at the end diverges after twelve steps[1]

[1] http://imgur.com/a/gcUCH

karpathy 10 years ago

It does seem like it wasn't trained in the most optimal fashion.
- The conv layer has only 20 relu units, I'm not sure if that suffices to memorize the rules. The net might be "underfitting" the rule set, and therefore making mistakes in rare pixel arrangements.
- More worryingly it's also strange that the author uses a fully connected layer right after the conv layer (this is implicitly added in ConvNetJS when you specify a loss layer). This means that the output neurons are a function of the entire preceding CONV layer activations everywhere, while the game of life rules are local. The way to do this would be to instead use a 1x1 CONV layer with 2 neurons on top of the first 3x3 CONV layer, and interpret it as computing the class scores at every spatial position. However, in this case you'd want to apply the loss on every spatial position, and this "fully-convolutional" loss use case is not supported out of the box in ConvNetJS, but could be written.
- However, with a fully-convolutional loss zero-padding of 1 used around the borders might cause trouble. Normally this is okay with images, but here this might cause trouble because the neurons all share parameters spatially (and hence compute the same function) and don't "know" if they are at the border on in the middle of the image. I'm not sure how how this game of life handles boundary conditions, but if borders obey different dynamics then you'd want to distinguish the border pixels with a special "border" feature vector at the input. E.g. each pixel is a 3-vector, with a 1-hot encoding for (border, positive pixel, negative pixel).
- And it's also strange that the author uses "regression" loss for some that is a binary classification problem.
So, nice attempt but several funny choices, and clear why it didn't fully work :)
- nicklo 10 years ago
  
  Glad to see a critical eye on these sorts of things. I had a question about the choice of loss function. So regression is definitely the wrong choice since our outputs are binary (0 or 1) and not real-valued (0-1).
  We could use a softmax classifier as this is meant for multi-class binary classifications. Each class in this case would represent a neighboring pixel, a classification of that class would represent activating that pixel. However, softmax assumes one-label and the probabilities add up to 1. Our problem is a multi-label multi-class binary classification.
  We could train 9 of such softmax node groupings for each neighboring pixel but that immediately seems to be a bad idea.
  Another awful solution - make each possible pixel configuration (2^9 of these) a class and do a standard soft-max.
  I can start to see why the author chose to use regression loss as a sort of hack to get this to work, but I'm trying to think out the best, proper solution.
  Any thoughts?
- JabavuAdams 10 years ago
  
  Comments like this, and Hinton's Coursera course remind me that there's a whole "Black Art" to training these systems.
  Do you think a similar approach could be used to learn to generalize Navier-Stokes by looking at fluid flows?
  - dave_sullivan 10 years ago
    
    It's not any more of a black art than programming is. There's no magic to it, you just have to know what you're doing. Also, you should use some kind of scheme (like bayesian optimization) to optimize the hyperparameters of many experimental models. That's the best way to make sure you're getting the best results (but of course takes a while, so you do have to be selective about which parameters you optimize or what combinations you allow).
    And I'll also aknowledge that "programming" is a vastly more mature field than "deep learning" or even "machine learning". So it's fair to argue that there's much we don't know, but there's more and more we do.

Settings

Learning Game of Life with a Convolutional Neural Network

Keyboard Shortcuts