My nephew vs ML - NFHN Reader

My 4 year old nephew Yali is very into Pokemon right now. He has multiple Pokemon toys and a few TCG (training card game) cards.

The other day Yali found my sick TCG cards stash, and he now has more cards than he can handle.

The problem is that Yali is too young to learn how to play the real game, so for now he invented his own game. The purpose of the game is to categorize the cards by their type (Pokemon, energy and trainer card).

He didn’t ask me how I know what type the card is, he took few cards and asked me what their types are. After few of my answers he was able to separate some cards by their type (with multiple mistakes).

At that moment I realized that my nephew is basically a machine learning algorithm and my task as an uncle is to tag the data for him.

As a geek uncle and a machine learning enthusiastic I began to write a program that will compete with Yali.

This is what a typical Pokemon cards looks like:

source: pkmncards

For a grown human being who knows how to read, it is pretty easy to tell what type the card is, It’s literally written at the top of the card. But Yali is 4 and he doesn’t know how to read. Simple OCR will solve my problem very quickly, but I didn’t want to make any assumptions. So I took the card as is and threw it to a neural network using MLP algorithm.

Getting the data was quite simple. Thanks to pkmncards I could download images of multiple cards, already separated into Yali’s categories.

A machine learning algorithm needs features, and my features were the image pixels. Using RGB color images I converted the 3 colors to one integer.

As the images are in different sizes I had to normalize them. After finding the maximum width and height of my data, I added padding of zeros to the small images.

Fast QA cycle before the real deal. I randomly took two cards of each type and ran the prediction. Using only 100 cards of each category the fitting was surprisingly fast and the predictions were awful. Then I took 500 samples for each category (excluding the energy type, which had only 130 samples) and let the fitting run.

Out of memory exception. Running the fitting code in the cloud should work, but I wanted to find out a way to use less space. The biggest image size was 800x635, which was way too much. resizing the images solved my problem.

For the real test, in addition to the regular cards, I added cards that I drew on, cut in half, painted shapes on, took a picture from my phone (which has poor camera) and other special effects. It’s important to say that I hadn’t trained the network with those cards.

I’ve tried more than a thousand models (1533 to be exact). Different image sizes, number of hidden layers (up to 3) and layer length (up to 100), image colors, read types (whole picture, top half, every other pixel and etc).

After many hours of fitting all of the models, the best score was 2 mistakes out of 25 cards (a few models had this score and each failed on different cards. all of them were grayscale version of the image).

9 out of the 1533 models scored 2 mistakes.

A combination of the models would give me a score of 1 mistake if I set my threshold to be higher then 44% agreement. For the test I’ll use a 50% threshold.

Letting Yali play for a month, I came back for the test.

Win! 23 for the machine and 21 for the human.

As you can see, the mistakes only occur on energy cards. There were only 130 energy cards at pkmncards instead of thousands from the other types. Fewer samples to learn from.

Asking Yali how could he recognized some of the cards he told me that he saw some of the pokemons in TV shows or books. That’s how he reconized Raichu’s ear or knew that Vapereon is water Eevee. My program didn’t have that kind of data, the only data I had were the cards.

As our hero beat Yali in the Pokemon challenge and got the ML badge, He moves on to his next adventure.