In May of 2005, as a newly minted second lieutenant, I reported for training at Goodfellow Air Force Base, Texas to begin the U.S. Air Force Intelligence Officer Course. Several days of this 1,000 hour training program were dedicated to teaching students to conduct visual recognition, or visrecce (pronounced viz-reck-eee), of U.S. and foreign aircraft.
The final exam for this block of instruction was a test where instructors showed pictures of aircraft and students were required to correctly identify the aircraft type (like flashcards used in elementary education). Each student had to earn an 80% score or better on the visrecce exam to proceed to the next block of instruction.
Today, computer vision technology is perfectly equipped for conducting this type of visual identification task. During some downtime over the 2018 holiday season I set out to train a machine learning model to perform aircraft visrecce.
I didn't have time or resources to build a training corpus of all U.S. and foreign aircraft, so I chose to limit the scope of my project to just four U.S. Air Force fighter aircraft (F-16, F-15, F-22, and F-35). This computer vision model would have fairly limited functionality in a practical sense, but I figured it would sufficiently demonstrate the art of the possible.
I chose to use Google's Cloud AutoML Vision product for this project. I made this choice for two reasons. One, I work for Google and my job requires me to be proficient in using these tools. Two, this tool provides an easy solution for uploading and labeling images, training a computer vision model, and then providing predictions about new samples. I wouldn’t have to provision any hardware, spin up any virtual machines, install any software, or write any code.
Training the Model
The most important step in any machine learning project is organizing the training data. Training data is the sample information (input) that the machine learning algorithms will analyze and compare. In my case, I needed many pictures of the four types of fighter aircraft I had scoped for inclusion in this project.
The U.S. Air Force website is a great place to pull these images, because the images available there are "government works" and have favorable licensing terms. A word of caution here: it is important to only use images that are in the public domain or that we have a license to utilize in order to prevent our model from infringing on someone else's copyright.
To get started, I simply made a folder for each aircraft type on my desktop, then over the course of two hours downloaded pictures of each fighter aircraft into its respective folder. Google's Cloud AutoML Vision recommends using 100 pictures of each "label", aircraft type in my case, in order to give the computer vision algorithm sufficient data for training.
After I had collected 400 images (100 each for the F-16, F-15, F-22, and F-35) and labeled them (put them in the folder that corresponded with their type) I was ready to upload the training data to Google's Cloud AutoML Vision interface and kick off the training process. Training the model took about 20 minutes.
Evaluating the Model
During the training process the system sets aside a portion of each labeled sample to use for testing and validating the model. For my model the system set aside 40 images, 10% of the dataset, to use for evaluation. As we can see below, when those 40 images where evaluated the model achieved 94.444% precision and 89.474% recall.
The Cloud AutoML Vision documentation provides the following explanation for understanding precision and recall in this context:
Precision and recall help us understand how well our model is capturing information, and how much it’s leaving out. Precision tells us, from all the test examples that were assigned a label, how many actually were supposed to be categorized with that label. Recall tells us, from all the test examples that should have had the label assigned, how many were actually assigned the label.
A useful metric for model accuracy is the area under the precision-recall curve [average precision]. It measures how well your model performs across all score thresholds
It is also helpful to understand where the model identified false positive and false negative examples. A confusion matrix is perfect for gleaning this information. A confusion matrix is a table that shows how often the model classified each label correctly (blue in the sample below), and which labels were most often confused for that label (orange in the sample below).
Interestingly, we can see that the machine learning model confused an F-16 and an F-22 about 10% of the time. To overcome this uncertainty, we could add more F-16 and F-22 sample images to the dataset, and then train the model again.
Making Predictions
With a fully trained model, I was ready to start making predictions about new images that the model had not used for training or evaluation. In each of the cases below, I uploaded an image and the model predicted the type of aircraft in the photo.
As the samples above show, the model makes predictions with a high degree of accuracy for images where a single aircraft is featured in the photograph.
Understanding Limitations
This model was trained with 400 high resolution images that each showed only one aircraft. We would need to enable multi-label classification and provide labeled sample images featuring multiple aircraft in order to evaluate images with more than one aircraft.
As demonstrated below, this model failed to predict the correct image type when two aircraft were shown in the same image.
Another limitation here is transfer learning. This model can't predict an A-10 from an image, because I did not provide any labeled images of A-10 aircraft in the training data. This model also can't play chess of predict the stock market, it is a supervised learning system that is only tuned to predict four types of fighter aircraft when provided an image.
Under the Hood
The machine learning model isn't actually counting the number of vertical stabilizers of engine intakes present in a particular image. The model is converting each pixel into a matrix of numbers and then doing advanced comparisons of these aggregate matrices.
The machine learning algorithm is looking for subtle patterns, series, and trends behind the pixels. As it compares these trends across the corpus of training data it develops a series of weights and balances that represent commonality in the underlying labeled images. The model then uses this information to make predictions when provided with new images.
Conclusion
It is safe to say that this computer vision model could successfully pass the visrecce exam I took back in 2005 when I was a student in the U.S. Air Force Intelligence Officer Course (if that exam were limited to only identifying F-16, F-15, F-22, and F-35 aircraft).
My goal with this project was simply to demonstrate the art of the possible and show how accessible machine learning has become. Today a person does not need to be a data scientist or a classically trained computer scientist to begin leveraging this powerful capability. Google's Cloud AutoML Vision product has democratized computer vision and made this technology easy to employ.
With more training data a visrecce model could be built to identify any number of U.S. and foreign aircraft. It would be fun to curate this dataset and open source it for the aviation community. If you are interested in working together on this type of effort, please let me know!