If Do-San were to teach Dal-Mi how to build the machine learning he described in the Netflix series — Start-Up
AI is impacting our everyday lives, but not everyone fully understands the process of AI development. In a recent popular Korean drama, Start-Up, the main character, Do-san (Nam Joo-Hyuk), gave the perfect explanation to Dal-Mi (Bae Suzy) by using an analogy. The series revolves around a group of young entrepreneurs pursuing their passion for an AI startup. The characters’ cheerful, brilliant, and funny attributes successfully made the show viral on Netflix. Once, Do-san smartly introduced machine learning (ML) by showing how Tarzan AI could be trained to learn about love by spending time with Jane.
MD.ai was inspired to continue this love story. In this article, we’ll show you how MD.ai can be used to help Tarzan AI win Jane’s heart. You can also sign up to experience the process after finishing the reading.
Overview
Acquiring a new skill requires the right learning resources, correct instructions, continuous training, and a feedback system. The same steps apply to ML, summarized as data curation, annotations, model development, and validation. For the first episode, we are going to discuss data curation and annotation at a high level.
In this scenario, the task for Tarzan AI is to identify which flowers Jane likes. Human instructors manage the inputs to ensure successful learning outcomes for Tarzan AI. Tarzan will then learn how to distinguish different flowers using deep learning.
Press enter or click to view image in full size
Data Curation
Like studying any subject, the first step is to decide what the model will need to learn to become a domain expert. This will help human experts gather and organize the most relevant data for ML, called data curation. Managing large datasets is often challenging for most organizations. With MD.ai, users can safely and efficiently upload, archive, and view their datasets in the cloud, from anywhere in the world.
Creating a project and adding users only take a few minutes. MD.ai supports any pixel data upload through UI or by using the CLI tool. The data will be organized by exams, series, and images. Users also have the choice to use tools, such as hanging protocols and filters to order data in their preferred ways.
Annotation
Next, how does the model interpret the data? To transfer our knowledge to the machine, we use labels to classify and indicate the objects we want the AI to recognize, known as data labeling or annotation. Through this way of communication, AI will be able to understand how people think and analyze.
In this scenario, assuming Jane only likes tulips, not roses. We simply used global labels to mark images as “Jane Likes ❤️” or “Jane Dislikes 😔.” After viewing so many pictures and annotations, Tarzan AI “learns” to identify the differences quickly and distinguish the two flowers based on their unique shapes, colors, leaves, etc.
Source Code
You can find the source code here:
https://github.com/mdai/tarzan-ai
Jupyter Notebook:
https://colab.research.google.com/github/mdai/tarzan-ai/blob/main/tarzan-ai.ipynb
This notebook demonstrates how to download and parse annotation data and train and validate the model on MD.ai. You’ll need to use an MD.ai access token to run the notebook (and put it here), which you can create after signing up for a free account.
Conclusion
Let’s see if Tarzan AI has learned what Jane likes: https://public.md.ai/annotator/project/nxN1d4R6
Tarzan does an excellent job learning that Jane likes tulips, not roses! He makes a few mistakes, but we are confident Tarzan will get better after spending more time with Jane. The next article will describe model training and validation.
Press enter or click to view image in full size
Feel free to clone the project and try to annotate yourself on MD.ai. For more instructions, please visit: https://docs.md.ai
Press enter or click to view image in full size
Contact us at tarzan@md.ai 🤖 to give us feedback about Tarzan AI, or to learn more about how MD.ai can help with your projects