Press enter or click to view image in full size
Machine learning is all the rage these days and can be a major differentiator for your startup. Unfortunately, most startups underestimate how difficult and expensive implementing ML can be. The following points are guidelines that I follow to successfully integrate machine learning into software while startups are still in the early stages.
What you’ll need
Machine Learning is simply a set of statistical methods that you can use on large datasets to make predictions. Whether you are focused on computer vision, robotics, recommendation systems, or any of the myriad of ways you can use ML. You are essentially focusing on making predictions. You put input in, and predictions come out. This is simple in theory, but finding the right model to accomplish what you want to do isn’t always straightforward. To help facilitate your ML journey, you’ll need a way to capture, transfer, and transform data to fit your needs.
Common methods for getting started with ML
The primary problem with machine learning is that, as a startup, it can seem nearly impossible to acquire all the data you will need. You could try partnering with an organization with all the data you might need, but few startups actually accomplish this. Another commonly suggested strategy is using services for extracting and labeling data, but that is usually quite expensive. As a startup, you generally don’t have a lot of money and don’t want to blow your entire budget on creating a data set. It’s possible to save the expense and do the extraction and labeling yourself as long as you don’t have anything else you want to do for the next few months.
I’ve offered a few options and argued against all of them. So what is a startup founder to do? I’ve found that the best way to solve the problem is to plan on an incremental evolution toward machine learning along with designing your product around getting your users to label your data for you. The rest of this post will focus on this strategy and how to go about implementing it.
The power of traditional AI
Back before Machine Learning was a common term, companies implemented AI using more traditional methods, and they were often really successful. Techniques such as rule-based expert systems, logic trees, and clustering algorithms like k-means are quite effective, and in many cases, they are still the backbone of the ML industry today. By building your service using these technologies, you can bridge the data gap as you collect all of the data that you need to layer in Machine learning algorithms.
Make a Plan
As an early-stage startup, the tools you use matter, especially for tools that govern your companies data. How should you pick the right data tool? Whatever you pick should be inexpensive or free to start. They should have a low learning curve, and they should provide flexibility so that your company can grow without unnecessary friction. Essentially, you’re going to want to avoid the enterprise tools.
Be Data-Oriented
Building data pipelines into your system from the beginning will make it easier to add in machine learning as your startup matures. You won’t have any data on day 1, though. So when you are designing your product experience, you should consider ways to get your users to label data for you. The goal is to acquire accurately labeled data for as close to free as possible.
Data Expansion
When you have a lot of data, but it doesn’t seem to be quite enough data for a machine learning model, you might be able to use the data expansion strategy to increase the size of your data set without merely duplicating records. In doing so, you can reduce the potential for high bias models caused by data sets without enough variance. Depending on the data you have available, you could apply transformations to the data to generate more data points for training from your existing data set. Techniques like rotating images or adjusting pitch in audio can help expand your data set and produce better results.
Don’t reinvent the wheel
Companies like Google, Amazon, and Microsoft all have APIs that allow you to perform predictions using their ML models. For more specialized applications that their services don’t cover, you will have to roll up your sleeves and use frameworks like TensorFlow and Keras to create your own models. I don’t suggest doing this in an early stage unless it is absolutely crucial to your value proposition. Creating quality models from scratch can take months, and despite best efforts, they can still fail.
Wrapping Up
It’s not easy running a startup, let alone one focused on the use of Machine Learning. By being creative with how you acquire and label data, you’ll find that ML doesn’t have to be an insurmountable hurdle. I realize that the strategy I present is likely just 1 of many. If you have a process that you have experienced success with, then please leave a comment.