Synference - an API for A/B testing with reinforcement learning
gigaom.comCongrats for launching. It's an interesting application of machine learning. Might give it a try with my tiny website.
I'm one of the founders - happy to answer any questions.
I believe this sort of bandit-meets-machine-learning approach is going to have a big impact on web optimisation.
How do you plan to deal with the "cold start" problem, not having enough information about the objects in the prediction? The majority of visitors will be viewing pages on a given site for the first time. In these cases there is a limited amount of information available: referring url, browser, perhaps first or third party cookies, visit time, etc. The whole point of A/B testing is to find a configuration of page elements that will appeal to the largest number of people, because there is so little information available about each individual.
I know of some cold start solutions, my personal favorite being variants of regression based latent factor models [1]. I'm not expecting you to reveal your secret sauce, but I am curious about how you plan to address the problem of having so little information about each person.
There are really two different problems you are mentioning there.
Problem 1 is the typical recommendation-system/ML problem of 'cold start', where there isn't sufficient training data examples to produce good predictions. (Recommender systems based on clustering of items and users are particularly vulnerable to this.)
Problem 2 is a different problem, which is where you only have a small amount of data for each user (low dimensional feature data), where, even if you had a lot of examples, you wouldn't have enough information about each example (i.e. enough features) to make predictions.
Problem 2 isn't really as big an issue here as you might think. You mention a limited amount of available information: "referring url, browser, perhaps first or third party cookies, visit time, etc." This is actually a fair amount of feature data, if you treat it in the right way - even just browser+IP are.
If our system just has IP address, we'll build features to do with geo-location, and derive features like predicted income level from a combination of geo and device/browser.
Now, those features are only useful if they are predictive - if its the case that the set of options you want to choose between will vary in some way that correlates with those features.
However, this is often the case. If you have a global website, users from some geographies will have different preferences. Time-of-day is also an important feature. Further, if you want to adjust the discount you give people, how modern their hardware (from the user agent) is is a good signal. Those are just examples - there's a lot in there, if you treat it right.
Even these very common, lowest-common-denominator features can thus give you better results than just blindly pretending your population is a homogeneous whole, which is what existing bandit approaches do. Further, our API supports custom features, too.
Problem 1 is a more technical question.
The honest answer is that the particular type of framework we use doesn't really suffer from that problem.
Our solution is adaptive. If there are very few training data points, then our system will treat the problem as if the user population is relatively homogeneous, and will attempt to predict the best option for the population as a whole - like a traditional bandit algorithm would, or like the result you'd get from a simple AB testing framework.
However, as the amount of data gets large, our system functions more like a predictive model (i.e. like a ML system) and less like a simple bandit algorithm.
This means that there's really no cold-start problem with the learning algorithm - instead, there's a transition to finer and finer predictions as the data density increases. So it ends up using the data roughly as efficiently (not quite as efficiently, but close) as a custom solution which is calibrated to the level of available data would.
And that's an awful lot more efficient than running an AB test for a fixed length of time, and waiting to see the results.
If you want more detail, maybe have a read of our FAQ: http://www.synference.com/faq.html or send me an e-mail.
How are you handling time variations?
Behavior varies across time of day, day of week, time of month, and obviously seasonally. Any thought to more transient things like recent press releases, or the bug that happened for IE9 browsers for 3 hours?
I see graphs in the article for time of day, and day of week, are longer term cycles found and used?
At the moment, we automatically pick up on time-of-day, and day-of-week variations.
We don't currently look at time-of-month and seasonal variation by default. This is because most people using our service at the moment, are not trying to optimise for a monthly or seasonal time horizon. If this changes, we'll add those features.
You also ask about transient events: Our system is adaptively responsive to short term changes in trends. It has certain assumptions about the preferences of users, which it is constantly testing and monitoring. If those assumptions turn out the be wrong, they get checked correspondingly more, and this process forms a feedback loop.
So, if there's a short term change in behaviour, that changes a user behaviour a large amount, the system will adapt to that quickly. If it can't tell the short term change apart from random noise, then it won't adapt - but thats a limitation of any system - for it to detect a short term change, the change either has to have a large magnitude, or it has to see many users with that change (probably 100s but maybe 1000s, depending on the magnitude of the change).