Exploration vs. exploitation

3 min read Original article ↗

Devin Finzer

There’s an interesting problem in artificial intelligence called the multi- armed bandit problem. (Don’t worry, you don’t need to know computer science to understand it)

The basic premise is a gambler sitting at a row of slot machines. When played, each slot machine gives him some unknown, variable reward. He tries the first lever and wins $5. He then pulls the second lever and wins $2. He pulls the first lever again, but this time gets only $1. Should he continue trying the first lever? Should he test the second lever to see if it might produce a better reward? Or should he try a third lever?

The gambler can either exploit his existing options (play slots he knows give him high rewards) or explore new options (try new slots in the hopes they will be better). In artificial intelligence, there are a number of good solutions to this problem that rigorously balance these two approaches.

The more you think about the multi armed bandit, the more you start to realize how relevant it is to life. Many of our big (and small) decisions can be boiled down to the same exploration vs. exploitation tradeoff.

A couple examples:

Food

You step up to the line at your local sandwich shop. Do you order a sandwich you already know you like (high reward)? Or do you choose a sandwich you’ve never tried (unknown reward) with the hopes that it might be even better than the one you’ve already tried?

Career

How long do you dabble before you decide on a career path? As an explorer, you could spend quite a long time trying every single major—but that would take forever. As an exploiter, you could jump into the first major you happen to stumble upon. But that might not lead to long term happiness either.

How long do you explore various industries and career paths before you decide where to invest your time? When is it time to stop exploring and develop depth and experience?

Dating

How long do you date around? You can’t possibly date everyone, so what confidence level do you need that you’ve found the right person?

The adoption of dating apps has made the process of exploring more efficient: you flip on an app and suddenly you have a few dates lined up for the week. So we’re moving towards a more exploratory dating culture—high volumes of first dates, frequent break ups, serious relationships delayed until later in life.

Our life journeys

In general, we explore while we’re young and exploit as we get older. In our youth, we develop many friends, we try new experiences, we travel. When we’re old, we find the things we truly care about and dedicate our lives to them — our marriages, our families, a tightly knit group of friends.

What makes the multi-armed bandit problem non-trivial is the constraint: the bandit’s goal is to optimize his solution as quickly as possible. Similarly, for us humans, the exploration vs. exploitation tradeoff concerns the allocation of our most precious, finite resource—our time.