How to Model Viral Growth: Retention and Virality Curves

46 points by rahulvohra 13 years ago · 7 comments

Reader

idoh 13 years ago

Definitely an interesting article and one worth reading. I work a lot on viral growth and I'd like to add a counterpoint that the growth of apps is so complicated that it is basically impossible to model in a useful way.

I've found that even if I make no changes to an app, the retention and virals fluctuate quite a bit for no apparent reason, and the fluctuations are big enough that it makes long term forecasting really more of guesswork than anything else.

Also, there are second order effects that are hard to model as well. For instance, improving virals can improve retention (user A invites friend B, user A stays for longer because their friend uses it).

I've gone through the process of modeling a couple apps, and it quickly gets to a point where the relationships become circular and small variations cause exponential differences down the line.

It is important to make informed decisions about virals and retention, but I don't think such a model is the way to do it. I think it is more important to think about optionality and decision making in opaque environments rather than trying to model the unmodelable.

adolgert 13 years ago

This multiplicative kind of model is happily amenable to time series analysis, so you can do stats to see what your numbers are and how well they fit. That's great. What's less great is the model quality, given that well-tested virality models can be found in other venues. Coffman looks at this, for instance, at http://datacommunitydc.org/blog/2013/01/better-science-of-vi.... The difference in these two types of models is that symmetries in the statement of the problem permit, or exclude, classes of solutions. Those symmetries come from assumptions about the contact graph, the most basic (and testable) assumption.

richardjordan 13 years ago

This is great stuff. I see so many startups get so excited about features and growth yet fail in their analysis of retention. Maybe I'm a bit on the data-nerd side but I love to see folks sharing their own methods for tracking and calculating this stuff. Even with so many startups basing their model on recurring revenue, it's still easy to trip up on modeling this stuff going forward.

Mahn 13 years ago

> so many startups get so excited about features and growth yet fail in their analysis of retention
I think it's just we write less about it, not so much that we aren't aware of its importance. "How we managed to retain our users for 4 months" sounds admittedly less sexy than "How we got a bazillion users in less than 72 hours", but the truth is a tech startup with no strong retention strategy is basically dead in the water, and generally folks know this.

graycat 13 years ago

Here's a different approach:

We denote time by t with units, say, days.

The number of customers at time t is the (real valued function of a real variable) y(t).

We assume that at the present t = 0 and that we have y(0), that is, the current number of customers.

We let the number of customers who will ever try our business be b. That is, b is our intended 'market potential'.

Initially we assume that once we get a person as a customer, we do not ever lose them but keep them forever.

As usual, we let y'(t) = dy(t)/dt be the calculus first derivative of y(t). Then y'(t) is number of new customers per day, that is, the 'rate' at which we gain customers.

For 'virality' we notice that that is proportional to (1) the number of customers y(t) we have 'talking' about our business and (2) the number of people

     b - y(t)

yet to be our our customers hearing the talking.

Then we have that for some constant of proportionality k

     y'(t) = k y(t) (b - y(t))

So we have an initial value problem (that is, we know y(0)) for a first order (we use only the first derivative) ordinary (no partial derivatives) differential equation.

Then from calculus,

     y(t) = y(0) b exp(bkt) /
            ( y(0)( exp(bkt) - 1) + b))

So this solution grows (1) initially slowly, (2) then more rapidly, (3) then more slowly and approaches b asymptotically from below.

In case we lose some customers forever at some rate r, then we get the same solution except k and b get adjusted.

Once there was a startup (now a major company) that was struggling and had as an investor a major company with a Board seat and at the startup two representatives, one in finance and the other in aeronautical engineering.

The two representatives had asked for some revenue growth projections.

People around the HQ considered what the startup hoped, intended, thought might happen, etc., but found nothing credible.

One guy who remembered calculus reluctantly got involved, formulated and solved the differential equation above, and showed the solution to a Senior VP of Planning (SVP) who reported to the founder, CEO, COB. The SVP was responsible for the projections. The SVP took the guy's calculus solution as the basis of the projections and on a Friday sat with the guy with a pocket calculator and some graph paper and graphed solutions to the differential equation for selected values of the constant k and picked one of the solutions as the official projection.

The next day, Saturday, at about noon, the guy was in his office working on some other math problems and got a call from a person asking if he knew about the projections for the Board and if he could come over to the HQ? Sure. When the guy arrived, the situation was grim: The two representatives of the major Board Member were standing in the hall with their bags packed with airline tickets back to Texas. The startup was about to die.

The SVP was traveling and out of town.

The person who had called got the graph of projections from the previous day and asked the guy to reproduce a point on the graph. Using the calculator, the solution above, and a few keystrokes, the point on the graph was reproduced. After several more points were reproduced, the area became happier; the two representatives on the Board stayed, and the startup was saved.

Later the person who had called explained that that Saturday was a Board meeting, the growth projection graph was shown, and the two representatives had asked how the projections were calculated. The rest of the company tried to reproduce the graph but could not. The Board meeting stopped. The two representatives lost patience with the startup, got airline tickets back to Texas, returned to their rented rooms, packed their bags, and as a last chance returned to the startup to see if there was an answer to how the projections were calculated.

Ah, one saved startup! One reason to take calculus seriously!

graycat 13 years ago

Note that with this derivation, if accept the assumptions (which obviously do not always hold), then all there is to 'viral' growth are three numbers, the current number of customers y(0), the eventual number of customers b, and the constant k. This situation holds also in the case of some customers leaving and never coming back (just by some adjustments in b and k).
For k, might fit to past data. For given y(0) and b, all k does is adjust how fast the curve rises to the asymptote. So basically all we are doing is interpolating between y(0) and b.
Otherwise, all viral curves are the same.
So, an advantage of my derivation is a simple, explicit equation for a fairly general solution.
The article has a comment claiming that biology addresses a similar problem and gets a 'logistic' curve. The comment didn't say just what was meant by a logistic curve, but I suspect that my solution here is an example. If so, then here we have an 'axiomatic' derivation of the logistic curve.
It is true that the growth of some products, e.g., TV sets, look to the eye very much like one of the curves from my solution for selected values of y(0), b, and k.
Could also make a Markov assumption: So, assume that get new customers (and, if wish, lose old customers) at some 'rates' and, thus, get a continuous time, discrete state space Markov process. Then as is well known the solution is a matrix exponential. Could evaluate the matrix exponential or just use Monte Carlo to generate a few thousand sample paths. Then could put some confidence limits on the deterministic solution.
Since no one guessed the war story, the startup was FedEx, the SVP was Mike Basch, the CEO, of course, was Fred Smith, the person who called on the phone was Roger Frock, and the investor was General Dynamics. The arithmetic was courtesy of an HP-35. So, HP might run an ad saying how they saved FedEx!

jacques_chester 13 years ago

As others have pointed out, this can modelled with calc pretty handily.

I found agent-based modelling much more interesting. For example, mean-field models struggle with non-uniform spaces.

For the interest of persons here attending, I've uploaded my crappy code on this topic. It's 3 years old and not production suitable.

https://github.com/jchester/ruby-epidemic-model

Settings

How to Model Viral Growth: Retention and Virality Curves

Keyboard Shortcuts