Bayesian cohort-level ARPU Model

A frequent problem in User Acquisition is to estimate ARPU of users in a cohort.
Cohort — a group of users acquired together (same date, source, channel, etc.).

Individual user revenue follows a heavy-tailed distribution:

Majority (~75%) generate zero revenue
Some (~20%) generate modest revenue
Tiny minority (~5%) generate the significant part (whales)

Assumption: users inside a cohort are i.i.d., individual revenue ∼ LogNormal (or any other heavy-tailed distribution with finite mean and variance).

Because mean μ and variance σ² exist and are finite, we can apply the Central Limit Theorem to the cohort mean.

So the problem is to estimate unknown parameters μ and σ² using data. For modeling purposes, it would be convenient to define σ in terms of some other parameter k and μ , for that we need observed standard deviation of revenue inside cohorts.In practice, revenue SD is roughly proportional to mean (common for positive heavy-tailed data).This leads to the hierarchical Bayesian model that works extremely well as long as cohort size n ≳ 50–100

import numpy as np
import pymc as pm
import pandas as pd

df_agg = pd.DataFrame({
    'cohort': ['app1', 'app2', 'app3', 'app4', 'app5'],
    'installs': [  325, 430, 280, 190, 500],
    'arpu': [0.15, 0.22, 0.30, 0.25, 0.05],
    'revenue_sd': [0.48, 0.46, 0.55, 0.50, 0.20],
})

df_agg['idx'], cats = pd.factorize(df_agg.cohort)
coords = {'cohorts': cats}
model = pm.Model(coords=coords)
idx = df_agg.idx.values

with model:
    mu_global = pm.Normal('mu_global', mu=np.log(1.2), sigma=0.3)
    sigma_global = pm.HalfNormal('sigma_global', sigma=0.3)

    log_mu = pm.Normal('log_mu',
                                mu=mu_global,
                                sigma=sigma_global,
                                dims=('cohorts',),)

    k = pm.Gamma('k', mu=2, sigma=0.8, dims=('cohorts',))

    mu = pm.Deterministic('mu', pm.math.exp(log_mu),
                                   dims=('cohorts',))

    sigma_k = pm.HalfNormal('sigma_k', sigma=0.5)
    pm.Normal('y_sd', mu=k[idx] * mu[idx],sigma=sigma_k, observed=df_agg.revenue_sd.values)

    sigma_obs = k[idx] * mu[idx] / pm.math.sqrt(df_agg.installs.values)

    pm.Normal('y',
              mu=mu[idx],
              sigma=sigma_obs,
              observed=df_agg.arpu.values)

    idata = pm.sample(500, target_accept=0.97)

Then there is a question why it's possible:

pm.Normal('y_sd', mu=k[idx] * mu[idx],sigma=sigma_k, observed=df_agg.revenue_sd.values)

k[idx]*mu[idx] is a standard deviation of user revenue distribution and some math to explain why it's the case:

This line is the direct consequence of the Central Limit Theorem:

# sd / sqrt(n)
sigma_obs = k[idx] * mu[idx] / pm.math.sqrt(df_agg.installs.values)