I’m done with dashboards, and likely you should be too.

Press enter or click to view image in full size

Ok, maybe not completely done, but their common use is both expensive and a waste of time. This comes from a number of factors.

Now, don’t get me wrong, that executive dash board that the CEO wants is still necessary. He’s the CEO, try and convince him he doesn’t need it. It’ll likely be a career limiting move. But that dashboard doesn’t drive and improve the business. It’s just a report card.

The real pay dirt is in the daily operations of the company. More accurately, what’s the customers’ experience today.

In all service activities, customers care about a short list of things: On time, backlog, volume, rate and quality. These are the KPIs we should apply to every key customer related process, or KCRP. If you have customers, you can and should measure these things in your data warehouse. An example, you order something from Amazon. What do you care about? Was it delivered on time? The backlog (number of orders made yet to be delivered) doesn’t affect the customer directly, but if it expands, on time will suffer. Supporting backlog is volume and rate. As a provider, I need to know how many orders are happening and at what rate they’re coming in so I can adjust operations to ensure backlog doesn’t explode. Last is quality. You, the customer, aren’t going to be very happy if the package is all busted up on delivery or if it ends up in your flower bed.

The same is true of all service activities. If we merge process analytics with data analytics, we can find a solution.

To explain the situation, let’s imagine I run an ad-tech DSP. It’s my company’s goal to find publishers on the internet who want to show ads and have quality audience. Then sell those audiences to advertisers. In the end, the advertisers are my customers. They’re looking for those same 5 KPIs on their primary KCRP: ad delivery.

On time: How long did it take between a user going to a website, and the system selected and displayed an ad.
Backlog: How many requests for ads exist at any time that haven’t been displayed yet (this is a sign my system is running slow).
Volume: How many ads have we delivered for this advertiser.
Rate: How fast can we deliver those ads (dwindling audience can make a mess of this).
Quality: Of the ads we deliver, how many users click the ad (this is a sign of audience quality)

At this point, we’re all thinking “no problem, I can build a dashboard to support that.”

Then the problems start to pile up. I’ve got 10 account managers and 3500 advertisers. Each account manager is dealing with 350 advertisers. How are they going to manage them? Likely, they’ll roll into the office, fire up a dashboard to see yesterday’s performance of each of their advertisers. If things aren’t wired up right, they’ll look at the performance of each advertiser over the entire past 30 days to see if yesterday’s performance was remarkable either good or bad. 9 chances out of 10, everything on that dashboard is “normal” and there’s nothing to do.

The result? The account manager just wasted an hour of their day, and this dashboard just fired off a terabyte of IO worth of queries against the data warehouse increasing necessary platform spend and decreasing query response time.

All of this is completely unnecessary. The data warehouse should already know what’s going on with these customers, or I wouldn’t have the data to make the time wasting dashboard. If I’ve driven my data warehouse design based on business processes (what are my Key Customer related Processes and how do I measure those 5 KPIs?) we can automate our way out of this.

The answer is to give the account managers, those that can take action, what they need to take action on and nothing else. No individual contributor can filter through 350 customers every day and accurately decide on which need attention and what action to take. So instead of handing them a dashboard and suggesting “go fish”, we help them out a bit and filter things down to only what requires action. We let the data drive.

For each end user session (people browsing publishers’ content) we track the following metrics:

session_day — the day the session started

campaign_id — which campaign the ad belonged to, advertisers manage campaigns

impressions — how many times we delivered an ad for this campaign to an end user

clicks — how many users clicked ads from this campaign

conversions — how many users took the ideal action (checkout from buying a product etc.) based on an impression

impression_cost — how much we charge for each impression

click_cost — how much we charge for each click

conversion_cost — how much we charge for each click

For any of these metrics, the data warehouse already knows what happened without an account manager going to a dashboard. We can find what’s actionable from a very simple query, relying on standard deviation to compare an advertiser’s performance on any single day to the previous month. If their performance on a given metric exceeds a standard deviation from the norm, either positive or negative, we need to pay attention and investigate what happened.

In this case, we’re looking at conversions, because that’s what the advertisers are most sensative to. So we write the following query to cut through the “normal” noise assuming today is 2022–11–28:

with
 sourceval as (
 select
 sum(dpcs.conversions) as conversions,
 session_day,
 advertiser_id
 from
 daily_campaign_summary dpcs
 inner join advertiser_campaign pc on pc.advertiser_campaign_id = dpcs.advertiser_campaign_id
 where
 session_day > ‘2022–10–28’
 and session_day <= ‘2022–11–28’
 group by
 advertiser_id,
 session_day
 ),
 limits as (
 select
 advertiser_id,
 session_day,
 conversions,
 AVG(conversions) OVER (
 PARTITION BY
 advertiser_id
 ) AS avg_conversions,
 STDDEV_SAMP(conversions) OVER (
 PARTITION BY
 advertiser_id
 ) * .75 as stdev_conversions 
 — I’ve set the threshold to 3/4 of a standard deviation, as the current processes are a bit noisy
 — as time goes by, and individual contributors manage their accounts better, we’ll decrease this to 1/2
 — because the number of actionable rows will decrease. Continuous improvement requires we continue to
 — raise the bar.
 from
 sourceval
 ),
 devients as (
 select
 advertiser_id,
 session_day,
 conversions,
 avg_conversions — stdev_conversions as low_limit,
 avg_conversions + stdev_conversions as high_limit,
 case
 when conversions < avg_conversions — stdev_conversions then ‘LOW’
 else ‘HIGH’
 end as outlier_type,
 avg_conversions
 from
 limits
 where
 (
 conversions < avg_conversions — stdev_conversions
 or conversions > avg_conversions + stdev_conversions
 )
 and session_day = ‘2022–11–28’
 ) select
 advertiser_id,
 session_day,
 conversions,
 avg_conversions,
 case
 when conversions < low_limit then low_limit — conversions
 else conversions — high_limit
 end as variance,
 outlier_type
from
 devients
where
 avg_conversions > 10000
 --this is a bit of an arbitrary filter. New advertisers will come on line and be very noisy
 --until they matue a bit. We assume any advertiser under 10k conversions per day is still
 --just getting going
order by
 variance / conversions desc
-- the order by displays the advertisers we need to pay attention to most, as they've 
-- deviated the most

Just like that, I’ve cut the number of advertisers a single account manager needs to look at to under 30. From there, they can investigate and learn from those succeeding, as well as resolve issues for those running lower than expected. As this process continues, we should expect almost all advertisers to become more “normal” contributing to continuous improvement. When this happens, we tighten the standard deviation envelope.

Toss the results in a report the contributors can access with links to supporting dashboards, then watch your data platform costs drop, contributor productivity increase, and customer retention get longer.

For bonus points, put together a feedback loop, requiring the contributor explain what they found with each anomaly. That way next time this anomaly is seen, we already know how to deal with it.