The Lost Art of Baseball's Makeup Call

Good morning everyone,

For those who have been around for a whole year, you know that April Fool’s Day is the only day in the year where The Spade doesn’t cover football. Last year we covered basketball in an attempt to find the NBA’s most efficient shooters from beyond halfcourt. Check it out here. Today we’re turning to baseball for this special article, and we’re unearthing a metric that is going the way of the dodo this season. In light of the automated ball/strike (ABS) system taking hold in the MLB this year, I wanted to analyze the almost extinct act of an umpire’s makeup call. A makeup call is defined as another errantly called ball/strike right after one occurs.

Phillies performance aside, I’ve been enjoying the first week of ABS challenges. My initial kneejerk reaction is that there should be more ABS involvement, to relieve the pressure off of home plate umpires so they can focus on making correct calls in other non-automated areas of the game. Umpire Scorecards have added ABS challenges to their tweets too, to improve on an already flawless data visualization. Today we’re going to dig deep into 9 seasons of baseball pitch data to see if we can find old opportunities for makeup calls in baseball games, things that for either emotional or technological reasons may be leaving this world soon. Umpires know that a mistake has happened, they called a strike a ball or a ball a strike. To set things even, did they flip flop immediately after to make the count ‘right’ again? Let’s find out.

This is a data analysis purely rooted in anecdotal evidence. I wanted to see if the data matches what my eyes have seen multiple times. Happy April Fool’s Day, and I hope you enjoy this.

Holy cow, this much baseball data has been free and accessible this entire time? I didn’t even realize. To me though, that’s no fun. I’ll be returning to the football data mines next week but this week we’re diving into the deep well of baseball stats.

There’s a whole Python package directly synced with Baseball Savant called pybaseball. I found some pitch by pitch data that allows me to create some visualizations similar to what you see on MLB broadcasts, with a classic strike zone and home plate look. I’ve also grabbed every home plate umpire using statsapi - the MLB’s official public data source. Given data on every called ball or strike from 2017 to 2025, I’m using SQL LEAD() and LAG() functions to locate makeup calls, which we’re defining as a ball called as a strike immediately after a strike was errantly called a ball, or vice versa. We’re looking for moments where hitters took two pitches in a row, an umpire made a mistake on pitch 1, then made up for it on pitch 2. This is slightly flawed, of course. I may have accidentally queried moments where home plate umpires are just flustered. But let’s still see where this goes. It is April Fool’s Day after all, I have the cloak of the joke to fall behind if these numbers are wonky.

So here’s our strict criteria for looking into potential makeup call territory. We’re looking for:

Two taken pitches in a row from the same pitcher to the same batter. Where…
- Pitch 1 was errantly called a ball or strike. And…
  - Pitch 2 was errantly categorized in the other direction by the umpire. (ball to strike or strike to ball sequences)

Here’s every instance of a makeup call within 1 inch of error:

Made a nice little baseball edition of The Spade logo for this one. This plot is busy, here’s what happens if we filter these makeup calls down even further to within 2 inches of error:

This one here is showing the frequency of makeup calls when an umpire misses by 2+ inches. If they called a ball a strike that was over 2 inches outside the zone, they made up for it on the next pitch with an opposite ruling.

Apparently the larger the flub, the more likely there is to be a make-up call. When comparing each error distance, that makes itself apparent:

The gray bars show how often you’d expect a makeup call to occur just by chance. The blue bars show how often they actually occur grouped by error margin. Slight differences, but still.

Is there any correlation between the number of errant calls and makeup call rate by umpire? Because technically a makeup call is a stacking of errors, right? Let’s see:

R-squared of .15? That means no.

Here are some makeup call scorecards for names you’ve heard before:

CB Bucknor:

Angel Hernandez:

And Pat Hoberg, who you might remember for calling that perfect game in the World Series back in 2022, then subsequently being banned from umpiring for betting:

That’s about all I have for today. I hope you enjoy April Fool’s Day, have a great day, everyone - RC

I lied. I have a Cam Newton Google Translate clone for you to mess around with today too. This is technically writing about football, right? Check it out.

The Spade is an entirely reader-supported weekly football analytics newsletter covering topics like the NFL, College Football, and all things in between. Recently, we covered college football recruiting/portal splits and NFL Draft Combine/Pro Day Shapeshifters. If any of that sounds interesting to you, I’d love to have you along for the ride as a subscriber. Thanks for all the support.