\( \require{cancel} \require{color} \newcommand{\p}[1]{\left(#1\right)} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \newcommand{\given}{\,\middle|\,} \renewcommand{\Pr}[1]{\mathop{{}\mathbb{P}\mathopen{}\left(#1\right)\mathclose{}}} \)
A concrete 5-minute example to compare & contrast frequentist and Bayesian approaches.
It is assumed the reader is comfortable
with Bayes's rule
and marginal likelihood
(and $\href{https://en.wikipedia.org/wiki/Arg_max}{\argmax}$).
(Simplified from here to avoid
confusing the reader with conjugate priors.
More reading here.)
The Problem
A coin comes up heads with unknown probability $p$. We toss it and observe $h$ heads and $t$ tails.
Given these previous tosses, what is the probability of getting $h'$ more heads ($\Pr{h' \given h, t}$)?
The Problem with the Problem
Obviously, the answer for the base case ($h = t = 0$) is $p^{h'}$, so we need to know something about $p$!
Either we must change the question to something more manageable, or just assume a “prior” $\textcolor{orange}{\Pr{p}}$ for $p$.
Our assumption about the prior is what makes Bayesian approaches subjective.
Frequentist (point estimate) approach: Maximum Likelihood (ML) solution
Use the single most likely value of $p$ to estimate the answer: \begin{align*} \Pr{h' \given h, t} &\textcolor{red}{\approx} \p{\textcolor{green}{\argmax_{p}\,\Pr{h, t \given p}}}^{h'} = \p{\argmax_{p}\,p^h \p{1 - p}^t}^{h'} = \p{\frac{h}{h + t}}^{h'} \end{align*} A couple of examples:
\begin{align*} \Pr{h' = \phantom{0}2 \given h = 10, t = 4} &\textcolor{red}{\approx} \p{10/\p{10+4}}^{2\phantom{0}} \approx \textcolor{red}{\mathbf{51.0}\%} \\ \Pr{h' = 20 \given h = 20, t = 1} &\textcolor{red}{\approx} \p{20/\p{20+1}}^{20} \approx \textcolor{red}{\mathbf{37.7}\%} \end{align*}
Pros: easy to compute; does not require any assumptions regarding $p$.
Cons: does not answer the original question; approximation error is unknown.
Semi-Bayesian (point estimate) approach: Maximum a Posteriori (MAP) solution
Use the single value of $p$ that maximizes the likelihood of the tosses to estimate the answer: \begin{align*} \Pr{h' \given h, t} \textcolor{red}{\approx} \p{\argmax_{p}\,\Pr{p \given h, t}}^{h'} &= \p{\textcolor{green}{\argmax_{p}} \textcolor{green}{\Pr{h, t \given p}} \textcolor{orange}{\Pr{p}} \div \cancelto{(\text{unaffected by }p)}{\Pr{h, t}}}^{h'} \\ \end{align*}
Pros: easy to compute; can account for a non-uniform prior $\textcolor{orange}{\Pr{p}}$
Cons: does not answer the original question; approximation error is unknown; requires $\textcolor{orange}{\Pr{p}}$.
Notice that if all values of $p$ are equally likely ($\textcolor{orange}{\Pr{p}} = 1$), then this is the same as the ML solution.
Fully Bayesian (exact) approach: “Bayesian” solution
If we already know $\textcolor{orange}{\Pr{p}}$, then we can just find the exact solution directly.
There is no need to estimate a single value of $p$ when we can consider all possible values:
\begin{align*}
\Pr{h' \given h, t}
&= \int_{0}^{1} \Pr{h' \given p, \cancel{h, t}} \times \,\Pr{p \given h, t}\,dp && \text{(marginalize)} \\
&= \int_{0}^{1} p^{h'} \times \,\frac{\Pr{h, t \given p} \textcolor{orange}{\Pr{p}}}{\Pr{h, t}}\,dp && \text{(Bayes's rule)} \\
&= \frac{\int_{0}^{1} p^{h'} \Pr{h, t \given p} \textcolor{orange}{\Pr{p}}\,dp}{\Pr{h, t}} \\
&= \frac{\int_{0}^{1} p^{h'} \Pr{h, t \given p} \textcolor{orange}{\Pr{p}}\,dp}{\int_{0}^{1}\hspace{1.35em}\Pr{h, t \given p} \textcolor{orange}{\Pr{p}}\,dp} && \text{(}\href{https://en.wikipedia.org/wiki/Law_of_total_probability}{\text{law of total probability}}\text{)} \\
&= \frac{\int_{0}^{1} p^{h'} \cancel{\frac{\p{h + t}!}{h!\,t!}} p^h \p{1 - p}^t \textcolor{orange}{\Pr{p}}\,dp}{\int_{0}^{1}\hspace{1.35em}\cancel{\frac{\p{h + t}!}{h!\,t!}} p^h \p{1 - p}^t \textcolor{orange}{\Pr{p}}\,dp} \\
\therefore\ \Pr{h' \given h, t}
&= \frac{\p{h + h'}!\,\p{h + t + 1}!}{h!\,\p{h + t + 1 + h'}!} \div \frac{\cancel{\p{h + 0}!}\,\cancel{\p{h + t + 1}!}}{\cancel{h!}\,\cancel{\p{h + t + 1 + 0}!}} && \p{\text{if } \textcolor{orange}{\Pr{p}} = 1}
\end{align*}
In contrast with previous examples, we find that:
\begin{align*} \Pr{h' = \phantom{0}2 \given h = 10, t = 4} &\textcolor{blue}{=} 33/68 \approx \textcolor{blue}{\mathbf{48.5}\%} < \textcolor{red}{\mathbf{51.0}\%} \\ \Pr{h' = 20 \given h = 20, t = 1} &\textcolor{blue}{=} 11/41 \approx \textcolor{blue}{\mathbf{26.8}\%} < \textcolor{red}{\mathbf{37.7}\%} \end{align*}
Conclusion
To obtain exact (Bayesian) solutions, we require extra assumptions regarding the parameters.
Frequentist methods avoid adding extra assumptions by allowing for errors in their solutions.