XKCD’s Frequentist Straw Man

An often shared comic used to poke at the absurdity of frequentist methods and the superiority of Bayesian methods shows a lack of understanding of frequentist methods and statistics in general. The comic is humurous which is often enough to convice the uninformed that it must be true. The problem is it is completely uninformed of statistical practice. This comic has been addressed by other Statisticians, including uber-Statistician Anrew Gelman. But I didn’t find any of the rebuttals very compelling, I think they miss the actual reason why this is a bad setup.

The comic shown above, presents a contrived machine which checks whether the sun has exploded and then rolls two dice. If two sixes are rolled, the machine will report the opposite of the sun’s true state, otherwise it will report the true state. The detector rolls the dice and reports that the sun has exploded. The bumbling frequentist amazed by the small probability of the detector rolling two sixes concludes that the sun must have actually exploded. The Bayesian being a reasonable person, concludes the machine rolled two sixes and the sun must still exist.

Taken on its face this seems like damning behavior for the frequentist community. Who would possibly want to behave so foolishly bound by the methods of frequentists? The problem is no frequentist would ever act like this, and not because they are secretly Bayesians.

The main issue is that frequentists do not perform statistical tests on the state of random variables. The second is that if these two gentlemen did in fact make a bet after performing inference as they would actually perform them, the frequentist would take the Bayesian’s lunch under all but a single set of priors. In this particular situation the frequentist would actually use Bayes rule to determine the conditional probability of the sun having exploded given the detector’s response, and it would be completely within the bounds of performing frequentist inference. Remember, using Bayes rule does not mean you are a Bayesian.

Statistical Tests, Unknown Parameters, Conditional Probability

The implied null hypothesis under test is that the sun has not exploded. What is interesting here is that the test is of the state of a random variable, not an unknown static parameter as is normally the case in null hypothesis significance testing (NHST). This is truly the rub here, XKCD has smuggled in a random variable to be tested, allowing him to ignore the conditional probability calculation that would be reasonably done by both frequentists and Bayesians.

As described there are two random variables, the sun exploding on any given night variable, and the rolling of two dice variable. The unknown parameters here are the probabilities of the sun exploding on any given night, \(\theta_s\), and the probability of rolling two sixes, \(\theta_d\). By making the state of the sun the hypothesis under test, XKCD is able to ignore the unknown \(\theta_s\) parameter for the frequentist, while it would feature prominently in the Bayesian calculation - if done correctly and not in some deranged caricature of the methods.

As setup by XKCD the p-value is derived from the conditional probability of observing the detector reporting: “exploded”, given that the sun has not exploded \(P(\text{"exploded"}|S=\text{🌞})=\frac{1}{36}\). However, we cannot look at this probability and conclude the the sun has exploded. You cannot assume a random variable takes on a particular value without taking into account the proability of it taking that value. This is not Bayesian, this is basic conditional probability.

This is very important, the state of the sun is not an unknown static parameter. It is itself a random variable with an unknown probability of occurence. The inference needs to be on the probability of occurence, not on its value on any particular day. So why does XKCD decide the frequentist would perform NHST on a random variable? To make fools of frequentists of course.

We can perform NHST on an assumed value for a static unknown parameter because there is no probability of it being one value or another. There is no possibility of it changing so we don’t need to take this into account.

Let us see how a frequentist would actually approach this problem, by estimating the unknown parameters then using them with Baye’s rule to determine the conditional probability.

Frequentist Solution

The first step is we need to estimate the probability of the sun exploding on any given night. Given that the earth is 4000 years old and we have observed the sun not explode on every single night we have \(\theta_s = \frac{4000 * 365}{4000 * 365} = 100\%\) probability of the sun not exploding. The wiley Bayesian of course would need to use some principled Beta prior for \(\theta_s\), then being unable to simplify the conjugate posterior due to MCMC brain rot, bust out Stan to sample from the posterior and ah crap forgot to set <lower=0, upper=1> ok now we can….

While we wait for the Bayesian to summarize their csv of posterior draws lets continue on in our inference. What we will all have here is estimated parameters, \(\theta_s\) and \(\theta_d\), and we want to calculate the probability of the sun being exploded given the observation of the detector saying it did.

Our observation space is {"exploded", "not exploded"} reported from the detector/robot, as well as two latent random variables, the state of the sun and the dice.

We have the following probabilities:

\[ \begin{aligned} & S \coloneqq \text{sun}, D \coloneqq \text{dice}, R \coloneqq \text{robot} \\ \\ & P(S=\text{💥}) = 0, P(S=\text{🌞}) = 1 \\ & P(D=\text{⚅⚅}) = \frac{1}{36}, P(D=\text{!⚅⚅}) = \frac{35}{36} \\ \\ & P(R=\text{"exploded"}|S=\text{🌞}) = P(D=\text{⚅⚅}) = \frac{1}{36} \\ & P(R=\text{"exploded"}|S=\text{💥}) = P(D=\text{!⚅⚅}) = \frac{35}{36} \\ & P(R=\text{"exploded"}) = P(S=\text{🌞})P(R=\text{"exploded"}|S=\text{🌞}) + P(S=\text{💥})P(R=\text{"exploded"}|S=\text{💥}) \\ & = 1 * \frac{1}{36} + 0 * \frac{35}{36} = \frac{1}{36} \\ & \end{aligned} \]

Given these probabilities we can then use Baye’s rule to calculate \(P(S=\text{💥}|R=\text{"exploded"})\). Doing so does not make one a Bayesian, what distinguishes Bayesians from frequentist here is how the above probabilities were calculated, not what we do with them once they are calculated.

\[ \begin{aligned} & P(S=\text{💥}|R=\text{"exploded"}) = \frac{P(R=\text{"exploded"}|S=\text{💥})P(S=\text{💥})}{P(R=\text{"exploded"})} \\ & = \frac{\frac{35}{36}*0}{\frac{1}{36}} = 0 \end{aligned} \]

Given the frequentist inference we have a \(0\%\) probability of the sun having exploded given our observation of the detector. The irony here is that any odds offered by the Bayesian - unless they used a 100% prior on not exploded - will be advantageous to the frequentist. It is easy to see this if we replaced our 0/1 probabilities with probabilities derived with Laplace’s rule of succession, which was actually originally proposed as a solution to estimating the probability of the sun rising the next day. This is sort of a Bayesian-lite method. Now our probabilities will be very near 0/1, but not quite equal. Either way, the frequentist and Bayesian will get very similar results and should be utterly confused by whatever XKCD decided to do.

Summary

The problem is assuming a random variable takes on a certain value, without taking into consideration the probabilities of the random variable when constructing the null and alternative hypothesis. This is why we use Bayes rule, because we are dealing with random variables and not static unknown parameters. By attempting to run NHST on a random variable XKCD has tried to show that a frequentist would ignore the base rate of occurence of the now static random variable in the null hypothesis. The problem is that a frequentist would not do that, no one would. If anything all this shows is that people don’t really understand statistical practice, and rather than try to understand it they pick a side and then use whatever garbage reasoning they can to make the other side look bad.