Fun with Benford’s law: Election 2016 edition: What’s up with Iowa and Mississippi
Update 11-11-2020: I’ve had a lot of comments on this post since Biden beat Trump in the 2020 election. And I’ll probably mess around with the data when the final totals are certified. Until then, @standupmaths, posted an excellent video explaining Benford’s Law in relation to the 2020 election. He explains when Benford’s Law can be applied and when it can’t, and he gives a great example of data from Chicago (the best city in America). In the video, he also mentions a paper, which I have downloaded and posted here for everyone to read because academic work shouldn’t sit behind paywalls: benfords_law_and_the_detection_of_election_fraud. Once again, as I said in the original post, there is no evidence of election fraud in either 2016 or 2020. The only fraud in these election is Donald Trump who never won the popular vote. #sickburn Fuck Trump.
Before I begin this post, I need to make it clear to the conspiracy lunatics out there that this is not evidence that the 2016 election was rigged. As of right now, there is basically no evidence that this election was anything other than a massive, but fair, fuck up by the American people. Could the election have been rigged? Sure. Anything, no matter how unlikely is possible in the world we live in now. (I mean Donald Trump a racist, xenophobic, misogynist, was elected president of the United States of America, which was basically an impossibility like 3 weeks ago.) But let me again say there is no evidence that the election was rigged. And that includes this post.
Ok, now that we have that out of the way, let’s talk about Benford’s Law (tip of the hat to @mulderc for this idea). Benford’s law states that in a list of numbers the leading digits does not appear uniformly. The digit 1 is expected to be first about 30% of the time, while the digit 9 is expected to be first only about 4.5% of the time. Specifically, for a digit d between 1 and 9, the probability that number appears first is given by the following formula:
So let’s apply this to the 2016 election. I downloaded data on the 2016 election at the county level from here. Using all of the data for each of the candidates I get the following two plots. The height of the bar is what is actually observed and the red dots are what is to be expected by Benford’s law. It really is amazing how well this distribution fits the data.
And if we go back to 2012, we see exactly the same thing. Amazing. Benford’s Law seems so counterintuitive, but it’s observed in so many different places.
Next I wanted to look at individual states. This is problematic for at least a few reasons. Most notably, there are a some states that have very few counties in them (e.g. Massachusetts, Alaska, etc.). So I went ahead and tested each state with at least 30 counties individually to see if their votes followed Benford’s Law. I also went a step further, as Benford’s law can be extended to the first two digits, three digits, etc. Below are the results of individual state goodness-of-fit tests for Benford’s law for 1-4 digits using a Bonferroni correction to control the family-wise error rate (FWER). States where the null hypothesis is rejected for Trump and Clinton are colored red and blue, respectively. States where the null is rejected for both Clinton and Trump are colored purple. I’m pretty sure I shouldn’t even be doing a Benford’s goodness of fit on the first 3 or 4 digits when I only have 30 observations. But I did it anyway. I’d pay more attention to the plots for the digits 1 and 2. On those plots we see that the null hypothesis for Trump’s totals in Iowa and Mississippi was rejected for 1 digit and for 2 digits the null was rejected in Mississippi only. Let’s go look at Iowa and Mississippi in more detail.
If we look at Trump’s and Clinton’s vote totals in Iowa, we get the plot below. This is significantly different than Benford’s Law with a p-value of 0.0205.
Next I looked at each candidates vote totals individually. The departure from Benford’s Law is entirely driven by Trump’s vote totals. Trump has way less 1’s than expected and more than expected for 2 through 5.
Before you go flipping out about how this is evidence of election fraud, you should look at Iowa from 2012. Basically we see the same thing with the Republican candidate. Too few 1’s and more 2’s and 4’s than we expect. My guess as to what is happening here is that the types of counties that Republicans are winning in Iowa are not expected to follow Benford’s law? Is that plausible? But I’d love to hear other ideas as to what is happening in Iowa.
Now let’s look at Mississippi. When we look at Clinton and Trump together, there is nothing significant. Though we do see far fewer 1’s than expected, just like in Iowa.
When we look at Trump and Clinton individually, we see that Clinton’s vote totals are not significantly different than Benford’s Law expects, but Trump’s are very different again with far too few 1’s and way too many 4’s.
Finally, here are the plots for test for 2012 using a Bonferroni correction to control the FWER. Iowa shows up again when d=1, but Oregon shows up when d=2.
In conclusion, Benford’s law is fun and there’s something weird about Iowa and Mississippi.