Author Archives: statsinthewild
My completely uninformed prediction for the election
I’m a guy on the internet. So naturally I have uninformed opinions. And because this is the internet, I’d like to share them with everyone because….well, what’s the point of the internet if I can’t post uninformed opinions? So here I go.
Here’s what I’ve been looking at recently. I took the polls that fivethirtyeight has listed and I took only the polls that have a grade of 2.5 or higher. I aggregated each of those polls to get a single prediction based on pooling all the polls in a sliding 15 day period. I then bootstrapped some error clouds around these predicted lines and you get the following results for swing states (plus a few bonus states) if you use polls for registered voters:

Based on these results, it looks like Pennsylvania and Wisconsin aren’t all that close and Harris has had them locked up for weeks with very little movement at all. Whereas the other swing states, Georgia, Michigan, Nevada, and North Carolina all seem incredibly close. Which gives up basically this map:

With this map all Kamala needs to win is one of Michigan and Nevada OR North Carolina OR Georgia. But if we look at likely voters, the story is different. (How do they determine likely voters? Do they ask respondents? Or do they model that?).
Here, Georgia is safely in Trumps column but the other five swing states are all statistically tied.

This would give us this map:

And this would mean all Trump needs to do to win would be to win any two of Wisconsin, Michigan, Pennsylvania, and North Carolina. But you didn’t come here for insight or rational discussion. This is the presidential race in 2024 where facts don’t matter and numbers are just completely made up. So what do I think will happen? I present to you my official completely uninformed prediction for the 2024 presidential race. I think Kamala wins the popular vote by 3 points and we end up with this map. And she wins by two electoral votes (and then there are 3 faithless electors and Trump wins anyway because….everything is stupid).

In all seriousness, no one knows what is going to happen. And polling is incredibly complicated in 2024. Response rates for polls used to be in the SIXTIES. Today? It’s less than 1 percent. I think about this all the time, think about this question: “Is it even possible for a pollster to reach you?”. For me the answer is 100% no. I don’t have a land line and I’m not answering my cell phone for a number I don’t know. I’m not gonna respond to an email from a pollster. So I genuinely will never be a poll responded. Now think about how true this is for everyone you know. Polling in 2024 is really difficult. That’s why even with all this polling we still have no idea who’s going to win. It is important to note than in both 2016 and 2020 Trump was underestimated in polls. Pollsters have attempted to correct for this by using something called “recall-vote weighting” which “in practice inflates Trump’s support”. So it’s very likely that Trump’s support is being over-estimated in many of these polls. And if it’s even slightly over estimated by even a point or two, Trump is going to lose very badly. But it might work! No one knows!
But I’m not a coward like Nate Silver, so I’m making a prediction: I’m taking Kamala 271 to 267. But also I am a coward and Trump might win. I have no idea what’s going to happen. But, rest assured, it will all end with chaos no matter who wins.
Cheers.
John Stewart is wrong and Joe Rogan is right (mostly).
Just a quick note unrelated to statistics.
Tony Hinchcliffe recently made a “joke” about Puerto Rico that might sway the 2024 election. And People are really mad. Hinchcliffe is being universally trashed (see what I did there) over this. But he has an unlikely defender: Jon Stewart. But Jon Stewart is wrong.
Choosing to think Tony Hinchcliffe is funny or not is a matter of personal taste. (I personally think he’s terrible, but that’s just one old man’s opinion). But the larger issue here is that context matters. Tony Hinchcliffe wasn’t on a comedy stage. He wasn’t on a celebrity roast. He wasn’t recording a comedy podcast. He was at a politically rally (A politically rally, mind you, for a guy who was recently called a fascist by an actual retired US general. That’s no small thing).
The context matters! And Joe Rogan somehow knows this better than liberal hero Jon Stewart. While Rogan also defend Hinchcliffe, he also claims he told Hinchcliffe: “It’s a political rally, and you’re doing jokes like you’re in a comedy club. Don’t do it!”
(Rogan goes on to say: “I’ve gotta tell you, that joke kills at comedy clubs. I don’t like the joke, but it kills.” What the fuck?!!?!? That joke KILLS? It’s genuinely a bad joke. It doesn’t even make sense. I don’t even understand why it’s a joke. The only way I can even conceive of it being funny is if you have some pretty terrible feelings about Puerto Rico and Puerto Ricans.)
My point is this: I’m so sick of these right wing “comedians” spewing the most racist hateful shit you’ve ever heard and then hiding behind: “It’s just a joke” and “People are so easily offended these days”. Fucking. Stop.
Finally, I think Sean O’Connor sums up the entire situation better than anyone possibly could:

Cheers.
I have no idea who is going to win, but I’d rather be Kamala than Trump right now (despite what the markets and poll aggregators are saying).
So, there are less than two weeks until election day and Trump is “leading”. FiveThirtyEight gives Trump a 54% chance to win the election. And betting markets also have Trump as a favorite to win (Polymarket gives Trump a 66.1% chance to win). And Trump might win. But…..I don’t see it. I think I’d rather be Harris right now than Trump. (But also I’m always wrong.)


Let’s take a look at Michigan as an example. Polymarket has Trump with a 53% chance to win and FiveThirtyEight has Michigan tied (with Harris leading by 0.4% in their poll aggregator.


But when you dig into the most recent polls, there is a wildly different story. Take a look at the polls that FiveThirtyEight lists from Oct 16 through today. There are six different polling companies listed here. Two of these six have Harris leading just at about the margin of error (Quinnipiac and Bloomberg). According to FiveThirtyEight’s ranking of pollsters, based on their track record of accuracy, Quinnipiac is 2.8/3 and Morning Consult (for Bloomberg) is 1.9/3. The other four polls listed here are Trafalgar Group, The Telegraph, Patriot Polling, and InsiderAdvantange, which all have Trump tied or leading. One of these, Trafalgar Group, is designated by FiveThirtyEight as “Republican-funded” and known to skew red.

The other three are:
- The Daily Telegraph: A British Conservative newspaper that endorsed Boris Johnson in 2019. (This polls was conducted by Redfield and Wilton Strategies which is rated 1.8/3 by FiveThirtyEight).
- Patriot Polling: A “non-partisan organization” that was literally founded by high school students and has a FiveThirtyEight rating of 1.1/3.
- Insider Advantage: The founder of this company was the on air pollster for Sean Hannity. FiveThirtyEight gives this polling company a 2/3.

So, all I’m trying to say is that the polls that have been flooding the news recently are openly partisan and right leaning. If you look at only the top rated pollster in the rankings from FiveThirtyEight (Siena College), the story from them is different than the Polymarket map in two big ways: Wisconsin and Pennsylvania. Siena College has Harris leading in both those states and kind of comfortably leading in Pennsylvania. In this scenario, it all comes down to Michigan which has flip flopped between blue and red in the last two elections. But factor in that in the Senate race that Slotkin, the Democrat, is leading by 4-5 points, which is advantage Harris. I just have a. hard time imagining a huge number of people voting for Slotkin and Trump. But who knows. I’m always wrong.
The point is this: Boy howdy, this is going to be close. Buckle up and strap in because we aren’t gonna now who won for days (weeks?) after election night. But I’d rather be Harris right now than Trump.

Cheers.
Some thoughts on this election and the polls.
So someone pointed me to this article, “Weaponized Polling Is More Dangerous Than Ever” where the author argues that right leaning pollsters (or “pollsters”) are conducting polls that intentionally make Trump look like he’s in a stronger position than he is. (The author goes on to offer what I consider to be compelling reasons why they are doing this.)
But I wanted to check this for myself. So, I went and got the poll data from fivethirtyeight.com (Click here to download the latest polling data from fivethirtyeight as a .csv file). Fivethrityeight also includes grades for each of the pollsters based on past performance.
The lowest rated polls in fivethirtyeight are:
- McLaughlin – 0.5
- Peak Insights – 0.6
- Research America – 0.7
- The Political Matrix/The Listener Group – 07.
- Trafalgar Group – 0.7
- Hendrix College – 0.9
- Tulchin Research – 0.9
On the other end of the spectrum, fivethirtyeight gives only 8 pollsters their top grade of 3: ABC/Washington Post, Marquette Law School, McCourtney Institute/YouGov, Siena/NYT, The Washington Post, YouGov, and YouGov Blue. Many of the top University polls have just slightly lower ratings like Emerson, Marist, Suffolks, UMass – Lowell (all 2.9 out of 3). In total, there are 158 polls with 80 of them rated 2.0 or higher.
So what I want to look at is how the polls of different quality score the race right now. So here is what I did (with Pennsylvania as an example): I took all the polling data and and removed “duplicate” polls (I don’t quite understand why 538 seems to have these duplicates in their file, but there are records in the data that look exactly the same to me except for the percentage. So, I included only one of these records when there were multiple. If anyone can explain the difference in these records to me, I’m all ears). I then only kept polls consisting of “likely voters”. I then took all those polls over a certain period of time and combined all the polls over these period of time to come up with one single estimate for each candidate (basically mini meta-analysis). I then computed these estimates by filtering on a range of polling grades (i.e. 0.5-1.5, 1-2, 1.5-2.5, 2-3) and I plotted these estimates (with fun little error tails), on a plot. Here is Pennsylvania for August, September, and October, and then September and October pooled together at the end:




What you’ll notice is that the higher the rating on the filter of which polls to include, the higher the estimate is for Kamala, while Trump remains relatively consistent across polls. If you look at the last plot (September and October pooled together), the lowest rated polls have Kamala trailing in Pennsylvania by almost 2 points. The top rated polls have Kamala leading by a point. So, based on this it looks to me like…..Pennsylvania isn’t actually that close. Kamala is consistently leading in the top rated polls and and the polls in 2022 were, according to 538, “historically accurate“. So this is good news for Kamala. But there also appears to be some not so great news for Kamala, too. Let’s look at the other swing states, which I consider to be Michigan, Wisconsin, Georgia, Arizona, Nevada, and North Carolina (North Carolina is a swing state. Fight me.) Let’s start with the rest of the Midwest (Pennsylvania is Midwest. Fight me.)
Wisconsin




We again see the same pattern where higher rated pollsters have Kamala at a higher percentage, exactly like we saw in Pennsylvania. The bad news for Kamala in Michigan is that her lead using only August data was about 1.5, in September it was down to 1, and with the limited data in October, she is trailing in Wisconsin by about 0.5. And we see the same thing in Michigan.
Michigan
Kamala’s lead in Michigan in the top rated polls in August was over 4 points, in September it was down to 1 point, and using only October data from the top pollsters, just like in Wisconsin, she is trailing by 0.5 points. The aggregate of September and October, still has her in the lead, but the trend has to be scary to the Harris campaign. Note that once again we see the lower rated pollsters with Kamala much lower than the highest rated pollsters.





Finally, let’s look at the last 4 swing states. What’s interesting in Georgia is that the top rated polls give Trump more of a lead than the lower rated polls, while Kamala’s percentage remains largely the same across pollster rating groups. Nevada and North Carolina follow the pattern that we saw in the Midwest with higher rated polls giving Kamala a higher percentage. Based on the top rated pollsters, Kamala is slightly leading in Nevada and just barely trailing in North Carolina (which I think Dems can win given the absolute lunatic running for governor).
Georgia

Arizona

Nevada

North Carolina

All this said, I give you an electoral map that I think is totally plausible based on this polling data that I guarantee you will not see anywhere else. Behold:

Kamala wins Pennsylvania. Wins the popular vote by 3 points. And loses the election.
What. A. Nightmare.
Cheers.
A US drought is over
American Frances Tiafoe defeated Dimitrov in the US Open quarterfinals last evening. Earlier yesterday another American, Taylor Fritz, beat Zverev to also advance to the semifinals. They will face each other on Friday guaranteeing an American will make the finals for the first time since 2006 when Andy Roddick made it to the finals and lost to some guy name Roger Federer. In honor of this feat, I wasted all morning making the graphic below. I’ve posted my bat shit crazy code at the bottom. Suggestions on how to do this better are welcome.
Cheers.

#How do I do this better?
#https://www.kaggle.com/datasets/zhongtr0n/country-flag-urls
library(ggplot2)
library(ggimage)
library(tidyverse)
# d <- data.frame(x = rnorm(10),
# y = rnorm(10),
# image = sample(c("https://www.r-project.org/logo/Rlogo.png",
# "https://www.worldometers.info//img/flags/small/tn_af-flag.gif"),
# size=10, replace = TRUE)
#)
# # plot2
# ggplot(d, aes(x, y)) + geom_image(aes(image=image), size=.05)
finals <- read.csv("/Users/gregorymatthews/usopenfinals.csv")
flags <- read.csv("/Users/gregorymatthews/Downloads/flags_iso.csv")
finals <- finals %>% mutate(Country = ifelse(Country == "CRO","HRV",Country))
finals <- finals %>% mutate(Country = ifelse(Country == "SUI","CHE",Country))
finals <- finals %>% mutate(Country = ifelse(Country == "TCH","CZE",Country))
finals <- finals %>% mutate(Country = ifelse(Country == "FRG","DEU",Country))
finals <- finals %>% mutate(Country.1 = ifelse(Country.1 == "GER","DEU",Country.1))
finals <- finals %>% mutate(Country.1 = ifelse(Country.1 == "RSA","ZAF",Country.1))
finals <- finals %>% mutate(Country.1 = ifelse(Country.1 == "SUI","CHE",Country.1))
finals <- finals %>% left_join(flags %>% select(Alpha.3.code,URL), by = c("Country" = "Alpha.3.code"))
finals <- finals %>% left_join(flags %>% select(Alpha.3.code,URL), by = c("Country.1" = "Alpha.3.code"))
finals
sub <- finals %>% filter(Year >= 2000)
sub$size1 <- sub$size2 <- 0.05
sub$size1[sub$Country == "CHE"] <- 0.03
sub$size2[sub$Country.1 == "CHE"] <- 0.03
sub <- sub %>% mutate(`US finalist` = ifelse(Year == 2024 | Year <= 2006,"Yes","No"))
png("/Users/gregorymatthews/test.png",res = 300, h = 10, w = 3, units = "in")
ggplot(sub) + geom_image(aes(x = Year, y = 1,image=URL.x), size=sub$size1) +
geom_image(aes(x = Year, y = 1.1,image=URL.y), size=sub$size2) +
coord_flip() +
theme(aspect.ratio=4) +
expand_limits(y = c(.9,1.2)) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) + ggtitle("US Open Tennis Finalists") +
scale_x_continuous(breaks = seq(2000,2025, by = 1),
minor_breaks = NULL) +
scale_y_continuous(breaks = c(-100,100)) +
annotate("rect", ymin = .9, ymax = 1.2, xmin = 2006.5, xmax = 2023.5, alpha=0.2, fill="red") +
annotate("rect", ymin = .9, ymax = 1.2, xmin = 1999.5, xmax = 2006.5, alpha=0.2, fill="green") +
annotate("rect", ymin = .9, ymax = 1.2, xmin = 2023.5, xmax = 2024.5, alpha=0.2, fill="green")
dev.off()
My Kaggle Elite 8 probs for Men and Women
Lost a few spots yesterday. Currently in 75th. I still need UConn and South Carolina to win in all in the Men’s and Women’s tournament, respectively.

Pre tournament probabilities that the remaining men’s teams would reach the Final 4:
UConn: 56.0% (100% in my aggressive Bracket)
Purdue: 51.8%
Tennessee: 18.2%
Alabama: 10.5%
Illinois: 8.11% (0% in aggressive Bracket)
Duke: 6.23%
Clemson: 2.87%
NC State: 0.87%
Pre tournament probabilities that the remaining men’s teams would reach the Final 4:
South Carolina: 82.5%% (100% in my aggressive Bracket)
Oregon St: 8.09% (0% in my aggressive Bracket)
Texas: 28.0%
NC State: 13.8%
LSU: 18.0%
Iowa: 28.4%
UConn: 27.5% / Duke: 1.37%
USC: 33.4% /Baylor: 4.21%
Cheers.
Kaggle March Madness Update – March 26, 2024
I thought I finally understood the scoring for this contest, but I can’t reproduce my score on the leaderboard. So I have no idea. But I’m at 141 after the first weekend. Unlikely to cash, looking to just finish top 100.

Sweet Sixteen probabilities for women: (Men’s Sweet Sixteen probabilities are here):
South Carolina: 91.9% (100% in aggressive bracket)
Indiana: 5.83% (0% in aggressive bracket)
Oregon St: 50.1%
Notre Dame: 41.9%
Gonzaga: 21.3%
Texas: 56.8%
NC State: 27.9%
Stanford: 65.4%
USC: 60.1%
Baylor: 12.2%
Duke: 5.1%
UConn: 47.2%
Colorado: 17.1%
Iowa: 56.6%
LSU: 34.0%
UCLA: 57.2%
Cheers.
Kaggle March Madness Update
Well, I’ve gone up in the rankings every scoring update. From 233rd to 170th to currently in 126th. Just about top 15%. That’s positive. And with the new scoring system, there is a ton of room to move up (and DOWN) very quickly.



Probabilities for women’s games today (remember these are pre-tournament probabilities that these teams would reach the Sweet Sixteen, NOT single game win probabilities):
Notre Dame: 79.33%
Ole Miss: 17.45%
NC State: 78.33%
Tennessee: 20.37%
Syracuse: 18.67%
UConn: 76.76%
Oklahoma: 26.11%
Indiana: 69.29%
West Virginia: 11.69%
Iowa: 83.15%
Creighton: 9.48%
UCLA: 85.15%
Kansas: 9.57%
USC: 85.8%
Utah: 40.58%
Gonzaga: 55.04%
And here are my probabilities for the men’s games next Thursday and Friday:
Thursday
Clemson: 7.75%
Arizona: 26.47%
San Diego St: 6.04% (0% in my aggressive bracket)
UConn: 71.53% (100% in my aggressive bracket)
Alabama: 21.37%
North Carolina: 51.43%
Illinois: 27.09%
Iowa St: 46.87%
Friday
NC State: 4.52%
Marquette: 38.57%
Gonzaga: 6.86%
Purdue: 69.26%
Duke: 12.87%
Houston: 69.82%
Creighton: 21.28%
Tennessee: 44.22%
In my aggressive bracket I have South Carolina and UConn winning the men’s and women’s titles with probability 100%. If that happens I pick up 0 loss for all those games in all 6 rounds. And 0 loss in the finals would be absolutely massive in reducing my score as it would be a full 1/6th of my score with 0 loss. So let’s go UConn and South Carolina!
Cheers.
March 24 Kaggle Probabilities
Currently in 170th. Better than yesterday.

My Kaggle probabilities for the men’s games today:
(Remember, these aren’t probabilities that these teams will win this particular game, it’s the pre-tournament probability that they would make it to a particular round.)
Colorado: 18.3%
Marquette: 60.27%
Purdue: 84.1%
Utah St: 7.62%
James Madison: 11.67%
Duke: 50.77%
Clemson: 17.55%
Baylor: 53.01%
Grand Canyon: 13.95%
Alabama: 49.01%
Northwestern: 4.51% (0% in my aggressive bracket)
UConn: 88.83% (100% in aggressive bracket)
Texas A&M: 6.36%
Houston: 83.74%
Yale: 3.25%
San Diego St: 31.47%
My Kaggle probabilities for the women’s games today:
(Remember, these aren’t probabilities that these teams will win this particular game, it’s the pre-tournament probability that they would make it to a particular round.)
Colorado: 42.38%
Kansas St: 53.03%
South Carolina: 97.7% (100% in aggressive bracket)
North Carolina: 0.79% (0% in my aggressive bracket)
Ohio St: 77.5%
Duke: 19.72%
MTSU: 2.73%
LSU: 78.24%
Nebraska: 10.95%
Oregon St: 83.56%
Alabama: 10.63%
Texas: 82.88%
Iowa St: 6.96%
Stanford: 89.52%
Baylor: 36.57%
Virginia Tech: 57.18%
Go UConn.
Cheers.
NCAA Tournament Stuff – March 23,2024
The real stories everyone should be talking about are how David Carr beat Keegan O’Toole in the semi-finals of the NCAA wrestling tournament at 165, Carter Starocci had to wrestle former national champions back to back in the quarter- (Mekhi Lewis) and semi-finals (Shane Griffith) and he beat them both, and Drake Ayala of Iowa made the finals keep Iowa’s streak of THIRTY THREE years in a row with a national finalist alive.
But you probably want to read about basketball, huh?
So here’s a fun, simple data viz of wins by seed:
Below is a plot of wins by seed in the first round of the men’s NCAA tournament. The 6 and 8 seeds combined for only two total wins ((6)Clemson and (8)Utah State). At least one of every seed 1 through 14 advanced and seed 9 through 12 went went a combined 9-7. In total the lower seed won 11 of the first 32 games.

As for the Kaggle contest, I’m currently in 233 out of 820 (top 30%). Not great, but also not terrible either. There should be a lot more shuffling of the leaderboards in these later rounds with the changes in scoring, so it’s going to be way more exciting than past contests, I think. At least in terms of chaos.

Here are my probabilities for the women’s games today:
Tennessee: 78.2%
UConn: 100%
Kansas: 55.6%
Indiana: 98.0%
Notre Dame: 99.1%
NC State: 98.4%
Iowa: 100%
Syracuse: 72.0%
Oklahoma: 76.0%
USC: 99.3%
Ole Miss: 72.7%
West Virginia: 63.0%
Creighton: 56.2%
Gonzaga: 96.9%
UCLA: 99.0%
Utah: 76.0%
And here are my Kaggle probabilities for the men’s games today:
(Remember, these aren’t probabilities that these teams will win this particular game, it’s the pre-tournament probability that they would make it to a particular round.)
Arizona: 51.6%
Dayton: 19.5%
Kansas: 47.4%
Gonzaga: 35.1%
UNC: 73.9%
Michigan St: 13.3%
NC State: 14.0%
Oakland: 1.78%
Iowa St: 71.6%
Washington St: 11.9%
Tennessee: 69.5%
Texas: 17.7%
Illinois: 55.5%
Duquesne: 5.83%
Creighton: 47.5%
Oregon: 16.8%
Finally, and most importantly, my numbers (8 and 2) hit in the Utah State-TCU game in NCAA sqaures. So worst case scenario, I only lose half my square buy-in. Lettttttt’s gooooooooooooo.
Cheers.