Lies, Damn Lies, and Trump Stats

The White House just posted this tweet:

The “readers added context” gets this correct as a misrepresentation. It’s actually only a 1.2% increase. The issue is a truncated y-axis. Don’t do this with bar charts!

Preserved here if this ever gets taken down:

Boy, this Donald Trump fellow sure does seem dishonest.

This reminds me of the famous Fox News graph “If the Bush Tax Cuts Expire”.

Man, it’s almost like all this dishonesty is coming from one side and they are doing it on purpose……

Cheers.

TIL: facet_grid edition

Today I learned how to get rid of the pesky “1” on the top of a ggplot facet_grid if I want to stack the plots vertically. Use facet_grid(rows = vars(group)). See my example below:

#Ugh. Why is there a 1?
rbind(PSold %>% filter(Age == "65+") %>% mutate(STI = "Syphilis"),
GCold %>% filter(Age == "65+") %>% mutate(STI = "Gonorrhea"),
CTold %>% filter(Age == "65+") %>% mutate(STI = "Chlamydia")) %>%
ggplot(aes(x = Year, y = Rate)) +
geom_point() +
geom_line() +
theme_bw() +
facet_grid(STI~1, scale = "free_y")
#Get rid of the pesky 1
rbind(PSold %>% filter(Age == "65+") %>% mutate(STI = "Syphilis"),
GCold %>% filter(Age == "65+") %>% mutate(STI = "Gonorrhea"),
CTold %>% filter(Age == "65+") %>% mutate(STI = "Chlamydia")) %>%
ggplot(aes(x = Year, y = Rate)) +
geom_point() +
geom_line() +
theme_bw() +
facet_grid(rows = vars(STI), scale = "free_y")

Cheers.

The University of Nebraska Board of Regents is a dumpster fire

When I was a kid, I used to think that if someone was in a high up position that they must be really smart and competent. Now that I’m in my 40s, it seems like it’s almost the opposite. Most of the people I meet or read about who are in positions of leadership, I’m wildly unimpressed by. But I could not possibly imagine the level of incompetence of the Board of Regents for the University of Nebraska.

They recently cut their statistics program along with a few others based on metrics showing these program were underperforming, but there really weren’t many details of what these metrics were or how they were computed. However, AMSTAT News recently published an interview with two professors from the University of Nebraska-Lincoln on what happened (You can read the full article here). In that article, they detail some of the flaws in the calculation of these metrics, but one of these flaws stands out above all the others as the peak of dip shittery.

So, one of the metrics the Board of Regents used in their calculation was graduation rate. Graduation rate is the number of graduates divided by number of majors. Simple enough. Now for the statistics department, this number was 0%. Why? Because the department hadn’t yet existed for 4 years.

Here is the relevant paragraph from the AMSTAT article:

And just as a nice added middle-finger to the statistics department, the Board of Regents decided to make their own definition of what an outlier is.

There is no nice way to say this: these people are world class unimpressive.

Cheers.

Smoothing Splines.

AI fails again

I was trying to fit a mixed effects model and I waned to compare the model with the same fixed effects with and without a random intercept in an ANOVA. So I wasn’t sure how to do this using lmer from the lme4 package. So I googled it:

It gave me the following answer with code:

# Example data
data <- data.frame(
  response = rnorm(100),
  fixed_effect_1 = rnorm(100),
  fixed_effect_2 = rnorm(100)
)

# Fit a linear mixed model with no random effects
model <- lmer(response ~ fixed_effect_1 + fixed_effect_2, data = data)

# Display the model summary
summary(model)
I was pretty sure this wouldn't work, but I tried it anyway.  And it doesn't work.  You can't fit an lmer model without specifying random effects.  

For the record, I’m not sure my question even makes sense. I think I should be comparing these models with AIC rather than ANOVA, now that I think about it for more than a second.

Anyway,

Cheers.

However bad you think the Rockies are, they are worse. Even worse than the White Sox last year (so far).

Update: They swept the Marlins and are now on a 3 game winning streak and sit at 12-50. They are 25 games out of first place with 100 games to play.

Update: They won last night and are now 10-50 as of noon on June 3.

The Rockies are 9-50. That’s 50 losses in 59 games. A winning percentage of .153. That’s the fastest any team has gotten to 50 losses since 1901. But didn’t we just get a new worst ever team last year in the Chicago White Sox? Yup! The Chicago White Sox lost 121 games last year, the most ever. And the Rockies are currently on pace to shat. ter. the White Sox loss record from last year. They are currently on pace to go 25-137. How bad is this? At this point in the season, the White sox were 15-44. SIX games ahead of where the Rockies are right now. To see this, let’s look at some data viz that I made.

Below is a plot of game number versus cumulative wins. Up until about game 30 the Rockies were merely just as bad as the White Sox. Since then they have really stepped down their game thanks to multiple 8 game losing streaks. Speaking of streaks, let’s take a look at that.

Below, you’ll find a plot of game number on the x axis and the win/loss streak on the y-axis. Positive numbers are winning streaks and negative numbers are losing streaks. You can see that the Rockies have already had FOUR 8 game losing streaks. But even more impressive than that is that they’ve only won back to back games a single time this season. Let me repeat this: Their largest win streak of the season is 2 games and it’s happened exactly once. The White Sox last year has winning streak of 2 or more 9 times total and 4 up to this point in the season,. They even had a nifty little 4 game winning streak (They also had losing streaks of 12, 14, and 21(!!!) games).

Now let’s take a look at scoring for the Rockies. Below is a 2d contour plot of for the runs for and runs against for every game from the White Sox 2024 season in the left panel and every game of the current Rockies season. The horizontal and vertical lines are the mean number of runs for and against and the diagonal lines shows whether the game was won or lost by the respective team (above the line is a loss, below the line if a win). What’s really interesting about this is that in the entire season last year, the White Sox only score 10 or more runs TWICE. The Rockies have already done this 3 times. Also, the most runs that the White Sox gave up all season was 14 runes. The Rockies have given up more than 14 runs twice already. (a 17-2 loss to the Brewers and a 21-0 loss to the Padres).

Here is what these plots look like on top of one another. What you’ll notice is that it’s hard to see the vertical line for the White Sox (i.e. their mean number of runs) because it’s nearly identical to the Rockies. The White Sox average 3.13 runs per game last year and the Rockies are just slightly below that at 3.12 runs per game. But where Colorado really “shines” is their defense. While the White Sox gave up an average of just over 5 runs per game (5.02, to be exact), the Rockies are currently allowing, and I can’t believe this is true, 6.25 runs per game. That 1.23 more runs per game on average than the worst team in the modern history of baseball. Incredible work.

If you’re a median type of person, I computed those two. The White Sox last year scored a median of 3 runs and allowed a median of 5 runs. For the Rockies this year, they are scoring a median of 2 runs and allowing a median of 6 runs.

Anyway, the point is that the Rockies are [really]+ bad.

My code is below.

Cheers.

rock <- read.csv("/Users/gregorymatthews/Dropbox/statsinthewild/rockies2025_20250602.csv")
ws <- read.csv("/Users/gregorymatthews/Dropbox/statsinthewild/whitesox2024.csv")

names(rock)[1] <- names(ws)[1] <- "gameno"
rock$streak_num <- as.numeric(paste0(substring(rock$Streak,1,1),nchar(rock$Streak)))
rock$W <- as.numeric(unlist(lapply(strsplit(rock$W.L.1,"-"),function(x){x[1]})))
rock$L <- as.numeric(unlist(lapply(strsplit(rock$W.L.1,"-"),function(x){x[2]})))

ws$streak_num <- as.numeric(paste0(substring(ws$Streak,1,1),nchar(ws$Streak)))
ws$W <- as.numeric(unlist(lapply(strsplit(ws$W.L.1,"-"),function(x){x[1]})))
ws$L <- as.numeric(unlist(lapply(strsplit(ws$W.L.1,"-"),function(x){x[2]})))
both <- rbind(rock,ws)

library(tidyverse)
library(teamcolors)
small <- teamcolors %>% filter(name %in% c("Chicago White Sox","Colorado Rockies"))
ggplot(aes(x = gameno, y = W,color = Tm), data =both) +
  geom_path() + 
  theme_bw() +
  scale_color_manual(values = c(small$secondary[1],small$primary[2])) + 
  xlab("Game Number")

ggplot(aes(x = gameno, y = streak_num, col = Tm), data = both) + geom_point() + theme_bw() + facet_grid(Tm~1) + scale_color_manual(values = c(small$secondary[1],small$primary[2])) + xlab("Game Number")

ggplot(aes(x = R, y = RA, color = Tm), data = both) +
  geom_jitter() + 
  geom_density2d() +
  scale_color_manual(values = c(small$secondary[1],small$primary[2])) + 
  theme_bw() + geom_abline(slope = 1, color = rgb(0,0,0,.5)) + 
  geom_vline(aes(xintercept = R, color = Tm), data = both %>% group_by(Tm) %>% summarise(R = mean(R),RA = mean(RA))) + 
  geom_hline(aes(yintercept = RA, color = Tm), data = both %>% group_by(Tm) %>% summarise(R = mean(R),RA = mean(RA))) + coord_fixed() +

both %>% group_by(Tm) %>% summarise(median(R),median(RA))
both %>% group_by(Tm) %>% summarise(mean(R),mean(RA))

both %>% filter(Tm == "COL") %>% pull(R) %>% table()
both %>% filter(Tm == "CHW") %>% pull(R) %>% table()



Kaggle Probs for sweet sixteen games.

Men’s

Alabama over BYU – 72.1%

Duke over Arizona – 90.3%

Michigan St. over Ole Miss – 78.0%

Tennessee over Kentucky – 76.3%

Florida over Maryland – 90.3%

Texas Tech over Arkansas – 62.4%

Auburn over Michigan – 77.0%

Houston over Michigan – 77.0%

Women’s

South Carolina over Maryland – 96.2%

Duke over North Carolina – 72.1%

UCLA over Ole Miss – 95.8%

LSU over NC State – 29.6%

Texas over Tennessee – 91.8%

TCU over Notre Dame – 50.0%

USC over Kansas St. – 96.2%

UConn over Oklahoma – 96.2%

Cheers.

Kaggle Probs for today’s games – 3/22/2025

Women

Iowa: 80.8%

UConn: 100%

Alabama: 88.5%

West Vriginia: 86.4%

NC State: 100%

Oklahoma: 100%

USC: 100%

Oklahoma St: 72.8%

Maryland: 91.3%

Michigan St: 66.4%

North Carolina: 100%

California: 55.7%

Illinois: 55.7%

Florida St: 66.4%

Texas: 100%

LSU: 100%

Men

Purdue: 58.3%

St. John’s: 82.4%

Texas A&M: 50.0%

Texas Tech: 62.4%

Auburn: 78.3%

Wisconsin: 51.0%

Houston: 73.9%

Tennessee: 82.4%

Cheers.

NCAA Probs

Yo. It’s been a while.

Here are my win probabilities for the games today from my kaggle entry this year.

Creighton: 40.6%

Purdue: 75.7%

Wisconsin: 92.4%

Houston: 100%

Auburn: 100%

Clemson: 75.7%

BYU: 59.36%

Gonzaga: 72.8%

Tennessee: 100%

Missouri: 72.8%

UCLA: 69.7%

St. John’s: 94.3%

Michigan: 59.4%

Texas Tech: 89.999566%

Kansas: 69.7%

Texas A&M: 75.7%

Cheers.

Live blogging the 2024 presidential election

2:54am Central Time:

So, uh, what went wrong in that Selzer poll?

10:32pm Central Time:

Just got back from my run. What are we doing America? What the fuck are we doing? The next four years are going to be so bad. So much worse than you can even imagine. It’s just. so. stupid. Good night. I’m gonna go watch a documentary about aliens and then smash my face through a window.

9:24pm Central Time:

I still haven’t gone running. But I haven’t checked the race in a while. It looks much worse than it did before. What the fuck are we doing America? You think this guy is gonna save the economy? Jesus. Christ.

8:27pm Central Time:

I’m going running.

8:18pm Central Time:

It’s just past quarter past 8. Nothing terribly surprising has happened so far. NYT gives Trump a 66% chance to win.

7:54pm Central Time:

Is there ANY other election in the world where the person with the most votes can lose the election? Or is it only the election to pick the most powerful person in the world?

7:44pm Central Time:

The NYT gives Trump a 53% chance to win Pennsylvania. But that prediction right now is based only on pre-election polls and their “model”. Nothing to do with actual results yet. Why even report this?

7:42pm Central Time:

I still think Harris is gonna win Pennsylvania. But it’s gonna be really, really close.

7:36pm Central Time:

Sometimes I reflect on how stupid it is that a few thousand people in Pennsylvania who probably can’t find Canada on a map get to decide who the most powerful person in the world is. Everything is so stupid.

7:35pm Central Time:

Virginia looks to be much closer than it was in 2020. That’s a good sign for Trump. How wild would it be if just everyone got this race completely wrong and Trump wins a place like Virginia and Harris wins something like Iowa or Ohio?

7:27pm Central Time:

In 2023, the Census estimates that about 44000 people from Georgia moved to Florida, around 26000 Michiganders moved to Florida, and about 28000 Pennsylvanians moved to Florida. And something like 9000 people from Wisconsin moved to Florida. Again, based on nothing, I’m guessing thats a vast majority of these people were right leaning. Would it make sense then that all these states get slightly bluer and Texas and Florida get slightly redder?

7:16pm Central Time:

Crazy theory based on nothing: Enough right leaning voters moved to Texas and Florida from swing states in the last 4 years that Trump’s margin in Florida and Texas will beat his margin’s in those states relative to 2020. But all those people leaving give Harris safe wins in swing states like Pennsylvania, Michigan, and Wisconsin (and Iowa?).

7:08pm Central Time:

New York Times has a much better data presentation than NBC News for election results. Great work NYT.

6:55pm Central Time:

Florida — already America’s inner thigh — continues to get redder and redder. And also it’s shifted further right. (“How the pandemic turned Florida red”)

6:44pm Central Time:

I’m seeing the same thing in Georgia as I saw in Indiana: Substantial shifts towards Harris in the suburbs.

For reference, Harris won Douglas County by 25.1% in 2020 and Rockdale by 40.77%. That’s about +7 and +11 with about 70% reporting. Trump in 2020 won Houston, Bartow, Oconee, and Troup by 12.42%, 50.67%, 33.47%, and 21.84%, respectively. That’s shifts of about 1, 0 , -2, and -5 for Trump. He’s about the same in counties he won last time in Georgia, but there are pretty big shifts in the Atlanta suburbs.

6:33pm Central Time:

Trump up 23-3. Kentucky, Indiana, and West Virginia for Trump; Vermont for Kamala. (Can we stop and think about how fucking weird Vermont is for a second?)

6:28pm Central Time:

Going back to Hamilton and Boone County. In 2016, Trump won Boone county by 29.12% and Hamilton by 19.32%. (Note: Boone has 91% of counties reported and Hamilton has 65%). But Trumps support in the Indianapolis suburbs has absolutely cratered since 2016.

6:18pm Central Time:

And…..they’ve called Vermont. Harris is on the board with 3 points. Trump leads 19-3.

6:16pm Central Time:

The north suburbs of Indianapolis are shifting about 8ish points towards Harris so far. That feels like a big deal to me.

  • Boone County 2020: Trump by 18.38%
  • Boone County 2024 (91% reporting): Trump by 10%

  • Hamilton County 2020: Trump by 6.78%
  • Hamilton County 2024 (65% reporting): HARRIS by 1.2%