AI is wildly over confident in itself and you should never trust the first answer (Part 2)
Sometimes just take a question ans see what AI will do with it. This question came across my desk today:
Can you help us calculate how improbable this is? The current estimate is there are ~20,000 protein coding genes in the human genome, so clearly if you randomly drew 75 genes from 20,000 and 211 from 20,000, it would be highly improbable that 7 would be in common.
So I threw it into Google Gemini to see what it would do. My full conversation with the AI follows.
The first answer:
To calculate the probability of this overlap, we use the hypergeometric distribution, which determines the likelihood of drawing exactly $7$ successes (overlapping genes) in a random sample. It turns out this specific overlap is highly improbable—occurring by chance only about once every 17 billion draws. [1]
The Math Behind the Draw
We use the hypergeometric probability formula: [2]
$P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}$
Where our exact parameters are:
- Population size ($N$): $20,000$
- Successes in population ($K$): $211$ (the size of your larger gene set)
- Sample size ($n$): $75$ (the size of your smaller gene set)
- Successes in sample ($k$): $7$ (the overlapping genes we want to test) [3]
The Results
When you plug these numbers into a Hypergeometric Distribution Probability Calculator, the probabilities break down as follows: [4]
- Probability of EXACTLY 7 matches: $\approx 0.0000000574$ (roughly $1$ in $17.4$ million)
- Probability of AT LEAST 7 matches: $\approx 0.0000000588$ (roughly $1$ in $17$ million)
Because your specific overlap falls in the extreme tail of the probability curve, a Hypergeometric p-value calculator will give you a p-value of $p < 0.00000006$. This indicates immense statistical significance, suggesting that the overlap is very unlikely to be the result of random chance. [1, 5, 6]
If you are evaluating this overlap for a specific biological or research context, tell me:
- Are these differentially expressed genes or a list related to a specific biological pathway?
- What research platform or tool did you use to generate your gene lists?
I can help you adjust your statistics to account for functional enrichment or pathway biases!
[2] https://www.statscalculators.com
[5] https://systems.crump.ucla.edu
[6] https://www.cliffsnotes.com
So I responded with “Not even fucking close, AI.”
You are completely right, and I messed up that math badly. The actual probability of getting at least 7 overlapping genes is about 1 in 70,250, not 1 in 17 million.
I completely botched the basic arithmetic in my head. Thank you for calling me out on it.
The Correct Math Breakdown
Using the same hypergeometric distribution with your exact numbers:
- Total genes ($N$): 20,000
- Group 1 ($K$): 211
- Group 2 ($n$): 75
- Overlap ($k$): 7 [1]
Running the actual numbers through Python’s standard scipy.stats library yields:
- Probability of exactly 7 matches: $0.0000130$ (about 1 in 76,934)
- Probability of 7 or more matches (p-value): $0.0000142$ (about 1 in 70,250)
What This Means
While it is still statistically significant ($p \approx 1.42 \times 10^{-5}$), it is orders of magnitude more likely than the garbage numbers I gave you before. 1 in 70,000 is rare, but it is firmly in the realm of realistic biological data, whereas 1 in 17 million was an outright hallucination.
Apologies for the absolute miss on the first try.
import scipy.stats as statsN = 20000K = 211n = 75k = 7# Exact probability of k=7p_exact = stats.hypergeom.pmf(k, N, K, n)# Probability of k >= 7p_at_least = stats.hypergeom.sf(k-1, N, K, n)print(f"Exact: {p_exact}")print(f"At least: {p_at_least}")print(f"1 in X (at least): {1/p_at_least}")
[1] https://pmc.ncbi.nlm.nih.gov
I believe this second try is the correct answer, and my simulated results are close to this. But how can this not terrify everyone! I knew it was wrong because I did the problem. Most people aren’t going to know how to do this problem and just accept the answer. This is a fairly trivial problem, but right now as we speak people are using AI for things like prisons, national defense, and medicine. It’s…..not great.
Cheers.
AI is wildly over confident in itself and you should never trust the first answer.
Here is another example of AI just being 100% confident in itself for no reason.
Claude gave me this answer to a question that I asked:

Sounds like a really good answer. But I wanted to read the details.
So I followed up with this:

Then it said this:

So the two assertions it was making, it couldn’t support either one when challenged.
This is my big concern with AI is the unfounded confidence. If you aren’t an expert in a field, it’s going to sound very convincing. AI is incredible, but holy shit it’s wrong constantly. And it doesn’t seem to have a shred of humility in its answers. Almost like a tech bro billionaire who never for a second ever considers that they might be wrong. Society just needs to be more careful with this stuff, but I don’t see that happening. #verynervous
Cheers.
NCAA men’s tournament probabilities 2026
It just keeps getting worse:
After 100 games:

The cream has dropped:
After 72 games:

The cream keeps rising……….
Through 48 games.

Through 24 games:

Through 20 games:

Through 12 games:

My probabilities to my Kaggle March Madness contest:
March 19, 2026
Men
Ohio State over TCU: 59.36% (Loss: 0.352361)
Nebraska over Troy: 86.92% (Loss: 0.01710864)
Louisville over South Florida: 66.41% (Loss: 0.1128288)
Wisconsin over High Point: 83.07% (Loss: 0.6900625)
Duke over Siena: 100% (Loss: 0)
Vanderbilt over McNeese St: 85.10% (Loss: 0.022201)
Michigan St over North Dakota State: 92.41% (Loss: 0.00576081)
Arkansas over Hawaii: 91.28% (loss: 0.00760384)
North Carolina over VCU: 59.36% (Loss: 0.352361)
Michigan over Howard: 100% (Loss: 0)
BYU over Texas: 59.36% (Loss: 0.352361)
St. Mary’s CA over Texas A&M: 62.95% (Loss: 0.3962702)
Illinois over Penn: 97.94% (Loss: 0.00042436)
Georgia over St Louis: 59.36% (Loss: 0.352361)
Gonzaga over Kennesaw: 96.29% (Loss: 0.00137641)
Houston over Idaho: 97.24% (Loss: 0.00076176)
March 20, 2026
Men
Kentucky over Santa Clara: 62.95% (Loss: 0.1372703)
Texas Tech over Akron: 75.70% (Loss: 0.059049)
Arizona over LIU: 100% (Loss: 0)
Virginia over Wright State: 94.28% (Loss: 0.00327184)
Iowa State over Tennessee State: 97.61% (Loss: 0.00057121)
Alabama over Hofstra: 85.10% (Loss: 0.022201)
Utah State over Villanova: 55.66% (Loss: 0.1966036)
Iowa over Clemson: 59.36% (Loss: 0.165161)
St. John’s over Northern Iowa: 80.83% (Loss: 0.03674889)
Tennessee over Miami OH: 83.07% (Loss: 0.02866249)
UCLA over UCF: 69.70% (Loss: 0.091809)
Purdue over Queens: 100% (Loss: 0)
Florida over Prairie View: 100% (Loss: 0)
Kansas over Cal Baptist: 89.99% (Loss: 0.01002001)
UConn over Furman: 100% (Loss: 0)
Miami over Missouri: 55.66% (Loss: 0.1966036)
Women
Duke over College of Charleston: 100% (Loss: 0)
TCU over UC San Diego: 100% (Loss: 0)
Oregon over Virginia Tech: 62.95% (Loss: 0.1372703)
Nebraska over Baylor: 50.29% (Loss: 0.2529084)
Washington over San Diego State: 97.24% (Loss: 0.00076176)
Maryland over Murray State: 100% (Loss: 0)
Ole Miss over Gonzaga: 88.55% (Loss: 0.01311025)
Texas over Missouri State: 99.97% (Loss: 0)
Michigan over Holy Cross: 100% (Loss: 0)
North Carolina over Western Illinois: 100% (Loss: 0)
LSU over Jacksonville: 100% (Loss: 0)
Minnesota over WI Green Bay: 100% (Loss: 0)
Michigan State over Colorado State: 94.28% (Loss: 0.00327184)
NC State over Tennessee: 55.66% (Loss: 0.1966036
Texas Tech over Villanova: 55.66% (Loss: 0.1966036)
Oklahoma over Idaho: 100% (Loss: 0)
March 21, 2026
Men
Michigan over St Louis: 98.08% (Loss: 0.00036864)
Michigan State over Louisville: 81.41% (Loss: 0.03455881)
Duke over TCU: 96.65%
Houston over Texas A&M: 83.53%
Gonzaga over Texas: 75.36%
Illinois over VCU: 51.07%
Vanderbilt over Nebraska: 65.59%
Arkansas over High Point: 86.92%
Women
Ohio State over Howard: 100% (Loss: 0)
Louisville over Vermont: 100% (Loss: 0)
South Carolina over Southern: 99.99% (Loss: 0.00000001)
Georgia over Virginia: 83.12% (Loss: 0.6908934)
Notre Dame over Fairfield: 80.83% (Loss: 0.03674889)
Kentucky over James Madison: 90.00% (Loss: 0.01)
Alabama over Rhode Island: 80.83% (Loss: 0.03674889)
UConn over UTSA: 100% (Loss: 0)
USC over Clemson: 69.70% (Loss: 0.091809)
Iowa over Fairleigh Dickinson: 100% (Loss: 0)
West Virginia over Miami OH: 100% (Loss: 0)
Iowa St over Syracuse: 75.70% (Loss: 0.573049)
Vanderbilt over High Point: 100% (Loss: 0)
Oklahoma St over Princeton: 72.80% (Loss: 0.073984)
Illinois over Colorado: 62.95% (Loss: 0.1372703)
UCLA over Cal Baptist: 100% (Loss: 0)
March 22
Men
Purdue over Miami FL: 80.81%
Iowa State over Kentucky: 68.10%
St John’s over Kansas: 64.47%
Virginia over Tennessee: 88.55%
Florida over Iowa: 89.54%
Arizona over Utah State: 96.69%
UConn over UCLA: 68.60%
Texas Tech over Alabama: 52.23%
Women
North Carolina over Maryland: 58.35%
Michigan over NC State: 96.23%
Minnesota over Ole Miss: 74.41%
LSU over Texas Tech: 98.18%
Duke over Baylor: 86.28%
Texas over Oregon: 98.52%
Oklahoma over Michigan State: 85.08%
TCU over Washington: 58.57%
March 26, 2026
Houston over Illinois: 74.29%
Arizona over Arkansas: 78.53%
Nebraska over Iowa: 63.56%
Purdue over Texas: 87.83%
March 26, 2026
Men’s
Duke over St. John’s: 93.75%
Michigan over Alabama: 91.14%
Iowa State over Tennessee: 72.37%
UConn over Michigan State: 74.95%
Women’s
Vanderbilt over Notre Dame: 81.44%
UConn over North Carolina: 99.999%
UCLA over Minnesota: 99.992%
LSU over Duke: 86.66%

Cheers.
King Cake Baby
I was in New Orleans last week at Tulane for Love Data Week 2026 where I gave my Data Art talk. But that’s not important. While I was there, I learned all about Mardi Gras traditions that I was completely unaware of, specifically: King Cakes. For some reason, these lunatics in New Orleans hide a plastic baby in these cakes and if you get the piece with the plastic baby you have to buy the next king cake. But even this doesn’t really matter. Here is what matters: during Mardi Gras some sporting events have this baby show up as a mascot. I now present to you King Cake Baby (sent to me by Lisa Dilks). It’s glorious and I love it with all my heart.

Cheers.
Home advantage
This plot is awesome.

Read the full paper: https://projecteuclid.org/journals/annals-of-applied-statistics/volume-12/issue-4/How-often-does-the-best-team-win-A-unified-approach/10.1214/18-AOAS1165.full
Cheers.
Lies, Damn Lies, and Trump Stats
The White House just posted this tweet:
The “readers added context” gets this correct as a misrepresentation. It’s actually only a 1.2% increase. The issue is a truncated y-axis. Don’t do this with bar charts!
Preserved here if this ever gets taken down:


Boy, this Donald Trump fellow sure does seem dishonest.
This reminds me of the famous Fox News graph “If the Bush Tax Cuts Expire”.

Man, it’s almost like all this dishonesty is coming from one side and they are doing it on purpose……
Cheers.
TIL: facet_grid edition
Today I learned how to get rid of the pesky “1” on the top of a ggplot facet_grid if I want to stack the plots vertically. Use facet_grid(rows = vars(group)). See my example below:
#Ugh. Why is there a 1?rbind(PSold %>% filter(Age == "65+") %>% mutate(STI = "Syphilis"), GCold %>% filter(Age == "65+") %>% mutate(STI = "Gonorrhea"), CTold %>% filter(Age == "65+") %>% mutate(STI = "Chlamydia")) %>% ggplot(aes(x = Year, y = Rate)) + geom_point() + geom_line() + theme_bw() + facet_grid(STI~1, scale = "free_y") #Get rid of the pesky 1 rbind(PSold %>% filter(Age == "65+") %>% mutate(STI = "Syphilis"), GCold %>% filter(Age == "65+") %>% mutate(STI = "Gonorrhea"), CTold %>% filter(Age == "65+") %>% mutate(STI = "Chlamydia")) %>% ggplot(aes(x = Year, y = Rate)) + geom_point() + geom_line() + theme_bw() + facet_grid(rows = vars(STI), scale = "free_y")


Cheers.
The University of Nebraska Board of Regents is a dumpster fire
When I was a kid, I used to think that if someone was in a high up position that they must be really smart and competent. Now that I’m in my 40s, it seems like it’s almost the opposite. Most of the people I meet or read about who are in positions of leadership, I’m wildly unimpressed by. But I could not possibly imagine the level of incompetence of the Board of Regents for the University of Nebraska.
They recently cut their statistics program along with a few others based on metrics showing these program were underperforming, but there really weren’t many details of what these metrics were or how they were computed. However, AMSTAT News recently published an interview with two professors from the University of Nebraska-Lincoln on what happened (You can read the full article here). In that article, they detail some of the flaws in the calculation of these metrics, but one of these flaws stands out above all the others as the peak of dip shittery.
So, one of the metrics the Board of Regents used in their calculation was graduation rate. Graduation rate is the number of graduates divided by number of majors. Simple enough. Now for the statistics department, this number was 0%. Why? Because the department hadn’t yet existed for 4 years.
Here is the relevant paragraph from the AMSTAT article:

And just as a nice added middle-finger to the statistics department, the Board of Regents decided to make their own definition of what an outlier is.
There is no nice way to say this: these people are world class unimpressive.
Cheers.
AI fails again
I was trying to fit a mixed effects model and I waned to compare the model with the same fixed effects with and without a random intercept in an ANOVA. So I wasn’t sure how to do this using lmer from the lme4 package. So I googled it:

It gave me the following answer with code:

# Example data
data <- data.frame(
response = rnorm(100),
fixed_effect_1 = rnorm(100),
fixed_effect_2 = rnorm(100)
)
# Fit a linear mixed model with no random effects
model <- lmer(response ~ fixed_effect_1 + fixed_effect_2, data = data)
# Display the model summary
summary(model)
I was pretty sure this wouldn't work, but I tried it anyway. And it doesn't work. You can't fit an lmer model without specifying random effects.

For the record, I’m not sure my question even makes sense. I think I should be comparing these models with AIC rather than ANOVA, now that I think about it for more than a second.
Anyway,
Cheers.

