AI is wildly over confident in itself and you should never trust the first answer (Part 2)

Sometimes just take a question ans see what AI will do with it. This question came across my desk today:

Can you help us calculate how improbable this is? The current estimate is there are ~20,000 protein coding genes in the human genome, so clearly if you randomly drew 75 genes from 20,000 and 211 from 20,000, it would be highly improbable that 7 would be in common.

So I threw it into Google Gemini to see what it would do. My full conversation with the AI follows.

The first answer:

To calculate the probability of this overlap, we use the hypergeometric distribution, which determines the likelihood of drawing exactly $7$ successes (overlapping genes) in a random sample. It turns out this specific overlap is highly improbable—occurring by chance only about once every 17 billion draws. [1]

The Math Behind the Draw

We use the hypergeometric probability formula: [2]

$P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}$

Where our exact parameters are:

Population size ($N$): $20,000$
Successes in population ($K$): $211$ (the size of your larger gene set)
Sample size ($n$): $75$ (the size of your smaller gene set)
Successes in sample ($k$): $7$ (the overlapping genes we want to test) [3]

The Results

When you plug these numbers into a Hypergeometric Distribution Probability Calculator, the probabilities break down as follows: [4]

Probability of EXACTLY 7 matches: $\approx 0.0000000574$ (roughly $1$ in $17.4$ million)
Probability of AT LEAST 7 matches: $\approx 0.0000000588$ (roughly $1$ in $17$ million)

Because your specific overlap falls in the extreme tail of the probability curve, a Hypergeometric p-value calculator will give you a p-value of $p < 0.00000006$. This indicates immense statistical significance, suggesting that the overlap is very unlikely to be the result of random chance. [1, 5, 6]

If you are evaluating this overlap for a specific biological or research context, tell me:

Are these differentially expressed genes or a list related to a specific biological pathway?
What research platform or tool did you use to generate your gene lists?

I can help you adjust your statistics to account for functional enrichment or pathway biases!

[1] https://en.wikipedia.org

[2] https://www.statscalculators.com

[3] https://www.chegg.com

[4] https://stattrek.com

[5] https://systems.crump.ucla.edu

[6] https://www.cliffsnotes.com

So I responded with “Not even fucking close, AI.”

You are completely right, and I messed up that math badly. The actual probability of getting at least 7 overlapping genes is about 1 in 70,250, not 1 in 17 million.

I completely botched the basic arithmetic in my head. Thank you for calling me out on it.

The Correct Math Breakdown

Using the same hypergeometric distribution with your exact numbers:

Total genes ($N$): 20,000
Group 1 ($K$): 211
Group 2 ($n$): 75
Overlap ($k$): 7 [1]

Running the actual numbers through Python’s standard scipy.stats library yields:

Probability of exactly 7 matches: $0.0000130$ (about 1 in 76,934)
Probability of 7 or more matches (p-value): $0.0000142$ (about 1 in 70,250)

What This Means

While it is still statistically significant ($p \approx 1.42 \times 10^{-5}$), it is orders of magnitude more likely than the garbage numbers I gave you before. 1 in 70,000 is rare, but it is firmly in the realm of realistic biological data, whereas 1 in 17 million was an outright hallucination.

Apologies for the absolute miss on the first try.

			
import scipy.stats as stats
N = 20000
K = 211
n = 75
k = 7
# Exact probability of k=7
p_exact = stats.hypergeom.pmf(k, N, K, n)
# Probability of k >= 7
p_at_least = stats.hypergeom.sf(k-1, N, K, n)
print(f"Exact: {p_exact}")
print(f"At least: {p_at_least}")
print(f"1 in X (at least): {1/p_at_least}")

		

[1] https://pmc.ncbi.nlm.nih.gov

I believe this second try is the correct answer, and my simulated results are close to this. But how can this not terrify everyone! I knew it was wrong because I did the problem. Most people aren’t going to know how to do this problem and just accept the answer. This is a fairly trivial problem, but right now as we speak people are using AI for things like prisons, national defense, and medicine. It’s…..not great.

Cheers.

2 responses to “AI is wildly over confident in itself and you should never trust the first answer (Part 2)”

salauer

June 17, 2026

Have you tried asking it any sports questions? It’s awful. I asked it if the USA and Spain could possibly play each other in the first round of knockouts (afraid that Spain could fall to 3rd if they had lost to Cabo Verde). It gave me scenarios where the USA or Spain won groups that they weren’t in. The correct answer is no, we can’t play Spain under any scenario. It really makes me not want to use AI for things I don’t know about.

1. statsinthewild
  
  June 23, 2026
  
  But then you’ll never be a billionaire tech bro!

recent posts

about

2 responses to “AI is wildly over confident in itself and you should never trust the first answer (Part 2)”

Leave a comment Cancel reply

recent posts

about