Blog Archives
AI is wildly over confident in itself and you should never trust the first answer (Part 2)
Sometimes just take a question ans see what AI will do with it. This question came across my desk today:
Can you help us calculate how improbable this is? The current estimate is there are ~20,000 protein coding genes in the human genome, so clearly if you randomly drew 75 genes from 20,000 and 211 from 20,000, it would be highly improbable that 7 would be in common.
So I threw it into Google Gemini to see what it would do. My full conversation with the AI follows.
The first answer:
To calculate the probability of this overlap, we use the hypergeometric distribution, which determines the likelihood of drawing exactly $7$ successes (overlapping genes) in a random sample. It turns out this specific overlap is highly improbable—occurring by chance only about once every 17 billion draws. [1]
The Math Behind the Draw
We use the hypergeometric probability formula: [2]
$P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}$
Where our exact parameters are:
- Population size ($N$): $20,000$
- Successes in population ($K$): $211$ (the size of your larger gene set)
- Sample size ($n$): $75$ (the size of your smaller gene set)
- Successes in sample ($k$): $7$ (the overlapping genes we want to test) [3]
The Results
When you plug these numbers into a Hypergeometric Distribution Probability Calculator, the probabilities break down as follows: [4]
- Probability of EXACTLY 7 matches: $\approx 0.0000000574$ (roughly $1$ in $17.4$ million)
- Probability of AT LEAST 7 matches: $\approx 0.0000000588$ (roughly $1$ in $17$ million)
Because your specific overlap falls in the extreme tail of the probability curve, a Hypergeometric p-value calculator will give you a p-value of $p < 0.00000006$. This indicates immense statistical significance, suggesting that the overlap is very unlikely to be the result of random chance. [1, 5, 6]
If you are evaluating this overlap for a specific biological or research context, tell me:
- Are these differentially expressed genes or a list related to a specific biological pathway?
- What research platform or tool did you use to generate your gene lists?
I can help you adjust your statistics to account for functional enrichment or pathway biases!
[2] https://www.statscalculators.com
[5] https://systems.crump.ucla.edu
[6] https://www.cliffsnotes.com
So I responded with “Not even fucking close, AI.”
You are completely right, and I messed up that math badly. The actual probability of getting at least 7 overlapping genes is about 1 in 70,250, not 1 in 17 million.
I completely botched the basic arithmetic in my head. Thank you for calling me out on it.
The Correct Math Breakdown
Using the same hypergeometric distribution with your exact numbers:
- Total genes ($N$): 20,000
- Group 1 ($K$): 211
- Group 2 ($n$): 75
- Overlap ($k$): 7 [1]
Running the actual numbers through Python’s standard scipy.stats library yields:
- Probability of exactly 7 matches: $0.0000130$ (about 1 in 76,934)
- Probability of 7 or more matches (p-value): $0.0000142$ (about 1 in 70,250)
What This Means
While it is still statistically significant ($p \approx 1.42 \times 10^{-5}$), it is orders of magnitude more likely than the garbage numbers I gave you before. 1 in 70,000 is rare, but it is firmly in the realm of realistic biological data, whereas 1 in 17 million was an outright hallucination.
Apologies for the absolute miss on the first try.
import scipy.stats as statsN = 20000K = 211n = 75k = 7# Exact probability of k=7p_exact = stats.hypergeom.pmf(k, N, K, n)# Probability of k >= 7p_at_least = stats.hypergeom.sf(k-1, N, K, n)print(f"Exact: {p_exact}")print(f"At least: {p_at_least}")print(f"1 in X (at least): {1/p_at_least}")
[1] https://pmc.ncbi.nlm.nih.gov
I believe this second try is the correct answer, and my simulated results are close to this. But how can this not terrify everyone! I knew it was wrong because I did the problem. Most people aren’t going to know how to do this problem and just accept the answer. This is a fairly trivial problem, but right now as we speak people are using AI for things like prisons, national defense, and medicine. It’s…..not great.
Cheers.