Statistics For Programmers - Bayes Theorem
Bayes Theorem is a fundamental concept in statistics and probability theory. It provides a way to update the probability of a hypothesis based on new evidence or information. It's particularly useful in situations where have some knowledge about the efficacy of a test or the prevalence of a condition in a population.
Bayes Theorem is expressed as follows:
\[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} \]
Where:
- \( P(A|B) \) is the probability of event A given event B
- \( P(B|A) \) is the probability of event B given event A
- \( P(A) \) is the prior probability of event A
- \( P(B) \) is the prior probability of event B
Let's take a look at an example,
A population is experiencing an outbreak of a condition that affects 1% of the total population. In response, a test has been developed that correctly identifies the condition 95% of the time. However, the test incorrectly flags healthy individuals as having the condition 3% of the time.
Given this information, what is the probability that an individual has the condition if they test positive?
The key to these types of problems is correctly identifying and defing your events. Then you can characterize them using a contingency table[1].
Given the following contingency table of events \(A\) and \(B\),
B | Not B | Total | |
---|---|---|---|
A | s | t | s + t |
Not A | u | v | u + v |
Total | s + u | t + v | s + t + u + v |
Where:
- \(P(A) = \frac{s + t }{s + t + u + v}\)
- \(P(B) = \frac{s + u }{s + t + u + v}\)
- \(P(A|B) = \frac{s}{s + u}\)
- \(P(B|A) = \frac{s}{s + t}\)
Applying this to our problem, we can begin by defining the events.
- \(A\): Any given individual has the condition - We know the positive rate to be 1% of the population or 0.01
-
\(B\): Any given individual tests positive - We need to calculate this probability
-
Ultimately we are interested in calculating the probability that an individual has the condition given that they test positive, \(P(A|B)\)
Applying this to our contingency table, we have the following:
Test Positive | Test Negative | Total | |
---|---|---|---|
Has Condition | True Positive | False Negative | |
Healthy | False Positive | True Negative | |
Total |
From the information given, we can fill in the table as follows:
- True Positive (Has the condition and tests positive): (1/100) * (95/100) = 0.0095
- False Negative (Has the condition but tests negative): (1/100) * (5/100) = 0.0005
- False Positive (Healthy but tests positive): (99/100) * (3/100) = 0.0297
- True Negative (Healthy and tests negative): (99/100) * (97/100) = 0.9703
Test Positive | Test Negative | Total | |
---|---|---|---|
Has Condition | 0.0095 | 0.0005 | 0.01 |
Healthy | 0.0297 | 0.9703 | 0.99 |
Total | 0.0392 | 0.9708 | 1 |
This means that the probability of any given individual testing positive \(P(B)\) is 0.0392 or 3.92%.
Now, we can calculate the probability of an individual having the condition given that they test positive:
\[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} = \frac{0.0095 \times 0.01}{0.0392} = 0.02423 \]
In other words, if an individual tests positive for the condition, there is a 24.23% chance that they actually have the condition.
Expressing this as code is pretty straight forward:
const pA = 0.01;
const pBGivenA = 0.95;
const pBGivenNotA = 0.03;
const pNotA = 1 - pA;
const pB = pA * pBGivenA + pNotA * pBGivenNotA;
const pAGivenB = (pBGivenA * pA) / pB;
console.log(pAGivenB);
// 0.2423469387755102
There's been a lot of great work written on the topic of Bayes Theorem. If you're interested in learning more, I recommend checking out the following resources:
- Math is Fun - Bayes Theorem
- Stanford Encyclopedia of Philosophy - Bayes Theorem
- An Intuitive (and Short) Explanation of Bayes’ Theorem
(no date) Bayes' theorem. www.mathsisfun.com. Available at: https://www.mathsisfun.com/data/bayes-theorem.html (Accessed: 2024-4-20). ↩︎