By Nish Tahir in Statistics For Programmers — 20 Apr 2024

Statistics For Programmers - Bayes Theorem

Bayes Theorem is a fundamental concept in statistics and probability theory. It provides a way to update the probability of a hypothesis based on new evidence or information. It's particularly useful in situations where have some knowledge about the efficacy of a test or the prevalence of a condition in a population.

Bayes Theorem is expressed as follows:

\[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} \]

Where:

\( P(A|B) \) is the probability of event A given event B
\( P(B|A) \) is the probability of event B given event A
\( P(A) \) is the prior probability of event A
\( P(B) \) is the prior probability of event B

Let's take a look at an example,

A population is experiencing an outbreak of a condition that affects 1% of the total population. In response, a test has been developed that correctly identifies the condition 95% of the time. However, the test incorrectly flags healthy individuals as having the condition 3% of the time.

Given this information, what is the probability that an individual has the condition if they test positive?

The key to these types of problems is correctly identifying and defing your events. Then you can characterize them using a contingency table^[1].

Given the following contingency table of events \(A\) and \(B\),

	B	Not B	Total
A	s	t	s + t
Not A	u	v	u + v
Total	s + u	t + v	s + t + u + v

Where:

\(P(A) = \frac{s + t }{s + t + u + v}\)
\(P(B) = \frac{s + u }{s + t + u + v}\)
\(P(A|B) = \frac{s}{s + u}\)
\(P(B|A) = \frac{s}{s + t}\)

Applying this to our problem, we can begin by defining the events.

\(A\): Any given individual has the condition - We know the positive rate to be 1% of the population or 0.01

\(B\): Any given individual tests positive - We need to calculate this probability
Ultimately we are interested in calculating the probability that an individual has the condition given that they test positive, \(P(A|B)\)

Applying this to our contingency table, we have the following:

	Test Positive	Test Negative
Has Condition	True Positive	False Negative
Healthy	False Positive	True Negative
Total

From the information given, we can fill in the table as follows:

True Positive (Has the condition and tests positive): (1/100) * (95/100) = 0.0095
False Negative (Has the condition but tests negative): (1/100) * (5/100) = 0.0005
False Positive (Healthy but tests positive): (99/100) * (3/100) = 0.0297
True Negative (Healthy and tests negative): (99/100) * (97/100) = 0.9703

	Test Positive	Test Negative	Total
Has Condition	0.0095	0.0005	0.01
Healthy	0.0297	0.9703	0.99
Total	0.0392	0.9708	1

This means that the probability of any given individual testing positive \(P(B)\) is 0.0392 or 3.92%.

Now, we can calculate the probability of an individual having the condition given that they test positive:

\[ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} = \frac{0.0095 \times 0.01}{0.0392} = 0.02423 \]

In other words, if an individual tests positive for the condition, there is a 24.23% chance that they actually have the condition.

Expressing this as code is pretty straight forward:

const pA = 0.01;
const pBGivenA = 0.95;
const pBGivenNotA = 0.03;
const pNotA = 1 - pA;

const pB = pA * pBGivenA + pNotA * pBGivenNotA;

const pAGivenB = (pBGivenA * pA) / pB;
console.log(pAGivenB);
// 0.2423469387755102

There's been a lot of great work written on the topic of Bayes Theorem. If you're interested in learning more, I recommend checking out the following resources:

(no date) Bayes' theorem. www.mathsisfun.com. Available at: https://www.mathsisfun.com/data/bayes-theorem.html (Accessed: 2024-4-20). ↩︎

Subscribe to Another Dev's Two Cents