By Nish Tahir in Statistics For Programmers — 05 Apr 2024

Statistics For Programmers - Introduction to Probability

Probability, in the realm of statistics, is a measure of the likelihood that a specific event will occur. It provides us with a way to quantify uncertainty. We see and characterize this in our everyday lives. For example, when we say that there is a 70% chance of rain, or a user has a 20% chance of clicking on a button, we are expressing probability.

Key Terminology

To precisely discuss probability, we need to understand some key terms.

Experiment - An experiment is an activity or process that results in an outcome that we can measure. For example, rolling a die, flipping a coin, or conducting a survey are all examples of experiments.
Sample Space (S): - The sample space is the set of all possible outcomes of an experiment. For example, when rolling a die, the sample space is {1, 2, 3, 4, 5, 6}.
Event (E): - An event is a subset of the sample space. It is a specific outcome or a set of outcomes of an experiment. If you're rolling a die, the event of getting an even number is {2, 4, 6}.
Probability (P): - A Probability is a number between 0 and 1 (inclusive) that quantifies the likelihood of an event occurring. A probability of 0 means the event will not happen, while a probability of 1 means the event is guaranteed to happen.

There are two main types of probability: Classical and Empirical. We determine which type to use based on the nature of the experiment.

Classical Probability

We turn to Classical probability when we know all of the possible outcomes of the event of interest and all outcomes in the sample space are equally likely. It is calculated using the formula:

\[ P(E) = \frac{n(E)}{n(S)} \]

Where:

\( P(E) \) is the probability of event \( E \),
\( n(E) \) is the number of outcomes of the event \( E \),
\( n(S) \) is the number of possible outcomes in the sample space.

For example, when rolling a die, to calculate the probability of getting a 3 we begin by defining the sample space and the event.

The sample space is every posibile outcome of rolling the die \( S = {1, 2, 3, 4, 5, 6} \), meaning there are (6) possible outcomes. The event is the specific outcome of interest \( E = {3} \). Pluging this into our formula probability of rolling a 3 is:

\[ P(E) = \frac{1}{6} \]

Empirical Probability

Empirical probability is used when we don't know all the possible outcomes of an event, or when the outcomes are not equally likely. Under this circumstance, we determine the probability of an event by conducting an experiment and observing the frequency of the event. This is effectively a Relative Frequency distribution of the event.

\[ P(E) = \frac{\text{Number of times event E occurs}}{\text{Total number of trials}} \]

For example, let's assume we wanted to calculate the probability of a user clicking a button in an application. We know the possible outcomes ahead of time (the user either clicks the button or doesn't), but we don't know every single factor that could influence the user's decision. This means that the outcomes are not equally likely.

To calculate the probability that a user would click our button, we can conduct an experiment aimed at gathering data by showing the button to 100 users and observing how many times it was clicked. If it was clicked 20 times.

'clicked' : 20
'not clicked' : 80

The probability of a user clicking on the button is:

\[ P(E) = \frac{20}{100} = 0.2 \]

The accuracy of Empirical probability is directly proportional to the number of trials conducted as well as the size of the sample space. The larger the sample space and the number of trials, the more likely the empirical probability will converge the true or classical probability. This is known as the Law of Large Numbers.

Probability as Code

Computing probability as code is quite straight forward using techniques we've already discussed. We begin by deriving a frequency distribution from our dataset and then calculating the probability of the event of interest.

Let's assume our dataset is an array of user interactions with the button:

// 20 clicked, 80 not clicked events
const data = ['clicked', 'not clicked', 'clicked', 'not clicked', 'clicked', 'not clicked', 'clicked', 'not clicked', 'clicked', 'not clicked' ... ];

We can derive the frequency distribution generating a frequency map of the data:

function frequencyDistribution(arr) {
  const map = {};
  for (let i = 0; i < arr.length; i++) {
    const item = arr[i];
    if (map[item]) {
      map[item] += 1;
    } else {
      map[item] = 1;
    }
  }
  return map;
}

const distribution = frequencyDistribution(data);
console.log(distribution);
// { 'clicked': 20, 'not clicked': 80 }

We can then use the frequency map to determine the probability of the event of interest:

function probability(event, frequencyMap) {
  // Sum the events in the frequency map
  const totalEvents = Object.values(frequencyMap).reduce(
    (acc, val) => acc + val,
    0
  );

  // Calculate the probability of the event occurring
  return frequencyMap[event] / totalEvents;
}

console.log(probability("clicked", distribution));
// 0.2

This means for our dataset, the probability of a user clicking the button is 0.2 or 20%.

Probability of Multiple Events

Often times are interested in the probability of multiple events occurring. How we compute this changes depending on what we are interested in.

Considering two events, \(A\) and \(B\),

The probability of both A and B occurring is known as the Intersection of A and B. This is denoted as \( P(A \cap B) \).
The probability of either A or B occurring is called the Union of A and B and is denoted as \( P(A \cup B) \).

Let's explore each of these concepts in detail.

Intersection of Events

To compute this, we need to count the number of times both events occur together and then divide by the total number of occurences. Formally, we can express this as,

\[ P(A \cap B) = \frac{n(A \cap B)}{n(S)} \]

Where:

\( P(A \cap B)\) is the probability of both events \(A\) and \(B\) occurring,
\( n(A \cap B) \) is the number of outcomes where both events \(A\) and \(B\) occur,
\( n(S) \) is the number of possible outcomes in the sample space.

Let's break it down with an example.

Expanding on our earlier example, we want to compute the probability of any given user clicking a button in a hypothetical user interface and making a purchase. After running an experiment, we aggregate the following data visualized as a contingency table:

	Made a Purchase	Did Not Make a Purchase	Total
Clicked Button	50	150	200
Did Not Click Button	20	780	800
Total	70	930	1000

Contingency tables show the actual or relative frequency of events and can be used to calculate the probability of the intersection of events.

In total, 1000 users were shown the button and 70 made a purchase. Of the 70 that made a purchase, 50 clicked the button. In other words, only 50 users clicked the button and made a purchase.

Applying our formula, we can compute the probability of any given user clicking the button and making a purchase as

\[ P(\text{Clicked Button} \cap \text{Made a Purchase}) = \frac{50}{1000} = 0.05 \]

To compute this as code we start by formating our data. We can represent this dataset as an array of objects with properties representing each event.

// Let's generate a synthetic dataset that matches
// the contingency table we've been working with
const data = [];

for (let i = 0; i < 50; i++) {
  data.push({ clicked: true, purchased: true });
}

for (let i = 0; i < 150; i++) {
  data.push({ clicked: true, purchased: false });
}

for (let i = 0; i < 20; i++) {
  data.push({ clicked: false, purchased: true });
}

for (let i = 0; i < 780; i++) {
  data.push({ clicked: false, purchased: false });
}

console.log("Total Number of Sessions: ", data.length);
// Total Number of Sessions: 1000

We can apply filters to the data to compute the frequency of the events of interest

const totalClicked = data.filter((item) => item.clicked);
const totalPurchased = data.filter((item) => item.purchased);

console.log("Total Clicked: ", totalClicked.length);
console.log("Total Purchased: ", totalPurchased.length);
// Total Clicked:  200
// Total Purchased: 70

We can apply a filter to sample only instances where both events occured:

const clickedAndPurchased = data.filter(
  (item) => item.clicked && item.purchased
);

console.log("Total Clicked and Purchased: ", clickedAndPurchased.length);
// Total Clicked and Purchased:  50

And then calculate the probability by dividing the frequency by the total number of trials:

console.log(
  "P(Clicked Button and Made a Purchase) =",
  clickedAndPurchased.length / data.length
);

//P(Clicked Button and Made a Purchase) = 0.05

Union of Events

This time we are interested in the probability of either event A or event B occurring. It is calculated using the formula:

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

Where:

\( P(E \cup B) \) is the probability of either event \( A \) or \( B \) occurring,
\( P(A) \) is the probability of event \( A \) occurring,
\( P(B) \) is the probability of event \( B \) occurring,
\( P(A \cap B) \) is the probability of both events \( A \) and \( B \) occurring.

Using the same example as before, we can calculate the probability of a given user clicking the button or making a purchase is:

\[ \displaylines{ P(\text{Clicked Button} \cup \text{Made a Purchase}) = \\
P(\text{Clicked Button}) + P(\text{Made a Purchase}) - P(\text{Clicked Button} \cap \text{Made a Purchase}) }\]

🤔

Note that we subtract the intersection here because the formula counts each probabily independent of the others. { a: true, b: true } would double count towards P(A) and P(B).

Applying this to our dataset,

\[ P(\text{Clicked Button} \cup \text{Made a Purchase}) = \frac{200}{1000} + \frac{70}{1000} - \frac{50}{1000} = 0.22 \]

As we would expect, the probability of a user either clicking the button or making a purchase (doing one, the other or both) is greater than the probability of a user clicking the button and making a purchase (doing both).

As code, we can calculate the probability of the union by updating our filter

const clickedOrPurchased = data.filter(
  (item) => item.clicked || item.purchased
); 

// Note We don't have to subtract the intersection here because the filter only counts each event once

console.log("P(Clicked Button or Made a Purchase) = ", clickedOrPurchased.length / data.length);
// P(Clicked Button or Made a Purchase) = 0.22

Key Terminology

Classical Probability

Empirical Probability

Probability as Code

Probability of Multiple Events

Intersection of Events

Union of Events

Subscribe to Another Dev's Two Cents