Statistics for Programmers - Frequency Distributions
A Frequency Distribution is a common way to understand a trend in a dataset. It's a tabular representation of the number of times a value appears in a dataset. If we denote values in a dataset as \(x_1, x_2, \ldots, x_n\), their corresponding frequencies can be denoted as \(f_1, f_2, \ldots, f_n\). This relationship can be expressed as a table.
\[ \begin{array}{|c|c|} \hline \text{Value (}x\text{)} & \text{Frequency (}f\text{)} \\ \hline x_1 & f_1 \\ x_2 & f_2 \\ \vdots & \vdots \\ x_n & f_n \\ \hline \end{array} \]
Applying this practically, let's consider a dataset of 10 users who were asked to review a product on a scale of 1 to 5. The dataset can be represented as an array of reviews.
[3, 1, 5, 5, 2, 4, 5, 3, 1, 5]
We can construct a frequency distribution table for this dataset by counting the number of times each unique element appears in the array.
Value (x) | Frequency (f)
-------------------------
1 | 2
2 | 1
3 | 2
4 | 1
5 | 4
This can be expressed in code using a Map
(or Dictionary
depending on your language of choice) of unique values and how many times they appear in a given dataset.
Once again considering our array of reviews,
const arr = [3, 1, 5, 5, 2, 4, 5, 3, 1, 5];
We can construct a function that counts the number of times each unique element appears in the array.
function frequencyDistribution(arr) {
const map = {};
for(let i = 0; i < arr.length; i++) {
const item = arr[i];
if (map[item]) {
map[item] += 1;
} else {
map[item] = 1;
}
}
return map;
}
Applying this function to our dataset gives us the following output,