How to Determine Bin Width for a Histogram ( R and Python)

Nkugwa Mark William
2 min readOct 31, 2023

--

A histogram is a graphical representation of the distribution of a set of data. It is a useful tool for visualizing the data and identifying patterns. The bin width of a histogram is the width of each bar in the histogram. It is an important parameter that can affect the appearance and informativeness of the histogram.

There are a few different rules of thumb that can be used to determine the bin width for a histogram. Three common rules of thumb are the Freedman-Diaconis rule, the Sturges’ rule, and the Scott’s rule.

Freedman-Diaconis Rule
The Freedman-Diaconis rule is based on the interquartile range (IQR) of the data. The IQR is a measure of the spread of the data, and it is calculated by subtracting the first quartile from the third quartile. The Freedman-Diaconis rule recommends using a bin width of 2 * IQR / n^(1/3), where n is the number of data points.

Sturges’ Rule
Sturges’ rule is based on the number of data points in the dataset. It recommends using a bin width of (max — min) / (1 + 3.3 * log(n)), where max and min are the maximum and minimum values in the dataset, and n is the number of data points.

Scott’s Rule
Scott’s rule is based on the standard deviation of the data. It recommends using a bin width of 3.49 * s / n^(1/3), where s is the standard deviation of the data, and n is the number of data points.

Implementation in R and Python
Here is an example of how to implement the Freedman-Diaconis rule, Sturges’ rule, and Scott’s rule in R and Python:

In R

# Load the data
data <- data.frame(x = c(1, 2, 3, 4, 5))

# Calculate the IQR
IQR <- IQR(data$x)

# Calculate the bin width using the Freedman-Diaconis rule
bin_width_fd <- 2 * IQR / length(data$x)^(1/3)

# Calculate the bin width using Sturges' rule
bin_width_sturgess <- (max(data$x) - min(data$x)) / (1 + 3.3 * log10(length(data$x)))

# Calculate the bin width using Scott's rule
bin_width_scott <- 3.49 * sd(data$x) / length(data$x)^(1/3)

# Create histograms using the different bin widths
hist(data$x, breaks = seq(min(data$x), max(data$x), by = bin_width_fd))
hist(data$x, breaks = seq(min(data$x), max(data$x), by = bin_width_sturgess))
hist(data$x, breaks = seq(min(data$x), max(data$x), by = bin_width_scott))

Python

import numpy as np
import matplotlib.pyplot as plt

# Load the data
data = np.array([1, 2, 3, 4, 5])

# Calculate the IQR
IQR = np.percentile(data, 75) - np.percentile(data, 25)

# Calculate the bin width using the Freedman-Diaconis rule
bin_width_fd = 2 * IQR / np.power(len(data), 1/3)

# Calculate the bin width using Sturges' rule
bin_width_sturgess = (np.max(data) - np.min(data)) / (1 + 3.3 * np.log10(len(data)))

# Calculate the bin width using Scott's rule
bin_width_scott = 3.49 * np.std(data) / np.power(len(data), 1/3)

# Create histograms using the different bin widths
plt.hist(data, bins=np.arange(min(data), max(data), bin_width_fd))
plt.hist(data, bins=np.arange(min(data), max(data), bin_width_sturgess))
plt.hist(data, bins=np.arange(min(data), max(data), bin_width_scott))
plt.show()

Conclusion
The Freedman-Diaconis rule, Sturges’ rule, and Scott’s rule are three common rules of thumb that can be used to determine the bin width for

--

--

Nkugwa Mark William

Nkugwa Mark William is a Chemical and Process engineer , entrepreneur, software engineer and a technologists with Apps on google play store and e commerce sites