Background

A data set is often reduced to a few numerical values such as minimum, maximum, average and some measure of variation, e.g. standard deviation. This simulation shows the derivation of the distributions of minimum and maximum for two different statistical distibutions.


Distributions of minimum and maximum

The two probability distributions (fmin and fmax) contains the two basic functions f(x) and F(x), i.e. the probability density function and the probability distribution function, n is the sample size.
(Note that min and max in a sample are called order statistics, so is the median. Books about Order statistics usually explain the two expressions):


\[ \begin{align*} \Large f{min}=n\cdot f(x) \cdot(1-F(x))^{n-1} & \phantom{invisible} \Large f{max}=n\cdot f(x) \cdot F(x)^{n-1} \end{align*} \]

The expressions show that a calculation of the average and standard deviation of min and max will depend on n and thus it is difficult to compare e.g. the min from different sized samples.


The probability density functions of the two distributions

The following codes and graphs show the distributions used for sampling of the min- and the max-values:

library(ggplot2)                                     # Library for creating the graphs.

simGrp    <- 2000                                    # Number of samples.
simperGrp <- 40                                      # Sample size.

simMu <- 50;   simSigma <- 5; cellWidthNorm = 1      # Parameters for the normal distribution.
shape <- 2.5; scale <- 1.6;  cellWidthWeib = 0.1     # Parameters for the Weibull distribution.


# ---------------- Normal pdf

fNorm   <- function(xvar) {dnorm(xvar, simMu, simSigma)}           # The normal 'curve'.
minNorm <- qnorm(0.0001, simMu, simSigma)
maxNorm <- qnorm(0.9999, simMu, simSigma) 
pdfNorm <- ggplot(data.frame(x = c(minNorm, maxNorm)), aes(x = x)) + xlab("") + ylab("") +
                  stat_function(fun = fNorm, colour="black", linewidth=0.75 ) 
pdfNorm <- pdfNorm + theme(axis.text.x  = element_text(size = 12), axis.text.y = element_blank(),  
                           axis.title.x = element_blank(), axis.title.y = element_blank())
pdfNorm <- pdfNorm + annotate("text", x = minNorm+0.75*(maxNorm-minNorm), y = Inf, 
                              label="Normal\ndistribution", vjust = 1.5, size = 4, colour = "blue" ) 
pdfNorm

# ---------------- Weibull pdf

fWeib   <- function(xvar) {dweibull(xvar, shape, scale)}          # The Weibull 'curve'.
minWeib <- qweibull(0.0001, shape, scale)
maxWeib <- qweibull(0.9999, shape, scale) 
pdfWeib <- ggplot(data.frame(x = c(minWeib, maxWeib)), aes(x = x)) + xlab("") + ylab("") +
                  stat_function(fun = fWeib, colour="black", linewidth=0.75 ) 
pdfWeib <- pdfWeib + theme(axis.text.x  = element_text(size = 12), axis.text.y = element_blank(),  
                       axis.title.x = element_blank(), axis.title.y = element_blank())
pdfWeib <- pdfWeib + annotate("text", x = 1.5*mean(range(minWeib, maxWeib)), y = Inf, label="Weibull\ndistribution",
                              vjust = 1.2, size = 4, colour = "blue" )
pdfWeib

Comments. The normal distribution is always symmetric but the skewness of the Weibull distribution depends on its shape parameter. A shape of 3.6 gives an approximative symmetrical Weibull distribution.


Sampling and graphing the min- and the max-values

minNorm <- c(); maxNorm <- c(); aveNorm <- c()                    # Columns for storage.
minWeib <- c(); maxWeib <- c(); aveWeib <- c()

for (j in 1:simGrp) {                                             # A DO-LOOP for creating the data.
 resNorm <- rnorm(simperGrp, simMu, simSigma)
 resWeib <- rweibull(simperGrp, shape, scale)
 minNorm[j] <- min(resNorm); maxNorm[j] <- max(resNorm); aveNorm[j] <- mean(resNorm)
 minWeib[j] <- min(resWeib); maxWeib[j] <- max(resWeib); aveWeib[j] <- mean(resWeib)
}

type <- c(rep(c("min"), times = simGrp), rep(c("max"), times = simGrp), rep(c("ave"), times = simGrp) )

minmaxNorm <- c(minNorm, maxNorm, aveNorm)
minmaxWeib <- c(minWeib, maxWeib, aveWeib)
data       <- data.frame(minmaxNorm, minmaxWeib, type)            # Storing all data in a data frame.


# --------------------- Histogram: min, average, max ---------------

minMin <- min(minNorm); maxMin <- max(minNorm); minMax <- min(maxNorm); maxMax <- max(maxNorm)

rubrik <- "Normal distribution: min, average, max"

funktionMin <- function(x){simperGrp*dnorm(x, simMu, simSigma)*(1 - pnorm(x, simMu, simSigma))^(simperGrp - 1)}
funktionMax <- function(x){simperGrp*dnorm(x, simMu, simSigma)*pnorm(x, simMu, simSigma)^(simperGrp - 1)}

histN <- ggplot(data, aes(minmaxNorm)) + geom_histogram(binwidth=cellWidthNorm, color='black', alpha = 0.5, 
                                          aes(y = after_stat(density), fill=type), position='identity')
histN <- histN + stat_function(fun=funktionMin, geom="line", color='black', linewidth=1.0, xlim=c(minMin, maxMin))
histN <- histN + stat_function(fun=funktionMax, geom="line", color='black', linewidth=1.0, xlim=c(minMax, maxMax))
histN <- histN + theme(legend.position = "none")
histN <- histN + theme(axis.text.x  = element_text(size = 12), axis.text.y = element_blank(),  
                       axis.title.x = element_blank(), axis.title.y = element_blank())
histN <- histN + annotate("text", x = mean(range(minMin, maxMax)), y = Inf, label=rubrik,
                        vjust = 1.5, size = 4, colour = "blue" )
histN

minMin <- min(minWeib); maxMin <- max(minWeib); minMax <- min(maxWeib); maxMax <- max(maxWeib)

rubrik <- "Weibull distribution: min, average, max"

funktionMin <- function(x){simperGrp*dweibull(x, shape, scale)*(1 - pweibull(x, shape, scale))^(simperGrp - 1)}
funktionMax <- function(x){simperGrp*dweibull(x, shape, scale)*pweibull(x, shape, scale)^(simperGrp - 1)}

histW <- ggplot(data, aes(minmaxWeib)) + geom_histogram(binwidth=cellWidthWeib, color='black', alpha = 0.5, 
                                          aes(y = after_stat(density), fill = type), position = 'identity')
histW <- histW + stat_function(fun=funktionMin, geom="line", color='black', linewidth=1.0, xlim=c(minMin, maxMin))
histW <- histW + stat_function(fun=funktionMax, geom="line", color='black', linewidth=1.0, xlim=c(minMax, maxMax))
histW <- histW + theme(legend.position = "none")
histW <- histW + theme(axis.text.x  = element_text(size = 12), axis.text.y = element_blank(),  
                       axis.title.x = element_blank(), axis.title.y = element_blank())
histW <- histW + annotate("text", x = mean(range(minMin, maxMax)), y = Inf, label=rubrik,
                          vjust = 1.5, size = 4, colour = "blue" )
histW

Comments. In the normal distribution the min is negatively skewed while the max is positively skewed. The average, the middle histogram, is exactly normal distributed and its mean and variance can easily be calculated.
The shape of min and max of the Weibull distribution is determined by the shape-parameter. Values > 3.6 will make the Weibull distribution negatively skewed. Because of the so-called Central Limiting theorem (CLT) the histogram of the Weibull-averages tend to the normal distribution as n increases.


Final comment. Beacuse the min and max often are reported, it is tempting to use these values to draw conclusions about a process. The histograms above show however that these values show a large variation even if the parameters of the process are unchanged. The average stands out as the most reliable value when considering the location of the process.