Background
A data set is often reduced to a few numerical values such as minimum, maximum, average and some measure of variation, e.g. standard deviation. This simulation shows the derivation of the distributions of minimum and maximum for two different statistical distibutions.
Distributions of minimum and maximum
The two probability distributions (fmin and fmax) contains
the two basic functions f(x) and F(x),
i.e. the probability density function and the probability
distribution function, n is the sample size.
(Note that
min and max in a sample are called order
statistics, so is the median. Books about Order
statistics usually explain the two expressions):
The expressions show that a calculation of the average and standard deviation of min and max will depend on n and thus it is difficult to compare e.g. the min from different sized samples.
The probability density functions of the two distributions
The following codes and graphs show the distributions used for sampling of the min- and the max-values:
library(ggplot2) # Library for creating the graphs.
simGrp <- 2000 # Number of samples.
simperGrp <- 40 # Sample size.
simMu <- 50; simSigma <- 5; cellWidthNorm = 1 # Parameters for the normal distribution.
shape <- 2.5; scale <- 1.6; cellWidthWeib = 0.1 # Parameters for the Weibull distribution.
# ---------------- Normal pdf
fNorm <- function(xvar) {dnorm(xvar, simMu, simSigma)} # The normal 'curve'.
minNorm <- qnorm(0.0001, simMu, simSigma)
maxNorm <- qnorm(0.9999, simMu, simSigma)
pdfNorm <- ggplot(data.frame(x = c(minNorm, maxNorm)), aes(x = x)) + xlab("") + ylab("") +
stat_function(fun = fNorm, colour="black", linewidth=0.75 )
pdfNorm <- pdfNorm + theme(axis.text.x = element_text(size = 12), axis.text.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank())
pdfNorm <- pdfNorm + annotate("text", x = minNorm+0.75*(maxNorm-minNorm), y = Inf,
label="Normal\ndistribution", vjust = 1.5, size = 4, colour = "blue" )
pdfNorm
# ---------------- Weibull pdf
fWeib <- function(xvar) {dweibull(xvar, shape, scale)} # The Weibull 'curve'.
minWeib <- qweibull(0.0001, shape, scale)
maxWeib <- qweibull(0.9999, shape, scale)
pdfWeib <- ggplot(data.frame(x = c(minWeib, maxWeib)), aes(x = x)) + xlab("") + ylab("") +
stat_function(fun = fWeib, colour="black", linewidth=0.75 )
pdfWeib <- pdfWeib + theme(axis.text.x = element_text(size = 12), axis.text.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank())
pdfWeib <- pdfWeib + annotate("text", x = 1.5*mean(range(minWeib, maxWeib)), y = Inf, label="Weibull\ndistribution",
vjust = 1.2, size = 4, colour = "blue" )
pdfWeib
Comments. The normal distribution is always symmetric but the skewness of the Weibull distribution depends on its shape parameter. A shape of 3.6 gives an approximative symmetrical Weibull distribution.
Sampling and graphing the min- and the max-values
minNorm <- c(); maxNorm <- c(); aveNorm <- c() # Columns for storage.
minWeib <- c(); maxWeib <- c(); aveWeib <- c()
for (j in 1:simGrp) { # A DO-LOOP for creating the data.
resNorm <- rnorm(simperGrp, simMu, simSigma)
resWeib <- rweibull(simperGrp, shape, scale)
minNorm[j] <- min(resNorm); maxNorm[j] <- max(resNorm); aveNorm[j] <- mean(resNorm)
minWeib[j] <- min(resWeib); maxWeib[j] <- max(resWeib); aveWeib[j] <- mean(resWeib)
}
type <- c(rep(c("min"), times = simGrp), rep(c("max"), times = simGrp), rep(c("ave"), times = simGrp) )
minmaxNorm <- c(minNorm, maxNorm, aveNorm)
minmaxWeib <- c(minWeib, maxWeib, aveWeib)
data <- data.frame(minmaxNorm, minmaxWeib, type) # Storing all data in a data frame.
# --------------------- Histogram: min, average, max ---------------
minMin <- min(minNorm); maxMin <- max(minNorm); minMax <- min(maxNorm); maxMax <- max(maxNorm)
rubrik <- "Normal distribution: min, average, max"
funktionMin <- function(x){simperGrp*dnorm(x, simMu, simSigma)*(1 - pnorm(x, simMu, simSigma))^(simperGrp - 1)}
funktionMax <- function(x){simperGrp*dnorm(x, simMu, simSigma)*pnorm(x, simMu, simSigma)^(simperGrp - 1)}
histN <- ggplot(data, aes(minmaxNorm)) + geom_histogram(binwidth=cellWidthNorm, color='black', alpha = 0.5,
aes(y = after_stat(density), fill=type), position='identity')
histN <- histN + stat_function(fun=funktionMin, geom="line", color='black', linewidth=1.0, xlim=c(minMin, maxMin))
histN <- histN + stat_function(fun=funktionMax, geom="line", color='black', linewidth=1.0, xlim=c(minMax, maxMax))
histN <- histN + theme(legend.position = "none")
histN <- histN + theme(axis.text.x = element_text(size = 12), axis.text.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank())
histN <- histN + annotate("text", x = mean(range(minMin, maxMax)), y = Inf, label=rubrik,
vjust = 1.5, size = 4, colour = "blue" )
histN
minMin <- min(minWeib); maxMin <- max(minWeib); minMax <- min(maxWeib); maxMax <- max(maxWeib)
rubrik <- "Weibull distribution: min, average, max"
funktionMin <- function(x){simperGrp*dweibull(x, shape, scale)*(1 - pweibull(x, shape, scale))^(simperGrp - 1)}
funktionMax <- function(x){simperGrp*dweibull(x, shape, scale)*pweibull(x, shape, scale)^(simperGrp - 1)}
histW <- ggplot(data, aes(minmaxWeib)) + geom_histogram(binwidth=cellWidthWeib, color='black', alpha = 0.5,
aes(y = after_stat(density), fill = type), position = 'identity')
histW <- histW + stat_function(fun=funktionMin, geom="line", color='black', linewidth=1.0, xlim=c(minMin, maxMin))
histW <- histW + stat_function(fun=funktionMax, geom="line", color='black', linewidth=1.0, xlim=c(minMax, maxMax))
histW <- histW + theme(legend.position = "none")
histW <- histW + theme(axis.text.x = element_text(size = 12), axis.text.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank())
histW <- histW + annotate("text", x = mean(range(minMin, maxMax)), y = Inf, label=rubrik,
vjust = 1.5, size = 4, colour = "blue" )
histW
Comments. In the normal distribution the min is negatively
skewed while the max is positively skewed. The average, the
middle histogram, is exactly normal distributed and its mean and
variance can easily be calculated.
The shape of min and
max of the Weibull distribution is determined by the
shape-parameter. Values > 3.6 will make the Weibull
distribution negatively skewed. Because of the so-called Central
Limiting theorem (CLT) the histogram of the Weibull-averages tend to
the normal distribution as n increases.
Final comment. Beacuse the min and max often are reported, it is tempting to use these values to draw conclusions about a process. The histograms above show however that these values show a large variation even if the parameters of the process are unchanged. The average stands out as the most reliable value when considering the location of the process.