This document shows how a Weibull variable can be expressed as a sum of a constant and a variable from the so-called extreme value distribution. This will later be used for regression analysis of time data where explanatory variables can be used to explain the data.


A normal variable. A normal variable X can be expressed as the sum of a mean value (μ) and a normally distributed residual variable, here denoted as N(0, σ), the left expression below. But the constant μ can be a function of X-variables, the middle expression. In the right expression a simple linear function is used and this is the basic form of a so-called linear regression:


\[ \begin{align} \Large Y = \text{} \mu \text{ } + \text{ } N(0,\sigma) & \phantom{utrym}\Large Y = \text{} \mu(\textbf{x})\text{ } + \text{ } N(0,\sigma) & \phantom{ut}\Large Y = \text{} \beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{2} \text{ } + \text{ } N(0,\sigma) \end{align} \]



A Weibull variable. A Weibull variable (often designated by T for time) can be written in a similar way after being transformed by Y = ln(T). The Weibull distribution has two parameters called shape and scale, here designated by the letters a and b. See also  https://ovn.ing-stat.se/ [7].). W is the standard extreme value distribution with expected value -0.577…

As in the case of the normal variable, the log of T can be expressed as a linear combination of X-variables:

\[ \begin{align} \Large Y = \text{} \ln(b) \text{ } + \text{ } W/a & \phantom{utrym}\Large Y = \text{} \mu(\textbf{x})\text{ } + \text{ } W/a & \phantom{ut}\Large Y = \text{} \beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{2} \text{ } + \text{ } W/a \end{align} \]

Simulating a Weibull variable (T) and calculating its logarithm (Y)

Below a set of Weibull-data is created and shown as a histogram together with the logarithm of the same data set.


library(weibullness)
library(MASS)
library(patchwork)
library(ggplot2)

n      <- 1000
shape  <- 2
scale  <- 5

trueMean <- round(scale * gamma(1 + 1/shape), 2)  # True mean and std, rounded.
trueStd  <- round(sqrt(scale**2 * (gamma(1 + 2/shape) - gamma(1 + 1/shape)**2)), 2)

print(paste0("True mean = ", trueMean, ", True standard deviation = ", trueStd))
## [1] "True mean = 4.43, True standard deviation = 2.32"
T <- rweibull(n, shape, scale)             # Simulating Weibull-data, T.
Y <- log(T)                                # Creating the Y-variable.

Ydata <- data.frame(T, Y)

histtime <- ggplot(Ydata, aes(T)) + geom_histogram(bins=25, color='black', alpha = 0.5, 
                                               aes(y = after_stat(density)), position = 'identity') + 
                                                   xlab("Weibull data (T)")
histtime <- histtime + annotate("text", x = 5, y = Inf, label="Weibull-data", vjust = 1.5, size = 4., 
                                                 colour = "blue" )

histlog <- ggplot(Ydata, aes(Y)) + geom_histogram(bins=25, color='black', alpha = 0.5, 
                                     aes(y = after_stat(density)), position = 'identity') +
                                     xlab("log of Weibull data (Y)")
histlog <- histlog + annotate("text", x = 0, y = Inf, label="Log of Weibull-data", vjust = 1.5, size = 4.0,
                                  colour = "blue" )

histtime + histlog

Comments to the two histograms. The Weibull-data is completely on the positive X-axis (negative values are impossible). When taking the logaritm of the data the histogram becomes negatively skewed and the data is on both the negative and positive X-axis. The right histogram shows similarities with the standard extreme value distribution, see below.



The standard extreme value distribution

Below a set of data from the standard extreme value distribution is created and shown as a histogram together with its probability density function (pdf). The true mean is negative of the Euler constant, i.e. -0.5772…


# ----------------------------- 'Standard extreme value distribution'
#                                Engineering Statistics Handbook  1.3.6.6.16
#                                https://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm
#   f(x) = exp(x-exp(x))
#   E(X) = mu + 0.5772*beta
#   S(X) = beta*Pi/sqrt(6)
#
#   For 'Standard...',  mu = 0 and beta = 1

extremefx <- function(x) {exp(x-exp(x)) }           # Page 274 of 'Statistical Models...' by J. F. Lawless

p      <- runif(n)
simtal <- log(-log(p))                              # Simulating of 'Standard...'. Uses the inverse of F(x).

extremData <- data.frame(simtal)

histextreme <- ggplot(extremData, aes(simtal)) + geom_histogram(bins=25, color='black', alpha = 0.5, 
                                                            aes(y = after_stat(density)), position = 'identity')
histextreme <- histextreme + stat_function(fun=extremefx, color='red', linewidth=1.0 ) + xlab("Data simulated values from 'standard extreme value distribution'")
histextreme <- histextreme + annotate("text", x = -4, y = Inf, label="Data simulated from extreme\nvalue distribution and its f(x)", vjust = 1.5, size = 4.5, colour = "blue")
histextreme


Comments to the histogram. The graph shows data from the extreme value distribution and it shows simularities to the logarithm om Weibull-data above. The red line is the probability distribution of the extreme value distribution.



Creating Weibull-data using the standard extreme value distribution

Below a set of Weibull-data is created by the following expression:and shown as a histogram together with its probability distribution (pdf).


\[ \begin{align} \Large Y = \text{} \ln(b) \text{ } + \text{ } W/a \end{align} \]

The Y-variable is then transformed using T = (Y) becoming a Weibull-variable (‘Time’) and shown as a histogram together with its probability distribution (pdf).


WeibEst <- exp(log(scale) + 1/shape*simtal)     # Simulating Weibull-data using 'extreme-data' stored in 'simtal'.

Weib <- data.frame(WeibEst)                     # Puts the simulated 'WeibEst'-data in a so-called data frame.

histWeibExtr <- ggplot(Weib, aes(WeibEst)) + geom_histogram(bins=25, color='black', alpha = 0.5, 
                                                         aes(y = after_stat(density)), position = 'identity')
histWeibExtr <- histWeibExtr + stat_function(fun=dweibull, args=list(shape, scale), color='red', linewidth=1.0 ) + xlab("Time")
histWeibExtr <- histWeibExtr + annotate("text", x = 7, y = Inf, label="Weibull-estimate, by 'Extreme value'", vjust = 1.5, size = 4.5, colour = "blue")
histWeibExtr

The diagram below shows a good fit of the data to the Weibull-diagram. In addition, the estimated parameters are shown including the average and standard deviation of the data. These valuesa coincide well to the theoretical values.


wp.plot(WeibEst, main="Weibull, created by using 'Extreme'", col.line = "red", xlab = "Time", lty.line=1, pch=19)

fitdistr(WeibEst, "weibull")
##      shape        scale   
##   2.06754046   5.05498431 
##  (0.05030253) (0.08149280)
print(paste0("Estimated mean = ",round(mean(WeibEst),2), ", Estimated standard deviation = ",round(sd(WeibEst),2)))
## [1] "Estimated mean = 4.48, Estimated standard deviation = 2.27"


Final comments.This document shows how a Weibull variable can be expressed as the sum of a constant and a standard extreme value distribution. This is the basic idea for performing regression analysis and is explored further in other entries on the home page.