This document shows how a Weibull variable can be expressed as a sum of a constant and a variable from the so-called extreme value distribution. This will later be used for regression analysis of time data where explanatory variables can be used to explain the data.
A normal variable. A normal variable X can be expressed as the sum of a mean value (μ) and a normally distributed residual variable, here denoted as N(0, σ), the left expression below. But the constant μ can be a function of X-variables, the middle expression. In the right expression a simple linear function is used and this is the basic form of a so-called linear regression:
\[ \begin{align} \Large Y = \text{} \mu \text{ } + \text{ } N(0,\sigma) & \phantom{utrym}\Large Y = \text{} \mu(\textbf{x})\text{ } + \text{ } N(0,\sigma) & \phantom{ut}\Large Y = \text{} \beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{2} \text{ } + \text{ } N(0,\sigma) \end{align} \]
A Weibull variable. A Weibull variable (often designated by T for time) can be written in a similar way after being transformed by Y = ln(T). The Weibull distribution has two parameters called shape and scale, here designated by the letters a and b. See also  https://ovn.ing-stat.se/ [7].). W is the standard extreme value distribution with expected value -0.577…
As in the case of the normal variable, the log of T can be expressed as a linear combination of X-variables:
\[ \begin{align} \Large Y = \text{} \ln(b) \text{ } + \text{ } W/a & \phantom{utrym}\Large Y = \text{} \mu(\textbf{x})\text{ } + \text{ } W/a & \phantom{ut}\Large Y = \text{} \beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{2} \text{ } + \text{ } W/a \end{align} \]Simulating a Weibull variable (T) and calculating its logarithm (Y)
Below a set of Weibull-data is created and shown as a histogram together with the logarithm of the same data set.
library(weibullness)
library(MASS)
library(patchwork)
library(ggplot2)
n <- 1000
shape <- 2
scale <- 5
trueMean <- round(scale * gamma(1 + 1/shape), 2) # True mean and std, rounded.
trueStd <- round(sqrt(scale**2 * (gamma(1 + 2/shape) - gamma(1 + 1/shape)**2)), 2)
print(paste0("True mean = ", trueMean, ", True standard deviation = ", trueStd))
## [1] "True mean = 4.43, True standard deviation = 2.32"
T <- rweibull(n, shape, scale) # Simulating Weibull-data, T.
Y <- log(T) # Creating the Y-variable.
Ydata <- data.frame(T, Y)
histtime <- ggplot(Ydata, aes(T)) + geom_histogram(bins=25, color='black', alpha = 0.5,
aes(y = after_stat(density)), position = 'identity') +
xlab("Weibull data (T)")
histtime <- histtime + annotate("text", x = 5, y = Inf, label="Weibull-data", vjust = 1.5, size = 4.,
colour = "blue" )
histlog <- ggplot(Ydata, aes(Y)) + geom_histogram(bins=25, color='black', alpha = 0.5,
aes(y = after_stat(density)), position = 'identity') +
xlab("log of Weibull data (Y)")
histlog <- histlog + annotate("text", x = 0, y = Inf, label="Log of Weibull-data", vjust = 1.5, size = 4.0,
colour = "blue" )
histtime + histlog
Comments to the two histograms. The Weibull-data is completely on the positive X-axis (negative values are impossible). When taking the logaritm of the data the histogram becomes negatively skewed and the data is on both the negative and positive X-axis. The right histogram shows similarities with the standard extreme value distribution, see below.
The standard extreme value distribution
Below a set of data from the standard extreme value distribution is created and shown as a histogram together with its probability density function (pdf). The true mean is negative of the Euler constant, i.e. -0.5772…
# ----------------------------- 'Standard extreme value distribution'
# Engineering Statistics Handbook 1.3.6.6.16
# https://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm
# f(x) = exp(x-exp(x))
# E(X) = mu + 0.5772*beta
# S(X) = beta*Pi/sqrt(6)
#
# For 'Standard...', mu = 0 and beta = 1
extremefx <- function(x) {exp(x-exp(x)) } # Page 274 of 'Statistical Models...' by J. F. Lawless
p <- runif(n)
simtal <- log(-log(p)) # Simulating of 'Standard...'. Uses the inverse of F(x).
extremData <- data.frame(simtal)
histextreme <- ggplot(extremData, aes(simtal)) + geom_histogram(bins=25, color='black', alpha = 0.5,
aes(y = after_stat(density)), position = 'identity')
histextreme <- histextreme + stat_function(fun=extremefx, color='red', linewidth=1.0 ) + xlab("Data simulated values from 'standard extreme value distribution'")
histextreme <- histextreme + annotate("text", x = -4, y = Inf, label="Data simulated from extreme\nvalue distribution and its f(x)", vjust = 1.5, size = 4.5, colour = "blue")
histextreme
Comments to the histogram. The graph shows data from the extreme value distribution and it shows simularities to the logarithm om Weibull-data above. The red line is the probability distribution of the extreme value distribution.
Creating Weibull-data using the standard extreme value distribution
Below a set of Weibull-data is created by the following expression:and shown as a histogram together with its probability distribution (pdf).
The Y-variable is then transformed using T = (Y) becoming a Weibull-variable (‘Time’) and shown as a histogram together with its probability distribution (pdf).
WeibEst <- exp(log(scale) + 1/shape*simtal) # Simulating Weibull-data using 'extreme-data' stored in 'simtal'.
Weib <- data.frame(WeibEst) # Puts the simulated 'WeibEst'-data in a so-called data frame.
histWeibExtr <- ggplot(Weib, aes(WeibEst)) + geom_histogram(bins=25, color='black', alpha = 0.5,
aes(y = after_stat(density)), position = 'identity')
histWeibExtr <- histWeibExtr + stat_function(fun=dweibull, args=list(shape, scale), color='red', linewidth=1.0 ) + xlab("Time")
histWeibExtr <- histWeibExtr + annotate("text", x = 7, y = Inf, label="Weibull-estimate, by 'Extreme value'", vjust = 1.5, size = 4.5, colour = "blue")
histWeibExtr
The diagram below shows a good fit of the data to the Weibull-diagram. In addition, the estimated parameters are shown including the average and standard deviation of the data. These valuesa coincide well to the theoretical values.
wp.plot(WeibEst, main="Weibull, created by using 'Extreme'", col.line = "red", xlab = "Time", lty.line=1, pch=19)
fitdistr(WeibEst, "weibull")
## shape scale
## 2.06754046 5.05498431
## (0.05030253) (0.08149280)
print(paste0("Estimated mean = ",round(mean(WeibEst),2), ", Estimated standard deviation = ",round(sd(WeibEst),2)))
## [1] "Estimated mean = 4.48, Estimated standard deviation = 2.27"
Final comments.This document shows how a Weibull variable can be expressed as the sum of a constant and a standard extreme value distribution. This is the basic idea for performing regression analysis and is explored further in other entries on the home page.