Poisson and 0-truncated Poisson distributions - Estimating the parameter by MLE (Maximum Likelihood Estimator)

The different features of the Poisson distribution are well known but the 0-truncated Poisson is perhaps less known. 0-truncated means that there are no 0:s in the data. There can be several reasons why the 0:s are omitted. In one factory the 0:s were considered unnecessary and thus incorrectly omitted. In another study the number of items bought by people in a shopping queue was recorded. Obviously, no 0-values.
There are several methods for estimating the value of the λ-parameter. In 1912 Sir R Fisher coined the expression Maximum Likelihood estimator (discussed below).

Below some important expressions are listed for the two distributions.

Ordinary Poisson distribution

The probability distribution is a discrete distribution for X > 0 and contains only one parameter, λ:

\[ \begin{align} \Large P(X=x) =\frac{\lambda^x e^{-\lambda}}{x!} \end{align} \]

The Likelihood function is simply the product of the probabilities, one for each of the measurements:

\[ \begin{align} \Large L(\lambda) =\prod_{i = 1}^{n} \frac{\lambda^{x_i} e^{-\lambda}}{x_i!} \end{align} \]

The mathematical problem is now to find the λ-value that maximizes the L(λ)-function. Usually, this function is rewritten by using logarithms:

\[ \begin{align} \Large l(\lambda) = -n\lambda + ln(\lambda)\sum_{i = 1}^{n} x_i-\sum_{i = 1}^{n} ln(x_i!) \end{align} \]

The next mathematical step is to find the derivative of the L(λ)-function:

\[ \begin{align} \Large \frac{\partial l(\lambda)}{\partial \lambda} = -n + \frac{1}{\lambda}\sum_{i = 1}^{n} x_i \end{align} \]

By setting the derivative above equal to 0, the MLE can be found. (The ‘hat’ on the λ-value indicates that it is an estimate based on the current X-values):

\[ \begin{align} \Large \hat{\lambda}=\frac{\sum_{}^{} x_i}{n} \end{align} \]

Expected value E(X) and the variance E(X) are as follows:

\[ \begin{align} \Large E(X)=\lambda \end{align} \]

\[ \begin{align} \Large V(X)=\lambda \end{align} \]

The 0-truncated Poisson distribution

When the probability of X = 0 is removed the rest of the probabilities must be adjusted (raised) otherwise it would not be a proper probability distribution (must add to 1). The following constant is used:

\[ \begin{align} \Large constant=\frac{1{} }{1-P(X=0)}=\frac{1{} }{1-e^{-\lambda}} \end{align} \]

Expected value E(X) and the variance E(X) for the 0-truncated Poisson distribution are as follows:

\[ \begin{align} \Large E(X)=\left(\frac{1{} }{1-e^{-\lambda}}\right)\cdot\lambda \end{align} \]

\[ \begin{align} \Large V(X)=\left(\frac{1{} }{1-e^{-\lambda}}\right)^2\cdot\lambda \end{align} \]

Deriving the MLE using R-commands and some graphs

1. MLE for an ordinary Poisson distribution

library("ggplot2")                 # Loads the ggplot2-package for graphics

n     <- 40                        # Number of simulated X-values.
lam   <- 2.2                       # The true papametwr value.
Xdata <- rpois(n, lam)             # Creating the data.

MLE   <- mean(Xdata)               # MLE-estimate for an oprdinary Poisson distribution.

library(ggplot2)

undre <- 1                         # Limits for a loop and X-axis of graph.
ovre  <- 4
step  <- (ovre - undre)/200        # Step size.

lamlista <- c()                    # To store lambda-values.
likeli <- c()                      # To store X-values.
raknare <- 1                       # A counter.

for (lamvarde in seq(undre, ovre, by = step)) {  # A loop for calculating L(lambda) for various lambda-values.
  lamlista[raknare] <- lamvarde 
  likeli[raknare] <- prod((lamvarde^Xdata * exp(-lamvarde))/factorial(Xdata))
  raknare <- raknare + 1
}

toppMax <- max(likeli)                          # Lowering the top of the curve.
likeli <-  likeli*0.95

resultat <- data.frame(lamlista, likeli)        # Creating a data frame for the plotting.

grafordPo <- ggplot(resultat, aes(x = lamlista, y = likeli)) + geom_line() + xlim(undre, ovre) + 
          ylab('Likelihood function') + xlab('lambda values') +
   geom_vline(xintercept = MLE, linetype = "dotted", color = "grey", linewidth = 1) + 
              theme(axis.text.y=element_blank()) + 
   geom_vline(xintercept = lam, linetype = "dotted", color = "red", linewidth = 1) + 
              theme(axis.text.y=element_blank()) + ylim(0, toppMax) +
   geom_point(x = MLE, y = 0, shape = 17, colour = "grey", size = 4.5) + geom_point(x = lam, y = 0, 
              shape = 17, colour = "red", size = 4.5) + 
          annotate("text", x = -Inf, y = Inf, label="Likelihood function ordinary Poisson", vjust = 1.8, 
                   hjust = -0.05, size = 6, colour = "blue" ) + 
          annotate("text", x = -Inf, y = Inf, label="Red: true lambda", vjust = 3.8, hjust = -0.1, 
                   size = 6, colour = "red" ) +
          annotate("text", x = -Inf, y = Inf, label="Grey: MLE", vjust = 5.8, hjust = -0.15, 
                   size = 6, colour = "grey" )
grafordPo

Comments. The curve above takes its maximum at the MLE, i.e. the ‘best’ value of the λ-value.

2. Logarithm of MLE for an ordinary Poisson distribution

undre <- 1
ovre  <- 4

basicPoLn <- function(lam) {-n*lam + log(lam) * sum(Xdata) - sum(factorial(Xdata))}

graf <- ggplot() + xlim(undre, ovre) + geom_vline(xintercept = MLE, linetype = "dotted", color = "grey", 
                                                  linewidth = 1) +                 # vertical line at MLE
                                       geom_vline(xintercept = lam, linetype = "dotted", color = "red", 
                                                  linewidth = 1) +                 # vertical line at true lambda.
                                       theme(axis.text.y=element_blank()) +
   annotate("text", x = -Inf, y = Inf, label="Log likelihood function ordinary Poisson", vjust = 1.8, 
            hjust = -0.05, size = 6, colour = "blue" ) + 
  annotate("text", x = -Inf, y = Inf, label="Red: true lambda", vjust = 3.8, hjust = -0.1, size = 6, 
            colour = "red" ) +
  annotate("text", x = -Inf, y = Inf, label="Grey: MLE", vjust = 5.8, hjust = -0.15, size = 6, 
           colour = "grey" ) +
  geom_function(fun = basicPoLn, colour = 'blue', linewidth = 1, linetype = 1) + ylab('Log likelihood function') +
  xlab('lambda values') + theme(axis.text.y=element_blank())
graf

Comments. The curve above is the log of the likelihood. The reason for using the log (instead of the original expession) is that it is easier to use the tool of derivative on a log-function, see next step.

3. Derivative of logarithm of MLE for an ordinary Poisson distribution

derivatabasicPoLn <- function(lam) {-n + 1/lam * sum(Xdata) }

graf <- ggplot() + xlim(undre, ovre) + geom_hline(yintercept = 0, linetype = "dotted", color="black", 
                                                  linewidth=1) + 
             geom_vline(xintercept = MLE, linetype = "dotted", color = "grey", linewidth = 1) + 
             theme(axis.text.y=element_blank()) +
             geom_vline(xintercept = lam, linetype = "dotted", color = "red", linewidth = 1) + 
             theme(axis.text.y=element_blank()) +
  annotate("text", x = -Inf, y = Inf, label="Derivative of log likelihood function ordinary Poisson", 
           vjust = 1.8, hjust = -0.05, size = 6, colour = "blue" ) + 
  annotate("text", x = -Inf, y = Inf, label="Red: true lambda", vjust = 3.8, hjust = -0.1, size = 6, 
           colour = "red" ) +
  annotate("text", x = -Inf, y = Inf, label="Grey: MLE", vjust = 5.8, hjust = -0.15, size = 6, colour = "grey" ) +
  annotate("text", x = -Inf, y = 0, label="0", vjust = -0.15, hjust = -0.2, size = 6, colour = "black" )

graf <- graf + geom_function(fun = derivatabasicPoLn, colour = 'red', linewidth = 1, linetype = 1) + 
               ylab('') + xlab('lambda values')
graf

MLEest <- mean(Xdata)                             # The MLE of a Poisson is equal to the mean of the X-data.
paste0("Mean of X-data (= MLE for an ordinary Poisson distribution = ", MLEest)

## [1] "Mean of X-data (= MLE for an ordinary Poisson distribution = 2.325"

Comments. The curve above (the derivative) crosses the 0-line at MLE. This value is also equal to the mean of the data.

The 0-truncated Poisson distribution

The probability distribution is here multiplied by a constant. The maximum of L(λ) is obtained by R-commands.

\[ \begin{align} \Large L(\lambda) =\prod_{i = 1}^{n} \frac{1{} }{1-e^{-\lambda}}\cdot\frac{\lambda^{x_i} e^{-\lambda}}{x_i!} \end{align} \]

4. MLE-function for a 0-truncated Poisson distribution

n     <- 50
lam   <- 1.2
Xdata <- rpois(n, lam)

Xdata <- Xdata[Xdata > 0]             # Removes 0-values from the data.

undre <- 0.1
ovre  <- 3

lamlista <- c()
likeli   <- c()
raknare  <- 1

for (lamvarde in seq(undre, ovre, by = 0.01)) {
   lamlista[raknare] <- lamvarde 
   likeli[raknare] <- prod( 1/(1-exp(-lamvarde)) * lamvarde^Xdata * exp(-lamvarde)/factorial(Xdata))
   raknare <- raknare + 1
}

resultat <- data.frame(lamlista, likeli)

maxrad <- which.max(resultat$likeli)
maxlam <- resultat$lamlista[maxrad]          # lam-value giving highest of 'max likelihood function'.

# ----- Finds maximum och 'Likelihood function' by optimize-function in R.
likelihood <- function(lamvarde) { prod( 1/(1-exp(-lamvarde)) * lamvarde^Xdata * exp(-lamvarde)/factorial(Xdata)) }
answer <-  optimize(likelihood, interval = c(undre, ovre), maximum = TRUE)
x_max <- answer$maximum
# ----------------------

toppMax <- max(likeli)            # Lowering the top of the curve.
likeli <-  likeli*0.95

resultat <- data.frame(lamlista, likeli)

graftruncPo <- ggplot(resultat, aes(x = lamlista, y = likeli)) + geom_line() + xlim(undre, ovre) + 
                      ylab('Likelihood function') + xlab('lambda values') +
  geom_vline(xintercept = x_max, linetype = "dotted", color = "grey", linewidth = 1) + 
                      theme(axis.text.y=element_blank()) +
  geom_vline(xintercept = lam, linetype = "dotted", color = "red", linewidth = 1) + 
                      theme(axis.text.y=element_blank()) + ylim(0, toppMax) +
  geom_point(x = x_max, y = 0, shape = 17, colour = "grey", size=4.5) + geom_point(x = lam, y = 0, 
                                                              shape = 17, colour = "red", size = 4.5) + 
  annotate("text", x = -Inf, y = Inf, label="Likelihood function 0-truncated Poisson", vjust = 1.8, 
                      hjust = -0.05, size = 6, colour = "blue" ) + 
  annotate("text", x = -Inf, y = Inf, label="Red: true lambda", vjust = 3.8, hjust = -0.1, size = 6, 
              colour = "red" ) +
  annotate("text", x = -Inf, y = Inf, label="Grey: MLE", vjust = 5.8, hjust = -0.15, size = 6, colour = "grey" )
  graftruncPo

Comments. The maximum of the curve above is obtained by using an ‘optimize’-function of R.

Final remarks. This document tries to illuminate the idea of Maximum Likelihood as a method of estimating a parameter. This method can of course also be used for other probability distributions.