vinaitheerthan


Tutorial on Introduction to biostatistics

Table of contents



Maximum Likelihood Estimation and Likelihood Ratio test revisited

Download PDF

1.    Introduction

Maximum likelihood Estimation is an important aspect of frequentist approach which was introduced by RA Fisher [1]. Maximum Likelihood estimation method helps us to find the estimator for the unknown population parameter. There are other methods of estimation also available such as Least Square Estimation and Bayesian Estimation methods but Maximum Likelihood Estimation is the widely used method to estimate the parameters. This paper provides an overview of Maximum Likelihood Method with example to calculate a Maximum Likelihood Estimator from a sample data set.

2.    Maximum Likelihood Estimation Method

 Let X is a random variable with probability mass function P(X/θ) where θ is the parameter of the distribution. Let X1, X2…..Xn be the observation from the given sample. Then the joint probability or likelihood function is defined as

P(X1….Xn/ θ) = P(X1/ θ)  x P(X2/ θ)……. P(Xn/ θ) …………(1)

Equation (1) is also a likelihood function and can be written as

             n

L(θ ) = πP(Xi; θ) = P(x1, θ).  P(x2, θ)….. P(xn, θ) …………(2)

             I=1    

                                                                 

The maximum likelihood estimator θ is defined the value of the parameter θ which maximizes the likelihood function. It will be easier to maximize the log of the likelihood function instead of the likelihood function directly

  

                  n

Log L(θ) = ∑ log P(Xi, θ)  …………………………………………………..(3)

                 I=1                                                                            

Let us take the case of a binomial variable X with one instancex1and one unknown parameter P.  Then the above equations (1) to (3) can be written as follows

P(X1/ θ) = P(X1/ θ) ……………………………………………………….(4)

           

L(θ ) = πP(Xi; θ) = P(x1, θ)………………………………….……...(5)

             I=1    

 

                

Log L(θ) = ∑ log P(Xi, θ)  …………………………………………………(6)

                 I=1                                                                            

Here the Binomial variable X takes value 0,1 (Success or failure in a trail) and n is the total number of trails. For example if there are 10 trails and we get 3 success out of that 10 trails then the probability of observing 3 successes in 10 trails is given by

 

P(X=3,p) =   10   p3(1-p)7…………………………………………………(7)

                         3                    

                                                          

We need to find out the value of p which maximizes the above equation (7) (Likelihood function L(p,3))

 

 

 

L(p,3)  =        10   p3(1-p)7………………………………………………….……(8)

                         3                    

 

 


Log L(P,3) =Log(  10   p3(1-p)7)…………………………………………………(9)

 

                                3                   

                   = 3 log p + 7 log(1-p) +log   10   ………………….………….(10)

                                                                     3

 

We need the maximum for the equation (8) or (9) which will be the maximum likelihood estimate for the parameter p. The table-1 below includes the maximum value of p for each value of X= 1, 2,..10 and n=10. For X=3 and n=10 and p=0.3, the likelihood equation (9) attains maximum i.e. 0.267.

                                                                                                                                                               

Hence the value of p=0.3 is the maximum likelihood estimate for the parameter p and p=X/n is the maximum likelihood estimator for p.

Table-1: Likelihood values for the (Binomial Parameter n=10 and variable x= 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) at different values for the parameter p [2]

x(Total  number of success in n trails)=np

n(number of trails)

n!

x!

n-x!

n!/x!(n-x)!

Probability of success in a single trial(p ranges from 0 <p< 1)

1

10

3628800

1

362880

10

0.0000

0.3874

0.27

0.12

0.04

0.01

0.00

0.00

0.00

0.00

0.00

2

10

3628800

2

40320

45

0.0000

0.19

0.3020

0.23

0.12

0.04

0.01

0.00

0.00

0.00

0.00

3

10

3628800

6

5040

120

0.0000

0.06

0.20

0.267

0.21

0.12

0.04

0.01

0.00

0.00

0.00

4

10

3628800

24

720

210

0.0000

0.01

0.09

0.20

0.2508

0.21

0.11

0.04

0.01

0.00

0.00

5

10

3628800

120

120

252

0.0000

0.00

0.03

0.10

0.20

0.2461

0.20

0.10

0.03

0.00

0.00

6

10

3628800

720

24

210

0.0000

0.00

0.01

0.04

0.11

0.21

0.2508

0.20

0.09

0.01

0.00

7

10

3628800

5040

6

120

0.0000

0.00

0.00

0.01

0.04

0.12

0.21

0.27

0.20

0.06

0.00

8

10

3628800

40320

2

45

0.0000

0.00

0.00

0.00

0.01

0.04

0.12

0.23

0.3020

0.19

0.00

9

10

3628800

362880

1

10

0.0000

0.00

0.00

0.00

0.00

0.01

0.04

0.12

0.27

0.3874

0.00

10

10

3628800

3628800

1

1

0.0000

0.00

0.00

0.00

0.00

0.00

0.01

0.03

0.11

0.35

0.000

 

 

From the above we can also find out the likelihood curve reaches a minimum at p=0.5 (success or failure) ,n=10 and x=5 !!

3.    Asymptotic (when the sample size is large) Properties of Maximum Likelihood estimators[3]

1.       Sufficiency

Maximum Likelihood Estimators (MLE) has sufficient information about the unknown population parameter.

2.       Consistency[4]

                     

              When the sample size is sufficiently large, probability of p = p will be 1 (when n tends to    infinity)

3.       Asymptotic normality

 

4.       Efficiency

MLE attains the Cramer Rao lower bound because of the fact that the MLE is consistent and asymptotically normal

 

Likelihood ratio test [5]

If we would like to test the hypothesis that the x follows a distribution with parameter θ1 against an alternative hypothesis θ2 then the likelihood ratio will help to us to test whether the parameters θ1 and θ2 are similar or not

  

Where L is the likelihood ratio and L will take value between 0 and 1.

We need to compute the χ2 to test the Likelihood ratio using the formula χ2 = 2 L. if theoretical value of the χ2 is greater than the calculated value i.e p>0.05, then we accept the null hypothesis that θ1 and θ2 are similar.

Conclusion

This paper revisited the maximum likelihood estimation test with examples and also description the likelihood ratio test.

 

References

[1]. Fisher, R. A. (1925, July). Theory of statistical estimation. In Mathematical Proceedings of the Cambridge Philosophical Society (Vol. 22, No. 05, pp. 700-725). Cambridge University Press.

[2]. Myung, I. J. (2003). Tutorial on maximum likelihood estimation. Journal of mathematical Psychology, 47(1), 90-100.

[3]. Self, S. G., & Liang, K. Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82(398), 605-610.

[4]. Kiefer, J., & Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. The Annals of Mathematical Statistics, 887-906.

[5]. Woolf, B. (1957). THE LOG LIKELIHOOD RATIO TEST (THE GTEST). Annals of human genetics, 21(4), 397-409.