Statistical Estimation - Maximum Likelihood Estimation (MLE)

 

📘 Statistical Estimation

When dealing with statistics, we usually have:

  • A population with unknown parameters (e.g., mean μ\mu, variance σ2\sigma^2, probability pp, etc.).

  • A sample of observations drawn from that population.

Since population parameters are unknown constants, we need to estimate them from sample data.


1. Point Estimation

A point estimator is a single statistic (function of sample observations) that provides a “best guess” of the parameter.

  • Example: Sample mean Xˉ=1nXi\bar{X} = \frac{1}{n}\sum X_i is an estimator of population mean μ\mu.

  • The obtained numerical value is called the point estimate.


2. Interval Estimation

Instead of one value, we provide an interval of plausible values with a given level of confidence.

  • Example:

    Xˉ±Zα/2σn\bar{X} \pm Z_{\alpha/2}\cdot \frac{\sigma}{\sqrt{n}}

    is a 95% confidence interval for μ\mu.


3. Properties of Good Estimators

 the following properties are essential:

  1. Unbiasedness:

    E(θ^)=θE(\hat{\theta}) = \theta

    The expected value of the estimator equals the true parameter.

  2. Consistency:
    As nn \to \infty, θ^θ\hat{\theta} \to \theta in probability.

  3. Efficiency:
    Among unbiased estimators, the one with minimum variance is preferred.

  4. Sufficiency:
    An estimator is sufficient if it uses all available information in the sample about the parameter.


📘 Maximum Likelihood Estimation (MLE)

Idea

Proposed by R.A. Fisher (1922), MLE is one of the most powerful and widely used methods of estimation.
It works on the principle of choosing the parameter values that maximize the likelihood of observing the given data.


Step-by-Step Procedure

Suppose X1,X2,,XnX_1, X_2, …, X_n is a random sample from a distribution with pdf/pmf f(xθ)f(x|\theta), where θ\theta is an unknown parameter.

  1. Likelihood Function:

    L(θ)=i=1nf(xiθ)L(\theta) = \prod_{i=1}^n f(x_i|\theta)

    This is the joint probability of the sample, considered as a function of θ\theta.

  2. Log-Likelihood:
    For easier calculations, take logs:

    (θ)=lnL(θ)=i=1nlnf(xiθ)\ell(\theta) = \ln L(\theta) = \sum_{i=1}^n \ln f(x_i|\theta)
  3. First Derivative (Likelihood Equation):

    d(θ)dθ=0\frac{d\ell(\theta)}{d\theta} = 0

    Solving this gives the MLE, θ^\hat{\theta}.

  4. Second Derivative Test:
    Ensure

    d2(θ)dθ2<0

    to confirm a maximum.


Example 1: MLE for Bernoulli / Binomial

Let XBinomial(n,p). Suppose xx successes are observed.

  1. Likelihood:

    L(p)=(nx)px(1p)nxL(p) = \binom{n}{x} p^x (1-p)^{n-x}
  2. Log-likelihood:

    (p)=xlnp+(nx)ln(1p)\ell(p) = x \ln p + (n-x)\ln(1-p)
  3. Differentiate:

    ddp=xpnx1p=0\frac{d\ell}{dp} = \frac{x}{p} - \frac{n-x}{1-p} = 0
  4. Solve:

    p^=xn\hat{p} = \frac{x}{n}

✅ Thus, the MLE of pp is the sample proportion.


Example 2: MLE for Normal Mean (μ\mu)

Suppose X1,X2,,XnN(μ,σ2)X_1, X_2, \dots, X_n \sim N(\mu, \sigma^2), with σ2\sigma^2 known.

  1. Likelihood:

    L(μ)=i=1n12πσ2exp((xiμ)22σ2)L(\mu) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)
  2. Log-likelihood:

    (μ)=n2ln(2πσ2)12σ2i=1n(xiμ)2\ell(\mu) = -\frac{n}{2}\ln(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n (x_i - \mu)^2
  3. Differentiate:

    ddμ=1σ2i=1n(xiμ)=0\frac{d\ell}{d\mu} = \frac{1}{\sigma^2}\sum_{i=1}^n (x_i - \mu) = 0
  4. Solve:

    μ^=Xˉ\hat{\mu} = \bar{X}

✅ Hence, the MLE of the population mean is the sample mean.


Advantages of MLE

  • Consistency: As nn \to \infty, θ^θ\hat{\theta} \to \theta.

  • Asymptotic normality: For large samples, the distribution of θ^\hat{\theta} tends to normal.

  • Efficiency: Attains the Cramér–Rao lower bound asymptotically.

  • General applicability: Works for discrete, continuous, and complex models.


Limitations of MLE

  • Can be algebraically complicated (often requires iterative methods).

  • Sensitive to outliers.

  • For small samples, MLE may be biased.

Comments

Popular posts from this blog

GNEST305 Introduction to Artificial Intelligence and Data Science KTU BTech S3 2024 Scheme - Dr Binu V P

Basics of Machine Learning

Types of Machine Learning Systems