Large Sample Properties of MLEs
Let $X_1,X_2,…,X_n$ be a random sample from a distribution with pdf $f(x;\theta)$.
Let $\hat{\theta}$ be an MLE for $\theta$.
Under certain “regularity conditions” such as those needed for the CRLB.
- $\hat{\theta}$ exists and is unique.
- $\hat{\theta} \stackrel{P}{\rightarrow}\theta$. We say that $\hat{\theta}$ is a consistent estimator of $\theta$.
- 
    $\hat{\theta}_n$ is an asymptotically unbiased estimator of $\theta$. \[\text{i.e} \quad \lim_{n\to\infty}E[\hat{\theta}_n] = 0\]
- 
    $\hat{\theta}_n$ is asymptotically efficient. \[\text{i.e} \quad \lim_{n\to\infty}\frac{CRLB_\theta}{Var[\hat{\theta}_n]} = 1\]
- 
    $\hat{\theta}\stackrel{asymp}{\sim}N(\theta,CRLB_\theta)$ \[\frac{\hat{\theta}_n-\theta}{\sqrt{CRLB_\theta}}\stackrel{d}{\rightarrow}N(0,1)\]
Example
\[X_1,X_2,...,X_n \sim exp(\text{rate}=\lambda)\]We have seen that the MLE for $\lambda$ is
- 
    the MLE for $\lambda$ is $\hat{\lambda} = \frac{1}{\bar{X}}$ 
- 
    $E[\hat{\lambda}]=\frac{n}{n-1}\lambda \quad \text{which goes to $\lambda$ as $n\rightarrow\infty$}$ 
- 
    $\bar{X}\stackrel{P}{\rightarrow}E[X_1]=1/\lambda$ 
Is it true that
\[\hat{\lambda}=\frac{1}{\bar{X}}\rightarrow\frac{1}{1/\lambda}=\lambda\text{ ?}\]Suppose that ${X_n}$ and ${Y_n}$ be sequences of random variables such that $X_n\stackrel{P}{\rightarrow}X$ and $Y_n\stackrel{P}{\rightarrow}Y$ for random variables $X$ and $Y$. Some properties of convergence in probability:
- $X_n + Y_n \stackrel{P}{\rightarrow} X + Y$
- $X_nY_n\stackrel{P}{\rightarrow} XY$
- $X_n/Y_n\stackrel{P}{\rightarrow} X/Y$ (if $P(Y\neq0)=1$)
- $g(X_n)\stackrel{P}{\rightarrow}g(X)$ (for $g$ continous)
Thus,
Using $g(x)=1/x$, we do have that $\bar{X}\stackrel{P}{\rightarrow}E[X_1]=1/\lambda$ implies that:
\[\hat{\lambda}=\frac{1}{\bar{X}}\stackrel{P}{\rightarrow}\frac{1}{1/\lambda}=\lambda\]We saw that the CRLB for $\lambda$ is
\[CRLB_\lambda=\frac{\lambda^2}{n}\] \[\begin{align} Var[\hat{\lambda}] &= Var\Bigg[\frac{1}{\bar{X}}\Bigg]\\ &= E\Bigg[\bigg(\frac{1}{\bar{X}}\bigg)^2\Bigg] - \Bigg(E\bigg[\frac{1}{\bar{X}}\bigg]\Bigg)^2 \end{align}\]We already have $E[\frac{1}{\bar{X}}]=\frac{n}{n-1}\lambda$. Now we need to calculate $E[(\frac{1}{\bar{X}})^2]$.
\[\begin{align} E\Bigg[\bigg(\frac{1}{\bar{X}}\bigg)^2\Bigg]&=E\Bigg[\frac{n^2}{Y^2}\Bigg]\quad\text{where}\quad Y\sim\Gamma(\alpha,\beta)\\ &=n^2\int_{-\infty}^{\infty}\frac{1}{y^2}f_Y(y)dy\\ &=n\int_{0}^{\infty}\frac{1}{y^2}.\frac{1}{\Gamma(n)}\lambda^ny^(n-1)e^{-\lambda y}dy\\ &=n\int_{0}^{\infty}\frac{1}{\Gamma(n)}\lambda^ny^(n-3)e^{-\lambda y}dy\\ &=n^2\lambda^2\frac{\Gamma(n-2)}{\Gamma(n)}\int_{0}^{\infty}\frac{1}{\Gamma(n-2)}\lambda^{n-2}y^{n-3}e^{-\lambda y}dy\\ &=\frac{n^2}{(n-1)(n-2)}\lambda^2 \end{align}\] \[\begin{align} Var\Bigg[\frac{1}{\bar{X}}\Bigg]&=E\Bigg[\bigg(\frac{1}{\bar{X}}\bigg)^2\Bigg] - \Bigg(E\bigg[\frac{1}{\bar{X}}\bigg]\Bigg)^2\\ &=\frac{n^2}{(n-1)(n-2)}\lambda^2 - \bigg(\frac{n}{n-1}\lambda^2\bigg)\\ &=\frac{n^2}{(n-1)^2(n-2)}\lambda^2 \end{align}\]And we calculate the ration of the CRLB to the variance:
\[\begin{align} \frac{CRLB_\theta}{Var[\hat{\theta}_n]}&=\frac{\frac{\lambda^2}{n}}{\frac{n^2\lambda^2}{(n-1)^2(n-2)}}\\ &=\frac{(n-1)^2(n-2)}{n^3}=1 \quad \text{as }n\rightarrow\infty \end{align}\]That means that MLE is in fact asymptotically efficient.
Recall the Weak Law of Large Numbers where we showed that $\bar{X}\stackrel{P}{\rightarrow}\mu$. To prove this, we used:
- Chebyshev’s inequality.
- The fact that $\bar{X}$ is an unbiased estimator of the mean $\mu$.
- The fact that $Var[\bar{X}]\rightarrow0$.
The exact same proof can be used to show the following.
If $\hat{\theta_n}$ is an unbiased estimator of $\theta$, and if $\lim_{n\to\infty}Var[\hat{\theta}_n]=0$, then $\hat{\theta}_n\stackrel{P}{\rightarrow}\theta$.
Using the generalized Markov inequality, we can show that this actually holds when “unbiased” is replaced by “asymptotically unbiased”.
We can use this to show, for example, that if $X_1,X_2,…,X_n \sim unif(0,\theta)$, the maximum:
\[Y_n = \max(X_1,X_2,...,X_n)\]is a consistent estimator of $\theta$.
What is the distribution of $Y$?
\[P(Y_n\le y) = P(\max(X_1,X_2,...,X_n)\le y)\]The only way the maximum of our sample can be less than or equal to $y$ is if all our values in the sample are less than or equal to $y$ and the statement holds in the reverse.
\[\begin{align} &P(Y_n\le y) \\ &=P(X_1\le y,X_2\le y,...,X_n\le y)\\ &=P(X_1\le y)P(X_2\le y)...P(X_n\le y)\\ &=[P(X_1\le y)]^2 = \bigg[\frac{y}{\theta}\bigg]^2\\ &\text{for }0\le y\le\theta. \end{align}\]The pdf for $Y_n=\max(max(X_1,X_2,…,X_n)$ is
\[\begin{align} f_{Y_n}(y)&=\frac{d}{dy}F_{Y_n}(y)\\ &=\frac{d}{dy}\bigg[\frac{y}{\theta}\bigg]^n\\ &=\frac{n}{\theta^n}y^{n-1}\quad\text{for }0\le y\le\theta \end{align}\]The expected value of the maximum is then
\[\begin{align} E[Y_n]&=\int_{-\infty}^{\infty}yf_{Y_n}(y)dy\\ &=\int_{0}^{\theta}\frac{n}{\theta^n}y^ndy\\ &=\frac{n}{n+1}\theta \end{align}\] \[\begin{align} E[Y_n]&=\frac{n}{n+1}\theta\\ Var[Y_n]&=\frac{n}{(n+1)^2(n+2)}\theta^2 \end{align}\]We have that it’s a asymptotically unbiased for $\theta$ and that its variance is going to $0$ as $n$ goes to infinity. We can conclude that we have a consistent estimator.