A Review of Maximum Likelihood Estimation

Flip a possibly unfair coin $n$ times.

Let $p$ be the probability of gettings “Heads” on any one flip. $(0\le p\le1)$

Record $1$’s and $0$’s for H’s and T’s, respectively.

We now have a random sample

\[X_1,X_2,\ldots,X_n\stackrel{iid}{\sim}\text{Bernoulli}(p)\]

$p$ is unknown and we want to estimate it.

If the observed data has a lot of $1$’s in it, a higher value of $p$, closer to $1$ is more likely.
If the observed data has a lot of $0$’s in it, a lower value of $p$, closer to $0$ is more likely.
If the obsered data has roughly an equal number $0$’s and $1$’s, a value of $p$ closer to $0.5$ is more likely.

The Bernoulli probability mass function is

\[f(x;p)=p^x(1-p)^{1-x}\]

for $x=0,1$.It is zero otherwise.

The joint pmf for $X_1,X_2,\ldots,X_n$ is

\[\begin{align} f(\stackrel{\rightharpoonup}{x};p)&\stackrel{iid}{=}\prod_{i=1}^{n}f(x_i;p)\\ &=p^{\sum_{i=1}^{n}x_i}(1-p)^{n-\sum_{i=1}^{n}x_i} \end{align}\]

for $x_i\in{0,1}$.

\[f(\stackrel{\rightharpoonup}{x};p)=\underbrace{P(X_1=x_1,\ldots,X_n=x_n)}_{\text{This is a function of $p$.}}\]

Find the value of $p$ in $[0,1]$ that makes the probability of seeing $X_1=x_1,X_2=x_2,\ldots,X_n=x_n$ "most likely".

i.e. This is called the maximum likelihood estimator for $p$.

\[f(\stackrel{\rightharpoonup}{x};p)=p^{\sum_{i=1}^{n}x_i}(1-p)^{n-\sum_{i=1}^{n}x_i}\]

Think about this as a function of $p$:

\[L(p)=p^{\sum_{i=1}^{n}x_i}(1-p)^{n-\sum_{i=1}^{n}x_i}\]

This is called a likelihood function.

It is easier to maximize the log-likelihood.

\[\begin{align} l(p)&=\ln L(p)\\ &=\left(\sum_{i=1}^{n}x_i\right)\ln p + \left(n-\sum_{i=1}^{n}x_i\right)\ln(1-p) \end{align}\]

This is a function of $p$, to maximize it, we take the derivative w.r.t to $p$ and set it equal to $0$.

\[\frac{d}{dp}l(p) = 0\\ \huge{\Downarrow}\\ \left(\sum_{i=1}^{n}x_i\right)\frac{1}{p} - \left(n-\sum_{i=1}^{n}x_i\right)\frac{1}{1-p} = 0\]

The MLE for $p$ is

\[\widehat{p}=\frac{\sum_{i=1}^{n}X_i}{n}=\overline{X}\]

For continous $X_1,X_2,\ldots,X_n$, the joint pdf does not represent probability but the MLE is found the same way.

Example

Suppose that $X_1,X_2,\ldots,X_n$ is a random sample from the continous Pareto distribution with pdf

\[f(x,\gamma)=\begin{cases} \begin{align} &\frac{\gamma}{(1+x)^{\gamma+1}}&&x\gt0\\ &0&&\text{otherwise}\\ \end{align} \end{cases}\]

The joint pdf is

\[\begin{align} f(\stackrel{\rightharpoonup}{x};\gamma)&\stackrel{iid}{=}\prod_{i=1}^{n}f(x_i;\gamma)\\ &=\prod_{i=1}^{n}\frac{\gamma}{(1+x)^{\gamma+1}}\\ &=\frac{\gamma}{\prod_{i=1}^{n}(1+x)^{\gamma+1}}\\ &=\frac{\gamma}{\left[\prod_{i=1}^{n}(1+x)\right]^{\gamma+1}} \end{align}\]

Equivalently,

\[l(\gamma)=n\ln\gamma-(\gamma+1)\sum_{i=1}^{n}\ln(1+x_i)\\\]

Take derivative w.r.t $\gamma$ and set it equal to $0$

\[l'(\gamma)=\frac{n}{\gamma}-\sum_{i=1}^{n}\ln(1+x_i)\stackrel{\text{set}}{=}0\\\]

The MLE for $\gamma$ is:

\[\widehat{\gamma}=\frac{n}{\sum_{i=1}^{n}\ln(1+X_i)}\]