Comparing Population Proportions
Example
A random sample of 500 people in a certain county which is about to have a national election were asked whether they preferred “Candidate A” or “Candidate B”.
From this sample, 320 people responded that they preferred Candidate A.
A random sample of 400 people in a second county which is about to have a national election were asked whether they preferred “Candidate A” or “Candidate B”.
From this second county sample, 268 people responded that they preferred Candidate A.
\[\widehat{p}_1 = \frac{320}{500} = 0.64\\ \quad\\ \widehat{p}_2 = \frac{268}{500} = 0.67\\\]We want to test that
\[H_0:p_1=p_2\qquad H_1:p_1\neq p_2\]We can change to
\[H_0:p_1-p_2=0\\ H_1:p_1-p_2\neq0\]For large enough samples,
\[\widehat{p}_1\stackrel{approx}{\sim}N\bigg(p_1,\frac{p_1(1-p_1)}{n_1}\bigg)\]and
\[\widehat{p}_2\stackrel{approx}{\sim}N\bigg(p_2,\frac{p_2(1-p_2)}{n_2}\bigg)\]We know that $\widehat{p}_1-\widehat{p}_2$ is normally distributed because it’s a linear combination of normals.
\[\widehat{p}_1-\widehat{p}_2\sim N(?,?)\]-
Mean:
\[E[\widehat{p}_1-\widehat{p}_2]=E[\widehat{p}_1]-E[\widehat{p}_2]=p_1-p_2\] -
Variance:
\[\begin{align} Var[\widehat{p}_1-\widehat{p}_2]&\stackrel{indep}{=}Var[\widehat{p}_1]-Var[\widehat{p}_2]\\ &=\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2} \end{align}\]
We have
\[\frac{\widehat{p}_1-\widehat{p}_2-(p_1-p_2)}{\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}}\sim N(0,1)\\\]Use estimators for $p_1$ and $p_2$ assuming they are the same.
- Call the common value $p$.
- Estimate by putting both groups together.
In the example with
\[\widehat{p}_1 = \frac{320}{500} = 0.64\quad\widehat{p}_2 = \frac{268}{500} = 0.67\\\]We have
\[\widehat{p}=\frac{320+268}{500+400}=\frac{588}{900}=\frac{49}{75}\\ \approx0.6533\]Use the fact that:
\[\begin{align} Z:&=\frac{\widehat{p}_1-\widehat{p}_2-(p_1-p_2)}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2}}}\stackrel{approx}{\sim} N(0,1)\\ &=\frac{\widehat{p}_1-\widehat{p}_2-(p_1-p_2)}{\sqrt{\hat{p}(1-\hat{p})\frac{1}{n_1}+\frac{1}{n_2}}} \end{align}\]This is two-tailed test with z-critical values.
\[\begin{align} Z&=\frac{0.64-0.67-0}{\sqrt{0.6533(1-0.6533)\bigg(\frac{1}{500}+\frac{1}{500}\bigg)}}\\ &\approx-0.9397 \end{align}\]Let level of significance $\alpha=0.5$.
$Z=-0.9397$ does not fall in the rejection region!