More on Expectation and Variance
In statistics and data science, we frequently collect data from several random variables and we want to understand and quantify the strength of their interactions.
-
The length of time a student studies and their score on an exam.
-
The relationship between male and female life expectancy in a certain country.
-
The relationship between the quantity of two different products purchased by a consumer.
Expectation
\[\begin{align} E(X)&=\sum_{k}kP(X=k)&&\text{if $X$ is discrete}\\ E(X)&=\int_{-\infty}^{\infty}xf(x)dx&&\text{if $X$ is continuous}\\ \end{align}\]What we can say about $E\big(g(x)\big)$?
\[E\big(g(x)\big)=\begin{cases} \begin{align} &\sum_{k}g(k)P(X=k)&&\text{$X$ is discrete}\\ &\int_{-\infty}^{\infty}g(x)f(x)dx&&\text{$X$ is continuous}\\ \end{align} \end{cases}\] \[\begin{align} E(aX+b)&=\sum_{k}(ak+b)P(X=k)\\ &=a\underbrace{\sum_{k}kP(X=k)}_{E(X)}+b\underbrace{\sum_{k}P(X=k)}_{1}\\ &=aE(X)+b \end{align}\]Example: Suppose a university has \$15,000 students and let $X$ equal the number of courses for which a randomly selected student is registered. The pmf is
$x$ | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
$p(x)$ | 0.01 | 0.03 | 0.13 | 0.25 | 0.39 | 0.17 | 0.02 |
If a student pay \$500 per course plus a \$100 per-semester registration fee, what is the average amount a student pays each semester?
Let $X$ = # of courses student takes
We want to find $E(500X+100)=500E(X)+100$.
\[\begin{align} E(X)&=\sum_{k=1}^7kP(X=k)\\ &=1(0.01) + 2(0.03) + \ldots + 7(0.02)\\ &=4.57 \end{align}\]So
\[E(500X+100) = 500(4.57) + 100 = \$2385\]Variance
\[\sigma^2=V(X)=E[\underbrace{(X-\mu)^2}_{g(X)}]=\underbrace{E(X^2)-\big(E(X)\big)^2}_{\text{Computational formula}}\] \[\begin{align} V(X)&=\sum_{k}(k-\mu)^2P(X=k)&&\text{if $X$ is discrete}\\ V(X)&=\int_{-\infty}^{\infty}(x-\mu)^2f(x)dx&&\text{if $X$ is continuous}\\ \end{align}\]What about $V\big(g(X)\big)$?
\[V\big(g(X)\big)=\begin{cases} \begin{align} &\sum_{k}\bigg(g(k)-E\big(g(X)\big)\bigg)^2P(X=k)&&\text{$X$ is discrete}\\ &\int_{-\infty}^{\infty}\bigg(g(x)-E\big(g(X)\big)\bigg)^2f(x)dx&&\text{$X$ is continuous}\\ \end{align} \end{cases}\]Find $V(aX+b)$
\[\begin{align} V(aX+b)&=E\big[\big(aX+b-E(aX+b)\big)^2\big]\\ &=E\big[\big(aX+b-aE(X)+b\big)^2\big]\\ &=E\big[a^2\big(X-E(X))^2\big]\\ &=a^2E\big[\big(X-E(X))^2\big]\\ &= a^2V(X) \end{align}\]Recall: $E(aX+b)=aE(X)+b$
Variance measure spread ofthe data, the $b$ shifts the data but doesn’t affect the spread.
Example: Suppose a university has \$15,000 students and let $X$ equal the number of courses for which a randomly selected student is registered. The pmf is
$x$ | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
$p(x)$ | 0.01 | 0.03 | 0.13 | 0.25 | 0.39 | 0.17 | 0.02 |
If a student pay \$500 per course plus a \$100 per-semester registration fee, what is the average amount a student pays each semester?
We found $E(500X+100) = 500(4.57) + 100 = $2385$
\[\begin{align} V(X)&=E(X^2)-\big(E(X)\big)^2\\ &=\sum_{k=1}^7k^2P(X=k)-(4.57)^2\\ &\approx1.265 \end{align}\] \[V(500X+100)=500^2V(X)=316,273\]