Note: I forgot to leave room in the course packet to demonstrate how to simulate functions of RV’s.
There are many reasons why we might be more interested in looking at the distribution of a function of a random variable \(X\) than the actual variable \(X\). One example would be that we’re interested in the absolute distance the random variable is away from it’s mean: \(g(x) = |X-\mu|\), or we want to know the total gain or loss in a stock portfolio by adding up all the sum of the daily results.
The pmf of \(g(X)\) can be computed as follows. For each \(y\), the probability that \(g(X)=y\) is given by \(\sum p(x)\) where the sum is over all values of \(x\) such that \(g(x)=y\).
Let \(X = {-2, -1, 0, 1, 2}\), all equally likely and \(g(x) = X^{2}\). Find the pmf of \(y=g(x)\), and \(E(Y)\).
\(x\) | \(p(x)\) | \(g(x) = X^{2}\) |
---|---|---|
-2 | .2 | 4 |
-1 | .2 | 1 |
0 | .2 | 0 |
1 | .2 | 1 |
2 | .2 | 4 |
Notice that the same value for \(y = g(x)\) can occur from multiple values of \(x\). The pmf of Y should be simplified to only have one row per unique value of Y. We can also use this same table to calculate \(E(Y)\)
\(Y = g(x)\) | \(p(y)\) | \(y*p(y)\) |
---|---|---|
0 | .2 | 0 |
1 | .4 | .4 |
4 | .4 | 1.6 |
So \(E(Y) = \sum y*p(y) = 2\).
Let’s confirm via simulation.
<- c(-2, -1, 0, 1, 2)
X <- sample(X, 10000, replace=TRUE) # sample from x with equal probability
sample.x <- sample.x^2 #y = g(x)
y proportions(table(y)) #pmf
## y
## 0 1 4
## 0.1967 0.3965 0.4068
mean(y) # expected value
## [1] 2.0237
Using the random variable \(X\) in the above example, find the pmf and expected value of \(Y = 2X+1\).
\(x\) | \(p(x)\) | \(g(x) = 2X+1\) |
---|---|---|
-2 | .2 | -3 |
-1 | .2 | -1 |
0 | .2 | 1 |
1 | .2 | 3 |
2 | .2 | 5 |
<- -3*.2 + -1*.2 + 1*.2 + 3*.2 + 5*.2) (E_x
## [1] 1
<- c(-2, -1, 0, 1, 2)
X <- sample(X, 10000, replace=TRUE)
sample.x <- 2*sample.x + 1
y proportions(table(y))
## y
## -3 -1 1 3 5
## 0.2023 0.2028 0.2002 0.1945 0.2002
mean(y)
## [1] 0.975
John travels to work five days a week. We will use \(X_{1}\) to represent his travel time on Monday, \(X_{2}\) to represent his travel time on Tuesday, and so on.
\[ W = \sum_{i=1}^{5} X_{i} \]
I would expect him to take \(W/5\) minutes on average to commute to work each day. W is the sum of the travel time on each of the five individual days, so the average is calculated as the total time divided by number of days.
That the travel time one day is independent of the travel time on another day. That is, \(X_{i}\) and \(X_{j}\) are independent \(\forall i, j\).
We say that two random variables are independent if the outcome of \(X\) does not give probabilistic information about the outcome of \(Y\) and vice versa.
Give an example of 2 variables that you think are independent:
Give an example of 2 variables that you think are not independent:
For random variables \(X\) and \(Y\), and constants \(a\), \(b\), and \(c\):
\[ E[aX+bY]=aE[X]+bE[Y] \qquad \mbox{ and } \qquad E[c]=c \]
Refer back to the commute time example. We intuitively reasoned that the expectation of the total time is equal to the sum of the expected individual times. This theorem generalizes and formalizes that statement to say that the expectation of a sum of random variables is always the sum of the expectation for each random variable.
\[ \begin{align} E(2X+5) & = E(2X) + E(5) \\ & = 2E(X) + 5 \\ & = 2(4) + 5 \\ & = 13 \end{align} \]
\[ \begin{align} E(2X+5Y) & = E(2X) + E(5Y) \\ & = 2E(X) + 5E(Y) \\ & = 2(4) + 5(-2) \\ & = -2 \end{align} \]
\[ E(3X-1) = 3E(X) - 1 = 3(2)-1 = 5 \]
\[ E(2) = 2 \]
\[ E(2X-3Y) = 2E(X) - 3E(Y) = 2(-4) - 3(1) = -11 \]
Now that we know some rules of expected value, we can use a simplified method to find the variance of a random variable.
\[ Var(X) = E(X^{2}) - E(X)^2 \]
Let \(X = {-2, -1, 0, 1, 2}\), all values equally likely. Find \(E(X)\) and \(Var(X)\).
\(x\) | \(p(x)\) | \(x*p(x)\) | \(x^{2}\) | \(x^{2}*p(x)\) |
---|---|---|---|---|
-2 | .2 | -0.4 | 4 | 0.8 |
1 | .2 | -0.2 | 1 | 0.2 |
0 | .2 | 0 | 0 | 0 |
1 | .2 | 0.2 | 1 | 0.2 |
2 | .2 | 0.4 | 4 | 0.8 |
\[ E(X) = \sum(x*p(x)) = 0 \\ E(X^2) = \sum (x^{2}*p(x)) = 2 \\ Var(X) = E(X^{2}) - E(X)^2 = 2 - 0^2 = 2 \]
Confirm via Simulation
<- c(-2, -1, 0, 1, 2)
X <- sample(X, 10000, replace=TRUE)
sample.x mean(sample.x)
## [1] -0.0143
var(sample.x)
## [1] 2.014297
Let X be a random variable with values \(x = {0, 1, 2}\) and \(p(x) = {.1, .5, .4}\). Find \(E(X)\) and \(Var(X)\).
\(x\) | \(p(x)\) | \(x*p(x)\) | \(x^{2}\) | \(x^{2}*p(x)\) |
---|---|---|---|---|
0 | .1 | 0 | 0 | 0 |
1 | .5 | 0.5 | 1 | 0.5 |
2 | .4 | 0.8 | 4 | 1.6 |
\[ E(X) = \sum(x*p(x)) = 1.3 \\ E(X^2) = \sum (x^{2}*p(x)) = 2.1 \\ Var(X) = E(X^{2}) - E(X)^2 = 2.1 - 1.3^2 = 0.41 \]
Confirm via simulation
<- c(0, 1, 2)
X <- c(.1, .5, .4)
p.X <- sample(X, 10000, prob=p.X, replace=TRUE)
sample.x mean(sample.x)
## [1] 1.3066
var(sample.x)
## [1] 0.4092374
\[ Var(cX) = c^{2}Var(X) \qquad \mbox{ and } \qquad Var(c) = 0 \]
\[ Var(aX + bY) = a^{2}Var(X) + b^{2}Var(Y) \]
Suppose that three random variables \(X_{1},X_{2},X_{3}\) form a random sample from a distribution for which the mean is 5 and the variance is 3. Determine the value of \(E(2X_{1}-3X_{2}+X_{3}-4)\) and Var(\(2X_{1}-3X_{2}+X_{3}-4\)).
\[E(2X_{1}-3X_{2}+X_{3}-4) = 2E(X_1) - 3*E(X_2) + E(X_3)-4 = -4\]
\[Var(2X_{1}-3X_{2}+X_{3}-4) = 4Var(X_1) + 9Var(X_2) + Var(X_3) = 4*3+9*3+3 = 42\]
Marksmanship competition at a certain level requires each contestant to take ten shots with each of two different handguns. Final scores are computed by taking a weighted average of 4 times the number of bull-eyes made with the first gun plus 6 times the number gotten with the second gun. If Bertha has a 30% chance of hitting the bull’s-eye with each shot from the first gun and a 40% chance with each shot from the second gun, what is the theoretical mean and sd of her score?
Let \(X_{1}\) be the number of hits out of 10 shots for gun 1, and let \(X_{2}\) be the number of hits out of 10 shots on gun 2. Her \(Score = 4X_{1} + 6X_{2}\).
Average on first gun is 3, with variance 2.1. Average on second gun is 4, with variance 2.4.
\[ E(4X1+6X2) = 4*3 + 6*4 = 36 \]
\[ Var(4X1+6X2) = 16*2.1 + 36 * 2.4 = 120 \\ SD(4X1+6X2) = \sqrt{120} = 10.95 \]
Confirm via simulation
This is really the key of why we practice simulation. Sometimes you don’t have the ability to calculate the theoretical values, but you can simulate the experiment to get an estimate of the values.
<- replicate(10000, {
weighted.score # let 0=miss and 1=hit, so sum() equals number of hits
<- sample(c(0,1), size=10, prob=c(.7, .3), replace=TRUE)
gun1 .1 <- sum(gun1)
score.gun<- sample(c(0,1), size=10, prob=c(.6, .4), replace=TRUE)
gun2 .2 <- sum(gun2)
score.gun
<- 4*score.gun.1 + 6*score.gun.2
score
})
mean(weighted.score)
## [1] 36.1364
sd(weighted.score)
## [1] 10.85856