Random Variables

Using simulation to estimate discrete probability distributions.

Sampling from a theoretical pmf.

Random Variables

Given a random experiment with an outcome sample space of . A function that assigns one and only one real number to each element in is called a random variable.

Example

Consider an experiment that is the single roll of a die, where the number of spots on the face up side of the die when rolled is observed.

Outcome space is
Let the random variable indicate the number of spots on the side facing up.
The space of the random variable is then .

Dr. D has 2 dogs and 2 cats in her household, so . Let be a random variable that denotes the type of animal. then maps each element in to one and only one real number:

Let denote a dog, and denote a cat.

When the sample space only has two outcomes, it is considered a binary random variable. Often the choice of numbers used for a binary random variable are 0 and 1.

Distribution of a Random Variable:

When a probability distribution has been specified on the sample space of an experiment, we can determine a probability distribution for the possible values of each random variable .

This section is focused on probability distributions for discrete distributions. It is said that has a discrete distribution if can only take the values of a finite number different numbers or at most, an infinite sequence of different values . Random variables that can take any value in an interval are called continuous and will be discussed in a later chapter. Working with discrete random variables requires summation while continuous random variables required integration.

Discrete variables are integers and usually represent a count of something while continuous variables take values in an interval of real numbers and often measure something.

Definition: PMF

A discrete random variable is a variable that takes integer values and is characterized by a probability mass function (pmf). The pmf p of a random variable is given by:

The above equation can be read as: the probability that the random variable is equal to some value, . Properties that the pmf satisfies:

The term probability distribution is a more generic term that describes the probabilities for each different value a random variable can take on. This holds for both discrete and continuous random variables. We will use the term probability distribution for all random variables, but the pmf is specific to discrete random variables, and pdf (chapter 4) is specific to continuous random variables.

Example

Consider a crooked dice where the cube is shortened in the one-six direction. This has the effect that 1’s and 6’s have a probability of 1/4 of being rolled, where the other faces each have a probability of 1/8.

Define the random variable

Let X be the number on the face up side of the die after it is rolled.

Write out the probability distribution (pmf).

x	1	2	3	4	5	6
p(x)	1/4	1/8	1/8	1/8	1/8	1/4

Is this a valid pmf? Explain.

Yes because all probabilities are non-negative and it all sums to 1.

You try it

Suppose your roll 2 dice. Let be the sum of the two die. Write out the pmf. Don’t forget you can refer to Example 2.8 in the textbook to visualize the sample space.

x	2	3	4	5	6
p(x)	1/36	2/36	3/36	4/36	5/36

7	8	9	10	11	12
6/36	5/36	4/36	3/36	2/36	1/36

Using simulation to estimate discrete probability distributions.

In the cases we’ve encountered so far, the sample space and the values of the random variable have been discrete, that is, whole numbers. We will get into continuous random variables in the next chapter.

Example

Suppose your roll 2 dice. Let be the sum of the two die. Use simulation to estimate the probability distribution.

die <- 1:6 
d1  <- sample(die, 1000, replace = TRUE)
d2  <- sample(die, 1000, replace = TRUE)
sum.2d6   <- d1 + d2

The pmf of is:

proportions(table(sum.2d6))

## sum.2d6
##     2     3     4     5     6     7     8     9    10    11    12 
## 0.027 0.052 0.081 0.106 0.140 0.175 0.128 0.107 0.090 0.061 0.033

Plotting the pmf

We can use the function plot to plot the estimate of the pmf using the following code.

plot(proportions(table(sum.2d6)),
     main="Sum of two dice", ylab="Probability")

You try it

Three coins are tossed and the number of heads is counted. Write out the theoretical pmf for and confirm via simulation.

Let be the number of heads observed when three coins are tossed.

x	0	1	2	3
p(x)	1/8	3/8	3/8	1/8

nheads <- replicate(10000, {
  coin3 <- sample(c("H", "T"), size=3, replace=TRUE)
  sum(coin3=="H")
})
proportions(table(nheads))

## nheads
##      0      1      2      3 
## 0.1226 0.3765 0.3764 0.1245

Seven balls number 1-7 are in an urn. Two balls are drawn from the urn without replacement and the sum of of the numbers is computed. Estimate via simulation the pmf of .

sum2balls <- replicate(10000, {
  balls <- sample(1:7, 2, replace=FALSE)
  x <- sum(balls)
})

proportions(table(sum2balls))

## sum2balls
##      3      4      5      6      7      8      9     10     11     12     13 
## 0.0439 0.0484 0.0917 0.0987 0.1411 0.1441 0.1409 0.0985 0.1001 0.0465 0.0461

What are the least likely outcomes of ?

The least likely outcomes are when and .

Challenge Example

Suppose you have a bag full of marbles; 50 are red and 50 are blue. You are standing on a number line, and you draw a marble out of the bag. If you get red, you go left one unit. If you get blue, you go right one unit. This is called a random walk. You draw marbles up to 100 times, each time moving left or right one unit. Let be the number of marbles drawn from the bag until you return to 0 for the first time. The rv is called the first return time since it is the number of steps it takes to return to your starting position.

Estimate the pmf of .

Create the sample space of red and blue marbles

bag <- c(rep(-1, 50), rep(1, 50))

Draw all 100 marbles out of the bag w/o replacement to simulate the steps in the walk.

steps <- sample(bag)
steps[1:10]

##  [1]  1  1  1 -1 -1 -1 -1  1  1  1

Calculate the cumulative sum of each step to see where you are at on the number line during the walk.

walk <- cumsum(steps)
walk[1:10]

##  [1]  1  2  3  2  1  0 -1  0  1  2

Identify at which step along the walk you end back at 0

(where.zero <- which(walk==0))

##  [1]   6   8  12  48  52  62  64  66  68  82  84  86  88 100

Let be a random variable denoting the first return time.

min(where.zero)

## [1] 6

Now that we have the RV defined, replicate all of the above 10000 times and create the pdf.

x <- replicate(10000, {
  steps <- sample(bag)
  walk <- cumsum(steps)
  where.zero <- which(walk==0)
  min(where.zero)
})

plot(proportions(table(x)))

Sampling from a theoretical pmf.

Sometimes you know what the theoretical pmf of a random variable is, but have need to draw samples from the known distribution. We can still use the sample() function to do so, we just provide it a vector of probabilities to use.

Example: Blood types

In the United states, human blood comes in four types: O,A,B,AB. Take a sample of thirty blood types with the following probabilities:

bloodtypes <- c("O","A","B","AB")
prob_bloodtypes <- c(0.45,.4,.11,.04)
sample_blood <- sample(bloodtypes, size = 30, prob=prob_bloodtypes, replace=TRUE)
sample_blood[1:10] #quick peek to confirm

##  [1] "A"  "A"  "O"  "A"  "O"  "A"  "B"  "AB" "O"  "O"

The estimated pmf is then:

proportions(table(sample_blood))

## sample_blood
##          A         AB          B          O 
## 0.40000000 0.03333333 0.13333333 0.43333333

You try it

Suppose the proportion of M&Ms by color is: 14% yellow, 13% Red, 20% Orange, 12% Brown, 20% Green, and 21% Blue. Answer the following questions using simulation.

colors <- c("Y","R","O","Br","G","Bl")
prob_mnms <- c(.14,.13,.2,.12,.2,.21)

What is the probability that a randomly selected M&M is not green?

not.green <- replicate(10000,{
  mnm_sample <- sample(colors, 50, prob=prob_mnms, replace=TRUE)
  mnm_sample !="G"
})

mean(not.green)

## [1] 0.799344

What is the probability that a randomly selected M&M is red, orange, or yellow?

roy <- replicate(10000,{
  mnm_sample <- sample(colors, 50, prob=prob_mnms, replace=TRUE)
  mnm_sample %in% c("R", "O", "Y")
})

mean(roy)

## [1] 0.470378

Section 3.1: Probability Mass Functions

Random Variables

Example

Distribution of a Random Variable:

Definition: PMF

Example

You try it

Using simulation to estimate discrete probability distributions.

Example

Plotting the pmf

You try it

Challenge Example

Sampling from a theoretical pmf.

Example: Blood types

You try it