Now that the foundations of random variables, probability distributions expectation and variance are under our belt, let’s start to look at some special random variables that occur so commonly, or have such mathematically wonderful properties that they have specific names. We will look at 6 different types of discrete random variables. For each we will learn the following:
In R, the common distributions are defined by their root name with 3 different prefixes:
d
to compute \(P(X == x)\) e.g.: dbinom, dgeom, dhyper, dnbinom
p
to compute \(P(X\leq x)\) e.g.: pbinom, pgeom, phyper, pnbinom
r
to randomly draw N samples from the specified distribution. e.g: rbinom, rgeom, rhyper, rnbinom
Situation The simplest type of experiment is one in which there are only two outcomes (success/failure, live/die, true/false, yes/no etc.). When running simulations in Chapter 2, you wrote your experiment to get down to a single TRUE/FALSE. You were creating a Bernoulli random variable. This simple, yet fundamental random variable serves as the basis for the rest of the distributions in this chapter.
Random variable: Let \(X\) be a random variable that denotes the outcome from a Bernoulli trial with probability of success \(p\). Specifically let \(X=1\) denote a success, and \(X=0\) denote a failure. (What is considered a success is entirely up to context. If you are interested in mortality rate for a certain disease, then “death” would be a success.)
Distributional Notation: \(X \sim Bernoulli(p)\)
pmf: \(P(X=x) = p^{x}(1-p)^{1-x} \qquad x\geq 0\)
Mean and variance: \(E(X) = p \qquad Var(X) = p(1-p)\)
R commands: There are no fancy named R commands for this distribution. You can simulate this random variable using sample(c(0,1), prob=c(1-p, p))
directly, or through a Binomial random variable with \(n\)=1.
A beet seed has been planted, and will either germinate or not. The probability of germination is 0.8, and germination is considered a success.
Let X be whether or not a beet seed has germinated. \(X \sim Bernoulli(.8)\)
\[ P(X=x) = .8^{x}.2^{1-x} \]
Let X=1 mean germination and X=0 not germination. \(P(X=1) = .8\)
Situation: If \(n\) independent random variables \(X_{1},...,X_{n}\) all have the same Bernoulli distribution with probability of success \(p\), then their sum is equal to the number of \(X_{i}\)’s which equal 1, and the distribution of the sum is known as a Binomial distribution. Examples include:
Random variable: Let \(X\) be a random variable that represents the number of “success” in a series of \(n\) independent Bernoulli trials each with probability success \(p\).
Distributional Notation: \(X \sim Binomial(n, p)\)
pmf: \[ P(X = x)= \binom{n}{x}p^{x}(1-p)^{n-x} \qquad x=0,1,2,...,n \]
Mean and variance: \(E(X) = np \qquad Var(X) = np(1-p)\)
R commands:
dbinom(x,size=n,prob=p)
to compute \(P(X == x)\)pbinom(x,size=n,prob=p)
to compute \(P(X\leq x)\)rbinom(N,size=n ,prob=p)
to randomly draw N samples from a \(X \sim Binom(n, p)\) distribution.Our parameters are: \(n = 10, p = .8\). The pmf is:
\[ P(X = x)= \binom{10}{x}(0.8)^{x}(.2)^{10-x} \qquad x=0,1,2,...,10 \]
and is written in distributional notation like:
\[ X \sim Binomial(10, .8)\]
Theoretical
\[ E(X) = n*p = 10*.8 = 8 \\ Var(X) = n*p*(1-p) = 10 * .8 * .2 = 1.6 \]
Simulation
<- rbinom(10000, 10, .8)
x mean(x)
## [1] 8.001
var(x)
## [1] 1.597359
Let \(X\) be the number of heads that appear coins are tossed.
\[ X \sim Binomial(9, .6) \]
with pmf
\[ P(X = x)= \binom{9}{x}(0.6)^{x}(.4)^{9-x} \qquad x=0,1,2,...,9 \]
Find \(P(X=3)\)
by hand using the pmf
\[ P(X = 3) = \binom{9}{3}(0.6)^{3}(.4)^{6} \]
choose(9, 3)*(0.6)^(3)*(0.4)^(6)
## [1] 0.07431782
theoretical using R commands
dbinom(3, 9, .6)
## [1] 0.07431782
using simulation
<- rbinom(10000, 9, .6)
x mean(x == 3)
## [1] 0.0732
10 students are selected at random, each has a probability of 0.10 of being a Math major. What is the probability that at least one student is a math major?
Let \(X\) be the number of Math majors selected.
\[ X \sim Binomial(10, .1) \]
with pmf
\[ P(X = x)= \binom{10}{x}(0.1)^{x}(.9)^{10-x} \qquad x=0,1,2,...,10 \]
Find \(P(X \geq 1)\). Hint: Use the complement
by hand using the pmf
\[ P(X \geq 1 ) = 1 - P(X = 0) = 1 - \binom{10}{0}(.1)^{0}(.9)^{10} \]
1-choose(10, 0)*(0.1)^(0)*(0.9)^(10)
## [1] 0.6513216
theoretical using R commands
1-dbinom(0, 10, .1)
## [1] 0.6513216
simulation
<- rbinom(10000, 10, .1)
n.math mean(n.math >= 1)
## [1] 0.656
What is the expected number of math majors in a random sample of 10 students?
theoretical
\(E(X) = n*p = 10*.1 = 1\)
On average one one in ten students is a math major.
Simulation
mean(n.math)
## [1] 1.0082
What is \(Var(X)\)?
# Theoretical
10*.1*.9
## [1] 0.9
# Simulation
var(n.math)
## [1] 0.9018229
Situation Given a series of independent Bernoulli trials, we are accustomed to thinking of \(n\) and \(p\) as fixed, and \(x\) is considered the number of successes for a binomial distribution. Suppose that the problem is turned around though, and the question is asked, how many trials will be required in order to achieve the first success? Put this way, the number of trials is the random variable and number of successes is fixed.
Random Variable: Let \(X\) be the number of failures before the first success in a Bernoulli process with probability of success \(p\).
Distributional Notation: \(X \sim Geom(p)\)
pmf: \[ P(X = x)= (1-p)^{x}p \qquad x=0,1,2,... \]
Mean and variance: \(E(X) = \frac{1-p}{p} \qquad Var(X) = \frac{1-p}{p^{2}}\)
R commands:
dgeom(x,prob=p)
to compute \(P(X == x)\)pgeom(x,prob=p)
to compute \(P(X\leq x)\)rgeom(N,prob=p)
to randomly draw N samples from a \(X \sim Geom(p)\) distribution.Professional basketball player Steve Nash was a 90% free throw shooter over his career. Answer the following questions using the formulas and also simulation.
Let \(X\) be the number of free throws before he misses one. \(X \sim Geom(.1)\) with pmf \(P(X = x) = .9^{x}(.1)\)
Theoretical
\[ E(X) = \frac{1-p}{p} = \frac{.9}{.1} = 9 \]
Simulation
<- rgeom(10000, .1)
num.made.shots mean(num.made.shots)
## [1] 8.9175
Find: \(P(X=20)\)
by hand using the pmf
9^20*.1 .
## [1] 0.01215767
theoretical using R commands
dgeom(20, .1)
## [1] 0.01215767
using simulation
mean(num.made.shots == 20)
## [1] 0.0118
Complete the following using both theoretical and simulation methods
Let \(X\) be the number of unmarried women before selecting the first married woman. \(X \sim Geometric(.471)\)
Theoretical probability using the pmf by hand \(P(X=2) = (1-.471)^2(.471)\)
1-.471)^2*(.471) (
## [1] 0.1318051
Theoretical probability using the pmf from R functions
dgeom(2, .471)
## [1] 0.1318051
Simulation
<- rgeom(10000, .471)
num.unmarried mean(num.unmarried==2)
## [1] 0.1295
Find: \(E(X)\) and \(SD(X)\)
Theoretical
<- .471
p <- (1-p)/p) (E_X
## [1] 1.123142
<- sqrt((1-p)/(p^2))) (SD_X
## [1] 1.544212
Simulation
mean(num.unmarried)
## [1] 1.1262
sd(num.unmarried)
## [1] 1.547745
<- .02 p
Find: \(P(X=9)\)
use pmf
1-p)^9*p (
## [1] 0.01667496
using R commands
dgeom(9, p)
## [1] 0.01667496
simulation
<- rgeom(10000, p)
good.transistors mean(good.transistors == 9)
## [1] 0.0193
Find: \(P(X \geq 4)\)
\[ P(X \geq 4) \\ = 1 - P(X \leq 3) \\ = 1 - [P(X=0) + P(X=1) + P(X=2) + P(X=3)] \\ = 1 - [(1-p)^0p + (1-p)^1p +(1-p)^2p +(1-p)^3p] \]
use pmf
1 - ((1-p)^0*p + (1-p)^1*p + (1-p)^2*p + (1-p)^3*p)
## [1] 0.9223682
using R commands
1 - pgeom(3, p)
## [1] 0.9223682
simulation
mean(good.transistors >=4)
## [1] 0.9268
Situation A random variable with a negative binomial distribution originates from a context much like the one that yields the geometric distribution. Again, we focus on independent and identical trials, each of which results in one of two outcomes, success or failure. The probability of success, \(p\), stays constant for each trial. The geometric case handles the number of cases until the first success occurs. What if we are interested in knowing the number of trials until the second, third, fourth, etc success occurs. Examples:
Random Variable: Let \(X\) denotes the number of failures before the \(n\)th success, where the probability of success is \(p\).
Distributional Notation: \(X \sim NegBin(n, p)\)
pmf: \[ P(X = x)=\binom{x+n-1}{x}p^{n}(1-p)^{x} \qquad x=0,1,2,... \]
We can think of the negative binomial distribution as the sum of \(r\) geometric distributions. This simplifies the functions for the mean and variance.
Mean and variance: \(E(X) = \frac{n(1-p)}{p} \qquad Var(X) = \frac{n(1-p)}{p^{2}}\)
R commands:
dnbinom(x, n, prob=p)
to compute \(P(X == x)\)pnbinom(x, n, prob=p)
to compute \(P(X\leq x)\)rnbinom(N, n, prob=p)
to randomly draw N samples from a \(X \sim NegBin(n, p)\) distribution.A geological study indicates that an exploratory oil well drilled in a particular region should strike oil with probability 0.2. We are interested in when the third oil strike hits.
Let \(X\) be the number of dry wells (no oil was found) before the third oil strike (oil was found).
\[ X \sim NegBinomial(3, .2) \]
<- .2
p <- 3 n
Write down the pmf, and then calculate the mean and variance.
pmf: \[ \binom{x+3-1}{x}(.2)^{3}(.8)^{x} \]
Theoretical
<- n*(1-p)/p) (E_X
## [1] 12
<- n*(1-p)/(p^2)) (Var_X
## [1] 60
Simulation
<- rnbinom(10000, n, p)
x mean(x)
## [1] 11.9084
var(x)
## [1] 61.23253
Find the probability that the third oil strike comes on the fifth well drilled.
Find: \(P(X = 2)\)
by hand using the pmf
\[ \binom{4}{2}(.2)^{3}(.8)^{2} \]
choose(4, 2)*.2^3*.8^2
## [1] 0.03072
theoretical using R commands
dnbinom(2, 3, .2)
## [1] 0.03072
using simulation
<- rnbinom(100000, 3, .2)
x mean(x==2)
## [1] 0.03061
Ten percent of the engines manufactured on an assembly line are defective.
If engines are randomly selected one at a time and tested, what is the probability that the first non defective engine will be found on the second trial?
\[ X \sim NegBinomial(1, .9) \]
<- .9 p.good
But since \(n=1\) –> This is also a Geometric Distribution. \(X \sim Geom(.9)\) Why?
\[ P(X = x) = \binom{x+1-1}{x}p^{1}(1-p)^{x} = \\ p(1-p)^{x} \]
Find: \(P(X=1)\)
Theoretical using pmf
*(1-p.good) p.good
## [1] 0.09
Theoretical using R commands
dnbinom(1, 1, .9)
## [1] 0.09
dgeom(1, .9)
## [1] 0.09
Simulation
<- rnbinom(10000, 1, p.good)
x.good mean(x.good==1)
## [1] 0.0868
What is the probability that the third non defective engine will be found on the fifth trial?
<- 3
n.good <- .9 p.good
Let Y be the number of defective engines before the third good engine is found.
\[ Y \sim NegBinomial(3, .9) \]
In the first four tries, there are 2 good and 2 bad engines.
Find: \(P(Y = 2)\)
Theoretical using pmf \[ P(Y = 2) = \binom{2+3-1}{2}(.9)^{3}(.1)^{2} \]
choose(4, 2)*.9^3*.1^2
## [1] 0.04374
Theoretical using R commands
dnbinom(2, 3, .9)
## [1] 0.04374
Using simulation
.3good <- rnbinom(10000, 3, .9)
xmean(x.3good == 2)
## [1] 0.0422
Find the mean and variance of the number of trials on which the first non defective engine is found.
\[ E(X) = \frac{n*(1-p)}{p} = \frac{1*.1}{.9} = \frac{.1}{.9} = \frac{1}{9} Var(X) = \frac{n*(1-p)}{p^2} = \frac{1*.1}{.9^2} \]
1*(1-.9)/.9 # E(X)
## [1] 0.1111111
1*(1-.9)/.9^2 # Var(X)
## [1] 0.1234568
Find the mean and variance of the number of failures until the third non defective engine is found.
3*(1-.9)/.9 # E(Y)
## [1] 0.3333333
3*(1-.9)/.9^2 # Var(Y)
## [1] 0.3703704
Situation: A Poisson process is one where events occur at random times during a fixed time period. The events occur independently from each other, but with a constant average rate over that time period. Examples include
Random Variable: Let \(X\) be the number of events occurring in a Poisson process with rate \(\lambda\) over one unit of time (e.g. per year, per second, per day).
Distributional Notation: \(X \sim Poisson(\lambda)\)
pmf: \[ P(X = x) = e^{-\lambda}\frac{\lambda^{x}}{x!} \]
Mean and variance: \(E(X) = Var(X) = \lambda\)
R commands:
dpois(x,lambda)
to compute \(P(X == x)\)ppois(x,lambda)
to compute \(P(X\leq x)\)rpois(N,lambda)
to randomly draw N samples from a \(X \sim Poisson(\lambda)\) distribution.The Taurids meteor shower is visible on clear nights in the Fall and can have visible meteor rates around five per hour. What is the probability that a viewer will observe exactly eight meteors in two hours?
Let \(X\) be the number of observed meteors in two hours. So \(X \sim Pois(10)\), and has pmf
\[ e^{-10}\frac{(10)^{x}}{x!} \]
Find: \(P(X = 8)\)
by hand using the pmf
\[ e^{-10}\frac{(10)^{8}}{8!} \]
exp(-10)*10^8/factorial(8)
## [1] 0.112599
theoretical using R commands
dpois(8, 10)
## [1] 0.112599
using simulation
<- rpois(10000, 10)
meteors mean(meteors == 8)
## [1] 0.1184
Suppose a typist makes typos at a rate of 3 typos per 10 pages. What is the probability that they will make at most one typo on a five page document?
Let \(X\) be the number of typos on a five page document. So \(X \sim Pois(1.5)\), with pmf
\[ e^{-1.5}\frac{1.5^{x}}{x!} \]
Find \(P(X \leq 1) = P(X=0) + P(X=1)\).
\[ P(X \leq 1) = e^{-1.5}\frac{1.5^{0}}{0!} + e^{-1.5}\frac{1.5^{1}}{1!} \\ e^{-1.5} + 1.5e^{-1.5} = 2.5e^{-1.5} \]
2.5*exp(-1.5)
## [1] 0.5578254
theoretical using R functions
ppois(1, 1.5)
## [1] 0.5578254
simulation
<- rpois(10000, 1.5)
typo mean(typo <= 1)
## [1] 0.5531
Situation: The hypergeometric distribution is a series of Bernoulli trials that are dependent. This occurs when we are sampling without replacement from a finite population. Examples include:
Random Variable: Let \(X\) denote the number of success out of a sample size of \(k\) when drawing without replacement from a pool where there are a total of \(m\) successes and \(n\) failures available.
Distributional Notation: \(X \sim Hypergeometric(m+n, n, k)\)
pmf:
\[ P(X=x)=\frac{\binom{m}{x}\binom{n}{k-x}}{\binom{m+n}{k}} \]
Mean and variance: \[ E[X]=k\left(\frac{m}{m+n}\right) \qquad \qquad V(X)=k\frac{m}{m+n}\frac{n}{m+n}\frac{m+n-k}{m+n-1} \]
R commands:
dhyper(x, m, n, k)
to compute \(P(X == x)\)phyper(x, m, n, k)
to compute \(P(X\leq x)\)rhyper(N, m, n, k)
to randomly draw N samples from a \(X \sim Hypergeometric(m+n, n, k)\) distribution.\[ X \sim Hypergeometric(9, 4, 3) \]
<- 4 # number of failures
n <- 5 # number of successes
m <- 3 # size of sample
k <- m+n total
Theoretical
<- k*(m/total)) (E_X
## [1] 1.666667
<- k*(m/total)*(n/total)*((total-k)/(total-1))) (Var_X
## [1] 0.5555556
Simulation
<- rhyper(10000, m, n, k)
n.red.chips mean(n.red.chips)
## [1] 1.6831
var(n.red.chips)
## [1] 0.5597304
\[ X \sim Hypergeometric(50, 40, 7) \]
<- 7
k <- 10
m <- 40 n
Find: \(P(X=2)\)
via pmf \[ \frac{\binom{10}{2}\binom{40}{5}}{\binom{50}{7}} \]
choose(10, 2)*choose(40, 5) / choose(50, 7)
## [1] 0.2964463
via R commands
dhyper(2, 10, 40, 7)
## [1] 0.2964463
<- rhyper(10000, 10, 40, 7)
n.tagged.fish mean(n.tagged.fish == 2)
## [1] 0.2981
\[ X \sim Hypergeometric(50, 47,3) \]
with pmf: \[ P(X = x) = \frac{\binom{3}{x}\binom{47}{10-x}}{\binom{50}{10}} \]
Find the probability that the sample contains
using pmf \[ P(X = x) = \frac{\binom{3}{1}\binom{47}{10-1}}{\binom{50}{10}} \]
choose(3, 1)*choose(47, 9) / choose(50, 10)
## [1] 0.3979592
using R commands
dhyper(1, 3, 47, 10)
## [1] 0.3979592
simulation
<- rhyper(10000, 3, 47, 10)
n.defective mean(n.defective==1)
## [1] 0.3857
using R commands
phyper(1, 3, 47, 10)
## [1] 0.9020408
simulation
mean(n.defective<=1)
## [1] 0.9014
Let X be the number of real diamonds in the first three stolen gems. So \(X \sim Hypergeometric(35, 25, 3)\)
If the 4th gem is the 2nd real gem, then we need to find the probability that 1 out of the first 3 gems is real. \(P(X=1)\)
.1real.in.3.gems <- dhyper(1, 10, 25, 3) p
Then we multiply this probability by the probability that the 4th gem is also real, which is 9/32 because there are 9 real gems left out of the 32 total gems left.
.1real.in.3.gems*(9/32) p
## [1] 0.1289152