Situation The most widely used continuous distribution is the normal distribution, a distribution with the familiar “bell” shape. Many characteristics in nature exhibit this shape:
The normal distribution is an extremely important distribution and we will discuss some of its properties in this section. There are three main reasons why the normal distribution is so important:
Many random variables happen to have a normal distribution. Other times, a transformation of a random variable has a normal distribution.
The central limit theorem states that under certain conditions the distribution of the sample mean, \(\bar{x}\), will be normally distributed.
It is a mathematical convenience to be able to assume that the distribution from which a sample is drawn is a normal distribution.
Random variable: Let \(X\) be a random variable from a Normal distribution defined by the parameters \(\mu\) for the mean, and \(\sigma^{2]}\) for the variance.
Distributional Notation: \(X \sim N(\mu, \sigma^{2})\).
pmf: \[ f(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\big[\frac{(x-\mu)}{2\sigma}\big]^{2}} \]
Mean and variance: \(E(X) = \mu \qquad Var(X) = \sigma^{2}\)
R commands:
Note that R uses the standard deviation, NOT the variance.
dnorm(x, mu, sd)
to compute \(P(X == x)\)pnorm(x, mu, sd)
to compute \(P(X\leq x)\) (the cdf)rnorm(N, mu, sd)
to randomly draw N samples from a \(X \sim N(\mu, \sigma^{2})\) distribution.Visualizing the shape:
par(mfrow=c(1,3))
hist(rnorm(1000, 0, 1))
hist(rnorm(1000, 5, 2))
hist(rnorm(1000, 100, 8))
The standard normal random variable Z is a special case of the Normal distribution with mean \(\mu = 0\) and variance \(\sigma^{2}=1\). The PDF then simplifies to
\[ f(z)=\frac{1}{\sqrt{2\pi}}e^{-z^{2}/2} \]
This is also commonly known as the “Z Distribution”, and is a transformation of the random variable X:
\[ Z = \frac{x - \mu}{\sigma} \]
This is also known as “normalizing” a variable. If you have any random variable you can subtract the mean (center on zero), and divide by it’s standard deviation (scaling) to achieve a Standard Normal, or Z distribution.
This is a very useful manner to compare values between two random variables with different means and/or variances. You transform them both to Z-distributions.
Use R to evaluate the following integrals under the Standard Normal \(Z\) distribution. In each case, draw a diagram of \(f_{Z}(z)\) and shade the area that corresponds to the integral.
<- function(z){1/sqrt(2*pi)*exp(-(z^2)/2)}
ex1 integrate(ex1, -.44, 1.33)
## 0.5782723 with absolute error < 6.4e-15
integrate(ex1, -3, .94)
## 0.8250413 with absolute error < 9.4e-13
Use pnorm
to calculate the theoretical probability for each question, and confirm via simulation using rnorm
.
Theoretical
1-pnorm(1.3)
## [1] 0.09680048
Simulation
<- rnorm(10000, 0, 1)
z mean(z>1.3)
## [1] 0.0974
Theoretical
pnorm(1.5) - pnorm(-.15)
## [1] 0.4928105
Simulation
<- rnorm(10000, 0, 1)
z mean(z > -0.15 & z < 1.5)
## [1] 0.4878
Theoretical
pnorm(-2)
## [1] 0.02275013
Simulation
<- rnorm(10000, 0, 1)
z mean(z < -2)
## [1] 0.0231
Use R for all steps, do not do these by hand or using Z tables. Use the integrate
, pnorm
and rnorm
functions.
I will define the function once, and use it for all questions. I will also draw a random sample of observations from a standard normal distribution and use it on all following problems.
<- function(z){1/sqrt(2*pi)*exp(-(z^2)/2)}
f.z <- rnorm(10000, 0, 1) sample.z
integral
integrate(f.z, -1, 2)
## 0.8185946 with absolute error < 9.1e-15
pnorm
pnorm(2, 0, 1) - pnorm(-1, 0, 1)
## [1] 0.8185946
simulation
mean(sample.z > -1 & sample.z < 2)
## [1] 0.8188
integral
integrate(f.z, -5, 2.1)
## 0.9821353 with absolute error < 2.3e-05
pnorm
pnorm(2.1, 0, 1)
## [1] 0.9821356
simulation
mean(sample.z < 2.1)
## [1] 0.9803
integral
integrate(f.z, -5, .9)
## 0.8159396 with absolute error < 5.5e-08
pnorm
pnorm(2.9, 0, 1)
## [1] 0.9981342
simulation
mean(sample.z < .9)
## [1] 0.8111
integral
integrate(f.z, -1.1, 2.5)
## 0.8581243 with absolute error < 7.3e-13
pnorm
pnorm(2.5, 0, 1) - pnorm(-1.1, 0, 1)
## [1] 0.8581243
simulation
mean(sample.z > -1.1 & sample.z < 2.5)
## [1] 0.8593
integral
integrate(f.z, .9, 5)
## 0.1840598 with absolute error < 2.7e-12
pnorm
1 - pnorm(.9, 0, 1)
## [1] 0.1840601
simulation
mean(sample.z > .9)
## [1] 0.1889
This is great, but limited to only a single normal distribution with mean 0 and variance 1.
Recall the normal random variable X has mean \(\mu\) and standard deviation \(\sigma\) and pdf:
\[ f(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\big[\frac{(x-\mu)}{2\sigma}\big]^{2}} \]
Let’s use the tools we learned in the last two chapters to solve the following problems.
A normally distributed population of lemming body weights has a mean of 63.5 g and a standard deviation 12.2 g.
Let X be the weight of a lemming in grams.
<- rnorm(10000, 63.5, 12.2) body.weights
For each question below, write the question in math notation, sketch a picture, calculate the theoretical probability using pnorm
, and simulate the probability using rnorm
.
Theoretical
1-pnorm(78, 63.5, 12.2)
## [1] 0.1173134
Simulation
mean(body.weights > 78)
## [1] 0.1201
Theoretical
pnorm(41, 63.5, 12.2)
## [1] 0.03257246
Simulation
mean(body.weights < 41)
## [1] 0.0317
Theoretical
pnorm(70, 63.5, 12.2)-pnorm(60, 63.5, 12.2)
## [1] 0.3158093
Simulation
mean(body.weights < 70 & body.weights > 60)
## [1] 0.312
rnorm
) and the R function pnorm
.Define the random variable.
Let X be the carapace length for adult male tarantulas from a certain species.
Draw a picture of the distribution
<- rnorm(10000, 17.45, 1.85) furry.spiders
Theoretical
pnorm(18, 17.45, 1.85)-pnorm(16, 17.45, 1.85)
## [1] 0.4002967
Simulation
mean(furry.spiders < 18 & furry.spiders > 16)
## [1] 0.4042
Theoretical
1-pnorm(21, 17.45, 1.85)
## [1] 0.0274973
Simulation
mean(furry.spiders > 21)
## [1] 0.0267
It is often of interest to calculate a quantile of the normal distribution. For instance, maybe we want to know the score you would need on the SAT exam to be in the top 10th percentile. For this sort of problem we would want to use the qnorm(p,mu,sigma)
where \(p\) is the area to the left of a certain value of interest.
A quantile divides the range of a probability distribution into intervals of equal probability.
Let’s look at those lemmings again. Recall the weight of a lemming can be described as \(X \sim N(63.5, 12.1^{2})\). What lemming weight corresponds to the 80th percentile?
Find \(t\) such that \(P(X < t) = .8\)
qnorm
qnorm(.8, 63.5, 12.2)
## [1] 73.76778
<- rnorm(10000, 63.5, 12.2)
lemming quantile(lemming, .8)
## 80%
## 73.79352
Reconsider the tarantula example above. What carapace length corresponds to the top 10th percentile. Note: I changed the percentile here compared to the notes so you could see how it differs from the above example.
\(X \sim N(17.45, 1.85)\). Find \(t\) such that \(P(X < t) = .9\)
qnorm(.9, 17.45, 1.85)
## [1] 19.82087
<- rnorm(10000, 17.45, 1.85)
big.spider quantile(big.spider, .9)
## 90%
## 19.80119