Overview

Situation The most widely used continuous distribution is the normal distribution, a distribution with the familiar “bell” shape. Many characteristics in nature exhibit this shape:

  • heights of humans, trees, wombats
  • failure of mechanical parts due to wear and tear
  • random noise in electrical circuits

The normal distribution is an extremely important distribution and we will discuss some of its properties in this section. There are three main reasons why the normal distribution is so important:

  1. Many random variables happen to have a normal distribution. Other times, a transformation of a random variable has a normal distribution.

  2. The central limit theorem states that under certain conditions the distribution of the sample mean, \(\bar{x}\), will be normally distributed.

  3. It is a mathematical convenience to be able to assume that the distribution from which a sample is drawn is a normal distribution.

Random variable: Let \(X\) be a random variable from a Normal distribution defined by the parameters \(\mu\) for the mean, and \(\sigma^{2]}\) for the variance.

Distributional Notation: \(X \sim N(\mu, \sigma^{2})\).

pmf: \[ f(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\big[\frac{(x-\mu)}{2\sigma}\big]^{2}} \]

Mean and variance: \(E(X) = \mu \qquad Var(X) = \sigma^{2}\)

R commands:

Note that R uses the standard deviation, NOT the variance.

  • dnorm(x, mu, sd) to compute \(P(X == x)\)
  • pnorm(x, mu, sd) to compute \(P(X\leq x)\) (the cdf)
  • rnorm(N, mu, sd) to randomly draw N samples from a \(X \sim N(\mu, \sigma^{2})\) distribution.

Visualizing the shape:

par(mfrow=c(1,3))
hist(rnorm(1000, 0, 1))
hist(rnorm(1000, 5, 2))
hist(rnorm(1000, 100, 8))

Standard Normal Random Variable

The standard normal random variable Z is a special case of the Normal distribution with mean \(\mu = 0\) and variance \(\sigma^{2}=1\). The PDF then simplifies to

\[ f(z)=\frac{1}{\sqrt{2\pi}}e^{-z^{2}/2} \]

This is also commonly known as the “Z Distribution”, and is a transformation of the random variable X:

\[ Z = \frac{x - \mu}{\sigma} \]

This is also known as “normalizing” a variable. If you have any random variable you can subtract the mean (center on zero), and divide by it’s standard deviation (scaling) to achieve a Standard Normal, or Z distribution.

This is a very useful manner to compare values between two random variables with different means and/or variances. You transform them both to Z-distributions.

Example

Use R to evaluate the following integrals under the Standard Normal \(Z\) distribution. In each case, draw a diagram of \(f_{Z}(z)\) and shade the area that corresponds to the integral.

  1. \(\frac{1}{\sqrt{2\pi}}\int^{1.33}_{-.44} e^{-z^{2}/2}\)

ex1 <- function(z){1/sqrt(2*pi)*exp(-(z^2)/2)}
integrate(ex1, -.44, 1.33)
## 0.5782723 with absolute error < 6.4e-15
  1. \(\frac{1}{\sqrt{2\pi}}\int^{.94}_{-\infty} e^{-z^{2}/2}\)

integrate(ex1, -3, .94)
## 0.8250413 with absolute error < 9.4e-13

Example

Use pnorm to calculate the theoretical probability for each question, and confirm via simulation using rnorm.

  1. \(P(Z > 1.3)\)

Theoretical

1-pnorm(1.3)
## [1] 0.09680048

Simulation

z <- rnorm(10000, 0, 1)
mean(z>1.3)
## [1] 0.0974
  1. \(P(-0.15 < Z < 1.5)\)

Theoretical

pnorm(1.5) - pnorm(-.15)
## [1] 0.4928105

Simulation

z <- rnorm(10000, 0, 1)
mean(z > -0.15 & z < 1.5)
## [1] 0.4878
  1. \(P(Z < -2)\)

Theoretical

pnorm(-2)
## [1] 0.02275013

Simulation

z <- rnorm(10000, 0, 1)
mean(z < -2)
## [1] 0.0231

You try it

Use R for all steps, do not do these by hand or using Z tables. Use the integrate, pnorm and rnorm functions.

I will define the function once, and use it for all questions. I will also draw a random sample of observations from a standard normal distribution and use it on all following problems.

f.z <- function(z){1/sqrt(2*pi)*exp(-(z^2)/2)}
sample.z <- rnorm(10000, 0, 1)
  1. \(\frac{1}{\sqrt{2\pi}}\int^{2}_{-1} e^{-z^{2}/2}\)

integral

integrate(f.z, -1, 2)
## 0.8185946 with absolute error < 9.1e-15

pnorm

pnorm(2, 0, 1) - pnorm(-1, 0, 1)
## [1] 0.8185946

simulation

mean(sample.z > -1 & sample.z < 2)
## [1] 0.8188
  1. \(\frac{1}{\sqrt{2\pi}}\int^{2.1}_{-\infty} e^{-z^{2}/2}\)

integral

integrate(f.z, -5, 2.1)
## 0.9821353 with absolute error < 2.3e-05

pnorm

pnorm(2.1, 0, 1)
## [1] 0.9821356

simulation

mean(sample.z < 2.1)
## [1] 0.9803
  1. \(P(Z < 0.9)\)

integral

integrate(f.z, -5, .9)
## 0.8159396 with absolute error < 5.5e-08

pnorm

pnorm(2.9, 0, 1)
## [1] 0.9981342

simulation

mean(sample.z < .9)
## [1] 0.8111
  1. \(P(1.1 < Z < 2.5)\)

integral

integrate(f.z, -1.1, 2.5)
## 0.8581243 with absolute error < 7.3e-13

pnorm

pnorm(2.5, 0, 1) - pnorm(-1.1, 0, 1)
## [1] 0.8581243

simulation

mean(sample.z > -1.1 & sample.z < 2.5)
## [1] 0.8593
  1. \(P(Z > 0.9)\)

integral

integrate(f.z, .9, 5)
## 0.1840598 with absolute error < 2.7e-12

pnorm

1 - pnorm(.9, 0, 1)
## [1] 0.1840601

simulation

mean(sample.z > .9)
## [1] 0.1889

Normal Random Variable

This is great, but limited to only a single normal distribution with mean 0 and variance 1.

Recall the normal random variable X has mean \(\mu\) and standard deviation \(\sigma\) and pdf:

\[ f(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-\big[\frac{(x-\mu)}{2\sigma}\big]^{2}} \]

Let’s use the tools we learned in the last two chapters to solve the following problems.

Example

A normally distributed population of lemming body weights has a mean of 63.5 g and a standard deviation 12.2 g.

  1. Define the random variable.

Let X be the weight of a lemming in grams.

  1. Draw a picture of the distribution

body.weights <- rnorm(10000, 63.5, 12.2)

For each question below, write the question in math notation, sketch a picture, calculate the theoretical probability using pnorm, and simulate the probability using rnorm.

  1. What proportion of this population is 78.0 g or larger?

Theoretical

1-pnorm(78, 63.5, 12.2)
## [1] 0.1173134

Simulation

mean(body.weights > 78)
## [1] 0.1201
  1. What is the probability of choosing at random from this population a weight smaller than 41g?

Theoretical

pnorm(41, 63.5, 12.2)
## [1] 0.03257246

Simulation

mean(body.weights < 41)
## [1] 0.0317
  1. What is the probability of choosing at random from this population a weight between 60 and 70 g?

Theoretical

pnorm(70, 63.5, 12.2)-pnorm(60, 63.5, 12.2)
## [1] 0.3158093

Simulation

mean(body.weights < 70 & body.weights > 60)
## [1] 0.312

You try it

  1. According to a recent study, the carapace length for adult males of a certain species of tarantula are normally distributed with a mean of 17.45 mm and a standard deviation of 1.85 mm. Answer these questions using both simulation (rnorm) and the R function pnorm.

Define the random variable.

Let X be the carapace length for adult male tarantulas from a certain species.

Draw a picture of the distribution

furry.spiders <- rnorm(10000, 17.45, 1.85)
  1. What is the probability that the length of a carapace is between 16mm and 18 mm.

Theoretical

pnorm(18, 17.45, 1.85)-pnorm(16, 17.45, 1.85)
## [1] 0.4002967

Simulation

mean(furry.spiders < 18 & furry.spiders > 16)
## [1] 0.4042
  1. Would a tarantula that had a carapace longer than 21 mm be unusual?

Theoretical

1-pnorm(21, 17.45, 1.85)
## [1] 0.0274973

Simulation

mean(furry.spiders > 21)
## [1] 0.0267

Inverse Normal

It is often of interest to calculate a quantile of the normal distribution. For instance, maybe we want to know the score you would need on the SAT exam to be in the top 10th percentile. For this sort of problem we would want to use the qnorm(p,mu,sigma) where \(p\) is the area to the left of a certain value of interest.

A quantile divides the range of a probability distribution into intervals of equal probability.

Example

Let’s look at those lemmings again. Recall the weight of a lemming can be described as \(X \sim N(63.5, 12.1^{2})\). What lemming weight corresponds to the 80th percentile?

  • Translate question into mathematical notation

Find \(t\) such that \(P(X < t) = .8\)

  • Draw picture

  • Find quantile \(x\) using qnorm
qnorm(.8, 63.5, 12.2)
## [1] 73.76778
  • Find the quantile using simulation
lemming <- rnorm(10000, 63.5, 12.2)
quantile(lemming, .8)
##      80% 
## 73.79352

You try it

Reconsider the tarantula example above. What carapace length corresponds to the top 10th percentile. Note: I changed the percentile here compared to the notes so you could see how it differs from the above example.

\(X \sim N(17.45, 1.85)\). Find \(t\) such that \(P(X < t) = .9\)

qnorm(.9, 17.45, 1.85)
## [1] 19.82087
big.spider <- rnorm(10000, 17.45, 1.85)
quantile(big.spider, .9)
##      90% 
## 19.80119