Introduction

In this chapter we learn that we can update probabilities of an event happening if we know that certain events are observed. The updated probability of event \(A\) after we learn that event \(B\) has occurred is the conditional probability of \(A\) given \(B\).

Example: Tulips

Suppose that we are given 20 tulip bulbs that are very similar in appearance and told that 8 tulips will bloom early, 12 will bloom late, 13 will be red, and 7 will be yellow. The following table summarizes information about the combination of features among these tulips:

Early Late Sum
Red 5 8 13
Yellow 3 4 7
Sum 8 12 20

If one tulip bulb is selected at random, what is the probability that it will produce a red tulip?

13/20
## [1] 0.65

Suppose that, under close examination, we know that it will be an early bulb. Given that it is an early bulb, what is the probability it is a red tulip?

5/8
## [1] 0.625

Conditional Probability

Let \(A\) and \(B\) be events in the sample space \(S\), with \(P(B)\neq 0\). The conditional probability of \(A\) given \(B\) is

\[P(A|B)=\frac{P(A \cap B)}{P(B)}\]

Example

Suppose that \(P(A) =.3, P(B)=.7\), and \(P(A \cap B)=.2\). What is \(P(A|B)\)?

.2/.7
## [1] 0.2857143

You try it

  1. Suppose that \(P\left(A\right)\)=.7, \(P\left(B\right)=.5\), and \(P\left(A\bigcap B\right)=.2\). Find \(P(A|B)\).

.2/.5
## [1] 0.4
  1. Find \(P(A\cap B)\) if \(P(A)=0.2\), \(P(B)=0.4\), and \(P(A|B)+P(B|A)=0.75\).

\[\begin{align*} P(A|B) + P(B|A) & = .75\\ \frac{P(A\cap B)}{P(B)} + \frac{P(A\cap B)}{P(A)} & = .75 \\ \frac{P(A \cap B)P(A) + P(A \cap B)P(B)}{P(B){(A)}} & = .75 \\ P(A \cap B)[P(A) + P(B)] & = .75P(A)P(B) \\ P(A \cap B) & = \frac{.75P(A)P(B)}{P(A) + P(B)} \\ P(A \cap B) & = 0.1 \end{align*}\]


Independent Events (Speegle 2.3.1)

In statistics we talk about independence a lot. If two events, A and B, are independent then knowing the outcome of B does not tell us any information about event A. Therefore, if A and B are independent events, then

\[ P(A|B)= P(A) \qquad \mbox{ and } P(B|A)= P(B) \]

If learning the probability that B has occurred does not change the probability of A, then we say A and B are independent.

Give an example of two events that are independent.

If two events \(A\) and \(B\) are independent then we can write \(P(A\cap B)\) as \(P(A)P(B)\)

Example: Machine failure

Suppose that two machines 1 and 2 in a factory are operated independently of each other. Let \(A\) be the event that machine 1 will become inoperative during a given 8-hour period; let \(B\) be the event that the machine 2 will become inoperative during the same period; and suppose that P(\(A\))=1/3 and P(\(B\))=1/4. We shall determine the probability that at least one of the machines will become inoperative during the given period.

Translation: “At least one machine is inoperative” \(\rightarrow P(A\cup B)\)

\[\begin{align*} P(A\cup B) & = P(A) + P(B) - P(A\cap B) \\ & = P(A) + P(B) - P(A)P(B) \\ & =\frac{1}{3} + \frac{1}{4} - \frac{1}{3}\frac{1}{4}\\ & =0.5 \end{align*}\]

Example: Graduation Requirements

School board officials are debating whether to require all high school seniors to take a proficiency exam before graduating. A student passing all three parts (math, language, and general) would be awarded a diploma; otherwise, they would receive only a certificate of attendance. A practice test given to this year’s ninety-five hundred seniors resulted in the following failures: Math: 3325; Language: 1900; General knowledge: 1425

\(P(M) = \frac{3325}{9500} = .35\), \(P(L) = \frac{1900}{9500} = .2\), \(P(G) = \frac{1425}{9500} = .15\)

If “Student fails Math”, “Student fails language”, and “Student fails general knowledge” are independent events, what proportion of next year’s seniors can be expected to fail to qualify for a diploma? Does independence seem reasonable here?

  • P(diploma) = (0.65)(0.8)(0.85) = 0.440
  • P(No diploma) = 1-.44 = 0.56

Example: Child Mortality

In a certain nation, statistics show that only two out of ten children born in the early 80s reached the age of 21. Assume the probability of child death is independent between children. If the same mortality rate is operative over the next generation, how many children does a person need to have if they wants to have at least a 75% probability that at least one child survives to adulthood?

\[ P(at least 1) = 1-P(all die) = 1-0.8^{k}>0.75\rightarrow k\approx 7 \]

You try it

Suppose that \(P(A\bigcap B)=.2\), \(P(A)=.6\), and \(P(B)=.5\).

  1. Are \(A\) and \(B\) mutually exclusive?
    • \(P(A\cap B)\neq 0\) so not mutually exclusive.
  2. Are \(A\) and \(B\) independent?
    • \(P(A)P(B)=0.3\neq P(A\cap B)\) so not independent
  3. Find \(P(A^{C}\cup B^{C}\)).
    • \(P(A^{C}\cup B^{C})=P(A^{c})+P(B^{c})-P(A^{c}\cap B^{c})\)

You try it:

Myra and Carlos are summer interns working as proofreaders for a local newspaper. Based on aptitude tests, Myra has a 50% chance of spotting a hyphenation error, while Carlos picks up on that same kind of mistake 80% of the time. Suppose the copy they are proofing contains a hyphenation error. What is the probability it goes undetected?

Let \(M\) and \(C\) be the events that Myra and Carlos, respectively, catch the mistake. By assumption, \(P(M)=.5\) and \(P(C)=.8\). What we are looking for is the probability of the complement of a union. That is,

\[\begin{align*} P(\mbox{Error goes undetected}) & = 1-P(\mbox{error detected}) \\ & =1-P(M\cup C)\\ & =1-[P(M)+P(C)-P(M\cap C)]\\ & =1-(0.5+0.8-0.5*.8)=0.10 \end{align*}\]

Another approach:

\[\begin{align*} P(\mbox{Error goes undetected}) & = P(M^{C})P(C^{C})) \\ & =(1-P(M))(1-P(C))\\ & =(1-0.5)(1-0.8)=0.10 \end{align*}\]


Simulating conditional probability

Simulating conditional probability is challenging. We will simulate the conditional probabilities by simulating \(P(A\cap B)\) and either \(P(A)\) or \(P(B)\). We will then divide to get the conditional distribution of \(P(B|A)\).

Example

Two dice are rolled. Estimate the conditional probability that the sum of the dice is at most 4 given that one of the die is a 2. Let \(A\) be that event that the sum of the dice is at most 4 and let \(B\) be the event that one of the die is a 2. Thus,we want \(P(A|B)\)

eventB <- replicate(10000,{
  dieroll <- sample(1:6,2,replace=TRUE)
  2 %in% dieroll
  })

(probB <- mean(eventB))
## [1] 0.3043
eventAB <- replicate(10000,{
  dieroll <- sample(1:6,2,replace=TRUE)
  (sum(dieroll)<=4) & (2 %in% dieroll)
  })

(probAB <- mean(eventAB))
## [1] 0.0818
(cond_prob <- probAB/probB)
## [1] 0.2688137

Now, compute the theoretical probability. Does your calculation match what is given above?

\(P(A|B)\) means the probability of A (sum of dice is at most 4) given that B (at least one die is a 2) already happened.

So if at least one die is a 2, the event space is \(B = {(1,2), (2,1), (2,2), (3,2), (2,3), (4,2), (2,4), (5,2), (2,5), (6,2), (2,6)}\). Of these 11 options, only \({(1,2), (2,1), (2,2)}\) have a sum of at most 4. So that’s a probability of 3/11.

Or you can calculate it as:

\[ \frac{P(A \cap B)}{P(B)} = \frac{3/36}{11/36} = 3/11 = .273 \]

Law of Total Probability

Suppose that the events \(A_{1},A_{2},\dots,A_{k}\) form a partition of the space \(S\) and \(P(A_{j})>0\) for \(j=1,\dots,k\). Then, for every event \(B\) in \(S\),

\[ P(B)=\sum^{k}_{j=1}P(A_{j})P(B|A_{j}) \]

Here is a picture of what this looks like for 3 partitions, \(A_{1},A_{2},A_{3}\)

Our interest is in finding \(P(B)\). Notice that \(B\) is also being partitioned by these three events. So if we knew the probability of the three intersections, we can add them together

\[ P(B) = P(B \cap A_{1}) + P(B \cap A_{2}) + P(B \cap A_{3}) \]

Using the definition of conditional probability, this can be re-written as:

\[ P(B) = P(B|A_{1})P(A_{1}) + P(B|A_{2})P(A_{2}) + P(B|A_{3})P(A_{3}) \]

Example: Voting preferences

The percentage of voters classified as Liberals in three different election districts are divided as follows: 21 % in the first district; 45% in the second district, and in the third district 75%. If a district is selected at random and a voter is selected at random from that district, what is the probability that she will be a Liberal?

Let \(L\) be the event a person is a Liberal, and let \(D_{1}, D_{2}, D_{3}\) be the event a person is in district 1, 2 or 3 respectively. Our known values are:

\[ P(L| D_{1}) = .21 \qquad P(L|D_{2}) = .45 \qquad P(L|D_{3}) = .75 \\ P(D_{1}) = P(D_{2}) = P(D_{3}) = 1/3 \\ \]

Find: \(P(L)\)

\[ P(L| D_{1})P(D_{1}) + P(L|D_{2})P(D_{2})+ P(L|D_{3})P(D_{3}) \\ (.21)(1/3) + (.45)(1/3) + (.75)(1/3) \]

(.21)*(1/3) + (.45)*(1/3) + (.75)*(1/3)
## [1] 0.47

You try it

In a certain study it was discovered that 15% of the participants were classified as heavy smokers, 30% as light smokers, and 55% as nonsmokers. In the five year study, 20% of the heavy smokers died, 10% of the light smokers died, and 4% of the nonsmokers died. What is the probability of death for this study?

Let \(H\) be heavy smokers, \(L\) be light smokers, \(NS\) for non smokers and \(D\) is death.

Our known values are:

\[ P(H) = .15 \qquad P(L) = .30 \qquad P(NS) = .55 \\ P(D|H) = .20 \qquad P(D|L) = .10 \qquad P(D|NS) = .04 \\ \]

We want to find \(P(D)\)

\[ P(D) = P(D|H)P(H) +P(D|L)P(L) + P(D|NS)P(NS) \]

.2*.15 + .1*.3 + .04*.55
## [1] 0.082

Bayes’ Rule and conditioning

Suppose that we are interested in which of several events \(A_{1}, A_{2},\dots,A_{k}\) will occur and that we will get to observe some other event \(B\). If \(P(B|A_{i})\) is available for each \(i\), then Bayes’ theorem is a useful formula for computing the conditional probabilities of the \(A_{i}\) events given \(B\). We will derive Bayes’ Theorem in this section. Suppose that we have \(A_{1}, A_{2}, A_{3}\) which form a partition of the sample space. There is another event we will call \(B\) that is contained in the same sample space.

Now suppose that we know the values of \(P(A_{j}\cap B)\) and \(P(A_{j})\) for all \(j\). We want to calculate \(P(A_{1}|B)\). Given the formula we learned in this chapter for conditional probability, we can rewrite \(P(A_{1}|B)\) as:

\[ P(A_{1}|B)= \frac{P(B \cap A_{1})}{P(B)} \]

Remember, we don’t know \(P(B)\) but we do know \(P(A_{1}\cap B)\) and \(P(A_{1})\) so we can rewrite the denominator of the above formula giving us:

\[ P(B)= \sum_{j}P(B|A_{j})P(A_{j}) \]

Now we know the denominator. Let’s deal with the numerator. How can we rewrite the numerator so that we can use the information that we are given? Again, we can use the formulas for conditional probability. Thus, we have

\[ P(A_{1}|B)= \frac{P(B|A_{1})P(A_{1}))}{\sum_{j}P(B|A_{j})P(A_{j})} \]

and we can now calculate \(P(A_{1}|B)\) because we know all the information on the right-hand side of the equation.

One can see from what we just did that Bayes’ Rule is a simple statement about conditional probabilities. This simple rule forms the basis for Bayesian inference.

Bayes’ Rule

Of course, we can extend this rule for any number of \(A_{i}\)’s.

Let \(A_{1},A_{2}, A_{3},...,A_{k}\) be a partition of the sample space \(S\) and let \(B\) be an event. Then

\[ P(A_{j}|B)=\frac{P(B|A_{j})P(A_{j})}{\sum^{k}_{i=1}P(B|A_{i})P(A_{i})} \]

Example: Disease test

Suppose that you are walking down the street and notice that the Department of Public Health is giving a free medical test for a certain disease. The test is 90 percent reliable in the following sense: If a person has the disease, there is a probability of .9 that the test will yield a positive response; whereas, if a person does not have the disease, there is a probability of .1 that the test will give a positive response.

Data indicate that your chances of having the disease are only 1 in 10,000. However, since the test costs you nothing, and is fast and harmless, you decide to stop and take the test. A few days later you learn that you had a positive response to the test. Now, what is the probability that you have the disease?

Let \(D\) be having the disease, so \(D^{c}\) is not having the disease. Let \(+\) be a positive test. Our known values are:

\(P(+|D) = .9 \qquad P(+|D^{c}) = .1 \qquad P(D) = .0001\)

pos.given.disease <- .9
pos.given.no.disease <- .1
prob.disease <- .0001
prob.no.disease <- 1-prob.disease

We are asked to find: \(P(D|+)\)

\[ P(D|+) = \frac{P(+|D)P(D)}{P(+|D)P(D) + P(+|D^{c})P(D^{c})}=0.0009 \\ P(D|+) = \frac{0.9*.0001}{0.9*.0001+0.1*0.9999}=0.0009 \]

(prob.disease.given.positive <- 
   (pos.given.disease*prob.disease)/ 
   (pos.given.disease*prob.disease + pos.given.no.disease*prob.no.disease)
)
## [1] 0.0008992806

You try it

  1. At a hospital’s emergency room, patients are classified and 20% of them are critical, 30% are serious, and 50% are stable. Of the critical ones, 30% die; of the serious, 10% die; and the stable, 1% die. Given that a patient dies, what is the conditional probability that the patient was classified as critical.

Let \(C\) be a patient in critical condition, \(S\) be stable, and \(R\) stable. Also let \(D\) indicate that the patient dies. Our known values are:

\[ P(C) = .2 \qquad P(R) = .3 \qquad P(S) = .5 \\ P(D|C) = .30 \qquad P(D|R) = .10 \qquad P(D|S) = .01 \]

We are asked to find \(P(C|D)\)

\[ P(C|D) = \frac{P(D|C)P(C)}{P(D|C)P(C) + P(D|R)P(R) + P(D|S)P(S)} \\ \]

(0.2*0.3)/(0.2*0.3 + 0.3*.01 + 0.5*.01)
## [1] 0.8823529
  1. In a certain city, 30% of the people are Conservatives, 50% are Liberals, and 20% are Independents. Records show that in a particular election, 65% of the Conservatives voted, 82% of the Liberals voted, and 50% of the Independents voted. If a person in the city is selected at random and it is learned that she did not vote in the last election, what is the probability that she is a liberal?

Let \(L\) be liberal, \(C\) conservative, \(I\) independent, and \(V\) voted. Our known values are:

\[ P(C) = .3 \qquad P(L) = .5 \qquad P(I) = .2 \\ P(V|C) = .65 \qquad P(V|L) = .82 \qquad P(V|I) = 0.5 \]

We are asked to find \(P(L|V^{c})\).

\[ P(L|V^{c})=\frac{P(V^{c}|L)P(L)}{P(V^{c}|L)P(L) + P(V^{c}|C)P(C) + P(V^{c}|I)P(I)} \]

(.18*.5) / (1 - (.18*.5 + .35*.3 + .5*.2))
## [1] 0.1276596