Download here - BCIT Commons

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
MATH 2441
Probability and Statistics for Biological Sciences
Calculating Probabilities: II
Some Basic Relationships Between Probabilities
We continue the summary of terminology of probability theory and basic properties of probabilities by looking
briefly at some relationships between probabilities of events which may be compound events.
Example SmokerAge:
This table gives a breakdown of the 2534 employees of a certain large organization by age group and
smoking history.
age group
30-39
40-49
50
total
cigarette
smoker
171
318
353
130
972
pipe smoker
ex-smoker
non-smoker
0
43
227
2
141
406
11
183
281
9
134
125
22
501
1039
441
867
828
398
2534
20-29
total
We will use this information to illustrate the formulas and concepts described briefly below.
The Story So Far….
Suppose employees of this company were to be selected randomly so that every employee had the same
likelihood of being selected. Since the selection of one employee can have any of 2534 distinct but equallylikely outcomes, the probability selecting a specific individual employee is 1/2534.
If we define the event:
A = the employee selected is an ex-smoker
then
Pr(A) = 501/2534  0.1977
because the probability of the event A is equal to the sum of the probabilities of the simple events which
make it up. For A, those simple events are the 501 equally-likely outcomes that correspond to each of the
501 ex-smoker employees of this organization, each having a probability of 1/2534 of being selected.
Similarly, if we define the event
B = the employee selected is in the age group 40 - 49
then
Pr(B) = 828/2534  0.3268.
The Complementary Event
We refer to the event that "A does not occur" as the complement of A, denoted by Ac (other systems of
notation are also used). Since A and Ac are mutually exclusive, and since between them they cover all
possible outcomes, we can write that
David W. Sabo (1999)
Calculating Probabilities: II
Page 1 of 8
Pr(A) + Pr(Ac) = 1
or
Pr(A) = 1 - Pr(Ac)
(PR-1)
Thus, given that
A = the event that a randomly selected employee is an ex-smoker
and that
Pr(A) = 501/2534  0.1977
from above, then it follows that
Ac = the event that the randomly selected employee is not an ex-smoker
and
Pr(Ac) = 1 - Pr(A) = 1 - 501/2534 = 2033/2534  0.8023
or
Pr(Ac)  1 - 0.1977 = 0.8023
Note that we get the same result if we just sum up the probabilities of all employees who are not exsmokers:
Pr(not an ex-smoker) = Pr(cigarette smoker) + Pr(pipe smoker) + Pr(non-smoker)
= 972/2534 + 22/2534 + 1039/2534
= 2033/2534  0.8023
The formula (PR-1) is one of the most important and useful probability formulas we will encounter in the
course. At the very least, it allows us to use standard probability tables very flexibly. However, there are
also many probability problems which are almost impossible to solve, but for which the probability of the
complement of the event of interest is obtained very easily. In class, we will look at one dealing with
duplicate birthdays, but no details are given here so that the fun is not spoiled.
Intersections of Events
The event
C = A and B
C=AB
or
known as the "intersection of events A and B" is the event that occurs only when both A and B have
occurred.
For example, if
A = the event that the selected employee is in the age group 30 - 39
B = the event that the selected employee is a non-smoker
then
Pr(A  B) = Pr(the selected employee is both in the 30-39 age group and is a non-smoker)
= 406/2534  0.1602
We could get this probability directly from the numbers in the table. Later in this document, we will give a
somewhat more general formula for Pr(A  B).
Use of Venn Diagrams to Sort Out Events
When you work with compound events, it is important to be able to accurately keep track of outcomes which
are common to two or more compound events as well as those which are not. One common way of doing
so is through the use of so-called Venn Diagrams.
Page 2 of 8
Calculating Probabilities: II
David W. Sabo (1999)
A Venn diagram represents the sample space, S, as a rectangle. Simple events are thought of as points
inside this rectangle, though they are not drawn explicitly. Compound events are represented by circles
sketched inside the rectangle. You think of the circles as containing the simple events that make up that
compound event. Then, two compound events that have some simple events in common will be sketched
as overlapping circles -- the region of overlap represented the simple events that are common to both
compound events.
S
B
A
A B
The intent here is simply to represent the major components of the problem. So, the circle labeled 'A' simply
indicates that A is an event comprising one or more simple outcomes. Similarly, the circle labeled 'B'
indicates that B is an event comprising one or more simple outcomes. The region where circles A and B
overlap represents those simple outcomes which are shared by A and B. The crescent-shaped region of A
outside of the overlap region represents those outcomes that are part of A, but not part of B.
If A and B have no outcomes in common (that is, they are mutually exclusive or disjoint), then the two circles
would not be sketched overlapping in the Venn diagram:
S
A
B
mutually exclusive events
To be specific, consider the two events from the previous section:
A = the event that the selected employee is in the age group 30 - 39
B = the event that the selected employee is a non-smoker
These two events are not mutually exclusive, since there are non-smokers in the age group 30 - 39 in the
employ of the organization. The parts of the Venn diagram for these two events have the following
meanings:
David W. Sabo (1999)
Calculating Probabilities: II
Page 3 of 8
employees who are in the
30-39 age group but are
not non-smokers
employees who are nonsmokers but are not in the
30-39 age group
S
B
A
employees who are non-smokers
and in the 30-39 age group
all other employees who are
neither non-smokers nor in the
30-39 age group
Unions of Events
The event
C = A or B
C=AB
or
known as the "union of events A and B" is the event that occurs whenever one or both of events A and B
occur.
Thus, if
A = the event that the selected employee is in the 30 - 39 age group
B = the event that the selected employee is in the 40 - 49 age group
then
C = A  B is the event that the selected employee is either in the 30 - 39 age group or in the 40 49 age group.
In a situation such as this, with events A and B mutually exclusive (or non-intersecting), the probability of
their union is just the sum of their individual probabilities:
Pr(C) = Pr(A  B) = Pr(A) + Pr(B) = 867/2534 + 828/2534 = 1695/2534  0.6689
We need to be a bit more careful, however, when the two events overlap. Return to a previous example
where we had
A = the event that the selected employee is in the age group 30 - 39
B = the event that the selected employee is a non-smoker
In the Venn diagram, A  B corresponds to the outcomes contained within the two-lobed region of the
overlapping A and B circles. If we simply sum Pr(A) and Pr(B) in an attempt to get Pr(A  B), we will end up
counting the outcomes in the overlap region twice -- once in Pr(A) and again in Pr(B). This is an error, of
course. To correct for the double counting, we must subtract the extra counting of the common outcomes.
In symbols, this is
Pr(A  B) = Pr(A) + Pr(B) - Pr(A  B)
(PR-2)
Thus, for the present example, we must first determine that
Pr(A  B) = Pr(selected employee is a non-smoker in the 30-39 age group)
= 406/2534
Page 4 of 8
Calculating Probabilities: II
David W. Sabo (1999)
and so
Pr(A  B) = Pr(selected employee is a non-smoker or is in the 30-39 age group or both)
= Pr(A) + Pr(B) - Pr(A  B)
= 1039/2534 + 867/2534 - 406/2534
= 1500/2534  0.5919
As a check, we note that A  B corresponds to those numbers in the second column and fourth row of the
body of the data table at the beginning of this document. This includes
318 + 2 + 141 + 406 + 227 + 281 + 125 = 1500
individuals out of the total of 2534 employees. From this, we also conclude Pr(A  B) = 1500/2534 
0.5919.
Note that formula (PR-2) is valid whether A and B are mutually exclusive or not. If the two events are
mutually exclusive, then A  B is impossible, and so Pr(A  B) = 0. In that case, Pr(A  B) is just the sum,
Pr(A) + Pr(B), as we noted before.
Conditional Probability
It is useful to introduce a notation to indicate some restriction of the sample space (or to represent some
additional condition that is known to be true). The symbol
Pr(B|A)
spoken "the probability of event B given event A" stands for the probability of B occurring if we know A has
occurred or A is true. This is what we mean by a conditional probability. It distinguishes the probability of
event B occurring, Pr(B), in the absence of any other information from the probability of event B occurring
when we know that event A has occurred. These two probabilities may not have the same values.
Conditional probabilities are useful because very often, we have information which doesn't make it certain
that an event will occur (or will not occur), but does make one or the other alternatives more likely than they
would be in the absence of that information. For example, if B = the event that it rains today, and A = the
event that it is cloudy today, then Pr(B|A) is the probability of rain today when we know the day is cloudy,
whereas Pr(B) would be the probability of rain today without any information on current climatic conditions.
These two probabilities can be quite different. The probability of rain is presumably higher on a cloudy day
than on a day which is not cloudy.
In reference to a Venn diagram, Pr(B|A) implies that we are looking only at the part of the sample space
corresponding to outcomes in A -- we know A has happened or is true. The only part of that region which
corresponds to B occurring is the overlap region, A  B. Thus, formally at least, we can write
Pr( B | A) 
Pr( B  A)
Pr( A)
(PR-3)
This is how a conditional probability works. Define the events A and B as before:
A = the event that the selected employee is in the age group 30 - 39
B = the event that the selected employee is a non-smoker
Suppose an employee is selected at random, and identified as being in the 30 - 39 age group. What is the
probability that they are a non-smoker? In the absence of any information about the employees age group,
the best we can do is
Pr(B) = 1039/2534  0.4100
However, once we are told that the employee selected is in the 30 - 39 age group, our possible simple
outcomes must be just those 867 employees in that age group. Further, the question "what is the probability
David W. Sabo (1999)
Calculating Probabilities: II
Page 5 of 8
that a randomly selected employee is a non-smoker if we know that they are in the 30 - 39 age group?" is
just a question to determine Pr(B|A) = Pr(employee is a non-smoker | employee is in 30 - 39 age group):
406
Pr( B  A) 2534 406
Pr( B | A) 


 0.4683
867
Pr( A)
867
2534
Notice how the fractions simplify down to what you'd expect: the probability of having selected one of the
406 non-smokers in the 867 employees in the 30 - 39 age group.
This example shows you how to use the formula to calculate a conditional probability. It doesn't really
indicate how immensely useful the notion of conditional probability actually is. We will give one application
in the next section, but the truly astonishing results that arise from this formula must wait until later in the
course, when we discuss the total probability formula and Bayes' formula. In many instances, conditional
probabilities are easier to compute than is the probability Pr(A  B), and so formula (PR-3) is used to
compute Pr(A  B) -- see the section on the multiplication law below.
Independent Events
Two events, A and B, are said to be independent if
Pr (B|A) = Pr(B)
or
Pr(A|B) = Pr(A)
That is, the probability of one of them occurring isn't affected by whether or not the other has occurred.
Events that are not independent are of course dependent.
(Don't confuse the notion of independent events with the notion of mutually exclusive events. In fact,
mutually exclusive events are, by their definition, very very dependent! Since Pr(A  B) = 0 if events A and
B are mutually exclusive, then Pr(B|A) = 0 and Pr(A|B) = 0, and so the conditions for independence would
not be satisfied if A and B each had a nonzero probability.)
For example, the two events,
A = the event that the selected employee is in the age group 30 - 39
B = the event that the selected employee is a non-smoker
are dependent (that is, they are not independent), because
Pr (B|A) = 406/867  0.4683, but
Pr(B) = 1039/2534  0.4100
Pr(A|B) = 406/1039  0.3908, but
Pr(A) = 867/2534  0.3421
Similarly
Independence is an important statistical concept, and we will develop ways of detecting its probable
presence or absence from sample data later in the course.
Perhaps the easiest example of independent events can be demonstrated for the experiment in which a fair
coin is flipped twice in a row. We know from the preceding document in this series that this experiment will
result in four possible equally likely outcomes:
{HH, HT, TH, TT}
where 'HH' means the first flip resulted in heads and the second flip resulted in heads, etc.
Now, define the events A and B as follows:
A = the event that on the first flip, the coin lands heads up
B = the event that the second flip, the coin lands heads up
Then,
Page 6 of 8
Calculating Probabilities: II
David W. Sabo (1999)
Pr(B|A) = Pr(second flip produces a heads up given that the first flip produced a heads up).
If these two events are independent, then the probability of flipping the coin heads up is not affected in any
way by how many heads you've already gotten. Now,
Pr( B | A) 
1
Pr( B  A)
Pr( HH )
4 1


Pr( A)
Pr( HH )  Pr( HT ) 1  1
2
4
4
But
Pr(B) = Pr(HH) + Pr(TH) = 1/4 + 1/4 = 1/2
In this last line, we used the fact that B is the event that the second flip results in heads. That means that B
corresponds to the two simple outcomes HH and TH, which are mutually exclusive and each have a
probability of 1/4. In the line before that, we noted that B  A is the event that both the first flip and the
second flip resulted in heads, and therefore must be the same thing as HH, which has a probability of 1/4.
Event A, that the first flip resulted in heads, corresponds to the two simple outcomes HH and HT, each with
a probability of 1/4.
Anyway, the result of the calculation is that Pr(B|A) = Pr(B), and so events A and B are independent. This
means that for a fair coin, the second flip is no more or less likely to give heads if the first flip gave heads
than if the first flip gave tails. In fact, you can extend this result to any sequence of coin flips. Even if you
flip 10 heads in a row, the probability of getting heads on the 11 th flip is still 1/2. (Don't confuse this with the
statement that the probability of getting 11 heads in a row is 1/2 -- that's a recipe for losing your shirt!).
The Multiplication Law
The multiplication law is just a rearrangement of the defining equation for conditional probabilities:
Pr(A  B) = Pr(A|B) Pr(B) = Pr(B|A) Pr(A)
(PR-4)
If the two events, A and B, are independent, then this simplifies to
Pr(A  B) = Pr(A) Pr(B)
(PR-5)
because Pr(A|B) = Pr(A) and Pr(B|A) = PR(B) in that case.
The more general form is quite intuitive. Pr(A  B) is the probability of observing both events A and B. For
both to occur, either A must occur and then B, or B must occur and then A -- hence the two alternative righthand sides. Further, if you think of probabilities in terms of relative frequencies, then to get the relative
frequency of both A and B occurring, we can start with the relative frequency of A occurring, which is Pr(A),
and multiply this by the relative frequency with which B occurs when A has occurred, namely Pr(B|A).
Alternatively, we could start with the relative frequency with which B occurs, Pr(B), and multiply it by the
relative frequency with which A occurs when B has occurred, Pr(A|B).
As a way to illustrate the use of this formula, let's return to the familiar two events:
A = the event that the selected employee is in the age group 30 - 39
B = the event that the selected employee is a non-smoker
We already know that Pr(A  B) = 406/2534  0.1602 from previous work. However, we can demonstrate
that these multiplication law formulas give exactly the same result. Recall that Pr(A) = 867/2534, Pr(B) =
1039/2534, Pr(A|B) = 406/1039, and Pr(B|A) = 406/867. So, using (PR-4), we get
Pr(A  B) = Pr(B|A) Pr(A) = (406/867) x (867/2534) = 406/2534
or
Pr(A  B) = Pr(A|B) Pr(B) = (406/1039) x (1039/2534) = 406/2534
David W. Sabo (1999)
Calculating Probabilities: II
Page 7 of 8
Thus, we get the expected result with both variants of the formula.
Page 8 of 8
Calculating Probabilities: II
David W. Sabo (1999)