Download Chapter 5: Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
MSc Maths and Statistics 2008
Department of Economics UCL
Chapter 5: Probability
Jidong Zhou
Chapter 5: Probability
Probability theory is the main tool to deal with problems with uncertainty. We introduce
the standard probability model first.1
1
Definition of Probability
• Experiments: an experiment is a process whose outcome is not known in advance with
certainty. For example, tossing a coin twice is an experiment.
• The sample space: the set contains all possible outcomes of an experiment, denoted by
Ω. For example, all possible outcomes of tossing a coin twice are {HH, HT, T H, T T }.
• Events: an event is a subset of the sample space Ω. For example, “at least one Head is
realized” or {HH, HT, T H} is an event.
We now define probability:
• Let Σ be the set of all “well-behaved” subsets of Ω (or all possible “interesting” events
of an experiment).2
• For any event in Σ, say A, we assign a number Pr(A) which is called the probability of
the event A if the following properties (or axioms) are satisfied:
— for any event A, Pr(A) ≥ 0.
— Pr(Ω) = 1.
— for every sequence of disjoint events {Ai }i∈N ,
̰ !
∞
[
X
Pr
Ai =
Pr (Ai ) .
i=1
i=1
• Based on this definition, we can derive the following results:
— Pr(∅) = 0
1
What does probability exactly mean? It is a debatable problem among statisticians. See, e.g., Section 1.2
in DeGroot and Schervish (2002).
2
More precisely, we only consider those subsets or events which are measurable. But the concept of measurable sets is beyond the scope of this course. There is no such a problem if the sample space Ω is a finite
set.
1
MSc Maths and Statistics 2008
Department of Economics UCL
Chapter 5: Probability
Jidong Zhou
— for any finite sequence of disjoint sets {Ai }i=1,2,··· ,n
Ãn
!
n
[
X
Pr
Ai =
Pr (Ai )
i=1
i=1
— Pr(Ac ) = 1 − Pr(A)
— if A ⊂ B, then Pr(A) ≤ Pr(B)
— for any A ∈ Σ, 0 ≤ Pr(A) ≤ 1
— for any A, B ∈ Σ, Pr (A ∪ B) = Pr(A) + Pr(B) − Pr(AB), where Pr(AB) =
Pr(A ∩ B).
Exercise 1 (i) Prove the last result in the above; then prove that
Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C)
− [Pr(AB) + Pr(BC) + Pr(AC)] + Pr(ABC).
Can you work out the general case?
(ii) Prove that, for any two events A and B, the probability that exactly one of the two
events will occur is
Pr(A) + Pr(B) − 2 Pr(A ∩ B)
2
Finite Sample Spaces
In this part, we consider relatively simple experiments which have finite number of outcomes.
The sample spaces of such experiments is called finite sample spaces.
2.1
Counting methods
We first illustrate some frequently used counting methods through some examples:
• Multiplication rule:
Suppose that two dices are rolled, and there are six possible outcomes for each dice.
What is the number of possible outcomes for this experiment? The answer is 36.
In general, if an experiment consists of two parts, and suppose the first part has m
possible outcomes and the second part, regardless of the outcome of the first part, has
n possible outcomes. Then the total number of possible outcomes for this experiment
is m × n.
2
MSc Maths and Statistics 2008
Department of Economics UCL
Chapter 5: Probability
Jidong Zhou
• Permutations:
Suppose that we want to arrange six different books in a row. How many possible
arrangements do we have? For the first position, we have six choices. Once the first
position is filled, we have 5 choices for the second position, and so on. Hence, we have
6 × 5 × 4 × 3 × 2 × 1 = 6! possible arrangements.
In general, given n distinct, if we want to randomly select k ≤ n items one by one
without replacement, the total number of possible sequences is
n × (n − 1) × · · · × (n − k + 1) ≡ Pnk .
This is also called the number of permutations of n items taken k at a time.3 In
particular, Pnn = n!.
• Combinations:
Suppose that we have four candidates a, b, c, and d, and two of them will be randomly
chosen to be the committee members. How many different selections do we have? All
possibilities are listed below:
{a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}.
So the answer is 6.4
In general, given a set consisting of n distinct elements, if we want to randomly select
k ≤ n ones to form a subset, then the number of possible subsets is
n!
≡ Cnk .
k!(n − k)!
This is called the number of combinations of n items taken k at a time. Sometimes it
¡ ¢
is also denoted by nk .
— notice that an ordered list of k items randomly selected from n items can be
constructed as follows: we first randomly pick k items at a time, and then arrange
them in a particular order. Therefore, we must have
Pnk = Cnk Pkk .
Actually, that is the way we get the formula for Cnk .
3
Selecting k items sequentially is equivalent to selecting k items at a time and then assigning them a
particular order.
4
Notice that here the order does not matter because, for example, {a, b} = {b, a}. However, if the problem
becomes that we want to randomly choose two: one for the president and the other for the secretary, then
the order matters. In that case, the number of different arrangements is 12, since for each pair to selected
candidates we have two ways to assign their positions.
3
MSc Maths and Statistics 2008
Department of Economics UCL
2.2
Chapter 5: Probability
Jidong Zhou
Computing probability in simple sample spaces
• A simple sample space is a finite sample space in which each outcome is equally likely.
(This is sometimes called the classical probability model.) Then, if Ω = {a1 , · · · , an },
we have
Pr({ai }) = 1/n
for any i, and the probability of any events is just the number of outcomes in that event
divided by n.
— in order to know the probability of an event, we first need to compute the number of
outcomes in the whole sample space, and then we compute the number of outcomes
in this event.
• Here are some examples:
— if two balanced dices are rolled, what is the probability that the sum of the two
numbers that appear will be 5? Let (x, y) be an outcome where x is the appeared
number of the first dice and y is the appeared number of the second one. Then
the total number of outcomes, according to the multiplication rule, is 36. The sum
will be 5 if one is 1 and the other is 4 or if one is 2 and the other is 3, and so the
number of outcomes in that event is 4. Thus, the probability is
1
4
= .
36
9
— (the birthday problem) consider a group of k people (2 ≤ k ≤ 365). Assume that
the birthdays of these people are unrelated (e.g., no twins), and each of the 365
days of the year is equally likely to be the birthday of any person in this group.
Then what is the probability that at least two people among them will have the
same birthday, that is, will have been born on the same day of the same month
but not necessarily in the same year? The answer is
1−
k
P365
.
365k
The second term is just the probability that all k persons have different birthdays.
For example, if k = 50, this probability is about 0.970.
— suppose that a class contains 15 boys and 30 girls, and that 10 students are to be
selected at random for a special assignment. What is the probability that exactly
3 boys will be selected? The number of outcomes in the event that 10 students are
4
MSc Maths and Statistics 2008
Department of Economics UCL
Chapter 5: Probability
Jidong Zhou
10 ; and the number of outcomes in the event that 3 boys
selected at random is C45
3 C 7 according to the multiplication rule.
are selected and 7 girls are selected is C15
30
3 C7
C15
30
≈ 0.2904.
10
C45
Exercise 2 (i) If 12 balls are thrown at random into 20 boxes, what is the probability that no
box will receive more than one ball?
(ii) Suppose a fair coin is to be tossed 10 times. What is the probability of obtaining exactly
3 heads?
(iii) Suppose that a deck of 52 cards containing four aces is shuffled thoroughly and the
cards are then distributed among four players so that each of them receives 13 cards. What is
the probability that each player will get one ace?
Pi=n i
k ; (c)
n
(iv) Show that (a) Cnk = Cnn−k ; (b) Cnk + Cnk−1 = Cn+1
i=0 Cn = 2 .
3
Conditional Probability
Suppose we run an experiment with a sample space Ω, and let Σ be the set of all appropriate
subsets of Ω. Suppose all events in Σ have been assigned probabilities.
• We want to know the revised probability of event A after learning that event B has
occurred. This is called the conditional probability of A given B and is denoted by
Pr(A|B).
• How to calculate this conditional probability? Consider a simple example in which
Ω = {1, 2, 3} and each outcome is equally likely. Suppose now we know that event
B = {1, 2} has occurred. Then what is the probability that event A = {2, 3} has
occurred? Since A has occurred only if {2} in B has occurred, so conditional probability
is just 1/2, or
1
Pr(AB)
= .
Pr(B)
2
• This rule works in general, i.e.,
Pr(A|B) =
Pr(AB)
.
Pr(B)
That is, the conditional probability Pr(A|B) is just the proportion of the total probability Pr(B) that is represented by the probability Pr(AB). Clearly, if A and B are
disjoint, then Pr(A|B) = 0.
5
MSc Maths and Statistics 2008
Department of Economics UCL
Chapter 5: Probability
Jidong Zhou
• Rearranging this expression yields
Pr(AB) = Pr(A|B) Pr(B)
= Pr(B|A) Pr(A),
and this can be used to compute the probability of an intersection of more sets:
Pr(ABC) = Pr(A|BC) Pr(BC)
= Pr(A|BC) Pr(B|C) Pr(C)
and in general, for sets A1 , A2 , A3 , · · · , An :
Pr(A1 A2 · · · An ) = Pr(A1 ) Pr(A2 |A1 ) Pr(A3 |A1 A2 ) · · · Pr(An |A1 A2 · · · An−1 ).
Exercise 3 Suppose that a box contains one blue card and four red cards, which are labeled
a, b, c, and d. Suppose two cards are selected at random from these five cards without replacement.
(i) If it is known that card a has been selected, what is the probability that both cards are
red?
(ii) If it is known that at least one red card has been selected, what is the probability that
both cards are red?
4
Independence
• If learning that event B has occurred does not change our probability judgment of event
A, we say that A and B are independent.5 Formally, A and B are independent if
Pr(A|B) = Pr(A),
or
Pr(AB) = Pr(A) Pr(B).
• The concept of independence in the probability theory is related with but different from
the ordinary use of independence. In the ordinary sense, that two events are independent
usually means that these two events are physically unrelated. For example, for two
machines which operate independently in two factories, the event that one machine will
become inoperative should be independent of the event that the other machine will
become inoperative. Although two physically unrelated events should be independent
in probability theory, the converse is not true as the following example shows. Suppose
5
It does not mean that we cannot infer anything about A from knowing B has occurred.
6
MSc Maths and Statistics 2008
Department of Economics UCL
Chapter 5: Probability
Jidong Zhou
that a balance dice is rolled. Let A be the event that an even number is obtained, and
let B be the event that one of the numbers 1, 2, 3,or 4 is obtained. It is easy to see that
Pr(A) = 1/2, Pr(B) = 2/3, and Pr(AB) = 1/3. Thus, A and B are independent, but
clearly they are physically related.
• The n events A1 , A2 , · · · , An are independent if, for every subset Ai1 , · · · , Aik of k of
these events (k = 2, · · · , n),
Pr(Ai1 Ai2 · · · Aik ) = Pr(Ai1 ) Pr(Ai2 ) · · · Pr(Aik ).
• The n events A1 , A2 , · · · , An are conditionally independent given B if, for every subset
Ai1 , · · · , Aik of k of these events (k = 2, · · · , n),
Pr(Ai1 Ai2 · · · Aik |B) = Pr(Ai1 |B) Pr(Ai2 |B) · · · Pr(Aik |B).
Exercise 4 (i) Show that, if A and B are independent, then A and B c are also independent.
(ii) Two students A and B are both registered for a certain course. Assume that student
A attends class 80 percent of the time, and student B attends class 60 percent of the time,
and the absences of the two students are independent. (a) What is the probability that at least
one of the two students will be in class on a given day? (b) If at least one of the two students
is in class on a given day, what is the probability that A is in class that day?
5
Bayes’ Theorem
• Consider an example first. A medical test can indicate whether a person has a certain
disease in the following way: if a person has this disease, then with probability 0.9 the
test result will be positive; if a person does not have this disease, then with probability
0.1 the test result will be positive. Suppose the data suggest that the chance of having
this disease is only 0.0001 in the population. Now if the test result for a randomly
selected person is positive, then what shall we say about the probability that this person
has this disease?
We can first calculate the probability of the event that a randomly selected person will
get a positive test result (we call it A). Given that this person does have this disease (of
which the probability is 0.0001), A will happen with probability 0.9. Givent that this
person does not have this disease (of which the probability is 0.9999), A will happen
with probability 0.1. Thus,
Pr(A) = 0.0001 × 0.9 + 0.9999 × 0.1
= 0.100 08.
7
MSc Maths and Statistics 2008
Department of Economics UCL
Chapter 5: Probability
Jidong Zhou
Among this probability, only 0.0001 × 0.9 = 0.000 09 is contributed by the event that
this person does have this disease (we call it B1 ), and most of it, 0.9999 ×0.1 = 0.099 99,
is contributed by the event that this person does not have this disease (we call it B0 ).
Therefore, we should infer that the probability of B1 conditional on A is that
Pr(B1 |A) =
or
Pr(B1 |A) =
0.000 09
≈ 0.000899
0.100 08
Pr(A|B1 ) Pr(B1 )
Pr(B1 A)
=
.
Pr(A)
Pr(A|B1 ) Pr(B1 ) + Pr(A|B0 ) Pr(B0 )
This is the basic idea of Bayes’ theorem. We usually call Pr(B1 ) the prior belief and
Pr(B1 |A) the posterior belief. Although the posterior is higher than the prior due to the
positive test result, it is still quite low.6
• In general, the Bayes’ theorem can be stated as follows. Let B1 , B2 , · · · , Bn be a partition
of the sample space Ω and A be an event.7 Since
Pr(A) = Pr(AB1 ) + Pr(AB2 ) + · · · + Pr(ABn )
= Pr(A|B1 ) Pr(B1 ) + Pr(A|B2 ) Pr(B2 ) + · · · + Pr(A|Bn ) Pr(Bn ),
we have
Pr(B1 |A) =
=
Pr(B1 A)
Pr(A)
Pr(A|B1 ) Pr(B1 )
Pn
i=1 Pr(A|Bi ) Pr(Bi )
if Pr(A) > 0. This formula tells us how to calculate the posterior of B1 after learning
A has occurred, when the priors Pr(Bi ) and the conditional probabilities Pr(A|Bi ) are
known.
Exercise 5 (i) Suppose a ball is drawn from an urn. With probability 1/3, this urn is of type
I which contains k red balls and n − k blue balls; with probability 2/3, this urn is of type II
which contains k blue balls and n − k red balls. Now if the ball observed is a red one, then
what is the probability that this urn is of type I?
(ii) In a certain city, 30 percent of the people are Conservatives, 50 percent are Liberals,
and 20 percent are Independents. Records show that in a particular election, 65 percent of
the Conservatives voted, 82 percent of the Liberals voted, and 50 percent of the Independents
voted. If a person in the city is selected at random and it is learned that she did not vote in
the last election, what is the probability that she is a Liberal?
6
If a person ignores the fact the base rate of having this disease is quite low, he may conclude, after obtaining
a positive test result, that he has this disease with probability 0.9.
7
A partition means that all Bi are disjoint with each other and the union of them is Ω.
8