Download Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Probability
COMP 245 STATISTICS
Dr N A Heard
Contents
1
Sample Spaces and Events
1.1 Sample Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Combinations of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2
2
2
Probability
2.1 Axioms of Probability . . . .
2.2 Simple Probability Results . .
2.3 Independence . . . . . . . . .
2.4 Interpretations of Probability
.
.
.
.
3
3
3
4
4
3
Examples
3.1 De Méré’s problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Joint events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
6
4
Conditional Probability
4.1 Definition . . . . . . . . . . . . . . . .
4.2 Examples . . . . . . . . . . . . . . . . .
4.3 Conditional Independence . . . . . . .
4.4 Bayes Theorem and the Partition Rule
4.5 More Examples . . . . . . . . . . . . .
1
1.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
9
9
11
Sample Spaces and Events
Sample Spaces
We consider a random experiment whose range of possible outcomes can be described by a set
S, called the sample space.
We use S as our universal set (Ω).
Ex.
• Coin tossing: S = {H, T}.
• Die rolling: S = { , , , , , }.
• 2 coins: S = {(H, H), (H, T), (T, H), (T, T)}.
1
1.2
Events
An event E is any subset of the sample space, E ⊆ S; it is a collection of some of the possible
outcomes.
Ex.
• Coin tossing: E = {H}, E = {T}.
• Die rolling: E = { }, E = {Even numbered face} = { , , }.
• 2 coins: E = {Head on the first toss} = {(H, H), (H, T)}.
Extreme possible events are ∅ (the null event) or S.
The singleton subsets of S (those subsets which contain exactly one element from S) are
known as the elementary events of S.
Suppose we now perform this random experiment; the outcome will be a single element
∗
s ∈ S. Then for any event E ⊆ S, we will say E has occurred if and only if s∗ ∈ E.
If E has not occurred, it must be that s∗ ∈
/ E ⇐⇒ s∗ ∈ E, so E has occurred; so E can be
read as the event not E.
First notice that the smallest event which will have occurred will be the singleton {s∗ }. For
any other event E, E will occur if and only if {s∗ } ⊂ E. Thus we can immediately draw two
conclusions before the experiment has even been performed.
Remark 1.1. For any sample space S, the following statements will always be true:
1. the null event ∅ will never occur;
2. the universal event S will always occur.
Hence it is only for events E in between these extreme events, ∅ ⊂ E ⊂ S for which we
have uncertainty about whether E will occur. It is precisely for quantifying this uncertainty
over these events that we require the notion of probability.
1.3
Combinations of Events
Set operators on events
Consider a set of events { E1 , E2 , . . .}.
• The event
[
Ei = {s ∈ S|∃i s.t. s ∈ Ei } will occur if and only if at least one of the events
i
{ Ei } occurs. So E1 ∪ E2 can be read as event E1 or E2 .
• The event
\
Ei = {s ∈ S|∀i, s ∈ Ei } will occur if and only if all of the events { Ei } occur.
i
So E1 ∩ E2 can be read as events E1 and E2 .
• The events are said to be mutually exclusive if ∀i, j, Ei ∩ Ej = ∅ (i.e. they are disjoint).
At most one of the events can occur.
2
2
Probability
2.1
Axioms of Probability
σ-algebras of events
So for our random experiment with sample space S of possible outcomes, for which events/subsets
E ⊆ S would we like to define the probability of E occurring?
Every subset? If S is finite or even countable, then this is fine. But for uncountably infinite
sample spaces it can be shown that we can very easily start off defining sensible, proper probabilities for an initial collection of subsets of S in a way that leaves it impossible to then carry
on and consistently define probability for all the remaining subsets of S.
For this reason, when defining a probability measure on S we (usually implicitly) simultaneously agree on a collection of subsets of S that we wish to measure with probability. Generically, we will refer to this set of subsets as S.
There are three properties we will require of S, the reasons for which will become immediately apparent when we meet the axioms of probability.
We need S to be
1. nonempty, S ∈ S;
2. closed under complements: E ∈ S =⇒ E ∈ S;
3. closed under countable union E1 , E2 , . . . ∈ S =⇒
[
Ei ∈ S.
i
Such a collection of sets is known as a σ-algebra.
Axioms of Probability
A probability measure on the pair (S, S) is a mapping P : S → [0, 1] satisfying the following three axioms for all subsets of S on which it is defined (S, the measurable subsets of
S)
1. ∀ E ∈ S, 0 ≤ P( E) ≤ 1;
2. P(S) = 1;
3. Countably additive: For disjoint subsets E1 , E2 , . . . ∈ S
!
P
[
Ei
2.2
= ∑ P( Ei ).
i
i
Simple Probability Results
Exercises:
From 1-3 it is easy to derive the following:
1. P( E) = 1 − P( E);
2. P(∅) = 0;
3. For any events E and F,
P( E ∪ F ) = P( E ) + P( F ) − P( E ∩ F ).
3
2.3
Independence
Two events E and F are said to be independent if and only if P( E ∩ F ) = P( E)P( F ). This is
sometimes written E ⊥ F.
More generally, a set of events { E1 , E2 , . . .} are said to be independent if for any finite subset
{ E10 , E20 , . . . , En0 },
!
P
n
\
n
Ei0
i =1
= ∏ P( Ei0 ).
i =1
If events E and F are independent, then E and F are also independent.
Proof:
Since F = ( E ∩ F ) ∪ ( E ∩ F ) is a disjoint union, P( F ) = P( E ∩ F ) + P( E ∩ F ) by Axiom 3. So
P( E ∩ F ) = P( F ) − P( E ∩ F ) = P( F ) − P( E)P( F ) = (1 − P( E))P( F ) = P( E)P( F ). 2.4
Interpretations of Probability
Classical
If S is finite and the elementary events are considered “equally likely”, then the probability
of an event E is the proportion of all outcomes in S in which lie inside E,
P( E ) =
| E|
.
|S|
Ex.
• Rolling a die: Elementary events are { }, { }, . . . , { }.
– P({ }) = P({ }) = . . . = P({ }) = 61 .
– P(Odd number) = P({ , , }) =
3
6
= 12 .
• Randomly drawn playing card: 52 elementary events
{♠2}, {♠3}, . . . , {♠A}, {♥2}, {♥3}, . . . , . . . , {♣K}, {♣A}.
– P(♠) = P(♥) = P(♦) = P(♣) = 14 .
– The joint event {Suit is red and value is 3} contains two of 52 elementary events, so
2
1
P({red 3}) = 52
= 26
. Since suit and face value should be independent, check that
P({red 3}) = P({♥, ♦}) × P({any 3}).
The “equally likely” (uniform) idea can be extended to infinite spaces, by apportioning
probability to sets not by their cardinality but by other standard measures, like volume or mass.
Ex.
• If a meteorite were to strike Earth, the probability that it will strike land rather than sea
would be given by
Total area of land
.
Total area of Earth
4
Frequentist
Observation shows that if one takes repeated observations in “identical” random situations, in which event E may or may not occur, then the proportion of times in which E occurs
tends to some limiting value - called the probability of E
Ex.
• Proportion of heads in tosses of a coin: H, H, T, H, T, T, H, T, T, . . . → 21 .
Subjective
Probability is a degree of belief held by an individual.
For example, De Finetti (1937/1964) suggested the following: Suppose a random experiment is to be performed, where an event E ⊆ S may or may not happen. Now suppose an
individual is entered into a game regarding this experiment where he has two choices, each
leading to monetary (or utility) consequences:
1. Gamble: If E occurs, he wins £1; if E occurs, he wins £0;
2. Stick: Regardless of the outcome of the experiment, he receives £P( E) for some real number P( E).
The critical value of P( E) for which the individual is indifferent between options 1 and 2 is
defined to be the individual’s probability for the event E occurring.
This procedure can be repeated for all possible events E in S.
Suppose after this process of elicitation of the individual’s preferences under the different
events, we can simultaneously arrange an arbitrary number of monetary bets with the individual based on the outcome of the experiment.
If it is possible to choose these bets in such a way that the individual is certain to lose money
(this is called a “Dutch Book”), then the individuals degrees of belief are said to be incoherent.
To be coherent, it is easily seen, for example, that we must have 0 ≤ P( E) ≤ 1 for all events
E, E ⊆ F =⇒ P( E) ≤ P( F ), etc.
5
3
Examples
3.1
De Méré’s problem
Antoine Gombaud, chevalier de Méré (1607-1684) posed to Pascal the following gambling problem: which of these two events is more likely?
1. E = {4 rolls of a die yield at least one
}.
2. F = {24 rolls of two dice yield at least one pair of
}.
De Méré observed that E seemed to lead to a profitable even money bet whereas F did not.
We calculate P( E) and P( F ).
1. Each roll of the die is independent from the previous rolls, and so there are 64 equally
likely outcomes. Of these, 54 show no s.
showing is
So the probability of no
54
64
≈ 0.4823.
showing, is ≈ 1 − 0.4823 = 0.5177.
So P( E), the probability of at least one
2. There are 3624 equally likely outcomes here. Of these, 3524 don’t show a
So the probability of no
is
3524
3624
≈ 0.5086
, is ≈ 1 − 0.5086 = 0.4914
So P( F ), the probability of at least one
Hence P( E) ≈ 0.5177 >
3.2
1
2
.
> 0.4914 ≈ P( F ).
Joint events
Coin and Die
Consider tossing a coin and rolling a die.
We would consider each of the 12 possible combinations of Head/Tail and die value as
equally likely.
So we can construct a probability table:
H
T
1
12
1
12
1
6
1
12
1
12
1
6
1
12
1
12
1
6
1
12
1
12
1
6
1
12
1
12
1
6
1
12
1
12
1
6
1
2
1
2
From this table we can calculate the probability of any event we might be interested in,
simply by adding up the probabilities of all the elementary events it contains.
For example, the event of getting a head on the coin
{H} = {(H, ), (H, ), . . . , (H, )}
6
has probability
P({H}) = P({(H, )}) + P({(H, )}) + . . . + P({(H, )})
1
1
1
=
+
+...+
12 12
12
1
= .
2
Notice the two experiments satisfy our probability definition of independence, since for
example
1 1
1
= × = P({H}) × P({ }).
P({(H, )}) =
12
2 6
Coin and Two Dice
A crooked die called a top has the same faces on opposite sides.
Suppose we have two dice, one normal and one which is a top with opposite faces numbered , , or .
Now suppose we first flip the coin. If it comes up heads, we roll the normal die; tails, and
we roll the top.
To calculate the probability table easily, we notice that this is equivalent to the previous
game using one normal die except with the change after tails that a roll of a is relabelled as
a , → , → . So we can just merge those probabilities in the tails row.
H
T
1
12
1
6
1
4
1
12
0
1
12
1
12
1
6
1
4
1
12
0
1
12
1
12
1
6
1
4
1
12
0
1
12
1
2
1
2
The probabilities of the different outcomes of the dice change according to the outcome of
the coin toss. And note, for example,
P({(H, )}) =
1
1
1
1
6=
= ×
= P({H}) × P({ }).
12
24
2 12
So the two experiments are now dependent.
7
4
Conditional Probability
4.1
Definition
For two events E and F in S where P( F ) 6= 0, we define the conditional probability of E occurring given that we know F has occurred as
P( E | F ) =
P( E ∩ F )
.
P( F )
Note that if E and F are independent, then
P( E | F ) =
4.2
P( E ∩ F )
P( E )P( F )
=
= P( E ).
P( F )
P( F )
Examples
Example 1 - Rolling a
We roll a normal die once.
1. What is the probability of E = {the die shows a
}?
2. What is the probability of E = {the die shows a
F = {the die shows an odd number}?
} given we know
Solution:
1. P( E) =
Number of ways a can come up
1
= .
Total number of possible outcomes
6
2. Now the set of possible outcomes is just F = { , , }.
So P( E| F ) =
Note P( F ) =
Number of ways a can come up
1
= .
Total number of possible outcomes
3
P( E ∩ F )
1
and E ∩ F = E, and hence we have P( E| F ) =
.
2
P( F )
Example 2 - Rolling two dice
Suppose we roll two normal dice, one from each hand.
Then the sample space comprises all of the ordered pairs of dice values
S = {( , ), ( , ), . . . , ( , )}.
Let E be the event that the die thrown from the left hand will show a larger value than the
die thrown from the right hand.
P( E ) =
# outcomes with left value > right
15
= .
total # outcomes
36
8
Suppose we are now informed that an event F has occurred, where
F = {the value of the left hand die is
}
How does this change the probability of E occurring?
Well since F has occurred, the only sample space elements which could have possibly occurred are exactly those elements in F = {( , ), ( , ), ( , ), ( , ), ( , )( , )}.
Similarly the only sample space elements in E that could have occurred now must be in
E ∩ F = {( , ), ( , ), ( , ), ( , )}.
So our revised probability is
# outcomes with left value > right
4
P( E ∩ F )
= =
≡ P( E | F ).
6
P( F )
total # outcomes ( , ·)
Discussion of Examples
In both examples, we considered the probability of an event E, and then reconsidered what
this probability would be if we were given the knowledge that F had occurred. What happened?
Answer: The sample space S was replaced by F, and the event E was replaced by E ∩ F. So
originally, we had
P( E ) = P( E | S ) =
P( E ∩ S )
(since E ∩ S = E, and P(S) = 1 by Axiom 2).
P( S )
So we can think of probability conditioning as a shrinking of the sample space, with events
replaced by their intersections with the reduced space and a consequent rescaling of probabilities.
4.3
Conditional Independence
Earlier we met the concept of independence of events according to a probability measure P.
We can now extend that idea to conditional probabilities since P(·| F ) is itself a perfectly good
probability measure obeying the axioms of probability.
For three events E1 , E2 and F, the event pair E1 and E2 are said to be conditionally independent given F if and only if P( E1 ∩ E2 | F ) = P( E1 | F )P( E2 | F ). This is sometimes written
E1 ⊥ E2 | F.
4.4
Bayes Theorem and the Partition Rule
Bayes Theorem
For two events E and F in S, we have
P( E ∩ F ) = P( F )P( E | F );
(1)
P( E ∩ F ) = P( E )P( F | E ).
(2)
but also, since E ∩ F ≡ F ∩ E
Equating the RHS of (1) and (2), provided P( F ) 6= 0 we can rearrange to obtain
P( E | F ) =
P( E )P( F | E )
.
P( F )
9
Partition Rule
Consider a set of events { F1 , F2 , . . .} which form a partition of S.
Then for any event E ⊆ S
P( E ) =
∑ P(E| Fi )P( Fi ).
i
Proof:
E = E∩S = E∩
[
i
Fi =
[
( E ∩ Fi ).
i
So
!
[
P( E ) = P
( E ∩ Fi ) ,
i
which, by countable additivity (Axiom 3) and noting that since the { F1 , F2 , . . .} are disjoint so
are { E ∩ F1 , E ∩ F2 , . . .}, implies
P( E ) =
∑ P(E ∩ Fi ).
(3)
i
(3) is known as the law of total probability; and it can be rewritten
P( E ) =
∑ P(E| Fi )P( Fi ). i
For any events E and F in S, note that { F, F }, say, form a partition of S. So by the law of
total probability we have
P( E ) = P( E ∩ F ) + P( E ∩ F )
= P( E | F )P( F ) + P( E | F )P( F ).
Terminology
When considering multiple events, say E and F, we often refer to
• probabilities of the form P( E| F ) as conditional probabilities;
• probabilities of the form P( E ∩ F ) as joint probabilities;
• probabilities of the form P( E) as marginal probabilities.
10
4.5
More Examples
Example 1 - Defective Chips
Ex.
A box contains 5000 VLSI chips, 1000 from company X and 4000 from Y. 10% of the chips
made by X are defective and 5% of those made by Y are defective. If a randomly chosen chip
is found to be defective, find the probability that it came from X.
Let E = “chip was made by X”;
let F = “chip is defective”.
First of all, which probabilities have we been given?
A box contains 5000 VLSI chips, 1000 from company X and 4000 from Y.
1000
= 0.2,
5000
4000
= 0.8.
P( E ) =
5000
=⇒ P( E) =
10% of the chips made by X are defective and 5% of those made by Y are defective.
=⇒ P( F | E) = 10% = 0.1,
P( F | E) = 5% = 0.05.
We have enough information to construct the probability table
F
F
E
0.02
0.18
0.2
E
0.04
0.76
0.8
0.06
0.94
The law of total probability has enabled us to extract the marginal probabilities P( F ) and
P( F ) as 0.06 and 0.94 respectively.
So by Bayes Theorem we can calculate the conditional probabilities. In particular, we want
P( E | F ) =
P( E ∩ F )
0.02
1
=
= .
P( F )
0.06
3
11
Example 2 - Kidney stones
Kidney stones are small (< 2cm diam) or large (> 2 cm diam). Treatment can succeed or
fail. The following data were collected from a sample of 700 patients with kidney stones.
Large (L)
Small (L)
Total
Success (S)
247
315
562
Failure (S)
96
42
138
343
357
700
For a patient randomly drawn from this sample, what is the probability that the outcome
of treatment was successful, given the kidney stones were large?
Clearly we can get the answer directly from the table by ignoring the small stone patients
P( S | L ) =
247
343
or we can go the long way round:
343
,
700
247
,
P( S ∩ L ) =
700
P( S ∩ L )
P( S | L ) =
=
P( L )
P( L ) =
247
700
343
700
=
247
.
343
Example 3 - Multiple Choice Question
A multiple choice question has c available choices. Let p be the probability that the student
knows the right answer, and 1 − p that he does not. When he doesn’t know, he chooses an
answer at random. Given that the answer the student chooses is correct, what is the probability
that the student knew the correct answer?
Let A be the event that the question is answered correctly;
let K be the event that the student knew the correct answer.
Then we require P(K | A).
By Bayes Theorem
P( K | A ) =
P( A | K )P( K )
P( A )
and we know P( A|K ) = 1 and P(K ) = p, so it remains to find P( A).
By the partition rule,
P( A ) = P( A | K )P( K ) + P( A | K )P( K )
1
and since P( A|K ) = , this gives
c
1
P( A ) = 1 × p + × (1 − p ).
c
Hence
p
cp
P( K | A ) =
=
.
1− p
cp + 1 − p
p+ c
Note: the larger c is, the greater the probability that the student knew the answer, given
that they answered correctly.
12
Example 4 - Super Computer Jobs
Measurements at the North Carolina Super Computing Center (NCSC) on a certain day
showed that 15% of the jobs came from Duke, 35% from UNC, and 50% from NC State University. Suppose that the probabilities that each of these jobs is a multitasking job is 0.01, 0.05,
and 0.02 respectively.
1. Find the probabilities that a job chosen at random is a multitasking job.
2. Find the probability that a randomly chosen job comes from UNC, given that it is a multitasking job.
Solution:
Let Ui = “job is from university i”, i = 1, 2, 3 for Duke, UNC, NC State respectively;
let M = “job uses multitasking”.
1.
P( M ) = P( M |U1 )P(U1 ) + P( M |U2 )P(U2 ) + P( M|U3 )P(U3 )
= 0.01 × 0.15 + 0.05 × 0.35 + 0.02 × 0.5 = 0.029.
2.
P( M|U2 )P(U2 )
P( M )
0.05 × 0.35
=
= 0.603.
0.029
P(U2 | M) =
Example 5 - HIV Test
A new HIV test is claimed to correctly identify 95% of people who are really HIV positive
and 98% of people who are really HIV negative. Is this acceptable?
If only 1 in a 1000 of the population are HIV positive, what is the probability that someone
who tests positive actually has HIV?
Solution:
Let H = “has the HIV virus”;
let T = “test is positive”.
We have been given P( T | H ) = 0.95, P( T | H ) = 0.98 and P( H ) = 0.001.
We wish to find P( H | T ).
P( T | H )P( H )
P( T | H )P( H ) + P( T | H )P( H )
0.95 × 0.001
=
0.95 × 0.001 + 0.02 × 0.999
= 0.045
P( H | T ) =
That is, less than 5% of those who test positive really have HIV.
13
Example 5 - continued
If the HIV test shows a positive result, the individual might wish to retake the test. Suppose
that the results of a person retaking the HIV test are conditionally independent given HIV
status (clearly two results of the test would certainly not be unconditionally independent). If
the test again gives a positive result, what is the probability that the person actually has HIV?
Solution:
Let Ti = “ith test is positive”.
P( T1 ∩ T2 | H )P( H )
P( T1 ∩ T2 )
P( T1 ∩ T2 | H )P( H )
=
P( T1 ∩ T2 | H )P( H ) + P( T1 ∩ T2 | H )P( H )
P( T1 | H )P( T2 | H )P( H )
=
P( T1 | H )P( T2 | H )P( H ) + P( T1 | H )P( T2 | H )P( H )
P( H | T1 ∩ T2 ) =
by conditional independence.
Since P( Ti | H ) = 0.95 and P( Ti | H ) = 0.02,
P( H | T1 ∩ T2 ) =
0.95 × 0.95 × 0.001
≈ 0.693.
0.95 × 0.95 × 0.001 + 0.02 × 0.02 × 0.999
So almost a 70% chance after taking the test twice and both times showing as positive. For
three times, this goes up to 99%.
14