Download Chapter 2. SAMPLE SPACES WITH NO STRUCTURE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Chapter 2. SAMPLE SPACES
WITH NO STRUCTURE
In many practical examples, the sample space
has some structure: there are relationships
between the outcomes. In this chapter,
probability theory is developed without
making such assumptions.
2.1 Deductions from Axioms
REMINDER
A1 For any event A, 0 ≤ Pr(A) ≤ 1.
A2 For the event S, Pr(S) = 1.
A3 For any two events A and B satisfying
A ∩ B = ∅,
Pr(A ∪ B) = Pr(A) + Pr(B).
2.1.1 THEOREM: Pr(A) = 1 − Pr(A).
S
$
'
A
&
%
PROOF : Since A is the complement of A,
A ∪ A = S and A ∩ A = ∅.
Hence, using A3
Pr(A ∪ A) = Pr(A) + Pr(A)
i.e
Pr(A) = Pr(A ∪ A) − Pr(A)
But A ∪ A = S and, by A2, Pr(S) = 1 and
hence
Pr(A) = 1 − Pr(A).
2.1.2 SPECIAL CASE: A = S
This gives
Pr(S) = 1 − Pr(S) = 0 ,
from A2
That is,
Pr(∅) = 0 .
Note: This result can be thought of as
obvious. The important thing is that we
don’t have to assume that it is true: we can
deduce it from the three axioms.
2.1.3 THEOREM: If A ⊃ B then
Pr(A) ≥ Pr(B).
'
$
'
S
$
A
B
&
&
%
%
PROOF: If A ⊃ B, then
A = B ∪ (A ∩ B)
and
B ∩ (A ∩ B) = ∅.
Hence, by A3, Pr(A) = Pr(B) + Pr(A ∩ B).
By A1, Pr(A ∩ B) ≥ 0, so
Pr(A) ≥ Pr(B).
Note: Intuitively, this result is also obviously
true. It is included here to show how, like the
result in §2.1.2, it can be derived from the
axioms.
2.1.4 THEOREM: For any two events A and B,
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).
'
'
$
$
S
B
A
&
&
%
%
PROOF :
A ∪ B = A ∪ (A ∩ B)
and
B = (A ∩ B) ∪ (A ∩ B).
In both cases the RHS contains mutually exclusive
events.
[Use A3 and substitute for Pr(A ∩ B) .]
It is often useful to write an expression as a union of
mutually exclusive events.
Fuller details: Meyer p.14, Arthurs, p.14, Feller, p.23,
Clarke and Cooke, p.134.
2.1.5 Extension of result 2.1.4 to 3 events.
'
A
'
$
$
B
'
$
&
&
S
%
C
%
&
%
Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C)
− Pr(A ∩ B) − Pr(A ∩ C)
− Pr(B ∩ C)
+ Pr(A ∩ B ∩ C)
PROOF: Write A ∪ B ∪ C as A ∪ (B ∪ C) and
apply result 2.1.4 twice.
2.1.6 Further extension to n events.

Pr 
n
[

Ai = sum of individual probabilities
i=1
−probabilities of all pairs
+probabilities of all triples
−···
T
n
n
−(−1) Pr i=1 Ai .
[PROOF: by induction]
COROLLARY: For mutually exclusive
events
Pr(union) = sum of individual probabilities.
Addition Law of Probability
2.2 Sampling Problems
Many applications of ‘symmetry’ probabilities
arise from problems in which a randomising
device is used to select a sample from some
population.
Terminology:
Terms like ‘sample at random’ or ‘select a
random sample’ are often used.
These may sound vague – but in fact they
are very precise. They both mean
‘select a sample in such a way that all
possible samples have exactly the same
chance of being the one selected’.
The order in which the sample members are
selected may or may not be important.
We assume for the moment that it is
important.
With replacement or without replacement
Suppose that a sample of size r is to be
chosen at random from a population of size n.
There are two main possibilities.
A Random sampling with replacement .
1
n
2
n
3
n
···
···
r
n
Each ‘box’ can be filled in n different ways.
The sample of size r can be selected in nr
ways; each possible sample has a probability
1
nr
of being the one selected.
B Random sampling without replacenent
1
n
2
n−1
3
n−2
···
···
r
n−r+1
First sample member: n possibilities
Second sample member: n − 1 possibilities
and so on.
The number of possible samples D is given by
D = n(n − 1)(n − 2) · · · (n − r + 1).
REMINDER:
Factorial n: n! = 1.2.3 . . . (n − 1).n.
n! .
Hence D = (n−r)!
Notes:
1. D is the denominator in probability
calculations.
2. Reminder: The order in which the sample
members are selected is taken as important.
The denominator D is the total number of
ways in which the sample members can be
selected in that order.
Since sampling is done at random, each of
these D samples has exactly the same chance
of being the one selected.
For any given event, the numerator N will be
the number of these samples which result in
the event occurring.
We sometimes call this the number of
samples favourable to the event.
Therefore, for any event,
Pr(event) =
number of favourable permutations
.
total number of permutations
Suppose now that the order in which the
sample members are selected is not
important. Whether a particular event occurs
depends only on which population members
are selected for the sample, and not on the
order in which they are selected.
So far, the numerator N and denominator D
in calculations have been the numbers of
permutations involved. We can now simplify
calculations by using
combinations
instead of permutations in the numerator and
denominator.
If this is done, D becomes the number of
different ways of choosing r items for the
sample out of n:

D=

n!
n 
=
r!(n − r)!
r


n
[We can also write   as nCr .]
r

Reminder: D = 

n 
n!
= r!(n−r)!
.
r
The numerator N is now the number of
combinations favourable to the required
event.
EXAMPLE:
Three cards are selected from a pack, at
random, without replacement.
Events:
A: first ace appears at 3rd card
B: exactly one card is an ace.
We wish to find Pr(A) and Pr(B).
Event A: first ace appears at 3rd card
For A, the order in which the cards are
selected is important.
D = No. of ways of choosing 3 from 52 in order
52!
=
= 52 × 51 × 50.
(52 − 3)!
N = No. of ways favourable to event A
= 48 × 47 × 4.
Hence
Pr(A) =
48.47.4
= 0.0681.
52.51.50
Event B: exactly one card is an ace.
For event B, the order in which the three
cards appear is not important.
Using combinations:




52 
D = 
= 22100,
3

48  4 
N = 
= 4512.
2
1
Hence
Pr(B) =
4512
= 0.2042.
22100
In problems where order is not important, it is still
possible to use permutations – but this usually makes
the calculations more complicated. For event B:
D = 52 × 51 × 50 = 6 × 22100
N = (4 × 48 × 47) + (48 × 4 × 47) + (48 × 47 × 4)
= 6 × 4512.
Hence N/D = 4512/22100, as before.
2.3 Conditional Probability
This topic relates to two (or more) events
associated with the same experiment.
'
'
$
$
S
B
A
&
&
%
%
Two events A and B divide S into 4 regions.
We now consider the form of relationship
between these events.
Example: E – two cards are taken in
sequence, without replacement, from a pack,
at random.
We consider two events
A: the first card is an ace, Pr(A) = 4/52.
B: the second card is an ace, Pr(B) = 4/52.
Suppose now that the first card is examined
and seen to be an ace. What is Pr(B)?
The answer is not
3
.
51
Reminder (from §1.4):
To each event arising out of an experiment, a
number (the probability of that event) is
permanently assigned.
3 represent?
What does the number 51
3 does not arise from an
The ratio 51
experiment alone. It appears as a result of an
experiment being performed and a particular
condition being met.
The experiment is that we choose two cards,
at random, in sequence.
The condition is that the first card chosen is
an Ace.
We can say that the conditional probability
3 .
of B given A is 51
This conditional probability is calculated as
follows
4.3 )
( 52.51
4)
( 52
=
Pr(A ∩ B)
.
Pr(A)
'
'
$
$
S
B
A
&
&
%
%
DEFINITION:
If A and B are two events, then the
conditional probability of B given A is
defined as
Pr(A ∩ B)
Pr(A)
for an event A such that Pr(A) > 0.
NOTATION:
We write the conditional probability of B
given A as Pr(B | A). That is,
Pr(B | A) =
Pr(A ∩ B)
.
Pr(A)
Notes:
1. Conditional probabilities can be
interpreted just as ordinary (often called
marginal ) probabilities:
symmetry
limiting relative frequency
subjective
2. It is often easier to evaluate a conditional
probability than a marginal probability.
This happens in particular for events
resulting from a sequence of actions.
To obtain a marginal probability from a
conditional one the formula is used in this
way:
Pr(A ∩ B) = Pr(B | A). Pr(A).
Note that we also have:
Pr(A ∩ B) = Pr(A | B). Pr(B).
EXAMPLE (Two cards):
Cards are selected at random without
replacement from a pack. Define D as the
event:
D = first ace appears at 2nd card.
Find Pr(D).
SOLUTION:
Define two events:
A = first card chosen is not an ace
B = second card chosen is an ace
Then D
≡
A∩B.
Now Pr(A) = 48/52, and Pr(B | A) = 4/51.
Hence
Pr(D) = Pr(A ∩ B) = Pr(A) Pr(B | A)
=
48 4
16
·
=
= 0.0724.
52 51
221
Extension
The basic result
Pr(A ∩ B) = Pr(A) Pr(B | A)
extends easily to three or more events.
We thus obtain:
Pr(A ∩ B ∩ C) = Pr(A) Pr(B | A) Pr(C | A, B)
and so on.
Applying this to the experiment of drawing
cards at random without replacement, the
argument extends easily to the event:
An: the first ace appears at the nth card.
For example, for the case n = 4,
Pr(A4) =
48 47 46
4
×
×
×
.
52 51 50 49
48·47·46·4 is
Note: Writing this as Pr(A4) = 52·51·50·49
also instructive.
EXAMPLE (Two dice):
Two unbiased dice are thrown.
X: score shown on die 1,
Y : score shown on die 2.
Consider two events:
A:
B:
{Y = 2}
{X < Y }
The probabilities for the four combinations of
results for A, B, A, B are:
B
A
1/36
A
14/36
Total
15/36
B
5/36
16/36
21/36
Total
6/36
30/36
36/36
Hence Pr(A | B) =
Pr(A ∩ B)
1/36
1
=
=
.
Pr(B)
15/36
15
2.4 Independence
In general, for two events A and B,
Pr(B | A) 6= Pr(B).
Example: an unbiased die is thrown:
A = {even}, B = {1, 2, 3}
But sometimes the two probabilities may be
equal:
Example : A = {even}, B = {1, 2}.
DEFINITION: If, for two events A and B,
Pr(B | A) = Pr(B),
then we say that
B is independent of A.
Alternatively, we say that the events
A and B are independent of each other.
Notes:
(1) Independence is reflexive .
If B is independent of A, then
Pr(A ∩ B)
,
Pr(B) = Pr(B | A) =
Pr(A)
Therefore
Pr(A ∩ B) = Pr(A) · Pr(B) .
Dividing both sides by Pr(B), we obtain:
Pr(A) =
Pr(A ∩ B)
= Pr(A | B) .
Pr(B)
So, if A is independent of B, then B is
independent of A, and vice versa.
(2) Interpretation of independence
If A and B are not independent, then
Pr(B | A) 6= Pr(B).
Information that A has occurred changes our
assessment of B.
[It does not alter Pr(B). It does alter our
assessment of the chance that B will occur,
which is affected by our knowledge that A
has occurred.]
But, if A and B are independent, knowledge
about the occurrence of B does not affect
our assessment of A.
(3) Theorem: If A and B are independent,
then so are A and B.
Proof:
Independence ⇒ Pr(A ∩ B) = Pr(A) · Pr(B).
Now, the events (A ∩ B) and (A ∩ B) are
mutually exclusive, and A = (A ∩ B) ∪ (A ∩ B).
Hence, using axiom A3,
Pr(A) = Pr(A ∩ B) + Pr(A ∩ B).
We can therefore write :
Pr(A ∩ B) = Pr(A) − Pr(A ∩ B)
= Pr(A) − Pr(A) · Pr(B)
= Pr(A) · {1 − Pr(B)}
= Pr(A) · Pr(B).
Corollary: If A and B are independent, then
A and B are independent. Also, A and B are
independent.
(4) The Multiplication Law of probability
When events A and B are independent, then
Pr(A ∩ B) = Pr(A) · Pr(B).
In words: The multiplication law states that,
if A and B are independent events, then their
joint probability is the product of the
individual probabilities.
Note: Compare this with the general result
Pr(A ∩ B) = Pr(A) Pr(B | A)
= Pr(B) Pr(A | B),
which holds for all events A and B.
(5) When does independence occur?
In practice, it is often known that two events
are independent, and the multiplication law
can then be used to calculate the joint
probability.
Experiments often consist of a set of quite
independent components, or trials, with
different events relating to different trials.
Example: E is ‘toss a coin, throw a die’
Event A = {Coin shows heads},
Event B = {Die shows a 6}.
If the tossing of the coin and the throw of the die are
unrelated, the events A and B will be independent.
Pr(A ∩ B) = Pr(A) · Pr(B)
1 1
1
=
× =
.
2 6
12
Distinction in the context of sampling:
with replacement, events independent ;
without replacement, not independent.
(6) Pairwise and Mutual Independence
If there are three or more events, it is possible
for all pairs to be independent, but for there
to be a more complex type of dependence.
Example: Toss two fair coins independently,
and define events as follows:
Event A: Coin 1 shows Heads
Event B: Coin 2 shows Heads
Event C: Exactly one coin shows Heads
Clearly Pr(A) = Pr(B) = 1
2 , and A and B are
1.
independent, so that Pr(A ∩ B) = 4
Now C = (A ∩ B) ∪ Pr(A ∩ B). So
Pr(C) = Pr(A ∩ B) + Pr(A ∩ B) = 1
2.
Also A ∩ C = A ∩ B, so Pr(A ∩ C) = 1
4.
Therefore events A and C are independent.
Also B and C are independent.
But what about the event A ∩ B ∩ C?
Distinction: In this example, we can say that
events A, B and C are pairwise independent,
but they are not mutually independent.
In practice, pairwise independent events are
almost always mutually independent (e.g.
events from different components of an
experiment).
Definition: Events A1, A2, . . . An are
mutually independent if and only if
Pr(Ai ∩ Aj ) = Pr(Ai) Pr(Aj ),
i 6= j,
Pr(Ai ∩ Aj ∩ Ak ) = Pr(Ai) Pr(Aj ) Pr(Ak ),
i 6= j, i 6= k, j 6= k,

Pr 
n
\
i=1
 ···
Ai =
n
Y
Pr(Ai),
i=1
i.e. if all subsets obey the multiplication law.
2.5 Two Important Theorems
Consider a set A1, A2, . . . Ak of mutually
exclusive and exhaustive events.
Let B be some other event from the same
experiment.
Law of Total Probability:
Pr(B)
Pr(A1) · Pr(B | A1)
=
Pr(A2) · Pr(B | A2)
+
+
...
+ Pr(Ak ) · Pr(B | Ak )
=
k
X
i=1
Pr(Ai) · Pr(B | Ai)
PROOF: (illustrated for the case k = 5)
A4
A3
A1
'
$
&
B%
A5
S
A2
Each element of S is a member of one and
only one of the A’s.
Hence, the same is true of each element of
B . We therefore obtain the result:
B ≡ (A1 ∩ B) ∪ (A2 ∩ B) ∪ · · · ∪ (Ak ∩ B).
where the events on the RHS are mutually
exclusive.
Hence Pr(B) =
Pk
i=1 Pr(Ai ∩ B).
But Pr(Ai ∩ B) = Pr(Ai) · Pr(B | Ai)
and so
Pr(B) =
k
X
i=1
Pr(Ai) · Pr(B | Ai).
EXAMPLE:
Three boxes contain certain items: box i
contains ni items, of which di are defective.
In an experiment, one box is chosen at
random. Then, one item is chosen at random
from the chosen box.
Find the probability that the chosen item is
defective, when
n1 = 50, n2 = 100, n3 = 100,
d1 = 5, d2 = 3, d3 = 5.
SOLUTION:
Reminder:
n1 = 50, n2 = 100, n3 = 100,
d1 = 5, d2 = 3, d3 = 5.
Events:
Let Ai = ‘box i is chosen,’ i = 1, 2, 3
Let B = ‘the chosen item is defective’.
1
Then Pr(A1) = Pr(A2) = Pr(A3) =
3
and
5
Pr(B | A1) =
,
50
3
,
Pr(B | A2) =
100
5
.
Pr(B | A3) =
100
Hence
1 5
1
3
1
5
·
+ ·
+ ·
3 50
3 100
3 100
18
=
= 0.06.
300
Pr(B) =
EXAMPLE: Two cards (revisited)
Two cards are selected from a pack of 52,
without replacement.
Event A: first card is an ace
Event B: second card is an ace.
We know that
4
;
52
3
Pr(B | A) =
;
51
Pr(A) =
Pr(A) =
48
52
Pr(B | A) =
4
.
51
Because A and A are mutually exclusive and
exhaustive events, it follows that
Pr(B) = Pr(B | A) Pr(A) + Pr(B | A) Pr(A)
3
4
4 48
=
·
+
·
51 52
51 52
4
=
.
52
Application of the Law of Total Probability:
The example concerns a 2-stage experiment.
Stage 1: A random choice is made, and
either A or A occurs.
Stage 2: A further random choice is made,
and B may occur.
We wish to find Pr(B), but it is easier to find
Pr(B | A) and Pr(B | A), since the result of
Stage 1 influences what happens in Stage 2.
Simple extension: in Stage 1, we have a set
of mutually exclusive and exhaustive events
A1, A2, . . . , Ak , rather than just two such
events (A and A).
Further extension to multi-stage experiments:
at stage 1, one of A1, A2, . . . , Ak occurs;
at stage 2, one of B1, B2, . . . , Bj occurs;
at stage 3, 4, . . .
at stage n, some event N may occur, the
conditional probability depending on
which of the As, Bs etc. occurred.
BAYES’ THEOREM
If A1, A2, . . . , Ak are mutually exclusive and
exhaustive events, and if B is an event based
on the same experiment, then
Pr(Ai) Pr(B | Ai)
Pr(Ai | B) = nP
k
j=1 Pr(Aj ) Pr(B | Aj )
o.
PROOF
By definition , Pr(Ai ∩ B) = Pr(Ai) Pr(B | Ai) ,
= Pr(B) Pr(Ai | B) .
Equating the two right-hand sides, we obtain:
Pr(Ai) Pr(B | Ai)
Pr(Ai | B) =
,
Pr(B)
= nP
Pr(Ai) Pr(B | Ai)
k
j=1 Pr(Aj ) Pr(B | Aj )
using the law of total probability.
Applications of Bayes’ theorem:
time–reversal,
assessment of evidence,
more advanced statistical methods
o,
EXAMPLE (continuation)
Ai: ‘box i chosen’, B: ‘item is defective’.
The calculations can be laid out most easily
in tabular form, as follows:
Box (i)
ni
di
1
50
5
2
100
3
3
100
5
Pr(B | Ai)
5
50
3
100
5
100
Pr(Ai)
1
3
1
3
1
3
Pr(Ai ∩ B)
Pr(Ai | B)
5
50
·
10
18
1
3
3
100
·
3
18
1
3
5
100
·
1
3
5
18
Pr(Ai): initial, or prior , probability.
Pr(Ai | B): final, or posterior , probability.
The prior probability is adjusted by using the
evidence provided by the data B.
Exercise: Find Pr(Ai | B) for the cards
example.
CHAPTER 2 SUMMARY
• Probability theory is developed in general by
deduction from the three axioms.
• Several important general results, e.g.
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).
• Concepts of random sampling with
replacement and without replacement.
• Conditional probability:
Pr(A ∩ B)
Pr(B | A) =
.
Pr(A)
• Independence: Pr(A ∩ B) = Pr(A) × Pr(B);
pairwise and mutual independence.
• Law of total probability; Bayes’ theorem.