Download §1.1 Probability, Relative Frequency and Classical Definition

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
§1.1 Probability, Relative Frequency and Classical Definition.
Probability is the study of random or non-deterministic experiments. Suppose an experiment
can be repeated any number of times, so that we can produce a series of independent trials under
identical conditions. This accumulated experience reveals a remarkable regularity of behaviour in
the following sense. In each observation, depending on chance, a particular event A either occurs
or does not occur. Let n be the total number of observations in this series and let n(A) denote the
number of times A occurs. Then the ratio n(A)/n is called the relative frequency of the event A
(in this given series of independent and identical trials). It has been empirically observed that the
relative frequency becomes stable in the long run. This stability is the basis of probability theory.
That is, there exists some constant P (A), called the probability of the event A, such that for large
n, P (A) ∼ n(A)/n. In fact, the ratio approaches this limit such that
P (A) = lim n(A)/n.
n→∞
Although the preceding definition is certainly intuitively pleasing and should always be kept in
mind, it possesses a serious drawback: How do we know that n(A)/n will converge to some constant
limiting value that will be the same for each possible sequence of repetitions of the experiment?
Proponents of this relative frequency definition of probability usually answer this objection by
stating that the convergence of n(A)/n to a constant limiting value is an assumption, or an axiom,
of the system. However, to assume that n(A)/n will necessarily converge to some constant value
seems to be a very complex assumption, even though this convergence is indeed backed up by many
of our empirical experiences.
Historically, probability theory began with the study of games of chance. The starting point is
an experiment with a finite number of mutually exclusive outcomes which are equiprobable or equally
likely because of the nature and setup of the experiment. Let A denote some event associated with
certain possible outcomes of the experiment. Then the probability P (A) of the event A is defined
as the fraction of the outcomes in which A occurs. That is,
P (A) = N (A)/N,
where N is the total number of outcomes of the experiment and N (A) is the number of outcomes
leading to the occurrence of event A. For example, in tossing a fair or unbiased die, let A be the
event of getting an even number of spots. Then N = 6 and N (A) = 3. Hence P (A) = 1/2.
This classical definition of probability is essentially circular since the idea of “equally likely” is
the same as that of “with equal probability”, which has not been defined. The modern treatment
of probability theory is purely axiomatic. This means that the probabilities of our events can be
prefectly arbitrary, except that they must satisfy a set of simple axioms listed below. The classical
theory will correspond to the special case of so-called equiprobable spaces.
§1.2 Sample Space and Events.
The set Ω of all mutually exclusive outcomes of some given random experiment is called the
sample space. An elementary outcome ω, which is an element of Ω, is called a sample point. An
event A is assoicated with the sample point ω if, given ωεΩ, we can always decide whether ω leads
1
2
to the occurrence of A. That is, event A occurs if and only if ωεA. Hence event A consists of a
set of outcomes or, in other words, simply a subset of the underlying sample space Ω. The event
{ω} consisting of a single sample point ωεΩ is called an elementary event. The empty set φ and Ω
itself are events; φ is sometimes called the impossible event, and Ω the certain or sure event.
Two events A and B are idential if A occurs iff B occurs. That is, they lead to the same
subset of Ω. Two events A and B are mutually exclusive or incompatible if A and B cannot occur
simultaneously. That is, A ∩ B = φ.
If the sample space Ω is finite or countably infinite, then every subset of Ω is an event. On the
other hand, if Ω is uncountable, we shall be concerned with the σ-algebra E of subsets of Ω, which
is closed under the operations of set union, set intersection, set difference and set complement.
§1.3 Axioms of Probability.
Let Ω be a sample space, let E be the class of events, and let P be a real-valued function
defined on E. Then P is called a probability function, and P (A) is called the probability of the
event A if the following axioms hold:
Axiom 1. For every event A, 0 ≤ P (A) ≤ 1.
Axiom 2. P (Ω) = 1.
Axiom 3. If A1 , A2 , . . . is a sequence of mutually exclusive events, then P (A1 ∪ A2 ∪ . . . ) =
P (A1 ) + P (A2 ) + · · · .
The assumption of the existence of a set function P , defined on the events of a sample space
and satisfying Axioms 1, 2 and 3 constitutes the modern mathematical approach to probability
theory. The axioms are natual and in accordance with our intuitive concept of probability as
related to chance and randomness. Furthermore, using these axioms, we shall be able to prove
that the relative frequency of a specific event A will equal P (A) with probability 1 under a series
of independent and identical trials of an experiment. This is the well known result of the strong
law of large numbers in probability theory.
A few easy and useful results concerning the probabilities of events follow.
Proposition 1. P (φ) = 0.
Proposition 2. P (A) = 1 − P (A), where A is the complement of A.
Proposition 3. If A ⊂ B, then P (A) ≤ P (B).
Proposition 4. P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
§1.4 Additive Laws.
For a sequence of n mutually exclusive events A1 , A2 , . . . , An , since Ai ∩ Aj = φ for i 6= j, we
have the additive law for mutually exclusive events
P(
n
[
k=1
Ak ) =
n
X
P (Ak ).
k=1
For a set of arbitrary events, there is the more general result stating that the probability of the
union of n events equals the sum of the probabilities of these events taken one at a time, minus
the sum of the probabilities of the events taken two at a time, plus the sum of the probabilities of
these events taken three at a time, and so on.
3
Theorem. Given n events A1 , A2 , . . . , An in Ω, let
P1 =
n
X
P (Ai )
i=1
X
P2 =
P (Ai Aj )
1≤i<j≤n
X
P3 =
P (Ai Aj Ak )
1≤i<j<k≤n
..
.
Pn = P (A1 A2 · · · An )
Then
P(
n
[
Ak ) = P1 − P2 + P3 − · · · + (−1)n+1 Pn .
k=1
Proof.
The proof is by induction on the number of events. For n = 2, we readily have P (A1 ∪ A2 ) =
P (A1 ) + P (A2 ) − P (A1 A2 ). Suppose the statement holds for n − 1 events A2 , A3 , . . . , An , i.e.
P(
n
[
Ak ) =
X
P (Ai ) −
2≤i≤n
k=2
X
X
P (Ai Aj ) +
2≤i<j≤n
P (Ai Aj Ak ) − · · ·
2≤i<j<k≤n
Hence
P(
n
[
A1 Ak ) =
P(
P (A1 Ai ) −
Sn
n
[
k=1
Ak = A1 ∪ (
Sn
k=2
Ak ) = P (A1 ) + P (
k=1
X
n
[
X
Ak ) − P (
P (Ai ) −
2≤i≤n
−
X
P (A1 Ai Aj Ak ) − · · ·
2≤i<j<k≤n
Ak ), we have
k=2
= P (A1 ) +
X
P (A1 Ai Aj ) +
2≤i<j≤n
2≤i≤n
k=2
Noting that
X
n
[
A1 Ak )
k=2
X
P (Ai Aj ) +
2≤i<j≤n
P (A1 Aj ) +
2≤j≤n
X
X
P (Ai Aj Ak ) − · · ·
2≤i<j<k≤n
P (A1 Aj Ak ) − · · ·
2≤j<k≤n
= P1 − P2 + P3 − · · ·
Example. Suppose n students have n identical raincoats all put together. Each student selects a
raincoat at random.
1. What is the probability that at least one raincoat ends up with its original owner?
2. What is the probability that exactly k students select their own raincoats?
Let A be the event that at least one raincoat ends up with its original owner. Let Ak be the
event that the k th student selects his own (the k th ) raincoat. Then A = ∪nk=1 Ak .
4
Every outcome of this experiment consisting of “randomly selecting the raincoats” is described
by a permutation (i1 , i2 , . . . , in ), where ik is the index of the raincoat selected by the k th student.
For m ≤ n, the event Ak1 Ak2 · · · Akm occurs whenever ik1 = k1 , ik2 = k2 , . . . , ikm = km and
the other n − m indices of ik take the remaining n − m values in any order. (In words, students
k1 , k2 , . . . , km select their own raincoats.) Therefore
P (Ak1 Ak2 · · · Akm ) =
N (Ak1 Ak2 · · · Akm )
(n − m)!
=
N
n!
n
for any set of m indices k1 , k2 , . . . , km . There are exactly Cm
distinct events of the type Ak1 Ak2 · · ·
Akm . It follows that
X
Pm =
n
P (Ak1 Ak2 · · · Akm ) = Cm
1≤k1 <k2 <···<km ≤n
(n − m)!
1
=
.
n!
m!
Hence
P (A) = P (
n
[
Ak ) = P1 − P2 + · · · + (−1)n+1 Pn = 1 −
k=1
1
1
1
+ − · · · + (−1)n+1 .
2! 3!
n!
(This is seen to be the partial sum of the power series expansion of 1 − e−1 .) Thus, for large n,
P (A) ∼ 1 − e−1 ∼ 0.632. The probability that none of the students selects his own raincoat equals
1
1
1
P (A) = 1 − P (A) = 2!
− 3!
+ · · · + (−1)n n!
, which is the well-known derangement number of order
n. For large n, this is approximately equal to e−1 ∼ 0.368. (An incorrect intuition might leads us
to think that P (A) would go to 1 as n goes to infinity.)
To obtain the probability of event B such that exactly k students select their own raincoats,
we first note that the number of ways in which a particular set of k students selecting their own
is equal to the number of ways in which none of the other n − k students selects his own raincoat.
This is equal to
(n − k)! [derangement number of order n − k]
1
1
1
=(n − k)! [ − + · · · + (−1)n−k
]
2! 3!
(n − k)!
As there are Ckn possible selections of a group of k students, it follows that there are
Ckn (n − k)! [
1
1
1
− + · · · + (−1)n−k
]
2! 3!
(n − k)!
ways in which exactly k of the students select their own raincoats. Thus
P (B) = Ckn (n − k)! [
Pn−k
=
i=0
1
1
1
− + · · · + (−1)n−k
]/n!
2! 3!
(n − k)!
(−1)i /i!
,
k!
which for n large is approximately e−1 /k!. The values e−1 /k!, k = 0, 1, . . . are of importance as
they represent the values associated with the Possion distribution with parameter equal to unity.
This will be elaborated upon when the Poisson approximation is discussed.
5
§2.1 Continuity Property of Probability Function.
Theorem. If A1 , A2 , . . . is an increasing sequence of events such that A1 ⊂ A2 ⊂ · · · , then
denoting limn→∞ An = ∪k Ak , we have
P(
[
Ak ) = P ( lim An ) = lim P (An ) .
n→∞
k
n→∞
Proof.
Let us define events B1 , B2 , . . . as follows. B1 ≡ A1 , B2 ≡ A2 −A1 = A2 −B1 , B3 ≡ A3 −A2 =
A3 − (B1 ∪ B2 ), . . . , Bn = An − ∪n−1
k=1 Bk , . . . . Thus
(a) B1 , B2 , . . . are mutually exclusive,
(b) ∪k Bk = ∪k Ak , and
(c) An = ∪nk=1 Bk
− − − − − − − − − − − − A3 − −→|
− − − − − − A2 − −→|
A1 − −→|
B1
|
B2
| B3
|
Hence
n
[
[
X
X
P ( Ak ) = P ( Bk ) =
P (Bk ) = lim
P (Bk )
k
k
= lim P (
n→∞
n→∞
k
n
[
k=1
Bk ) = lim P (An ) .
k=1
n→∞
Theorem. If A1 , A2 , . . . is a decreasing sequence of events such that A1 ⊃ A2 ⊃ · · · , then denoting
limn→∞ An = ∩k Ak , we have
P(
\
k
Ak ) = P ( lim An ) = lim P (An ) .
n→∞
n→∞
Proof.
We have A1 ⊂ A2 ⊂ · · · . Hence P (∩k Ak ) = P (Ω − ∪k Ak ) = 1 − P (∪k Ak ) = 1 −
limn→∞ P (An ) = limn→∞ [1 − P (An )] = limn→∞ P (An ) .
P
For mutually exclusive events A1 , A2 , . . . , we know that P (∪k Ak ) = k P (Ak ) . For arbitrary
events, there is the following.
P
Theorem. P (∪k Ak ) ≤ k P (Ak ) for arbitrary events A1 , A2 , . . . .
Proof.
n−1
Again we define Bn = An − ∪k=1
Bk as before. Then we have ∪k Ak = ∪k Bk and Bk ⊂ Ak for
P
P
all k . Hence P (∪k Ak ) = P (∪k Bk ) = k P (Bk ) ≤ k P (Ak ) .
6
Remark.
It is not only for increasing or decreasing sequence of events that we can define a limit. In
general, for any sequence of events {An , n ≥ 1}, define
T∞ S∞
(a) lim supn→∞ An = n=1 i=n Ai .
[Interpretation: This consists of all points that are contained in an infinite number of the
events An , n ≥ 1.]
S∞ T∞
(b) lim inf n→∞ An = n=1 i=n Ai .
[Interpretation: This consists of all points that are contained in all but a finite number of the
events An , n ≥ 1.]
Definition. We say that limn→∞ An exists if
lim sup An = lim inf An (≡ lim An ) .
n→∞
n→∞
n→∞
Theorem. If limn→∞ An exists, then P (limn→∞ An ) = limn→∞ P (An ) .
Proof.
(See S.M. Ross, “A First Course in Probability”, pp 38-40.)
§2.2 Recurrence of Events.
Theorem. (First Borel-Cantelli Lemma)
Given a sequence of events A1 , A2 , . . . with probabilities P (Ak ) = pk , k = 1, 2, . . . . Suppose
P
k pk < ∞ converges. Then, with probability 1 only finitely many of the events A1 , A2 , . . . occur.
Proof.
Let B be the event that infinitely many of the events A1 , A1 , . . . occur. Let Bn = ∪k≥n Ak , i.e.
at least one of the events An , An+1 , . . . occurs. Then B occurs iff Bn occurs for every n = 1, 2, . . .
(That is ωεB iff ωεB1 ∩ B2 ∩ · · · ). Hence B = ∩n Bn = ∩n (∪k≥n Ak ) .
Now as B1 ⊃ B2 ⊃ · · · is a decreasing sequence, we have
P (B) = P (∩n Bn ) = lim P (Bn ) .
n→∞
P
P
But P (Bn ) = P (∪k≥n Ak ) ≤
k≥n P (Ak ) =
k≥n pk approaches zero as n approaches ∞ .
Hence P (B) = limn→∞ P (Bn ) = 0, or P (B) = 1 − P (B) = 1 as asserted.
Theorem. (Second Borel-Cantelli Lemma)
Given a sequence of independent events A1 , A2 , . . . with probabilities P (Ak ) = pk , k = 1, 2, . . . .
P
Suppose k pk = ∞ diverges . Then, with probability 1 infinitely many of the events A1 , A2 , . . .
occur.
Proof.
Let Bn = ∪k≥n Ak and B = ∩n Bn = ∩n (∪k≥n Ak ), so that B occurs iff each Bn occur (i.e.
infintiely many of A1 , A2 , . . . , occur). Taking complements yields B n = ∩k≥n Ak , B = ∪n B n =
∪n (∩k≥n Ak ) .
7
Now B n = ∩k≥n Ak ⊂ ∩n+m
k=n Ak , ∀ m = 0, 1, 2, . . . . Therefore
n+m
\
P (B n ) ≤ P (
Ak ) = P (An )P (An+1 ) · · · P (An+m )
k=n
= (1 − pn )(1 − pn+1 ) · · · (1 − pn+m )
≤ e−pn e−pn+1 · · · e−pn+m = exp(−
n+m
X
pk ) ,
k=n
Pn+m
where we used the fact that 1 − x ≤ e−x for x ≥ 0 . But k=n pk approaches ∞ as m approaches
∞, hence
n+m
X
P (B n ) = lim P (B n ) ≤ lim exp(−
pk ) = 0, ∀ n = 1, 2, .
m→∞
Finally P (B) ≤
P
n
m→∞
k=n
P (B n ) = 0 implies that P (B) = 1 − P (B) = 1 .
§3.1 Conditional Probability as a Probability Function.
In terms of the dependency of two events A and B, we have:
(a) A and B are mutually exclusive, i.e.AB = φ and P (AB) = 0. Hence A never occurs if B does.
(b) A has complete dependence on B, i.e. A ⊃ B and P (A) ≥ P (B) . Hence A always occurs if B
does.
(c) A and B are independent, i.e. P (AB) = P (A)P (B). Here the outcomes of A and B are not
influenced by that of the other.
More generally, we define the conditional probability of A given B (with P (B) > 0) by P (A|B) =
P (AB)/P (B) . Thus in terms of conditional probabilities:
(a) P (A|B) = 0 if A and B are mutually exclusive.
(b) P (A|B) = 1 if A is implied by B .
(c) P (A|B) = P (A) if A and B are independent.
Conditional probabilities satisfy all the properties of ordinary probabilities. As a set function,
P (A|B) satisfies the three axioms while restricting the set of possible outcomes (sample space) to
those that are in B .
Theorem. Suppose B ⊂ Ω is not an impossible event, i.e. P (B) > 0.
(a) 0 ≤ P (A|B) ≤ 1 .
(b) P (Ω|B) = 1 .
P
(c) If A1 , A2 , . . . are mutually exclusive events, then P (∪k Ak |B) = k P (Ak |B) .
Proof.
(a) We have 0 ≤ P (AB)/P (B) ≤ P (B)/P (B) = 1 .
(b) P (ΩB)/P (B) = P (B)/P (B) = 1 .
(c) We have A1 B, A2 B, . . . being mutually exclusive, hence
P
P
P (∪k Ak B)/P (B) = k P (Ak B)/P (B) = k P (Ak |B) .
8
The upshot is that all properties about ordinary probability function are true for conditional
probability function. For example, it easily follows that P (A1 ∪ A2 |B) = P (A1 |B) + P (A2 |B) −
P (A1 A2 |B); and P (A1 |B) ≤ P (A2 |B) if A1 ⊂ A2 .
§3.2 Total Probability Formula and Conditioning Argument.
We say that B1 , B2 , . . . form a full set of events if they are mutually exclusive and collectively
exhaustive, i.e. one and only one of B1 , B2 , . . . always occur and ∪k Bk = Ω .
A powerful computation tool for the probability of an event is by way of the conditioning
argument:
X
P (A) =
P (A|Bk )P (Bk ) .
k
The argument is simple. As A = AΩ = ∪k ABk , the total probability formula yields
[
X
P (A) = P ( ABk ) =
P (ABk ) ,
k
k
since B1 , B2 , . . . form a full set. In fact, many well-knwon formulas in probability theory are direct
consequences of this.
Proposition. (Baye’s Formula)
Suppose B1 , B2 , . . . form a full set of events, then
P (Bk |A) =
P (Bk A)
P (Bk )P (A|Bk )
.
=P
P (A)
k P (Bk )P (A|Bk )
Example. (The Gambler’s Ruin)
Consider the game of calling “head or tail” on a toss of a fair coin. A correct (incorrect) call wins
(loses) a dollar. Suppose the gambler’s initial capital is k dollars and the game continues until
he either wins an amount of m dollars, stipulated in advance, or else loses all his capital and is
“ruined”. We would like to calculate the probability p(k) that the player will be ruined, starting
with 0 < k < m dollars.
Let B1 be the event that the player wins the first call, and B2 be that he loses the first call.
Hence B1 and B2 form a full set for one play of the game. Now P (B1 ) = P (B2 ) = 1/2 . Also
P (Ruin |B1 ) = p(k + 1) and P (Ruin |B2 ) = p(k − 1) . Hence by conditioning on the first call, we
have
P (Ruin) = P (Ruin |B1 )P (B1 ) + P (Ruin |B2 )P (B2 ) .
That is, p(k) = p(k + 1)/2 + p(k − 1)/2, 1 ≤ k ≤ m − 1 . Solution to this set of difference equations
has the form of p(k) = c1 + c2 k . Using the boundary conditions that p(0) = 1 and p(m) = 0, we
have c1 = 1 and c2 = −1/m . Hence p(k) = 1 − k/m , 0 ≤ k ≤ m , is the required probability.
m−k
[For general winning probability 0 < p < 1, p(k) = 1−(p/q)
1−(p/q)m . See S.M. Ross, “A First Course
in Probability, pp. 62-66.]