Download Probability Theory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Birthday problem wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Expected value wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Probability Theory
Robert R. Snapp
January 13, 2009
1
Informal Set Theory
A set is defined as collection of objects, or elements. Familiar examples are the set of greek letters, f˛; ˇ; !; : : : ;
!g, the set of binary values, B D f0; 1g, and the set of integers Z D f: : : ; !1; 0; 1; : : :g, the set of positive integers
(or natural numbers) N D ZC D f1; 2; 3; : : :g, the set of negative integers Z! D f!1; !2; !3; : : :g, and the set of
nonpositive integers Z" D f0; 1; 2; : : :g. Ellipses (the symbol “: : :”) can represent either a finite or infinite number of
implied elements. We will often use capital Roman letters, such as A; B; C to represent different sets, and lowercase
letters (x; y; z) to represent elements. When defining sets using the brace notation, e.g., fx; y; zg, it is important to
adhere to the rule that each element should only appear once.
If an element x belongs to a set A, we then state that x is a member of A. More concisely, we write x 2 A (or
A 3 x). This notation allows us to define the set of rational numbers as
Q D fm=n W m; n 2 Z and n ¤ 0g:
(The above notation, fx W P .x/g, denotes the set of all elements x such that the proposition P .x/ is true.) For
completeness, recall that the symbol R denotes the set of real numbers, and C, the set of complex numbers. Thus
" 2 R and 1 C i 2 C.
If an element x is not a member of A, we write x … A. For example, 0 … N
If every member of A is also a member of B, we say that A is a subset of B, and write A " B (or B # A). If A is
not a subset of B, we write A › B.
If every member of A belongs to B, and B contains at least one member that does not belong to A, we say that A
is a proper subset of B, and write A $ B (or B % A). Thus, N $ Z because all natural numbers are also integers,
but not all integers are natural numbers.
Two sets A and B are said to be equal if the contain exactly the same elements, or alternatively if A " B and
B " A. In this case we write A D B.
Note that the set relations ", $, and D are transitive: A " B and B " C imply that A " C .
A set without members is called the empty set, and is designated by the special symbol ;, or more pictorially by fg.
On occasion we may find it useful to consider sets that have other sets as members. In such cases a set of sets is
often called a class or family, and a set of classes or families, a collection. An example of a class is the power set of
A, denoted by P .A/, which is defined as the set of all subsets of A:
P .A/ D fB W B " Ag:
Thus, for example P .B/ D f;; f0g; f1g; f0; 1gg.
The cardinality of a set A, denoted by jAj, is defined to be the number of members it contains. Thus, j;j = 0,
jBj D 2 and jNj D 1.
A set A is said to be finite if jAj < 1. A set is said to be denumerable if its elements correspond one-to-one with
the natural numbers.
E XERCISE 1 Show that Z and Q are denumerable sets.
1
1 INFORMAL SET THEORY
The German mathematician Georg Cantor (1845–1918) created quite a stir in mathematics by showing that it was
not possible to construct a one-to-one correspondence between the natural numbers, and a bounded interval of real
numbers, e.g., the unit interval Œ0; 1#. In fact, there are many more numbers in the unit real interval, than there are in
the set N 1 . He thus demonstrated that it is incorrect to assume that there is only one infinity, 1. Rather there is an
infinite hierarchy of infinities, @0 < @1 < @2 < & & & , called “aleph null,” “aleph one,” etc. This result is more than
esoteric. It forces us to treat real-valued random variables with special care in order to avoid absurd conclusions.
Cantor’s showed that the reals are not denumerable using proof by contradiction. For example, let’s assume that
the numbers in the real interval can be placed into a one-to-one correspondence with the naturals. In the table below,
naturals appear on the left, the reals in [0, 1] on the right. We represent each number on the right by its decimal
expansion, which is non-repeating for the transcendental reals, such as " ! 3. Also note that we don’t care about
placing the reals in any particular order.
N
1
2
3
4
5
6
::
:
Œ0; 1# D fx 2 R W 0 ' x ' 1g
0.01250987362839402938420& & &
0.98983749283497383223384& & &
0.73397403982739483333049& & &
0.14159265358979323846264& & &
0.25000000000000000000000& & &
0.12345678901234567890123& & &
::
:
Now if our assumption is correct, every real number in the unit interval appears somewhere in the right column.
However, Cantor discovered that by manipulating the diagonal digits (which appear above in bold) we can easily create
a number that does not appear in the right column. Let’s create a new number z that has a different first digit than the
first number, a different second digit than the second, and so on, for example z D 0:194617 & & & . Since this number
has a digit that differs from every number in the table, it cannot appear in the right hand side. (For example, if I were
to claim that z actually appears on the right side in the 145,763,804th row, you could rightly say, “that’s impossible,
because the 145,763,804th digit of z differs from the number that appears in that row.” Since this diagonalization trick
can be applied to any attempted one-to-one correspondence, the real numbers are not denumerable.
An interesting corollary is that the set of infinite sequences of coin tosses is also not denumerable. Note that ever
real number x in the unit interval can be represented as a infinite dyadic series:
xD
b2
b3
b1
C 2 C 3 C &&&
2
2
2
where bk 2 f0; 1g, for k D 1; 2; : : :, which is equivalent to the binary representation, x D 0:b1 b2 b3 & & & . Now simply
associate 0 with T and 1 with H , to construct a one-to-one correspondence between the reals in Œ0; 1# and the set of
infinite sequences of coin tosses. This has important implications for many stochastic processes.
1.1
Set Operations
Any two sets can be combined to define a new set in a variety of ways. The union of A and B, denoted symbolically
by A [ B, consists of the members that belong to at least one of the two sets. Thus,
A [ B D fx W x 2 A or x 2 Bg:
For example,
f1; 2; 3; 4g [ f3; 4; 5; 6g D f1; 2; 3; 4; 5; 6g:
1 His
method used a powerful diagonalization technique, that eventually lead to the even greater stir in logic caused by Kurt Gödel.
January 13, 2009 (1:57 PM)
2
Robert R. Snapp © 2008
1 INFORMAL SET THEORY
The intersection of A and B, denoted by A \ B, consists of the members that are common to both sets. Thus,
A \ B D fx W x 2 A and x 2 Bg:
Thus, for example,
f1; 2; 3; 4g \ f3; 4; 5; 6g D f3; 4g:
Two arbitrary sets, A and B are said to be disjoint if A \ B D ;.
The difference between A and B, denoted by A n B, consists of the members of A that do not belong to B. Thus,
A n B D fx W x 2 A and x … Bg:
For example,
f1; 2; 3; 4g n f3; 4; 5; 6g D f1; 2g;
and
f3; 4; 5; 6g n f1; 2; 3; 4g D f5; 6g;
The symmetric difference between A and B, denoted by A 4 B, consists of the elements that are unique to each
set. Thus,
A 4 B D .A n B/ [ .B n A/;
whence,
f1; 2; 3; 4g 4 f3; 4; 5; 6g D f1; 2; 5; 6g;
1.2
Venn Diagrams
Ω
Ω
B
Bc
A
Ac
Figure 1: A Venn diagram of two sets, B $ A.
Figure 2: The complements of the previous two sets. The
crosshatching evident in the region corresponding the Ac
suggests that
B " A ) Ac $ B c .
The English logician, John Venn (1834–1923) introduced the use of two-dimensional diagrams to help visualize
abstract set relations and operations (Figs. 1 through 3). In each of these diagrams is constructed in the context of an
abstract universal set, $ that is depicted as the enclosing rectangle. Each subset of $ is represented by a set of points
within the rectangle, usually by an elliptical region. In order to distinguish one subset from the other, their interiors
are sometimes shaded distinctly. In Fig. 1, two intersecting subsets of $ are shown, one as subset A (shaded with
January 13, 2009 (1:57 PM)
3
Robert R. Snapp © 2008
1 INFORMAL SET THEORY
right-handed diagonals.2 ), and the other as subset B (with left-handed diagonals). Because B is a subset of A, the
region interior to B exhibits both kinds of shading. This suggests the hypothesis, B " A ) A \ B D B. However,
it should be emphasized that although Venn diagrams provide a useful heuristic, they do not provide a general proof
of a theorem, as they can only illustrate one particular instance. Thus proofs of set theoretic theorems are algebraic in
nature.
For instance, to prove the above hypothesis, we show that A " B implies both of the following statements: (i)
A \ B " B, and (ii) B " A \ B. To demonstrate (i) we use the definition of set intersection: if x 2 A \ B, then
x 2 A and x 2 B. This, in turn, by the definition of subset, implies that A \ B is a subset of B (as well as of A).
To demonstrate (ii) we begin with the definition of B " A, which implies x 2 B ) x 2 A. Since now x 2 A and
x 2 B, it follows that x 2 A \ B. Consequently, by the definition of subset, B $ A \ B.
Given a particular context, one defines the universal set, $, appropriately. Thus, when discussing subsets of real
numbers, $ D R, for subsets of playing cards
$ D fA|; 2|; : : : ; K|; A}; 2}; : : : ; K}; A~; 2~; : : : ; K~; A(; 2(; : : : ; K(g:
Once $ has been defined, it is possible to introduce a new set operation, the complement, which is defined by
Ac D fx 2 $ W x … Ag:
The complements of sets A and B in Fig. 1 are depicted in Fig. 2. Here, Ac is the region that is exterior to the ellipse
that defines A, and is shaded with right-handed diagonals. Similarly, B c is the region exterior to the circle that defines
B, and is shaded with left-handed diagonals. That the region defining Ac is cross hatched suggests that Ac $ B c , and
thus (perhaps) that B $ A ) Ac $ B c . (Can you prove this?)
One of the more useful theorems from set theory are De Morgan’s Laws:
.A \ B/c D Ac [ B c
.A [ B/c D Ac \ B c
Given a sequence set A1 ; A2 ; : : : ; An ; : : : that is possibly infinite, the mutual union and intersection are defined
respectively as
\
[
An D A1 [ A2 [ & & & ; and,
An D A1 \ A2 \ & & & :
n
n
Using induction, one can generalize De Morgan’s Laws for sequences of sets A1 ; A2 ; : : :,
!c
[
\
An D
An c
n
[
n
An
!c
n
D
\
An c
n
A denumerable family of sets fA1 ; A2 ; : :S
:g is said to be mutually disjoint if Ai \ Aj D ; whenever i ¤ j . A
denumerable family is said to be complete if n An D $:
1.3
Limit sets
Given an infinite sequence of sets A1 ; A2 ; A3 ; : : :, we define the limit superior as
lim sup An D
n
1 [
1
\
Ak ;
nD1 kDn
2 Here,
right-handed diagonals run from the upper right to the lower left. They are the diagonals that are easiest to draw with one’s right hand.
Conversely, left-handed diagonals run from the upper left to the lower right.
January 13, 2009 (1:57 PM)
4
Robert R. Snapp © 2008
1 INFORMAL SET THEORY
Ω
∪ ∪
A
B
A B C
C
Figure 3: A Venn diagram of three intersecting sets A; B, and C . Set A is represented by right diagonals, set B by left
diagonals, and set C by horizontal rules.
and the limit inferior as
lim inf An D
n
1 \
1
[
Ak :
nD1 kDn
Informally, the limit superior contains those elements that occur in an infinite number of sets, while the limit inferior
contains those elements that occur in all but a finite number of sets.
If
A D lim sup An D lim inf An ;
n
n
then we say that the sequence fAn g has a limit equal to A:
lim An D A:
n!1
1.4
Problems
1.1 Evaluate .f1; 2; 3g [ f2; 3; 4g/ \ .f1; 2; 3g n f3; 4; 5g/.
1.2 Create a copy of the Venn diagram that appears in Fig. 3, and label each of the 8 homogeneously shaded region
in terms of A, B, and C . For example, the central region that contains all three shadings (left diagonal, right
diagonal, and horizontal rules) is A \ B \ C .
1.3 Show that .A [ B/ \ C D .A \ C / [ .B \ C /.
1.4 Show that A [ .B \ C / D .A [ B/ \ .A [ C /.
1.5 Show that A $ B and B $ C ) A $ C , (transitivity).
1.6 Show that A 4 B D .A [ B/ n .A \ B/.
1.7 Show that ; " A for every set A.
1.8 Show that .A \ B/ " .A [ B/ for any sets A and B.
1.9 Show that A $ B ) jAj 'j Bj.
1.10 Show that .Ac /c D A.
January 13, 2009 (1:57 PM)
5
Robert R. Snapp © 2008
2 COUNTING
1.11 Show that $c D ; and ;c D $.
1.12 Show that A \ $ D A and A [ ; D A.
1.13 Show that A \ .B n .A \ B// D ;.
1.14 The union of two arbitrary sets can be expressed as the union of two disjoint sets, e.g. A [ B D A [ .B n .A \ B/
(see the preceding problem). Find an analogous decomposition (or partition) for the union of three arbitrary sets
A [ B [ C.
1.15 Show that lim infn An " lim supn An .
2
Counting
(William Feller’s Introduction to Probability Theory and Its Applications, Vol. I [1] provides a more detailed explanation of the following material.)
2.1
Permutations and Combinations
The number of permutations of n distinct objects equals n-factorial: nŠ D n.n ! 1/.n ! 2/ & & & 3 & 2 & 1. (Recall that
0Š D 1.) For example, there are six different permutations of the greek letters ˛; ˇ; ! :
˛ˇ!
˛!ˇ
ˇ! ˛
ˇ˛!
! ˛ˇ
!ˇ˛
Note that there are 3 different ways to choose the first letter, but once it has been chosen only 2 choices remain for
the second letter, and once those two are chosen, only 1 option remains for the third and final letter. Thus the number
of permutations is 3 & 2 & 1 D 6. This multiplication principle can be extended to solve a multitude of combinatorial
problems.
E XERCISE 2 In how many different p
ways can a standard deck of 52 playing cards be ordered? For problems of this
sort, Stirling’s Formula, nŠ ) nn e !n 2" n, is quite useful.
Now consider the enumeration of permutations with repeated elements, e.g., the sequence ˛˛˛ˇˇ can be reordered
in only ten different ways:
˛˛˛ˇˇ
˛ˇ˛ˇ˛
˛˛ˇ˛ˇ
ˇ˛˛ˇ˛
˛ˇ˛˛ˇ
˛ˇˇ˛˛
ˇ˛˛˛ˇ
ˇ˛ˇ˛˛
˛˛ˇˇ˛
ˇˇ˛˛˛
The previous formula, 5Š D 120 grossly over counts the number of distinct sequences by a factor of 3Š & 2Š, the number
of (identical) permutations of 3 ˛s, multiplied by the number of (identical) permutations of 2 ˇs. Thus number of
distinct permutations of ˛˛˛ˇˇ is 5!/(3! 2!) = 10, in agreement with the constructive enumeration above.
More generally the number of permutations of N˛ ˛s, Nˇ ˇs, : : :, N! !s is
.N˛ C Nˇ C & & & C N! /Š
:
N ˛ Š Nˇ Š & & & N! Š
This ratio arises from the observation that .N˛ C Nˇ C & & & C N! /Š is a gross over count, as it includes all duplicate
character sequences. Note that the denominator expresses the number of duplicates of each sequence, as there are N˛ Š
indistinguishable orderings of the N˛ ˛s, Nˇ Š indistinguishable orderings of the Nˇ ˇs, etc. Thus, the above ratio,
also known as a multinomial coefficient:
!
.N˛ C Nˇ C & & & C N! /Š
N˛ C Nˇ C & & & C N !
D
N˛ Nˇ & & & N!
N ˛ Š N ˇ Š & & & N! Š
January 13, 2009 (1:57 PM)
6
Robert R. Snapp © 2008
2 COUNTING
yields the correct answer.
For sequences having only two distinct symbols (e.g. ˛˛˛ˇˇ), the multinomial coefficient reduces to the more
familiar binomial coefficient:
!
n.n ! 1/ & & & .n ! k C 1/
n
nŠ
D
:
D
kŠ .n ! k/Š
kŠ
k
Sequences with two symbols can also be used to represent subset inclusion. For example, suppose we would like
to count the number of possible five-card poker hands. In the diagram below, an ˛ is placed immediately below each
particular card that is included in the poker hand A |; 2 |; 3 |; 4 |; 5 |. A ˇ is place immediately below each of the
remaining 47 cards not included in the hand. Clearly there is a one-to-one correspondence between the set of five-card
poker hands, and set of 52 character permutations consisting of 5 ˛s and 47 ˇs. Thus the number of poker hands is
!
52
D 2; 598; 960:
5
More generally, the number of ways of selecting subsets of size k from a universal set of size n, (or the number of
combinations), is
!
n
n
:
Ck D
k
A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K A 2 3 4 5 6 7 8 9 10 J Q K
| | | | | | | | | | | | | } } } } } } } } } } } } } ! ! ! ! ! ! ! ! ! ! ! ! ! ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
˛ ˛ ˛ ˛ ˛ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ
2.2
Occupancy Problems
Many common problems in discrete probability can be reduced to occupancy problems. Consider the problem of
placing n labeled tokens into k labeled bins. In how many ways can this be done? Since each token can be assigned
to a bin in k distinguishable ways, independent of the rest, the multiplication principle yields k n arrangements.
Example 1 What is the cardinality of the set of n-bit binary numbers? Here each of the n bits plays the role of a
token, and the possible values, 0 or 1, assumes the role of a bin. Thus k D 2, and so the answer is 2n . Likewise, there
are 2n different possible outcomes from a sequence of n tosses of a single coin.
A slightly more difficult situation occurs in the event that the n tokens are indistinguishable from one-another. For
example, let k = 6 and n D 8, and let vertical bars denote the dividers between the bins, and let disks (*) denote the
tokens. One possible arrangement is
j * j **j ****j *jjj
Note that k C 1 D 7 dividers are used to delinate the k D 6 bins. Moreover the two outer dividers are not really
necessary, since no token can be placed to the left of the first divider, nor to the right of the last one. Thus, this
configuration can be abbreviated by the n D 8 disks and k ! 1 D 6 dividers:
*j * *j * * * *j * jj
Since there is a one-to-one correspondence between the set of these configurations and distinguishable strings of n ˛s
and k ! 1 ˇs, the number of possible arrangements is
!
!
nCk!1
nCk!1
.n C k ! 1/Š
:
D
D
nŠ .k ! 1/Š
n
k!1
January 13, 2009 (1:57 PM)
7
Robert R. Snapp © 2008
3 ELEMENTARY PROBABILITY THEORY
2.3
Problems
2.1 How many different five-card hands of poker include four cards of the same rank?
2.2 How many five-card hands of poker include both three of a kind (three cards of the same rank), and a pair (two
cards of the same rank)? This hand is called a full house.
2.3 How many five-card hands of poker include five cards in the same suit that do not have consecutive ranks? This
hand is called a flush.
2.4 How many five-card hand of poker include five cards of consecutive rank, representing two or more suits. This
hand is called a straight.
2.5 How many five-card hands of poker include only a pair of cards of the same rank?
2.6 In the game of blackjack, each player is initially dealt two cards, from a standard deck of 52 cards. (We will
ignore for the time being that casinos typically use several decks of cards to confound card counters.) If one of
the cards dealt is an Ace, and the other a face card (J; Q; K) or a ten, then the player declares “blackjack” and
wins instantly. How many different combinations of two cards yields blackjack?
2.7 In how many different ways can two rooks be placed on a chessboard so that they can take one another. [1]
2.8 If n tokens are placed randomly into n bin, find the probability that exactly one bin remains empty. [1]
2.9 Three dice are thrown twice. What is the probability that the first throw differs from the second if the dice are
distinguishable (e.g., one die might be red, another black, and the third one white)? Repeat this problem for the
cast that the dice are indistinguishable (so that a 1–3–5 now matches the permutations 1–5–3, 3–5–1, etc.) [1]
2.10 In how many distinguishable ways can n red tokens, and m black tokens be placed into k bins?
2.11 Prove the identity
!
!
!
n
n!1
n!1
D
C
:
k
k!1
k
2.12 (a) Each year Bob orders exactly 10 boxes of cookies from Alice, the girl scout who lives next door. If 8 cookie
varieties are currently available, in how many different ways can Bob place his order?
(b) Each year Maria orders a number of boxes that ranges anywhere from 0 to 10. In how many different ways
can she place her cookie order from Alice?
3
Elementary Probability Theory
The development of the theory of probability is a fascinating historical subject. [4]
3.1
Kolmogorov’s Axioms of Probability
A. N. Kolmogorov (1933) introduced a probability space [3] as a triple .$; A; P /:
1. $ is the set of elementary events, or sample space.
2. Let A $ $. A is called an event. A] then denotes the set of observable events. (Technically, A is a class.
Nevertheless it is usually referred to as a set.) In order to achieve self-consistency, A must also be a so-called
& -algebra of events:
* If A 2 A, then Ac 2 A.
January 13, 2009 (1:57 PM)
8
Robert R. Snapp © 2008
3 ELEMENTARY PROBABILITY THEORY
* If Ai 2 A, for i D 1; 2; : : : (finite or countably infinite), then
[
Ai 2 A:
i
* $ 2 A.
3. P is a probability measure that assigns a non-negative, real number to each element of A, such that
* P .$/ D 1, and
* if A1 ; A2 ; : : : (with Ai 2 A, for i D 1; 2; : : :) denotes a sequence of mutually exclusive events, then
!
[
X
P
Ai D
P .Ai /:
(complete additivity)
i
i
Example 2 (Three consecutive coin tosses) The set of elementary events is finite:
$ D fHHH; HH T; H TH; H T T; THH; TH T; T TH; T T T g
The set of measurable events, A has cardinality jAj D 2j!j D 256: Some elements of A include
* Each elementary event: fHHH g; fHH T g; fH TH g; fH T T g; : : :
* Sequences with two or more heads: fHHH; HH T; H TH; THH g
* Sequences with one or fewer heads: fH T T; TH T; T TH; T T T g
* Sequences with exactly one head fH T T; TH T; T TH g
* Palindromic sequences fHHH; H TH; T T T; TH T g
* Non-palindromic sequences fHH T; H T T; THH; T TH g
* First toss lands heads: fHHH; H TH; HH T; H T T g
* First toss lands tails: fTHH; TH T; T TH; T T T g
* Homogeneous sequences: fHHH; T T T g
* Heterogeneous sequences: fHH T; H TH; H T T; THH; TH T; T TH g
3.2
Some Useful Theorems
For a & -algebra A
1. If Ai 2 A, for i D 1; 2; : : : ; n then
Proof: By De Morgan’s Law,
Tn
iD1
Ai 2 A.
n
\
iD1
Ai D
n
[
iD1
Ai
c
!c
:
By the definition of a & -algebra, the right side of the above is in A. !
2. If A; B 2 A, then A n B 2 A.
Proof: A n B D A \ B c : From the theorem above, A \ B c 2 A. !
Given a probability space .$; A; P /, with A; B; C 2 A,
January 13, 2009 (1:57 PM)
9
Robert R. Snapp © 2008
4 CONDITIONAL PROBABILITY
3. P .Ac / D 1 ! P .A/.
4. P .;/ D 0.
5. P .A/ D 0 » A D ;
6. P .A [ B/ D P .A/ C P .B/ ! P .A \ B/
7. P .A [ B [ C / D P .A/ C P .B/ C P .C / ! P .A \ B/ ! P .A \ C / ! P .B \ C / C P .A \ B \ C /
4
Conditional Probability
Given a probability space .$; A; P /, with A; B 2 A, such that P .B/ > 0, the conditional probability of A with
respect to B is defined as
def P .A \ B/
:
P .AjB/ D
P .B/
Theorems
8. For Ai 2 A, for i D 1; 2; : : : ; n
ˇ n!1 !
!
n
ˇ\
"
! ˇ
\
ˇ
ˇ
P
Ai D P .A1 / P .A2 jA1 / P A3 ˇA1 \ A2 & & & P An ˇ
Ai
ˇ
iD1
iD1
9. P .$jAi / D 1.
10. If Ai 2 A, for i D 1; 2; : : : (finite or countably infinite), are mutually exclusive, Ai \ Aj D ; whenever i ¤ j ,
and if B 2 A with P .B/ > 0, then
ˇ !
X
[ ˇˇ
P
Ai ˇB D
P .Ai jB/:
ˇ
i
4.1
i
Bayes’s Theorem
P .AjB/ D
P .BjA/P .A/
P .B/
Proof: From the definition of conditional probability, or Theorem 6 with n D 2,
P .A \ B/
D P .AjB/ P .B/;
D P .BjA/ P .A/;
(by symmetry).
Equating the two right sides of the above, yields the hypothesis.!
Theorem (Total Probability): Let Ai 2 A, for i D 1; 2; : : : ; n, be mutually exclusive, with
B 2 A. Then,
n
X
P .B/ D
P .BjAi / P .Ai /:
Sn
iD1
Ai D $. Let
iD1
Proof: B D
Sn
iD1
B \ Ai . Note that .B \ Ai / \ .B \ Aj / D ;, whenever i ¤ j . !
Combining the above, yields the more general theorem of Bayes:
P .BjAj / P .Aj /
P .Aj jB/ D Pn
:
iD1 P .BjAi / P .Ai /
January 13, 2009 (1:57 PM)
10
Robert R. Snapp © 2008
5 STATISTICAL INDEPENDENCE
5
Statistical Independence
Two events A; B 2 A are said to be independent if P .A \ B/ D P .A/ P .B/.
Exercise: If A and B are independent, show that
P .Ac \ B/ D P .Ac / P .B/; P .A \ B c / D P .A/ P .B c /; and, P .Ac \ B c / D P .Ac / P .B c /:
More generally, n events Ai 2 A, for i D 1; 2; : : : ; n are said to be mutually independent if all of the following
equations are satisfied, for m D 1; 2; 3; : : : n
P .Ai1 \ Ai2 \ & & & \ Aim / D P .Ai1 / P .Ai2 / & & & P .Aim /;
with, 1 ' i1 < i2 < & & & < im ' n.
Example 3 With n D 3, the events A, B, and C , are independent if and only if
P .A \ B/ D P .A/ P .B/; P .A \ C / D P .A/ P .C /; P .B \ C / D P .B/ P .C /
P .A \ B \ C / D P .A/ P .B/ P .C /:
Example 4 Let A1 ; A2 ; : : : An denote n mutually independent events, and let pi D P .Ai / for i D 1; : : : ; n. Then the
probability that none of the events occurs is
#
$
P .A1 c \ A2 c \ & & & \ An c / D P .A1 [ A2 [ & & & [ An /c
D 1 ! P .A1 [ A2 [ & & & [ An /
D1!
!
n
X
i1 D1
n
X
P .Ai1 / C
n
X
n
X
n
X
pi1 C
n
X
i1 D1 i2 Di1 C1
n
X
i1 D1 i2 Di1 C1 i3 Di2 C1
P .Ai1 \ Ai2 /
P .Ai1 \ Ai2 \ Ai3 /
C & & & C .!1/n P .A1 \ A2 \ : : : An /
D1!
!
i1 D1
n
X
n
X
n
X
n
X
pi1 pi2
i1 D1 i2 Di1 C1
n
X
i1 D1 i2 Di1 C1 i3 Di2 C1
pi1 pi2 pi3 C & & & C .!1/n p1 p2 p3 & & & pn
D .1 ! p1 /.1 ! p2 /.1 ! p3 / & & & .1 ! pn /
D P .A1 c / P .A2 c / P .A3 c / & & & P .An c / :
5.1
Birthday Paradox
Let An denote the event that two or more people, in a group of n share the same birthday (neglecting leap years).
&%
& %
&
%
2
n!1
1
c
1!
&&& 1 !
:
P .An / D 1 1 !
365
365
365
Thus,
&%
& %
&
%
2
n!1
1
P .An / D 1 ! P .An c / D 1 ! 1 1 !
1!
&&& 1 !
:
365
365
365
January 13, 2009 (1:57 PM)
11
Robert R. Snapp © 2008
7 DISCRETE RANDOM VARIABLES
1.0
n
10
20
30
40
50
60
70
80
6
P (An)
0.8
P .An /
0.116948
0.411438
0.706316
0.891232
0.970374
0.994123
0.999160
0.999914
0.6
0.4
0.2
0.0
0
20
40
n
60
80
Bernoulli Trials
Consider the example of a coin being tossed n times.
Assume successive tosses are mutually independent, and let
n
p D P .H / and q D 1 ! p D P .T /. (Note, j$j D 2n , and jAj D 22 .)
Then
!
n i n!i
b.i; n/ D P .i heads in a sequence of n tosses/ D
pq :
i
Note,
P .$/ D
n
X
iD0
!
n
X
n i n!i
b.i; n/ D
D .p C q/n D 1:
pq
i
iD0
Example 5
P .1 or more heads in n tosses/
D 1 ! b.0; n/
D 1 ! .1 ! p/n :
7
Discrete Random Variables
An integer-valued random variable is a function X W $ ! S, where S $ Z. (This definition is readily generalized to
random variables that assume binary values, categorical values, and values from other discrete spaces.)
The discrete random variable X is said to be measurable if
f! 2 $ W X.!/ D i g 2 A; 8i 2 S:
The probability distribution of a discrete random variable X 2 S is defined by
def
Pi D P fX D i g D P f! 2 $ W X.!/ D i g:
January 13, 2009 (1:57 PM)
12
Robert R. Snapp © 2008
8 CONTINUOUS RANDOM VARIABLES
7.1
Common Discrete Distributions
Distribution
Uniform
Parameters
Pi
S
n 2 ZC
1
n
f1; 2; : : : ; ng
n 2 ZC
!
n i n!i
p q
i
f0; 1; : : : ; ng
p; q + 0;
Binomial
pCq D1
p; q + 0;
Geometric
Poisson
Hypergeometric
7.2
' 2 .0; C1/
e !"
k; m; n 2 ZC ;
#m$#
k 'mCn
Z"
qp i
pCq D1
'i
iŠ
n $
k!i
#mCn$
k
i
Z"
f0; 1; : : : ; mg
Distributions of Several Discrete Random Variables
Let Xi W $ ! Si $ Z for i D 1; 2; : : : ; n. These random variables are measurable if 8x1 2 S1 ; : : : ; 8xn 2 Sn ,
f! 2 $ W X1 .!/ D x1 ; : : : ; Xn .!/ D xn g 2 A:
We define the joint discrete probability distribution as
def
P .x1 ; x2 ; : : : ; xn / D P fX1 D x1 ; X2 D x2 ; : : : ; Xn D xn g
D P f! 2 $ W X1 .!/ D x1 ; X2 .!/ D x2 ; : : : ; Xn .!/ D xn g
The random variables X1 ; : : : ; Xn are said to be independent if the probability distribution factors, as
P .x1 ; x2 ; : : : ; xn / D P fX1 D x1 ; X2 D x2 ; : : : ; Xn D xn g
D P fX1 D x1 g P fX2 D x2 g &&& P fXn D xn g
D PX1 .x1 / PX2 .x2 / & & & PXn .xn /
(Often the subscripts are dropped in the last expression, if no ambiguity arises.)
8
Continuous Random Variables
A real-valued random variable is a function X W $ ! S $ R,
A real-valued random variable is measurable if
f! 2 $ W X.!/ < xg 2 A; 8x 2 S:
January 13, 2009 (1:57 PM)
13
Robert R. Snapp © 2008
9 STATISTICAL MOMENTS OF RANDOM VARIABLES
We define the probability distribution of the real-valued, random variable X , as
def
FX .x/ D P fX < xg D P f! 2 $ W X.!/ < xg;
where the subscript of FX is often ommitted if no confusion arises.
Likewise, we define the probability density of X 2 S, as
fX .x/ D
d
FX .x/
dx
at all points x 2 S where the derivative is defined.
8.1
Common Continuous Probability Densities
Distribution
Parameters
fX .x/
S
Rectangular
a; b 2 R
a<b
1
b!a
Œa; b#
Triangular
a>0
%
&
1
jxj
1!
a
a
Œ!a; a#
Normal
(2R
& 2 .0; C1/
!
R
Gamma (or
exponential
if ! D 1)
'; s 2 .0; C1/
! x ""!1 e !x=s
s
s ).'/
Œ0; 1/
Cauchy
s 2 .0; C1/
1
s
2
" x C s2
R
Beta
a; b 2 .0; C1/
).a C b/ a!1
.1 ! x/b!1
x
).a/ ).b/
Œ0; 1#
def
).u/ D
9
.x ! (/2
1
exp !
p
2& 2
& 2"
Z
0
1
t u!1 e !t dt;
).u C 1/ D u).u/;
).n C 1/ D n Š
Statistical Moments of Random Variables
Moment
Continuous?
Z
E.X/ D
x fX .x/ dx
i2S
ZS
X
E.X k / D
i k P fX D i g
E.X k / D
x k fX .x/ dx
S
i2S
"
! "
!
def
Var.X/ D E .X ! E.X//2 D E X 2 ! E.X/2
Discrete
X
E.X/ D
i P fX D i g
Mean
k-th moment
Variance
E.aX C bY / D aE.X / C bE.Y /;
January 13, 2009 (1:57 PM)
14
Var.aX C b/ D a2 Var.X /
Robert R. Snapp © 2008
9 STATISTICAL MOMENTS OF RANDOM VARIABLES
? If
the density fX .x/ is undefined, these integrals can be evaluated as Stieltjes integrals, i.e.,
Z
1
X
def
E .g.X// D
g.x/ dFX .x/ D lim
sup
g.x/ .Fx ..i C 1/ı// ! Fx .i ı//;
ı!0
S
iD!1 iı<x#.iC1/ı
where we assume the same value is obtained if sup is replaced by inf.
Example 6 (Indicator Functions) Let A $ R denote a measurable event for a real-valued random variable X . The
indicator function of event A is defined by
(
1; if x 2 A
IA .x/ D
0; otherwise:
Then,
EfIA .X /g D
Z
R
IA .x/ dFX .x/ D
Z
A
dFX .x/ D P fAg:
Example 7 (Moments of a Binomial Random Variable) Let X , Binomial.n; p/, i.e.,
!
n i n!i
P fX D i g D b.i; n/ D
pq
i
for i D 0; 1; : : : ; n, where q D 1 ! p.
To compute the mean:
E.X / D
D
n
X
iD0
i & P fX D i g D
n
X
iD0
!
!
n
n i n!i
@ X n i n!i
i
Dp
pq
pq
@p
i
i
iD0
@
p .p C q/n D np.p C q/n!1 D np
@p
To compute the variance:
# $
Var.X / D E X 2 ! E.X /2 D E .X.X ! 1// C E.X / ! E.X /2
!
!
n
n
2 X
X
n i n!i
n i n!i
2 @
E.X.X ! 1// D
i.i ! 1/
Dp
pq
pq
@p 2
i
i
iD0
iD0
2
@
.p C q/n D n.n ! 1/p 2 .p C q/n!2 D n.n ! 1/p 2
@p 2
Var.X / D n.n ! 1/p 2 C np ! .np/2 D np.1 ! p/ D npq
D p2
9.1
Generating Functions for Discrete Random Variables
If X is a discrete random variable, defined for example on the integers, we define its generating function g W R ! R
according to the expression
C1
h i
X
s k P fX D kg:
gX .s/ D E s X D
Note that gX .1/ D 1 as
P
kD!1
P fX D kg D 1. Likewise, we see that
X
gX0 .1/ D
kP fX D kg D E.X /
k
k
gX00 .1/
January 13, 2009 (1:57 PM)
D
X
k
' (
k.k ! 1/P fX D kg D E ŒX.X ! 1/# D E X 2 ! E ŒX # :
15
Robert R. Snapp © 2008
9 STATISTICAL MOMENTS OF RANDOM VARIABLES
#
$2
Thus, Var ŒX # D gX00 .1/ C gX0 .1/ ! gX0 .1/ .
Generating functions are especially useful for computing moments of the sum of a random number of random
variables. (This is an example of what is known as a compound distribution.) For example, let S D X1 CX2 C& & &CXN ,
and N 2 Z"
where the Xi 2 Z are mutually independent and identically distributed (i.i.d.) by pk D P fX
Pi D kg,
k
is
chosen according to the distribution qn D P fN D ng. Letting gX .s/ D k pk s and gN .s/ D
Pindependently
1
n
q
s
,
By
definition,
n
nD0
' (
gS .s/ D E s S
(
'
D E s X1 C$$$CXN
h '
ˇ
(i
D E E s X1 C$$$CXN ˇ N D n
D
D
1
X
ˇ
(
'
E s X1 C$$$CXn ˇ N D n qn
nD0
1
X
nD0
(
'
E s X1 C$$$CXn qn
N
X
(
'
(
'
D
E s X1 & & & E s Xn qn
D
nD0
1
X
#
$n
gX .s/ qn
nD0
#
$
D gN gX .s/ :
0
By the chain rule of calculus, EŒS # D gS0 .1/ D gN
.1/gX0 .1/ D EŒN #EŒX #.
9.2
Characteristic Functions for Continuous Random Variables
If X is a random variable, defined for example on the reals, we define its characteristic function, as
Z C1
' itX (
*X .t / D E e
D
e itx dF .x/;
!1
p
where i D !1, and t 2 R. In the event that X is a continuous random variable with probability density fX .x/, the
above reduces to
Z C1
*X .t / D
e itx fX .x/ dx:
def
!1
9.3
Problems
9.1 A coin that lands heads with probability p is tossed repeatedly until it lands tails. Compute the mean and variance
of the number of times the coin is tossed.
9.2 Show that the generating function for a binomially distributed random variable B , Binomial(n, p) is given by
gB .s/ D .1 C .s ! 1/p/n : Use gB to compute the mean and variance of B.
9.3 Show that the generating function for a Poisson random variable, X , Poisson.'/ is given by gX .s/ D e .s!1/" .
Use gX to compute the mean and variance of X .
9.4 A random variable U assumes values from the discrete set f0; 1; 2; : : : ; n ! 1g with equal probability, 1=n. Show
that its generating function is given by
1 ! sn
gU .S / D
:
n.1 ! s/
January 13, 2009 (1:57 PM)
16
Robert R. Snapp © 2008
11 DISTRIBUTIONS OF FUNCTIONS OF RANDOM VARIABLES
Use gU to compute the mean and variance of U .
10
Multiple Random Variables
Sometimes problems will involve more than one random quantity, e.g.,
Xi W $ ! Si $ R;
for i D 1; 2; : : : ; n.
These variables are measurable, if 8x1 2 S1 ; : : : ; 8xn 2 Sn ,
f! 2 $ W X1 .!/ < x1 ; : : : ; Xn .!/ < xn g 2 A:
We define the joint probability distribution as
def
FX1 ;:::;Xn .x1 ; : : : ; xn / D
D
P fX1 < x1 ; : : : ; Xn < xn g
P f! 2 $ W X1 .!/ < x1 ; : : : ; Xn .!/ < xn g:
The joint probability density is defined as
fX1 ;:::;Xn .x1 ; : : : ; xn / D
@n
FX ;:::;Xn .x1 ; : : : ; xn /:
@x1 & & & @xn 1
The random variables X1 ; : : : ; Xn are said to be independent, if
FX1 ;:::;Xn .x1 ; : : : ; xn / D FX1 .x1 / & & & FXn .xn /:
11
Distributions of functions of random variables
Let X denote a continuous random variable with probability distribution FX , and let * W R ! R. Then the probability
distribution of the induced random variable Y D *.X / is obtained as
FY .y/ D P fY ' yg D P fX 2 * !1 ..!1; y#/g
(With slight notational abuse, we let * !1 .S / denote the inverse image if set S . That is x 2 * !1 .S / if and only if
y D *.x/ 2 S .) Consequently, if both FX and * !1 are differentiable, then the probability density of Y is obtained
via,
X
ˇ
ˇ!1
fX .x/ ˇ* 0 .x/ˇ :
fY .y/ D
fxW#.x/Dyg
2
2
As an example, consider Y D X , i.e., *.x/ D x : Then,
# p $
#p $
fX
y
fX ! y
C
:
fY .y/ D
p
p
2 y
2 y
To generalize the above to the multivariate case, let X1 ; X2 ; : : : ; Xn denote n random variables governed by the
joint probability distribution,
FX1 ;:::;Xn .x1 ; : : : ; xn /;
and let ! W Rn ! Rn . Consequently, let Yi D *i .X1 ; : : : ; Xn /, for i D 1; : : : ; n denote the n components of ! applied
to X1 ; : : : ; Xn . With the assumption that fX1 ;:::;Xn exists, and that ! is differentiable, then
fY1 ;:::;Yn .y/ D
January 13, 2009 (1:57 PM)
X
fxWyD!.x/g
17
fX1 ;:::;Xn .x/
jJ.x/j
Robert R. Snapp © 2008
11 DISTRIBUTIONS OF FUNCTIONS OF RANDOM VARIABLES
where,
2 @#
1
@x1
6 :
J.x/ D det 6
4 ::
@#n
@x1
@#1
@xn
&&&
::
:
&&&
:: 7
7
: 5
@#n
@xn
is the Jacobian determinant of the variable transformation.
11.1
3
Distributions of sums and quotients
Assume that X and Y are continuous random variables defined by the joint probability density f .x; y/, i.e.
Z x Z y
FX;Y .x; y/ D P fX ' x; Y ' yg D
f .x 0 ; y 0 / dx 0 dy 0
!1
!1
The distribution and density of S D X C Y is obtained by computing the
probability of the event fS ' sg:
Z C1
Z s!x
dx
f .x; y/ dy:
FS .z/ D P fS ' sg D P fX C Y ' sg D
Y
!1
x
C
X
Whence,
y
D
fS .s/ D
s
Figure 4: The shaded region depicts
the event X C Y ' s.
D
Z
C1
!1
f .x; s ! x/ dx:
In the event that X and Y are also independent, f .x; y/ D fX .x/fY .y/, and
Z C1
fS .s/ D
fX .x/fY .s ! x/ dx D .fX - fY /.s/:
(1)
!1
the convolution of the two densities. Similarly, the distribution and density of
the quotient Q D Y =X is found by evaluating the probability of the event
Y
fQ ' qg D f.x; y/ 2 R W y=x < qg
D f.x; y/ 2 R W y > qx; x < 0g [ f.x; y/ 2 R W y < qx; x > 0g:
x
yDq
X
Thus,
FQ .q/ D
Z
0
dx
!1
Figure 5: The shaded region depicts
Whence,
the event Y =X ' q, for q > 0.
Z 0
Z
fQ .q/ D FQ0 .q/ D !
xf .x; qx/ dx C
!1
11.2
FS0 .s/
!1
0
C1
Z
C1
qx
f .x; y/ dy C
xf .x; qx/ dx D
Z
Z
C1
0
C1
!1
dx
Z
qx
f .x; y/ dy:
!1
jxjf .x; qx/ dx:
Problems
11.1 Derive the probability densities of Y D e X , where X is a given continuous random variable with probability
density fX defined over the reals.
11.2 Let X and Y be continuous random variables with joint probability density f .x; y/. Derive an expression for
the probability distributions and densities of densities of (a) aX C bY , (b) X ! Y , (c) X Y and (d) X=Y .
11.3 Let X , N .(x ; &x2 / and Y , N .(y ; &y2 /, be independent, normal random variables. Use formula 1 to derive
the probability density of the sum X C Y .
January 13, 2009 (1:57 PM)
18
Robert R. Snapp © 2008
13 CHEBYSHEV’S INEQUALITY
11.4 Let X; Y 2 Œ0; C1/ be independent, exponentially distributed random variables, with fX .x/ D ˛e !˛x , fY .y/ D
ˇe !ˇy , with ˛; ˇ 2 .0; C1/. Compute the probability density of X C Y .
11.5 Let Z D Y X , where X and Y are independent continuous random variables that are uniformly distributed on
the unit interval .0; 1/. Show that EfZ k g D log.k C 1/=k, for k D 1; 2; : : :.
12
Convergence of a random sequence
Since the precise values of random variables are usually unpredictable, it is somewhat surprising that one can say
anything substantial about the limit of a sequence of random variables of the form X1 ; X2 ; X3 ; : : :. As one might
expect, the degree of certainty obtained for deterministic sequences such as xk D .2 C k/=.3 C 2k/, cannot in
general be realized for a random sequence. However, four common categories of convergence are encountered in a
probabilistic framework, and are here defined from strongest to weakest.
1. The sequence Xn is said to converge with probability one to X if
n
o
P lim Xn D X D 1:
n!1
This type is also known as almost sure (a.s.) convergence, which we often write as,
lim Xn D X
n!1
(a.s.):
2. The sequence Xn is said to converge in probability to X (or Xn ! X , i.p.) if for every + > 0,
lim P fjXn ! X j > +g D 0:
n!1
3. The sequence Xn is said to converge in quadratic mean to X (or Xn ! X , in q.m.) if
i
h
lim E .Xn ! X /2 D 0:
n!0
4. For n D 1; 2; : : :, let Fn .x/ D P fXn ' xg denote the probability distribution of Xn , and let F .x/ D P fX ' xg
denote that for X . The sequence Xn is said to converge in distribution to X if
lim Fn .x/ D F .x/
n!1
for all x where F is continuous.
13
Chebyshev’s Inequality
Let X W $ ! R denote a random variable with mean ( and variance & 2 . Then for any + > 0,
P fjX ! (j > +g <
January 13, 2009 (1:57 PM)
19
&2
:
+2
Robert R. Snapp © 2008
15 STRONG LAW OF LARGE NUMBERS
Proof: Let f denote the probability density of X .
& 2 D VarfX g D Ef.X ! (/2 g
D
D
>
Z
R
.x ! (/2 f .x/ dx
Z
Z
>+
2
jx!$j>%
jx!$j>%
2
Z
.x ! (/ f .x/ dx C
Z
jx!$j#%
.x ! (/2 f .x/ dx
.x ! (/2 f .x/ dx
jx!$j>%
f .x/ dx D + 2 P fjX ! (j > +g:
!
Chebyshev’s Inequality,
&2
:
+2
is a rather “weak” inequality. Note, that + ' & yields a trivial upper bound. Thus, Chebyshev’s inequality is most
useful when + . & .
P fjX ! (j > +g <
For example, it can be used to prove the weak law of large numbers, which states that the average value of a
sequence of independent observations tends to the statistical mean.
14
Weak law of large numbers
o
n
Let X1 ; X2 ; : : : ; Xn denote a sequence of i.i.d. random variables, with ( D E fXi g and & 2 D E .Xi ! (/2 , for
i D 1; 2; : : : . Let,
1
Mn D .X1 C & & & C Xn /
n
denote the sample mean. Observe,
)
*
n
1
1X
.X1 C & & & C Xn / D
EfMn g D E
E.Xi / D (;
n
n
iD1
)
*
n
1
1 X
&2
.X1 C & & & C Xn / D 2
Var.Xi / D
VarfMn g D Var
n
n
n
iD1
By Chebyshev’s inequality, for all + > 0,
P fjMn ! (j > +g D
Thus, Mn D
15
1
n
Pn
iD1
&2
:
n+ 2
Xi ! ( in probability as n ! 1.
Strong Law of Large Numbers
With probability one, Mn converges to the mean:
o
n
P lim Mn D ( D 1:
n!1
January 13, 2009 (1:57 PM)
20
Robert R. Snapp © 2008
REFERENCES
16
Borel-Cantelli Lemmas
Lemma 1 (Borel-Cantelli) Let fAn g denote a sequence of measurable events. One of the following conditions is
valid:
"
!
P
1. If 1
nD1 P fAn g < 1, then P lim supn An D 0.
2. If
P1
nD1
"
!
P fAn g D 1, then P lim supn An D 1.
References
[1] William Feller. An Introduction to Probability Theory and Its Applications, volume I. John Wiley & Sons, New
York, third edition, 1968.
[2] Paul R. Halmos. Naive Set Theory. Springer-Verlag, New York, 1974.
[3] A. N. Kolmogorov. Foundations of Probability. Chelsea, New York, second edition, 1956.
[4] I. Todhunter. A History of the Mathematical Theory of Probability. Chelsea, 1965.
January 13, 2009 (1:57 PM)
21
Robert R. Snapp © 2008