Download Lecture 1-2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
MTH 202 : Probability and Statistics
Lecture 1 & 2
6, 7 January, 2013
Probability
1.1 : Introduction
In probability theory we would be dealing with random experiments
and analyze their outcomes. The word ”random” means that :
1. the particular outcome of an experiment would be unknown,
2. all possible outcomes of the experiment would be known in advance,
3. the experiments can be repeated under identical conditions.
Examples 1.1.1 :
A. Car-Goat Problem : This was a famous problem originally posed
in 1975 in the American Statistician by Steve Selvin (Also known
as Monty Hall Problem based on the American television game show
”Let’s Make a Deal” and named after its original host, Monty Hall).
Figure 1. Monty Hall Problem
The problems is stated as follows : Suppose you are in a game show
and are asked to choose one of the three given door; behind one of the
doors has a brand new shiny car of your dream. The other two hides a
goat each. You pick a door, say no. 1, and the host, who knows what’s
1
2
behind the doors, opens another door, say no. 3, which has a goat.
He then says to you, ”Do you want to pick door no. 2?” Is it to your
advantage to switch your choice?
Let us model the problem. Each outcome of the experiment here can
be described by a quadruple (x, y, z, w), where x represents the number
on the door you would choose, y represents the number on the door
the host would open, z represents the number of the door you would
switch to, and w represents one of W or L, depending upon whether
you win or lose. Assuming that the door No. 1 hide the car, the sample
space Ω would look like :
Ω = {(1, 2, 3, L), (1, 3, 2, L), (2, 3, 1, W ), (3, 2, 1, W )}
The probability of choosing door no. 1 is 1/3. It may be expressed as
the event A = {(1, 2, 3, L), (1, 3, 2, L)} represents choosing door no. 1,
and P A = 1/3.
But if you choose door no. 1 you are going to lose. However, if you
are choosing any one of the doors numbered 2 or 3, you are going to
win; each of these have probability 1/3, totaling gives the probability
of winning is equal to 2/3.
Now consider the other scenario where you are sticked to your choice.
Assuming again that door no. 1 hide the car, the sample space Ω would
look like :
Ω = {(1, 2, 1, W ), (1, 3, 1, W ), (2, 3, 2, L), (3, 2, 3, L)}
In this case, considering A being the event of choosing door no. 1, we
have P A = 1/3 which is the probability of winning which is half of
what was if you decide to switch.
B. Tossing coin(s) : Let’s toss a coin and the sample space would be
Ω = {H, T }. There are 50% chance either winning or losing the toss.
Hence roughly you would assign P (winning) = 1/2 and P (losing) =
1/2.
Now suppose you wish to toss a coin three times hoping that there
would be head at least twice. Let us model the space Ω first.
Ω = {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ),
(T, T, H), (T, T, T )}
Out of the eight possibilities, there are four outcomes while it has two
heads. So roughly you would say there are 50% chance of getting two
heads. You can mathematically say P (two heads) = 1/2. Similarly,
3
Figure 2. Tossing a coin
you can find out that P (two tails) = 1/2 and these two add up to 1
(Why?).
C. Waiting for the DBCity Bus : (Continuous sample space)
Suppose Brahadeesh wishes to catch one of the buses which goes to
DBCity Mall which starts at 2 : 15 P.M. and 3 : 15 P.M. from the stop
near students hostel. He however, decide to arrive at the bus stop at
a random moment between 2 P.M. to 3 : 30 P.M. Now he intend to
calculate the probability that he would have to wait for more than ten
minutes for the bus.
The sample space here would be an interval rather than a finite or a
discrete set. Assuming the time starts at 2 P.M. and counting every
minute as an unit, the sample space Ω would be the interval Ω = [0, 90].
He would have to wait for the bus for more than ten minutes if he
arrives at the stop at anytime : (a) between 2 and 2 : 05 P.M., (b)
between 2 : 15 and 3 : 05 P.M., (c) between 3 : 15 and 3 : 30 P.M. (In
this case considering that the probability of Brahadeesh arriving at the
bus stop exactly at 2 : 15 P.M. or 3 : 15 P.M. is zero). Now all the three
events A, B, C as in (a), (b), (c) are ”disjoint” events and probability of
happening them would be intuitively calculated as : P A = 5/90, P B =
4
Figure 3. Waiting for the DBCity Bus
50/90, P C = 15/90. Thus the probability of Brahadeesh waiting for
the bus would be P (A ∪ B ∪ C) = P A + P B + P C = 7/9.
Finally in this case, note that the events as discussed, are certain closed
sub-intervals of [0, 90]. In aR bit more mathematical language calculating
65
P B can be thought of as ( 15 dt)/90 = 5/9 (modeling B by the interval
[15, 65]). There are certain subsets of the closed interval [0, 90] on which
integration cannot be done. Upon this realization we would be needing
to consider certain specific subsets of [0, 90] which can be modeled and
which would satisfy certain properties. These family of subsets are
known as σ-field.
1.2 : Axiomatic Definitions
Definition 1.2.1 : Let X be a non-empty set. A non-empty collection
F of subsets of X is called a field (or an algebra) on X if :
(i) E1 , E2 ∈ F, then E1 ∪ E2 ∈ F,
(ii) E ∈ F, then E c ∈ F.
Note that a field on X would always contain ∅ and X. Moreover,
{∅, X} forms a field, which is known as the smallest possible field.
In examples 1.1.1(A) and (B), note that the collection P(Ω) all subsets
of Ω (What are these ?) form a field.
5
Definition 1.2.2 : Let X be a non-empty set. A non-empty collection
F of subsets of X is called a σ-field (or a σ-algebra) if :
(i) for countably many elements E1 , E2 , . . . , En , . . . from F the union
∪∞
j=1 Ej is also an element of F,
(ii) E ∈ F, then E c ∈ F.
Note that a σ-field is also a field. As before the collection {∅, X} forms
the smallest σ-field.
Exercise 1.2.3 : Let F be a field of subsets of a non-empty set X.
(i) If E1 , E2 , . . . , Ek be finitely many subsets of F. Show that the union
∪kj=1 Ej and the intersection ∩kj=1 Ej belong to F.
(ii) If E, F ∈ F, then show that E \ F, E∆F ∈ F, where E \ F is the
difference set and E∆F is the symmetric difference (E \ F ) ∪ (F \ E).
Exercise 1.2.4 : Show that a field on a set X is also a σ-field if X is
finite. (Hence the notion of a σ-field and a field has no difference on
finite sets).
While we are engaged in a random experiment and trying to find out
some probability of a particular event of our own interest, it is often
required to model all possible events. We will call them a sample space.
Defintion 1.2.5 : A sample space of a random experiment is a pair
(Ω, S), where :
(i) Ω is the set of all possible outcomes of the experiment,
(ii) S is a σ-algebra on Ω (i.e., S ⊆ P(Ω)).
If (Ω, S) is a sample space, the members of S are the events that can
be well understood after a suitable assignment of the probability on
them. We now turn to define probability on a sample space :
Definition 1.2.6 : Let (Ω, S) be a sample space. A function
P : S → [0, ∞) is called a probability measure (or simply ”probability”) if :
(i) P (A) ≥ 0 for all A ∈ S,
(ii) P (Ω) = 1,
(iii) for a sequence {Aj }∞
j=1 of sets from S which are mutually disjoint
with each other (i.e. Aj ∩ Ak = ∅ if j 6= k) we have
P (∪∞
j=1 Aj )
=
∞
X
j=1
P (Aj )
6
P (A) is often referred as the probability of the event A. If there is
no confusion, we would simply write P A instead of P (A). The triple
(Ω, S, P ) is called a probability space. Let us consider defining probability on the examples considered.
Examples 1.2.7 :
A. Car-goat problem : Let (Ω, S) be the sample space where
Ω = {(1, 2, 3, L), (1, 3, 2, L), (2, 3, 1, W ), (3, 2, 1, W )} and S = P(Ω).
Define the function P on S by P ((2, 3, 1, W )) = 1/3, P ((3, 2, 1, W )) =
1/3, P ((1, 2, 3, L)) = 1/6, P ((1, 3, 2, L)) = 1/6. Verify that this defines
a probability on the sample space.
B. Tossing coins : (Ω, S) be the sample space where Ω is as defined
and S = P(Ω). Define the function P on S by P ((x, y, z)) = 1/8 for
every {(x, y, z)} ∈ P(Ω). Verify that this defines a probability on the
sample space.
C. Waiting for the DBCity bus : In this case Ω = [0, 90]. The
sample space is usually known as space of all Borel subsets of the
interval [0, 90]. To give the most appropriate probability measure it
would require the knowledge of how would these sets look like in general. However, if A is the interval [a, b] where 0 ≤ a < b ≤ 90, we can
simply define P A := b − a.
Let us now prove some properties of probability as defined :
Properties 1.2.8 : Let (Ω, S, P ) be a probability space. Then :
(1) P ∅ = 0
Proof Let E1 := Ω and En := ∅ for
n ≥ 2. Using the property
Pall
∞
(iii) we have that P (Ω) = P (Ω) + n=2 P (En ). If P (∅) > 0, then
the right side of the sum is infinite, since the positive number P (∅) is
simply added infinitely many times and hence the sum is larger than
any positive number (Why ?). Hence P (∅) = 0.
(2) P (A1 ∪ A2 ∪ . . . ∪ Ak ) = P A1 + P A2 + . . . + P Ak if Ai ∈ S and
Ai ∩ Aj = ∅ if i 6= j (finite additivity)
Proof : Consider the sequence {En }∞
n=1 of sets defined by E1 :=
A1 , E2 := A2 , . . . , Ek = Ak and En := ∅ if n ≥ k + 1. We can now
apply property (iii) which gives : P (A1 ∪ A2 ∪ . . . ∪ Ak ) = P (A1 ) +
P (A2 ) + . . . + P (Ak ) + 0.
7
(3) P A = 1 − P Ac for A ∈ S
Proof : Since A and Ac are disjoint and their union is Ω we have by
(2) P A + P Ac = P (Ω) = 1 (Remember : Ac ∈ S since A ∈ S).
(4) If E ⊆ F for two sets E, F ∈ S, then P E ≤ P F (monotonicity)
Proof : Since E ⊆ F we have two disjoint sets E and F \E whose union
is F . Hence by (2) we have 0 ≤ P (E) ≤ P (E) + P (F \ E) = P (F ).
(5) P (A ∪ B) = P A + P B − P (A ∩ B) for A, B ∈ S
Proof : The set A∪B can be broken into three disjoint sets A\B, A∩B
and B \ A. Hence P (A ∪ B) = P (A \ B) + P (A ∩ B) + P (B \ A). But
P (A) = P (A \ B) + P (A ∩ B) and P (B) = P (B \ A) + P (A ∩ B).
Replacing P (A \ B) and P (B \ A) in the equation we have the result.
Example 1.2.9 : Let two dice are rolled. Find out the probability of
the event of getting a number which is at most 5 or at least 9.
Solution : The set Ω of all possible outcomes is :
Ω = {{i, j} : 1 ≤ i, j ≤ 6}
Since the order of i and j are not important, the size of Ω is 21. Now
if we denote the event of getting a pair totaled up to 5 by A and of
getting a pair totaled at least 9 by B, we would need to calculate P (A∪
B). Now A = {{1, 1}, {1, 2}, {1, 3}, {1, 4}, {2, 2}, {2, 3}} and B =
{{3, 6}, {4, 5}, {4, 6}, {5, 5}, {5, 6}, {6, 6}}. Hence P A = 6/21, P B =
6/21. Since A ∩ B = ∅, we have P (A ∪ B) = P A + P B = 12/21 = 4/7.
Example 1.2.10 : A box contains 1000 light bulb. The probability that there is at least 1 defective bulb in the box is 0.1, and the
probability that there are at least 2 defective bulbs is 0.05. Find the
probability of each of the following cases :
(a) the box contains no defective bulbs.
(b) the box contains exactly 1 defective bulb.
(c) the box contains at most 1 defective bulb.
Solution : Let us the denote the following events : A denote the event
of getting at least one defective bulb and B denote the event of getting
at least two defective bulbs. It is given that P A = 0.1 and P B = 0.05.
(a) Let C denote the event of getting no defective bulb. Then C = Ac
and P C = 1 − P A = 0.9.
(c) If D = {at most one defective bulb}, then D = B c and P D =
1 − P B = 0.95.
(b) Now the event E = {exactly one defective bulb} = A∩D. Now A∪
D represents the event of getting at least one or at most one defective
8
bulb. Since these exhausts all possible events, we have P (A ∪ D) = 1
and P (A ∩ D) = P A + P D − P (A ∪ D) = 0.1 + 0.95 − 1 = 0.05.
Example 1.2.11 : An absent minded secretary who places n letters
in envelopes at random. Determine the probability that he or she will
misplace every letter.
Let Ω be the set of all permutation of n letters are placed at n envelopes
(the envelopes are are marked by the digits 1, 2, . . . , n). Here we are
assuming that the i-th letter is supposed to go to i-th envelope. Let Ai
be all such possibility that the i-th letter is going to the i-th envelope.
We note that :
P Ai = (n − 1)!/n!, P (Ai ∩ Aj ) = (n − 2)!/n! (i 6= j),
P (Ai ∩ Aj ∩ Ak ) = (n − 3)!/n! (i 6= j 6= k), . . .
Finally, we apply the following formula :
Theorem 1.2.12 : (Principle of Inclusion-Exclusion)
Let A1 , A2 , . . . , An ∈ S. Then :
n
X
X
P (∪nk=1 Ak ) =
P Ak −
P (Ak1 ∩ Ak2 )
k=1
+
X
k1 <k2
P (Ak1 ∩ Ak2 ∩ Ak3 ) + . . . + (−1)n+1 P (∩nk=1 Ak )
k1 <k2 <k3
to get :
1
1
n+1 1
= 1 − + − . . . + (−1)
2! 3!
n!
It is interesting to see that as the number of letters increase, there is
an upper bound of the probability that all letters are misplaced, which
is e−1 .
P (∪nk=1 Ak )