Download 1 Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
1
Probability
1.1 Probabilities of outcomes and events.
1.1.1 Probabilities of outcomes.
This chapter is an introduction to some of the basic concepts of probability. Let's begin by considering
what we mean by probability.
Example 1. Suppose we roll a six sided die, and we say the following.
The probability of getting a 1 is 1/6,
The probability of getting a 2 is 1/6,
....
The probability of getting a 6 is 1/6.
What do we mean by this? One interpretation is based on frequency of occurrence. If we roll the die a
large number of times, then we expect that the proportion of times it comes up 1 will be approximately 1/6,
and similarly for 2, 3, ..., 6. For example, if we roll the die 600 times, then we expect to get approximately
100 1's. To make this a little more precise, we might say
# times a 1 comes up
 Pr{ 1 }
# rolls
as # rolls  ,
and similarly for 2, 3, ..., 6. Here Pr{ 1 } denotes the probability that a 1 will come up on an individual roll.
Example 2. Each day a newsstand buys and sells The Wall Street Journal. Based on records for the past
month they feel that they would never sell more than 4 copies in any day. Furthermore they feel that
The probability, Pr{0}, of selling zero copies in a given day = p0 = 0.2,
The probability, Pr{1}, of selling one copy in a given day = p1 = 0.25,
(1)
Pr{2} = p2 = 0.35,
Pr{3} = p3 = 0.15,
Pr{4} = p4 = 0.05,
Here we are assuming that the newsstand has enough copies of The Wall Street Journal in stock to satisfy
the demand of all the customers who want to buy it in a given day. Using a frequency interpretation of
probability similar to Example 1, if we look at the number of copies of The Wall Street Journal sold each
day for a number of consecutive days, then the proportion of times none are sold is approximately 0.2, and
similarly for 1, 2, 3, 4. More precisely
1.1 - 1
(2)
# days none are sold
 Pr{ 0 }
# days
as # days  ,
and similarly for 1, 2, 3, 4. Often it is convenient to group the probabilities of the outcomes together into a
0.2
 pp01   0.25

vector p =  p2  =  0.35 .
 p3   0.15 
 p4   0.05 
Often it is informative to make a graph of the probability distribution. Here is a graph of the probability
distribution in Example 2.
0.35
0.3
0.25
0.2
0.15
0.1
0.05
1
2
3
4
Since the relation (2) involves taking the limit as # days  , it looks like we may not be able to determine
Pr{0}, ..., Pr{4} exactly. We can estimate these values by the ratios
# days j are sold
,
# days
j = 0, ..., 4,
using a large number of days. In fact, the values 0.2, 0.25, ..., 0.05 used for Pr{0}, ..., Pr{4} in (1) were
probably obtained in this fashion. This is typical of real world probability models where we must use
estimated values for the presumed underlying probabilities.
Example 3. A company manufactures diodes. Based on constant testing they feel that that the probability
that a diode is defective is 0.3%, i.e.
The probability that a diode is defective = 0.003,
The probability that a diode is good = 0.997.
If we look at a large number of diodes, the proportion of defectives is approximately 0.003 and the
proportion of good diodes is approximately 0.997, i.e.
# defective
 Pr{d}
# diodes
as # diodes  ,
# good
 Pr{g}
# diodes
as # diodes  .
1.1 - 2
where we are abbreviating defective and good by d and g.
Often it is convenient to number the outcomes by consecutive integers starting with 0 or 1. For example,
with the diodes, we might represent defective by 0 and good by 1. In this case we can use the following
vector to represent the probability distribution. p =
p
0.003
( Pr{d}
Pr{g} ) = ( p ) = ( 0.997 )
0
1
Example 4. An office copier is either in good condition (represented by g or 1), poor condition (p or 2) or
broken (b or 3). Based on past experience, you feel that when you arrive at the office tomorrow
The probability that the copier is in good condition
= 0.5,
The probability that the copier is in poor condition
= 0.1,
The probability that the copier is broken
= 0.4,
If we look at a large number of days, the proportion of days the copier is in good condition is approximately
0.5, the proportion of days the copier is in poor condition is approximately 0.1 and the proportion of days
the copier is broken is approximately 0.4, i.e.
# good
 Pr{g} = p1 = 0.5
# days
as # days  ,
# poor
 Pr{p} = p2 = 0.1
# days
as # days  ,
# broken
 Pr{b} = p3 = 0.4
# days
as # days  ,
 Pr{g}   p1   0.5 
 Pr{b}   p3   0.4 
In this case the probability vector is p =  Pr{p}  =  p2  =  0.1 
Summary and Terminology: We observe something. This is called an experiment in many books. For
example, rolling a die and observing the face that comes up is an experiment. When we do an experiment
there are certain possible things we may observe. These are called outcomes. For example, when we roll a
die, the outcomes are 1, 2, 3, 4, 5, and 6. The set of all possible outcomes is called the sample space. Often
we will denote the sample space by S. ( is another symbol used for the sample space in many texts.) For
example, when we roll a die, S = {1, 2, 3, 4, 5, 6}. For each outcome a in S one has the probability of a
occurring Pr{a}. Intuitively, it is the proportion of times the outcome occurs if we repeat the same
experiment independently many times, i.e.
(3)
Pr{ a } = lim
n
# times a occurs in n repetitions
n
Remarks: Even though this description of probability has a nice intuitive appeal, it has some problems if
one tries to use it as a mathematical definition. One reason for this is because the notion of repeating the
same experiment independently a number of times has not been defined. If we flip a coin or roll a die
repeatedly, then this is pretty close to independent repetitions of the same experiment. However if we
1.1 - 3
observe the number of newspapers sold each day for a number of days, then this is probably not
independent repetitions of the same experiment. Despite this problem, let us proceed on using the above
concept of probability as a guide to our thoughts.
In the two examples above, there were only a finite number of outcomes, i.e. S = {a1, a2, ..., an}. In this
case formula (3) would imply
n

Pr{ai} ≥ 0 for each i and
(4)
Pr{ai} = 1
i=1
If we are going to use estimated or assumed values for the actual probabilities, then we should make sure
that they satisfy (4). For example, this is true in Examples 1 and 2. The set of values Pr{a1}, ..., Pr{an} is
called the probability distribution for the outcomes of the experiment. We can represent a probability
 p1 
distribution by the row vector p = (p1,…, pn) or column vector p =   where pi = Pr{ai}. Later we shall
 pn 
see some situations where it might be advantageous to use a row vector instead of a column vector or vice
versa. For example, in Example 2 one has p = (0.2, 0.25, 0.35, 0.15, 0.05). The components of such a vector p
should be nonnegative numbers that sum to one. A vector with this property is called a probability vector.
1.1.2 Probabilities of events.
Often we are interested in whether the outcome of an experiment belongs to a certain set. Consider the
situation in Example 2. Suppose the newsstand decides to stock 2 copies of The Wall Street Journal on a
certain day. What is the probability that they will have enough copies to satisfy all the customers that want
to buy one? We are asking for the probability that the number of copies customers want to buy on a given
day is in the set E = {0, 1, 2}.
Suppose, as in this example, we fix a set E of outcomes. If the experiment results in an outcome in the set
E, then we say the event E has occurred. Thus, an event is just another word for a set of outcomes. For
example, having no more than 2 customers wanting to buy the paper on a given day is the event
E = {0, 1, 2}.
We are interested in probabilities of events as well as probabilities of individual outcomes. Using the
intuitive description of probability above, we would say that the probability of a certain event E is the
proportion of times the outcome lies in the set E if we repeat the same experiment independently many
times, i. e.
(5)
lim # times outcome is in E in n independent repetitions
Pr{ E } = n 
n
If E = {b1, b2, ..., bm} is a finite set then it would follow from (3) and (5) that the probability of the event E
is the sum of the probabilities of the outcomes in E, i.e.
1.1 - 4
m
(6)
Pr{E} =

Pr{bi}
i=1
For example, the probability that the number of customers wanting The Wall Street Journal on a given day
is no more than 2 would be
Pr{0} + Pr{1} + Pr{2} = 0.2 + 0.25 + 0.35 = 0.8.
Finite additivity. Formula (6) is actually a special case of the following more general formulas. Suppose
A and B are disjoint events, i.e. there is no outcome that is in both A and B. Let AB be the union of A and
B, i.e. the set of outcomes that are either in A or B. Then
(7)
Pr{AB} = Pr{A} + Pr{B}
Example 5. In a certain company 10% of the employees work in the accounting department and 30% of the
employees work in the sales department. Suppose no employee works in both department. What
percentage of the employees work in both department?
Answer: 10% + 30% = 40%.
More generally, suppose that A and B are not necessarily disjoint, i.e. they may have elements in common.
Then
(8)
Pr{AB} = Pr{A} + Pr{B} - Pr{AB}
For a suggestion on how to prove this, see Problem 2 below.
Example 6. In a certain company 10% of the people are left handed, 30% wear glasses and 5% of the
people are both left handed and wear glass. What percentage of the people are either left handed or wear
glasses?
Answer: 10% + 30% - 5% = 35%.
Another generalization of (7) is the following. Suppose E1, E2, ..., Em are disjoint events, i.e. no two of the
sets Ei and Ej have any elements in common. Furthermore, let E denote the union of the sets E1, ..., Em, i.e.
the set of all outcomes which are in one of the sets E1, ..., Em. Then
m
(9)
Pr{E} =

Pr{Ei}
i=1
This is called the finite additivity property of probability.
There are three other properties of probability of events which would follow from our informal definition.
(10)
0  Pr{E}  1,
for any event E,
1.1 - 5
(11)
Pr{S} = 1,
(12)
Pr{  } = 0
where S is the entire sample space and  is the empty set. Formula (10) is called the non-negativity
property of probability and formula (11) is sometimes called the normalization axiom.
Problem 1. Office Max keeps a certain number of staplers on hand. If they sell out on a certain day,
they order 6 more from the distributor and these are delivered in time for the start of the next day. Thus
the inventory at the start of a day can be 1, 2, 3, 4, 5, or 6. Based on records for the past two months
they feel that the probability, Pr{1}, of there being 1 stapler at the start of a day = 0.09, Pr{2} = 0.21,
Pr{3} = 0.29, Pr{4} = 0.23, Pr{5} = 0.12 and Pr{6} = 0.06.
(a) What is the sample space S?
Ans: S = {1, 2, 3, 4, 5, 6}
(b) Does this model satisfy condition (4)?
Ans: Yes.
(c) What is the probability that there is at least 3 staplers at the start of the day? What event are we
talking about?
Ans: The event is E = {3, 4, 5, 6} and Pr{E} = 0.7.
Problem 2. Prove (8). Hint: Let A \ B be the difference of A and B, i.e. the set of outcomes that are in
A but not B. Note that A \ B and B are disjoint and their union is AB. Apply (7) to this relation. Then
apply (7) with A \ B in place of A and AB in place of B and combine with the previous formula.
Problem 3. If E is an event, then the complement of E, denoted by Ec, is the set of outcomes in the
sample space S that are not in E, in symbols Ec = {a  S: a  E}. Use finite additivity and (11) to
show that
Pr{Ec} = 1 - Pr{E}
1.1.3 Countably infinite sample spaces.
We often encounter situations where there are an infinite number of outcomes. Infinite sets can either be
countable or uncountable. A set is called countable if its elements can be arranged in a sequence.
Otherwise, it is called uncountable. For example, the set of integers is countable since we can arrange them
in a sequence. One way to do this is 0, 1, -1, 2, -2, 3, ..., n, -n, ... . On the other hand if one considers the
set of all real numbers x in some interval a  x  b, then it can be shown that this set is uncountable.
Let us consider the case where the number of outcomes is countably infinite. so S = {a1, a2, ... }.
Example 7. Suppose we flip a coin until a head appears and we count the number of tosses until a head
appears. In this case the possible outcomes are the positive integers. We are interested in the probability,
Pr{n}, that the number of tosses until a head appears is a given integer n. Suppose
Pr{ n } = (1/2)n
for n = 1, 2, . …
1.1 - 6
0.5
0.4
0.3
0.2
0.1
2
4
6
8
10
Later we shall give an argument as to why this is a reasonable assumption. For the moment, we just assume
it is true. For example, the probability that it takes 3 tosses to get a head would be (1/2) 3 = 1/8. Now
suppose we ask for the probability that the number of tosses until we get a head is an even number. This is
the event A = {2, 4, 6, ... }. It is not unreasonable to assume that the finite additivity formulas (6) and (9)
above extend to infinite sums, i.e. if E = {b1, b2, ... } is a countably infinite set then

(12)
Pr{E} =

Pr{bi}
i=1
and if E1, E2, ... is an infinite sequence of disjoint sets and E is their union, then

(13)
Pr{E} =

Pr{Ei}
i=1
Formula (13) is called the countable additivity property of probability. Formulas (12) and (13) are
assumptions; they do not follow from (3) and (5). Using (12), it would follow that the probability of
A = {2, 4, 6, ... } is

Pr{A} =

i=1

(1/2)2i =


(1/4)i = (1/4)
i=1

i=0
(1/4)i = (1/4)
1
1
1 4
=
Here we have used the formula

(14)
xi
i=0
=
1
1-x
which holds for | x | < 1.
One implication of the assumption (12) is that the analogue of (4) should hold if the sample space
S = {a1, a2, ... } is countably infinite, i.e.
1.1 - 7
1
3


(15)
Pr{ai} = 1
i=1
Just as in the finite case, we will require that this hold for any assumed or estimated values for the Pr{ai}.
For example, using formula (14) with x = 1/2, one can see that this holds for the values in Example 5. As in
the case where S is finite, the set of values Pr{ai} is called the probability distribution for the outcomes of
the experiment.
Problem 4. You often go to the library to use the copier. Let N be the number of people already in
line to use the copier including the one using it. For example, if N = 2, then there is one person using
the copier and one person waiting to use it. Suppose after some investigation you arrive at the
following model for the number at the machine, namely the probability, Pr{n}, that there is n people at
the machine is Pr{n} = 2 (1/3)n+1
for n = 0, 1, 2, …
(a)
Show that this model satisfies the condition (13).
(b)
Suppose that whenever you go to use the copier and see 3 or more people at machine, you
decide the wait will be too long and come back later. Find the probability that this will
happen, i.e. find the probability of the event {3, 4, ... }. Hint: You can either compute the
infinite sum Pr{3} + Pr{4} + ... or you can compute 1 - Pr{0} - Pr{1} - Pr{2}. Ans: 1/27.
Summary:
We observe something which has a number of outcomes, and we aren't able to predict
which outcome will occur with complete certainty. However, we feel that there are some underlying
probabilities that the outcome will lie in any particular set. More precisely, given a set E of outcomes, we
feel that the probability that the outcome is in E should satisfy (5) and also the formulas (9), (10), (11) and
(12). In practice one uses assumed or estimated values for the probabilities and requires that these values
satisfy the assumptions (9), (10), (11), and (12). This is called a probability model of the real world
situation. Using the probability model one calculates other probabilities and hope that they apply to the real
world situation.
1.1 - 8