Download 1 Probability

1 Probability 1.1 Probabilities of outcomes and events. 1.1.1 Probabilities of outcomes. This chapter is an introduction to some of the basic concepts of probability. Let's begin by considering what we mean by probability. Example 1. Suppose we roll a six sided die, and we say the following. The probability of getting a 1 is 1/6, The probability of getting a 2 is 1/6, .... The probability of getting a 6 is 1/6. What do we mean by this? One interpretation is based on frequency of occurrence. If we roll the die a large number of times, then we expect that the proportion of times it comes up 1 will be approximately 1/6, and similarly for 2, 3, ..., 6. For example, if we roll the die 600 times, then we expect to get approximately 100 1's. To make this a little more precise, we might say # times a 1 comes up  Pr{ 1 } # rolls as # rolls  , and similarly for 2, 3, ..., 6. Here Pr{ 1 } denotes the probability that a 1 will come up on an individual roll. Example 2. Each day a newsstand buys and sells The Wall Street Journal. Based on records for the past month they feel that they would never sell more than 4 copies in any day. Furthermore they feel that The probability, Pr{0}, of selling zero copies in a given day = p0 = 0.2, The probability, Pr{1}, of selling one copy in a given day = p1 = 0.25, (1) Pr{2} = p2 = 0.35, Pr{3} = p3 = 0.15, Pr{4} = p4 = 0.05, Here we are assuming that the newsstand has enough copies of The Wall Street Journal in stock to satisfy the demand of all the customers who want to buy it in a given day. Using a frequency interpretation of probability similar to Example 1, if we look at the number of copies of The Wall Street Journal sold each day for a number of consecutive days, then the proportion of times none are sold is approximately 0.2, and similarly for 1, 2, 3, 4. More precisely 1.1 - 1 (2) # days none are sold  Pr{ 0 } # days as # days  , and similarly for 1, 2, 3, 4. Often it is convenient to group the probabilities of the outcomes together into a 0.2  pp01   0.25  vector p =  p2  =  0.35 .  p3   0.15   p4   0.05  Often it is informative to make a graph of the probability distribution. Here is a graph of the probability distribution in Example 2. 0.35 0.3 0.25 0.2 0.15 0.1 0.05 1 2 3 4 Since the relation (2) involves taking the limit as # days  , it looks like we may not be able to determine Pr{0}, ..., Pr{4} exactly. We can estimate these values by the ratios # days j are sold , # days j = 0, ..., 4, using a large number of days. In fact, the values 0.2, 0.25, ..., 0.05 used for Pr{0}, ..., Pr{4} in (1) were probably obtained in this fashion. This is typical of real world probability models where we must use estimated values for the presumed underlying probabilities. Example 3. A company manufactures diodes. Based on constant testing they feel that that the probability that a diode is defective is 0.3%, i.e. The probability that a diode is defective = 0.003, The probability that a diode is good = 0.997. If we look at a large number of diodes, the proportion of defectives is approximately 0.003 and the proportion of good diodes is approximately 0.997, i.e. # defective  Pr{d} # diodes as # diodes  , # good  Pr{g} # diodes as # diodes  . 1.1 - 2 where we are abbreviating defective and good by d and g. Often it is convenient to number the outcomes by consecutive integers starting with 0 or 1. For example, with the diodes, we might represent defective by 0 and good by 1. In this case we can use the following vector to represent the probability distribution. p = p 0.003 ( Pr{d} Pr{g} ) = ( p ) = ( 0.997 ) 0 1 Example 4. An office copier is either in good condition (represented by g or 1), poor condition (p or 2) or broken (b or 3). Based on past experience, you feel that when you arrive at the office tomorrow The probability that the copier is in good condition = 0.5, The probability that the copier is in poor condition = 0.1, The probability that the copier is broken = 0.4, If we look at a large number of days, the proportion of days the copier is in good condition is approximately 0.5, the proportion of days the copier is in poor condition is approximately 0.1 and the proportion of days the copier is broken is approximately 0.4, i.e. # good  Pr{g} = p1 = 0.5 # days as # days  , # poor  Pr{p} = p2 = 0.1 # days as # days  , # broken  Pr{b} = p3 = 0.4 # days as # days  ,  Pr{g}   p1   0.5   Pr{b}   p3   0.4  In this case the probability vector is p =  Pr{p}  =  p2  =  0.1  Summary and Terminology: We observe something. This is called an experiment in many books. For example, rolling a die and observing the face that comes up is an experiment. When we do an experiment there are certain possible things we may observe. These are called outcomes. For example, when we roll a die, the outcomes are 1, 2, 3, 4, 5, and 6. The set of all possible outcomes is called the sample space. Often we will denote the sample space by S. ( is another symbol used for the sample space in many texts.) For example, when we roll a die, S = {1, 2, 3, 4, 5, 6}. For each outcome a in S one has the probability of a occurring Pr{a}. Intuitively, it is the proportion of times the outcome occurs if we repeat the same experiment independently many times, i.e. (3) Pr{ a } = lim n # times a occurs in n repetitions n Remarks: Even though this description of probability has a nice intuitive appeal, it has some problems if one tries to use it as a mathematical definition. One reason for this is because the notion of repeating the same experiment independently a number of times has not been defined. If we flip a coin or roll a die repeatedly, then this is pretty close to independent repetitions of the same experiment. However if we 1.1 - 3 observe the number of newspapers sold each day for a number of days, then this is probably not independent repetitions of the same experiment. Despite this problem, let us proceed on using the above concept of probability as a guide to our thoughts. In the two examples above, there were only a finite number of outcomes, i.e. S = {a1, a2, ..., an}. In this case formula (3) would imply n  Pr{ai} ≥ 0 for each i and (4) Pr{ai} = 1 i=1 If we are going to use estimated or assumed values for the actual probabilities, then we should make sure that they satisfy (4). For example, this is true in Examples 1 and 2. The set of values Pr{a1}, ..., Pr{an} is called the probability distribution for the outcomes of the experiment. We can represent a probability  p1  distribution by the row vector p = (p1,…, pn) or column vector p =   where pi = Pr{ai}. Later we shall  pn  see some situations where it might be advantageous to use a row vector instead of a column vector or vice versa. For example, in Example 2 one has p = (0.2, 0.25, 0.35, 0.15, 0.05). The components of such a vector p should be nonnegative numbers that sum to one. A vector with this property is called a probability vector. 1.1.2 Probabilities of events. Often we are interested in whether the outcome of an experiment belongs to a certain set. Consider the situation in Example 2. Suppose the newsstand decides to stock 2 copies of The Wall Street Journal on a certain day. What is the probability that they will have enough copies to satisfy all the customers that want to buy one? We are asking for the probability that the number of copies customers want to buy on a given day is in the set E = {0, 1, 2}. Suppose, as in this example, we fix a set E of outcomes. If the experiment results in an outcome in the set E, then we say the event E has occurred. Thus, an event is just another word for a set of outcomes. For example, having no more than 2 customers wanting to buy the paper on a given day is the event E = {0, 1, 2}. We are interested in probabilities of events as well as probabilities of individual outcomes. Using the intuitive description of probability above, we would say that the probability of a certain event E is the proportion of times the outcome lies in the set E if we repeat the same experiment independently many times, i. e. (5) lim # times outcome is in E in n independent repetitions Pr{ E } = n  n If E = {b1, b2, ..., bm} is a finite set then it would follow from (3) and (5) that the probability of the event E is the sum of the probabilities of the outcomes in E, i.e. 1.1 - 4 m (6) Pr{E} =  Pr{bi} i=1 For example, the probability that the number of customers wanting The Wall Street Journal on a given day is no more than 2 would be Pr{0} + Pr{1} + Pr{2} = 0.2 + 0.25 + 0.35 = 0.8. Finite additivity. Formula (6) is actually a special case of the following more general formulas. Suppose A and B are disjoint events, i.e. there is no outcome that is in both A and B. Let AB be the union of A and B, i.e. the set of outcomes that are either in A or B. Then (7) Pr{AB} = Pr{A} + Pr{B} Example 5. In a certain company 10% of the employees work in the accounting department and 30% of the employees work in the sales department. Suppose no employee works in both department. What percentage of the employees work in both department? Answer: 10% + 30% = 40%. More generally, suppose that A and B are not necessarily disjoint, i.e. they may have elements in common. Then (8) Pr{AB} = Pr{A} + Pr{B} - Pr{AB} For a suggestion on how to prove this, see Problem 2 below. Example 6. In a certain company 10% of the people are left handed, 30% wear glasses and 5% of the people are both left handed and wear glass. What percentage of the people are either left handed or wear glasses? Answer: 10% + 30% - 5% = 35%. Another generalization of (7) is the following. Suppose E1, E2, ..., Em are disjoint events, i.e. no two of the sets Ei and Ej have any elements in common. Furthermore, let E denote the union of the sets E1, ..., Em, i.e. the set of all outcomes which are in one of the sets E1, ..., Em. Then m (9) Pr{E} =  Pr{Ei} i=1 This is called the finite additivity property of probability. There are three other properties of probability of events which would follow from our informal definition. (10) 0  Pr{E}  1, for any event E, 1.1 - 5 (11) Pr{S} = 1, (12) Pr{  } = 0 where S is the entire sample space and  is the empty set. Formula (10) is called the non-negativity property of probability and formula (11) is sometimes called the normalization axiom. Problem 1. Office Max keeps a certain number of staplers on hand. If they sell out on a certain day, they order 6 more from the distributor and these are delivered in time for the start of the next day. Thus the inventory at the start of a day can be 1, 2, 3, 4, 5, or 6. Based on records for the past two months they feel that the probability, Pr{1}, of there being 1 stapler at the start of a day = 0.09, Pr{2} = 0.21, Pr{3} = 0.29, Pr{4} = 0.23, Pr{5} = 0.12 and Pr{6} = 0.06. (a) What is the sample space S? Ans: S = {1, 2, 3, 4, 5, 6} (b) Does this model satisfy condition (4)? Ans: Yes. (c) What is the probability that there is at least 3 staplers at the start of the day? What event are we talking about? Ans: The event is E = {3, 4, 5, 6} and Pr{E} = 0.7. Problem 2. Prove (8). Hint: Let A \ B be the difference of A and B, i.e. the set of outcomes that are in A but not B. Note that A \ B and B are disjoint and their union is AB. Apply (7) to this relation. Then apply (7) with A \ B in place of A and AB in place of B and combine with the previous formula. Problem 3. If E is an event, then the complement of E, denoted by Ec, is the set of outcomes in the sample space S that are not in E, in symbols Ec = {a  S: a  E}. Use finite additivity and (11) to show that Pr{Ec} = 1 - Pr{E} 1.1.3 Countably infinite sample spaces. We often encounter situations where there are an infinite number of outcomes. Infinite sets can either be countable or uncountable. A set is called countable if its elements can be arranged in a sequence. Otherwise, it is called uncountable. For example, the set of integers is countable since we can arrange them in a sequence. One way to do this is 0, 1, -1, 2, -2, 3, ..., n, -n, ... . On the other hand if one considers the set of all real numbers x in some interval a  x  b, then it can be shown that this set is uncountable. Let us consider the case where the number of outcomes is countably infinite. so S = {a1, a2, ... }. Example 7. Suppose we flip a coin until a head appears and we count the number of tosses until a head appears. In this case the possible outcomes are the positive integers. We are interested in the probability, Pr{n}, that the number of tosses until a head appears is a given integer n. Suppose Pr{ n } = (1/2)n for n = 1, 2, . … 1.1 - 6 0.5 0.4 0.3 0.2 0.1 2 4 6 8 10 Later we shall give an argument as to why this is a reasonable assumption. For the moment, we just assume it is true. For example, the probability that it takes 3 tosses to get a head would be (1/2) 3 = 1/8. Now suppose we ask for the probability that the number of tosses until we get a head is an even number. This is the event A = {2, 4, 6, ... }. It is not unreasonable to assume that the finite additivity formulas (6) and (9) above extend to infinite sums, i.e. if E = {b1, b2, ... } is a countably infinite set then  (12) Pr{E} =  Pr{bi} i=1 and if E1, E2, ... is an infinite sequence of disjoint sets and E is their union, then  (13) Pr{E} =  Pr{Ei} i=1 Formula (13) is called the countable additivity property of probability. Formulas (12) and (13) are assumptions; they do not follow from (3) and (5). Using (12), it would follow that the probability of A = {2, 4, 6, ... } is  Pr{A} =  i=1  (1/2)2i =   (1/4)i = (1/4) i=1  i=0 (1/4)i = (1/4) 1 1 1 4 = Here we have used the formula  (14) xi i=0 = 1 1-x which holds for | x | < 1. One implication of the assumption (12) is that the analogue of (4) should hold if the sample space S = {a1, a2, ... } is countably infinite, i.e. 1.1 - 7 1 3   (15) Pr{ai} = 1 i=1 Just as in the finite case, we will require that this hold for any assumed or estimated values for the Pr{ai}. For example, using formula (14) with x = 1/2, one can see that this holds for the values in Example 5. As in the case where S is finite, the set of values Pr{ai} is called the probability distribution for the outcomes of the experiment. Problem 4. You often go to the library to use the copier. Let N be the number of people already in line to use the copier including the one using it. For example, if N = 2, then there is one person using the copier and one person waiting to use it. Suppose after some investigation you arrive at the following model for the number at the machine, namely the probability, Pr{n}, that there is n people at the machine is Pr{n} = 2 (1/3)n+1 for n = 0, 1, 2, … (a) Show that this model satisfies the condition (13). (b) Suppose that whenever you go to use the copier and see 3 or more people at machine, you decide the wait will be too long and come back later. Find the probability that this will happen, i.e. find the probability of the event {3, 4, ... }. Hint: You can either compute the infinite sum Pr{3} + Pr{4} + ... or you can compute 1 - Pr{0} - Pr{1} - Pr{2}. Ans: 1/27. Summary: We observe something which has a number of outcomes, and we aren't able to predict which outcome will occur with complete certainty. However, we feel that there are some underlying probabilities that the outcome will lie in any particular set. More precisely, given a set E of outcomes, we feel that the probability that the outcome is in E should satisfy (5) and also the formulas (9), (10), (11) and (12). In practice one uses assumed or estimated values for the probabilities and requires that these values satisfy the assumptions (9), (10), (11), and (12). This is called a probability model of the real world situation. Using the probability model one calculates other probabilities and hope that they apply to the real world situation. 1.1 - 8

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1 Probability