Download Chapter 8 Discrete probability and the laws of chance

Chapter 8 Discrete probability and the laws of chance 8.1 Introduction In this chapter we lay the groundwork for calculations and rules governing simple discrete probabilities. These steps will be essential to developing the skills to analyzing and understanding problems of genetic diseases, genetic codes, and a vast array of other phenomena where the laws of chance intersect the the processes of biological systems. To gain experience with probability, it is important to see simple examples. We start this introduction with experiments that can be easily reproduced and tested by the reader. 8.2 Simple experiments Consider the following experiment: We flip a coin and observe any one of two possible results: “heads” (H) or “tails” (T). A fair coin is one for which these results are equally likely. Similarly, consider the experiment of rolling a dice: A six-sided dice can land on any of its six faces, so that a “single experiment” has six possible outcomes. We anticipate getting each of the results with an equal probability, i.e. if we were to repeat the same experiment many many times, we would expect that, on average, the six possible events would occur with similar frequencies. We say that the events are random and unbiased for “fair” dice. How likely are we to roll a 5 and a 6 in successive experiments? A five or a six? If we toss a coin ten times, how probable is it that we get 8 out of ten heads? For a given experiment such as the one described here, we are interested in quantifying how likely it is that a certain event is obtained. Our goal in this chapter is to make more precise our notion of probability, and to examine ways of quantifying and computing probabilities for experiments such as these. To motivate this investigation, we first look at results of a real experiment performed in class by students. v.2005.1 - January 5, 2009 1 Math 103 Notes 8.3 Chapter 8 Empirical probability Each student in a class of N = 121 individuals was asked to toss a penny 10 times. The students were then asked to record their results and to indicate how many heads they had obtained in this sequence of tosses. (Note that the order of the heads was not taken into account, only how many were obtained out of the 10 tosses.) The table shown below specifies the number, k, of heads (column 1), the number, xk , of students who responded that they had obtained that many heads (column 2). In column (4) we compute the fraction of the class p(xk ) = xk /N who got exactly k heads. In column (3) we display a cumulative number of students who got any number up to and including k heads, and then in column (5) we compute the cumulative fraction of the class who got any number up to and including k heads. We will henceforth associate the fraction with the empirical probability of k heads. In the last column we include the cumulative probability, i.e. the sum of the probabilities of getting any number up to k heads. Number of heads frequency k xk Cumulative k X xi probability p(xk ) = xk /N i 0 1 2 3 4 5 6 7 8 9 10 0 1 2 10 27 26 34 14 7 0 0 0 1 3 13 40 66 100 114 121 121 121 cumulative probability k X p(xi ) i 0.00 0.0083 0.0165 0.0826 0.2231 0.2149 0.2810 0.1157 0.0579 0.00 0.00 0.00 0.0083 0.0248 0.1074 0.3306 0.5455 0.8264 0.9421 1.00 1.00 1.00 Table 8.1: Results of a real experiment carried out by 121 students in this mathematics course. Each student tossed a coin 10 times. We recorded the number of students who got 0, 1, 2, etc heads. The fraction of the class that got each outcome is identified with the (empirical) probability of that outcome. See Figure 8.1 for the same data presented graphically. In Figure 8.1 we show what this distribution looks like on a bar graph. We observe that this “empirical” distribution is not very symmetric, because it is based on a total of only 121 trials (i.e. 121 repetitions of the experiment of 10 tosses). However, it is clear from this distribution that certain results occurred more often (and hence are associated with a greater probability) than others. To the right, we also show the cumulative distribution function, superimposed as an xy-plot on the same graph. Observe that this function starts with the value 0 and climbs up to value 1, since the probabilities of any of the events (0, 1, 2, etc heads) must add up to 1. v.2005.1 - January 5, 2009 2 Math 103 Notes Chapter 8 1.0 0.4 empirical probability of k heads in 10 tosses Cumulative distribution 0.0 0.0 0.0 number of heads (k) 10.0 0.0 number of heads (k) 10.0 Figure 8.1: The data from Table 8.1 is shown plotted on this graph. A total of N = 121 people were asked to toss a coin n = 10 times. In the bar graph (left), the horizontal axis reflects k, the number, of heads (H) that came up during those 10 coin tosses. The vertical axis reflects the fraction p(xk ) of the class that achieved that particular number of heads. The same bar graph is shown on the right, together with the cumulative function that sums up the values from left to right. v.2005.1 - January 5, 2009 3 Math 103 Notes 8.4 Chapter 8 Mean and variance of a probability distribution In a previous chapter, we considered distributions of grades and computed a mean (also called average) of that distribution. The identical concept applies in the distributions discussed in the context of probability, but here we interchangeably use the terminology mean or average value or expected value. Suppose we toss a coin n times and let xi stand for the number of heads that are obtained in those n tosses. Then xi can take on the values xi = 0, 1, 2, 3, . . . n. Let p(xi ) be the probability of obtaining exactly xi heads. By analogy to ideas in a previous chapter, we would define the mean (or average or expected value), x̄ of the probability distribution by the ratio Pn xi p(xi ) x̄ = Pi=0 . n i=0 p(xi ) However, this expression can be simplified by observing that, according to property (2) of discrete probability, the denominator is just n X p(xi ) = 1 i=0 This explains the following definition of the expected value. Definition The expected value x̄ of a probability distribution (also called the mean or average value) is x̄ = n X xi p(xi ) . i=0 It is important to keep in mind that the expected value or mean is a kind of “average x coordinate”, where values of x are weighted by their frequency of occurrence. This is similar to the idea of a center of mass (x positions weighted by masses associated with those positions). The mean is a point on the x axis, representing the “average” outcome of an experiment. (Recall that in the distributions we are describing, the possible outcomes of some observation or measurement process are depicted on the x axis of the graph.) The mean is not the same as the average value of a function, discussed in an earlier chapter. (In that case, the average is an average y coordinate.) We also define a numerical quantity that represents the width of the distribution. We define the variance, V and standard deviation, σ as follows: v.2005.1 - January 5, 2009 4 Math 103 Notes Chapter 8 The variance, V , of a distribution is V = n X i=0 (xi − x̄)2 p(xi ). where x̄ is the mean. The standard deviation, σ is √ σ = V. In the problem sets, we show that the variance can also be expressed in the form V = M2 − x̄2 where M2 is the second moment of the distribution. Moments of a distribution are defined as the numbers obtained by summing up products of the probability weighted by powers of x. The j’th moment, Mj of a distribution is n X Mj = (xi )j p(xi ). i=0 8.4.1 Example For the empirical probability distribution shown in Figure 8.1, the mean (expected value) is calculated by performing the following sum, based on the table of events shown above: x̄ = 10 X k=0 xk p(xk ) = 0(0) + 1(0.0083) + 2(0.0165) + · · · + 8(0.0579) + 9(0) + 10(0) = 5.2149 The mean number of heads in this set of experiments is about 5.2. Intuitively, we would expect that in a fair coin, half the tosses should produce heads, i.e. on average 5 heads would be obtained out of 10 tosses. We see from the fact that the empirical distribution is slightly biased, that the mean is close to, but not equal to this intuitive theoretical result. To compute the variance we form the sum 10 10 X X 2 (k − 5.2149)2p(k). (xk − x̄) p(xk ) = V = k=0 k=0 Here we have used the mean calculated above and the fact that xk = k. We obtain V = (0 − 5.2149)2(0) + (1 − 5.2149)2(0.0083) + · · · + (9 − 5.2149)2(0) + (10 − 5.2149)2(0) = 2.0530 The standard deviation is then σ = v.2005.1 - January 5, 2009 √ V = 1.4328. 5 Math 103 Notes 8.5 Chapter 8 Theoretical probability Our motivation in what follows it to put results of an experiment into some rational context. We would like to be able to predict the distribution of outcomes based on underlying “laws of chance”. Here we will formalize the basic rules of probability, and learn how to assign probabilities to events that consist of repetitions of some basic, simple experiment like the coin toss. Intuitively, we expect that in tossing a fair coin, half the time we should get H and half the time T. But as seen in our experimental results, there can be long repetitions that result in very few H or very many H, far from the mean or expected value. How do we assign a theoretical probability to the event that only 1 head is obtained in 10 tosses of a coin? This motivates our more detailed study of the laws of chance and theoretical probability. As we have seen in our previous example, the probability p assigns a number to the likelihood of an outcome of an experiment. In the experiment discussed above, that number was the fraction of the students who got a certain number of heads in a coin toss repetition. 8.5.1 Basic definitions of probability Suppose we label the possible results of the experiment by symbols e1 , e2 , e3 , . . . ek , . . . em where m is the number of possible events (e.g. m = 2 for a coin flip, m = 6 for a dice toss). We will refer to these as events, and our purpose here will be to assign numbers, called probabilities, p, to these events that indicate how likely it is that they occur. Then the following two conditions are required for p to be a probability: 1. The following inequality must be satisfied: 0 ≤ p(ek ) ≤ 1 for all events ek . Here p(ek ) = 0 is interpreted to mean that this event never happens, and p(ek ) = 1 means that this event always happens. The probability of each (discrete) event is a number between 0 and 1. 2. If {e1 , e2 , e3 . . . ek , . . . em } is a list of all the possible events then p(e1 ) + p(e2 ) + p(e3 ) + · · · + p(ek ) + . . . p(em ) = 1 or simply m X p(ek ) = 1 k=1 That is, the probabilities of any of the events occurring sum up to one, since one or another of the events must always occur. v.2005.1 - January 5, 2009 6 Math 103 Notes Chapter 8 Definition The list of all possible “events” {e1 , e2 , e3 . . . ek , . . . em } is called the sample space. 8.6 Multiple events and combined probabilities We here consider an “experiment” that consists of more than one repetition. For example, each player tossed a coin 10 times to generate data used earlier. We aim to have a way of describing the number of possible events as well as the likelihood of getting any one or another of these events. First multiplication principle If there are N1 possible events in experiment 1 and N2 possible events in experiment 2, then there are N1 · N2 possible events of the combined set of experiments. In the above multiplication principle we assume that the order of the events is important. Example 1 Each flip of a coin has two events. Flipping two coins can give rise to 2 · 2 = 4 possible events (where we distinguish between the result TH and HT, i.e. order of occurrence is important.) Example 2 Rolling a dice twice gives rise to 6 · 6 = 36 possible events. Example 3 A sequence of three letters is to be chosen from a 4-letter alphabet consisting of the letters T, A, G, C. (For example, TTT, TAG, GCT, and GGA are all examples of this type.) The number of ways of choosing such 3-letter “words” is 4 × 4 × 4 = 64 since we can pick any of the four letters in any of the three positions of the sequence. 8.7 Calculating the theoretical probability How do we assign a probability to a given set of events? In the first example in this chapter, we used data to do this, i.e. we repeated an experiment many times, and observed the fraction of times of occurrence of each event. The resulting distribution of outcomes was used to determine empirical probability. Here we take the alternate approach: we make some simplifying assumptions about each elementary event and use the rules of probability to compute a theoretical probability. v.2005.1 - January 5, 2009 7 Math 103 Notes Chapter 8 Equally likely assumption One of the most common assumptions is that each event occurs with equal likelihood. Suppose that there are m possible events and that each is equally likely. Then the probability of each event is 1/m, i.e. P(ei ) = 1 for i = 1 . . . m m Example 1 For a fair coin tossed one time, we expect that the probability of getting H or T is equal. In that case, P(H) + P(T ) = 1 P(H) = P(T ) Together these imply that 1 P(H) = P(T ) = . 2 Example 2 For a fair 6-sided die, the same assumption leads to the conclusion that the probability of getting any one of the six faces as a result of a roll is P(ek ) = 1/6 for k = 1 . . . 6. Independent events In order to combine results of several experiments, we need to discuss the notion of independence of events. Essentially, independent events are those that are not correlated or linked with one another. For example, we assume in general that the result of one toss of a coin does not influence the result of a second toss. All theoretical probability calculated in this chapter will be based on this important assumption. Second multiplication principle Suppose events e1 and e2 are independent. Then, if the probability of event e1 is P(e1 ) = p1 and the probability of event e2 is P(e2 ) = p2 , the probability of event e1 and event e2 both occurring is P(e1 and e2 ) = p1 · p2 . v.2005.1 - January 5, 2009 8 Math 103 Notes Chapter 8 We also say that this is the probability of event e1 AND event e2 . This is sometimes written as P(e1 ∩ e2 ). If the two events e1 and e2 are not independent, the probability of both occuring is P(e1 ∩ e2 ) = P(e1 ) · P(e2 assuming that e1 happened) = P(e2 ) · P(e1 assuming that e2 happened) Example 3 The probability of tossing a coin to get H and rolling a dice to get a 6 is the product of the individual probabilities of each of these events, i.e. 1 1 1 · = . 2 6 12 P(H and 6) = Example 4 The probability of rolling a dice twice, to get a 3 followed by a 4, is P(3 and 4) = 1 1 1 · = . 6 6 36 Addition principle Suppose events e1 and e2 are mutually exclusive. Then the probability of getting event e1 OR event e2 is given by P(e1 ∪ e2 ) = p1 + p2 . In general, i.e. the events e1 and e2 are not necessarily mutually exclusive, the following equation holds: P(e1 ∪ e2 ) = P(e1 ) + P(e2 ) − P(e1 ∩ e2 ). Example 5 When we roll a dice once, assuming that each face has equal probability of occurring, the chances of getting either a 1 or a 2 (i.e. any of these two possibilities out of a total of six) is P({1} ∪ {2}) = v.2005.1 - January 5, 2009 1 1 1 + = . 6 6 3 9 Math 103 Notes Chapter 8 Example 6 When we flip a coin, the probability of getting either heads or tails is P({H} ∪ {T }) = 1 1 + = 1. 2 2 This makes sense since there are only 2 possible events (N = 2), and we said earlier that 1, i.e. one of the 2 events must always occur. PN k=1 p(ek ) = Subtraction principle If the probability of event ek is P(ek ) then the probability of NOT getting event ek is P(not ek ) = 1 − P(ek ). Example 7 When we roll a dice once, the probability of NOT getting the value 2 is P(not 2) = 1 − 1 5 = . 6 6 Alternatively, we can add up the probabilities of getting all the results other than a 2, i.e. 1, 3, 4, 5, or 6, and arrive at the same answer. P({1} ∪ {3} ∪ {4} ∪ {5} ∪ {6}) = 5 1 1 1 1 1 + + + + = . 6 6 6 6 6 6 Example 8 A box of jelly beans contains a mixture of 10 red, 18 blue, and 12 green jelly beans. Suppose that these are well-mixed and that the probability of pulling out any one jelly bean is the same. (a) What is the probability of randomly selecting two blue jelly beans from the box? (b) What is the probability of randomly selecting two beans that have similar colors? (c) What is the probability of randomly selecting two beans that have different colors? Solution There are a total of 40 jelly beans in the box. In a random selection, we are assuming that each jelly bean has equal likelihood of being selected. (a) Suppose we take out one jelly bean and then a second. Once we take out the first, there will be 39 left. If the first one was blue (with probability 18/40), then there will be 17 blue ones left in the box. Thus, the probability of selecting a blue bean AND another blue bean is: 18 17 · 39 = 0.196. P(2 blue) = 40 v.2005.1 - January 5, 2009 10 Math 103 Notes Chapter 8 The same answer is obtained by considering pairs of jelly beans. There are a total of (40 × 39)/2 pairs and out of these, only (18 × 17)/2 pairs are pure blue. Thus the probability of getting a blue pair is (18 × 17)/2 = 0.196 (40 × 39)/2 (The result is the same whether we select both simultaneously or one at a time.) (b) Two beans will have similar colors if they are both blue OR both red OR both green. We have 10 9 12 11 to add the corresponding probabilities, that is P(same color) = ( 18 · 17 )+( 40 · 39 )+( 40 · 39 ) = 0.338. 40 39 (c) Two beans will have different colors if we DO NOT get the case that the two beans will have the same color. Thus P(not same color) = 1 − P(same color) = 1 − 0.338 = 0.662. Example 9 (a) How many different ways are there of rolling a pair of dice to get the total score of 7? (By total score we mean the sum of both faces.) (b) What is the probability of rolling the total score 7 with a pair of fair dice ? (c) What is the probability of rolling the total score 8 with a pair of dice? (d) What is the probability of getting a total of 13 by rolling three fair dice? Solution (a) We can think of the result as + = 7. Then for the first die we could have any value, j = 1, 2, . . . 6 (i.e. a total of 6 possibilities) but then the second die must be 7 − j, which means that there is no choice for the second die. Thus there are 6 ways of obtaining a total of 7. (We do not need to list those ways, here, since the argument establishes an unambiguous answer, but here is that list anyway, showing the face value of each pair of events that totals 7: (1, 6); (2, 5); (3, 4); (4, 3); (5, 2); (6, 1)). (b) There are a total of 6 × 6 = 36 possibilities for the outcomes of rolling two dice, and we saw above that 6 of these will add up to 7. Assuming all possibilities are equally likely (for fair dice), this means that the probability of a total of 7 for the pair is 6/36 = 1/6. (c) Here we must be more careful, since the previous argument will not quite work: We need + = 8. For example, if the first dice comes up 1, then there is no way to get a total of 8 for the pair. The smallest value that would work on the “first die” is 2, so that we have only 5 possible choices for the first die, and then the second has to make up the difference. There are only 5 such possibilities. These are: (2, 6); (3, 5); (4, 4); (5, 3); (6, 2). Therefore the probability of such an event is 5/36. v.2005.1 - January 5, 2009 11 Math 103 Notes Chapter 8 (d) To get 13 by rolling three fair dice, we need + + = 13. We consider the possibilities: If the first dice comes up a 6, then we need, for the other pair + = 13 − 6 = 7. We already know this can be done in six ways. If the first dice comes up a 5, then we need, for other pair + = 13 − 5 = 8. There are 5 ways to get this. Let us organize our “counting” of the possibilities in the following table, to be systematic and to see a pattern: Face value total of of first die remaining pair Number of ways to get this 6 + =7 6 5 + =8 5 4 + =9 4 3 + = 10 3 2 + = 11 2 1 + = 12 1 We can easily persuade ourselves that there is a pattern being followed in building up this table. We see that the total number of ways of getting the desired result is just a sum of the numbers in the third column, i.e. 6 + 5 + 4 + 3 + 2 + 1 = 21. But the total number of possibilities for the three dice is 63 = 216. Thus the probability of a total score of 13 is 21/216 = 0.097. In this example, we had to list some different possibilities in order to achieve the desired result. The examples in this section illustrate the simplest probability assumptions, rules, and calculations. Many of the questions asked in these examples were answered by careful “counting” of possibilities and computing the fraction of cases in which some desired result is obtained. In the following sections, we will discuss ways of representing outcomes of measurement (by distributions, by numerical descriptors such as “mean” and “variance”). We will also study techniques for helping us to “count” the number of possibilities and to compute probabilities associated with repeated trials of one type of experiment. v.2005.1 - January 5, 2009 12 Math 103 Notes 8.8 Chapter 8 Theoretical probability of coin tossing Earlier in this chapter, we studied the results of a coin-tossing experiment. Now we turn to a theoretical investigation of the same type of experiment to understand predictions of the basic rules of probability. We would like to quantify the probability of getting some number, k, of heads when the coin is tossed n times. We start with an elementary example when the coin is tossed only three times (N = 3) to build up intuition. Let us use the notation p=P(H) and q=P(T) to represent the probabilities of each outcome. For our theoretical probability investigation, we will make the simplest assumption about each elementary event, i.e. that it is equally likely to obtain a head (H) and tail (T) in one repetition. Then the probabilities of event H and event T in a single toss are the same, i.e. 1/2: P (H) = P (T ) = 1/2 i.e. p = q = 1/2. A new feature of this section is that we will summarize the probability using a frequency distribution. In this important type of plot, the horizontal axis represents some observed or measured value in an experiment (for example, the number of heads in a coin toss experiment). The vertical axis represents how often that outcome is obtained (i.e. the frequency, or probability of the event.) (b) three coin tosses Suppose we are not interested in the precise order, but rather in the total number of heads (or tails). For example, we may win $1 for every H and lose $1 for every T that occurs. When a fair coin is tossed 3 times, the possible events are as follows: Grouping events together by the number of heads obtained, we summarize the same information with the following frequency table: event TTT TTH THT HTT THH HTH HHT HHH number of heads 0 1 1 1 2 2 2 3 probability · 21 · 12 = 18 · 21 · 12 = 18 · 21 · 12 = 18 · 21 · 12 = 18 · 21 · 12 = 18 · 21 · 12 = 18 · 21 · 12 = 18 · 21 · 12 = 18 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 Table 8.2: A list of all possible results of a 3 coin toss experiment, showing the number of heads in each case and the theoretical probabilities of each of the results Each result shown above has the same probability, p = 1/23 = 1/8. Grouping together results from Table 8.2, and forming Table 8.3 we find similarly that the probability of getting no heads is 1/8, of getting one head is 3/8 (the sum of the probabilities of three equally likely events), of getting two heads is 3/8, and of getting three heads is 1/8. This distribution is shown in Figure 8.2(b). We can use these results to calculate the theoretical mean number of heads (expected value) in this experiment. v.2005.1 - January 5, 2009 13 Math 103 Notes Chapter 8 number of heads (xk = k) 0 1 2 3 probability P(TTT) P(TTH)+P(THT)+P(TTH) P(HTT)+P(HHT)+P(THH) P(HHH) result 1/8 3/8 3/8 1/8 Table 8.3: Theoretical probability of getting 0, 1, 2, 3, H’s in a 3-coin toss experiment. 8.8.1 Example In the case of three tosses of a coin described above and shown in Figure 8.2(b), the expected value is: 3 X k · P(k Heads) x̄ = k=0 x̄ = 0p(0) + 1p(1) + 2p(2) + 3p(3) 3 3 1 1 x̄ = 0 · + 1 · + 2 · + 3 · . 8 8 8 8 3 6 3 12 x̄ = + + = = 1.5 8 8 8 8 Thus, in three coin tosses, we expect that on average we would obtain 1.5 heads. 1.0 p(x) distribution of Heads in 3 coin tosses 0.0 -0.5 3.5 Figure 8.2: The theoretical probability distribution for the number of heads obtained in three tosses of a fair coin. The variance of this distribution can be calculated as follows: V = 3 X 3 3 1 1 (xi − x̄)2 p(xi ) = (0 − 1.5)2 · + (1 − 1.5)2 · + (2 − 1.5)2 · + (3 − 1.5)2 · 8 8 8 8 i=0 v.2005.1 - January 5, 2009 14 Math 103 Notes Chapter 8 3 1 V = 2(1.5)2 + 2(0.5)2 = 0.5625 + 0.1875 = 0.75 8 8 The standard deviation is √ √ σ = V = 0.75 = 0.866 The bar graph shown in Figure 8.2 (a) and (b) will be referred to as the probability distribution for the number of heads in three tosses of a coin. We make the following observations about this distribution: • The values of the probabilities are all positive and satisfy 0 ≤ p ≤ 1. • In both graphs, the sum of the areas of all the bars in the bar graph is 1, i.e. n X p(xi ) = 1. i=0 • Some events (for example obtaining 1 or 2 heads) appear more often than others, and are thus associated with larger values of the probability. (Even though each underlying event is equally likely, there are many combinations that contribute to the result of one head.) • There is a pattern in the probability we computed theoretically for k, the number of heads obtained in n tosses of a fair coin. The pattern so far is: Probability of k heads in n tosses of a fair coin= n [Number of possible ways to obtain k heads ] × 21 • So far, to determine the number of possible ways to obtain k heads, i.e., the factor in the square brackets, we have listed all the possibilities and counted those that have exactly so many heads. This would become very tedious, especially for large number of tosses, n. For this reason, some part of this chapter will be a diversion into the investigation of permutations and combinations. These results will help to understand what factor goes into the square brackets in the above term. • In the case of a fair coin here examined, p = q = 1/2. This accounts for the factor (1/2)n . We will see later that this result is modified somewhat when the coin is biased, i.e. not fair, so that the probability of H is not the same as the probability of T, i.e., p 6= q. In that case the factor (1/2)n will be modified (to pk q n−k , as later discussed). The coin toss experiment is an important example of a type of experiment with only two outcomes (H vs T). Such experiments are called Bernoulli trials. Here we have looked at examples where the probability of each event was (assumed to be) the same. We will generalize this to unequal probabilities further on in this chapter. The above motivation leads us to consider the subject of permutations and combinations. v.2005.1 - January 5, 2009 15 Math 103 Notes 8.9 Chapter 8 How many possible ways are there of getting k heads in n tosses? permutations and combinations In computing theoretical probability, we often have to “count” the number of possible ways there are of obtaining a given type of outcome. So far, it has been relatively easy to simply display all the possibilities and group them into classes (0, 1, 2, etc heads out of n tosses, etc.). This is not always the case. When the number of repetitions of an experiment grows, it may be very difficult and boring to list all possibilities. We develop some shortcuts to figure out, in general, how many ways there are of getting each type of outcome. This will make the job of computing theoretical probability easier. In this section we introduce some notation and then summarize general features of combinations and permutations to help in “counting” the possibilities. 8.9.1 Factorial notation Let n be an integer, n ≥ 0. Then n!, called “n factorial”, is defined as the following product of integers: n! = n(n − 1)(n − 2) . . . (2)(1) Example 1! = 1 2! = 2 · 1 = 2 3! = 3 · 2 · 1 = 6 4! = 4 · 3 · 2 · 1 = 24 5! = 5 · 4 · 3 · 2 · 1 = 120 We also define 0! = 1 8.9.2 Permutations A permutation is a way of arranging objects, where the order of appearance of the objects is important. v.2005.1 - January 5, 2009 16 Math 103 Notes Chapter 8 n! (a) n distinct objects n n−1 n−2 P(n,k)= (b) n distinct objects n n−1 ... ... 2 1 n slots n! (n−k)! k slots n−k+1 C(n,k) k! (c) n distinct objects k objects n n−1 ... n−k+1 k slots Figure 8.3: This diagram illustrates the meanings of permutations and combinations. (a) The number of permutations (ways of arranging) n objects into n slots. There are n choices for the first slot, and for each of these, there are n − 1 choices for the second slot, etc. In total there are n! ways of arranging these objects. (Note that the order of the objects is here important.) (b) The number of permutations of n objects into k slots, P (n, k), is the product n · (n − 1) · (n − 2) . . . (n − k + 1) which can also be written as a ratio of factorials. (c) The number of combinations of n objects in groups of k is called C(n, k) (shown as the first arrow in part c). Here order is not important. The step shown in (b) is equivalent o the two steps shown in (c). This means that there is a relationship between P (n, k) and C(n, k), namely, P (n, k) = k!C(n, k). v.2005.1 - January 5, 2009 17 Math 103 Notes Chapter 8 Example 6 Given the three cards Jack, Queen, King, we could permute them to form the sequences JQK JKQ QKJ QJK KQJ KJQ We observe that there are six possible arrangements (permutations). Other than explicitly listing all the arrangements, as done here, (possible only for small sets of objects) we could arrive at this fact by reasoning as follows: Let us consider the possible “slots” that can be filled by the three cards . We have three choices of what to put in the first slot (J or K or Q). This uses up one card, so for each of the above choices, we then have only two choices for what to put in the second slot. The third slot leaves no choice: we must put our remaining card in it. Thus the total number of possibilities, i.e. the total number of permutations of the three cards is 3 × 2 × 1 = 6. A feature of this argument is that it can be easily generalized for any number of objects. For example, given N = 10 different cards, we would reason similarly that as we fill in ten slots, we can choose any one of 10 cards for the first slot, any of the remaining 9 for the next slot, etc., so that the number of permutations is 10 × 9 × 8 · · · × 2 × 1 = 10! = 3628800 We can summarize our observation in the following statement: The number of permutations (arrangements) of n objects is n!. (See Figure 8.3(a).) Recall that the factorial notation n! was defined in section 8.9.1. Example 7 How many different ways are there to display five distinct playing cards? Solution The answer is 5! = 120. Here the order in which the cards are shown is important. Suppose we have n objects and we randomly choose k of these to put into k boxes (one per box). Assume k < n. v.2005.1 - January 5, 2009 18 Math 103 Notes Chapter 8 For example, the objects are ♣♦♥♠ ⋆ • ◦ ⊕ and we must choose some of them (in order) so as to fill up the 4 slots: We ask ”how many ways there are of arranging n objects taken k at a time?” As in our previous argument, the first slot to fill comes with a choice of n objects (for n possibilities). This “uses up” one object leaving (n − 1) to choose from in the next stage. (For each of the n first choices there are (n − 1) choices for slot 2, forming the product n · (n − 1)). In the third slot, we have to choose among (n − 2) remaining objects, etc. By the time we arrive at the k’th slot, we have (n − k + 1) choices. Thus, in total, the number of ways that we can form such arrangements of n objects into k slots, represented by the notation P (n, k) is P (n, k) = n · (n − 1) · (n − 2) . . . (n − k + 1). We can also express this in the form of factorial notation: P (n, k) = n · (n − 1) · (n − 2) . . . (n − k + 1) · (n − k) · · · · 3 · 2 · 1 n! = . (n − k) . . . (n − k − 1) · · · · 3 · 2 · 1 (n − k)! These remarks motivate the following observation: The number of permutations of n objects taken k at a time is P (n, k) = n! . (n − k)! (See Figure 8.3(b).) 8.9.3 Combinations and binomial coefficients How many ways are there to choose k objects out of a set of n objects if the order of the selection does not matter? For example, if we have a class of 10 students, how many possible pairs of students can be formed for a team project? In the case that the order of the objects is not important, we refer to the number of possible combinations of n objects taken k at a time by the notation C(n, k) or, more commonly, by n! n = C(n, k) = k (n − k)!k! Note that two notations are commonly used to refer to the same concept. We will henceforth use mainly the notation C(n, k). We can read this notation as “n choose k”. The values C(n, k) are also called the binomial coefficients for reasons that will shortly become apparent. As shown in Figure 8.3(b,c), combinations are related to permutations in the following way: To find the number of permutations of n objects taken k at a time, P (n, k), we would v.2005.1 - January 5, 2009 19 Math 103 Notes Chapter 8 • Choose k out of n objects. But the number of ways of doing this is C(n, k), i.e., n C(n, k) = k • Find all the permutations of the k chosen objects. We have discussed that there are k! ways of arranging (i.e. permuting) k objects. Thus P (n, k) = n k · k! The above remarks lead to the following conclusion: The number of combinations of n objects taken k at a time, sometimes called “n choose k”, (also called the binomial coefficient, C(n, k)) is: P (n, k) n! n = C(n, k) = = . k k! k!(n − k)! We can observe an interesting symmetry property, namely that n n or C(n, k) = C(n, n − k). = n−k k It is worth noting that the binomial coefficients are entries that occur in Pascal’s triangle: 1 11 121 1331 14641 1 5 10 10 5 1 Each term in Pascal’s triangle is obtained by adding the two diagonally above it. The top of the triangle represents C(0, 0) and is associated with n = 0. The next row represents C(1, 0) and C(1, 1). For row number n, terms along the row are the binomial coefficients C(n, k), starting with k = 0 at the beginning of the row and and going to k = n at the end of the row. For example, we see above that the value of C(5, 2) = C(5, 3) = 10 The triangle can be continued, by including subsequent rows; this is left as an exercise for the reader. 8.9.4 Example How many ways are there of getting k heads in n tosses of a fair coin? v.2005.1 - January 5, 2009 20 Math 103 Notes Chapter 8 Solution This problem motivated the discussion of permutations and combinations. We can now answer this question. The number of possible ways of obtaining k heads in n tosses is C(n, k) = n! . k!(n − k)! Thus the probability of getting any outcome consisting of k heads when a fair coin is tossed n times, is n! For a fair coin, P(k heads in n tosses)= k!(n − k)! n 1 . 2 The term containing the power (1/2)n is the probability of any one specific sequence of possible H’s and T’s. The multiplier in front, which as we have seen is the binomial coefficient C(n, k), is how many such sequences have exactly k H’s and all the rest (i.e., n − k) T’s. (The greater the number of possible combinations, the more likely it is that any one of them would occur.) 8.9.5 Example How many combinations can be made out of of 5 objects if we take 1, or 2 or 3 etc objects at a time? Solution Here the order of the objects is not important. The number of ways of taking 5 objects k at a time (where 0 ≤ k ≤ 5) is C(5, k). For example, the number of combinations of 5 objects taken 3 at a time is 5! 5·4·3·2·1 5·4 C(5, 3) = = = = 10. 3!(5 − 3)! (3 · 2 · 1)(2 · 1) 2 The list of all the coefficients C(5, k) appears as the last row displayed above in Pascal’s triangle. 8.9.6 Example How many different 5-card hands can be formed from a deck of 52 ordinary playing cards? v.2005.1 - January 5, 2009 21 Math 103 Notes Chapter 8 Solution We are not concerned here with the order of appearance of the cards that are dealt, only with the “hand” (i.e. composition of the final collection of 5 cards). Thus we are asking how many combinations of 5 cards there are from a set of 52 cards. The solution is C(52, 5) = 8.9.7 52 · 51 · 50 · 49 · 48 52! = = 2, 598, 960. 5!(52 − 5)! 5·4·3·2·1 The binomial theorem An interesting application of combinations is the formula for the product of terms of the form (a + b)n known as the Binomial expansion. Consider the simple example (a + b)2 = (a + b) · (a + b). We expand this by multiplying each of the terms in the first factor by each of the terms in the second factor: (a + b)2 = a2 + ab + ba + b2 . However, the order of factors ab or ba does not matter, so we count these as two identical terms, and express our result as (a + b)2 = a2 + 2ab + b2 . Similarly, consider the product (a + b)3 = (a + b)(a + b)(a + b). Now, to form the expansion, each term in the first factor is multiplied by two other terms (one chosen from each of the other factors). This leads to an expansion of the form (a + b)3 = a3 + 3a2 b + 3ab2 + b3 . More generally, consider a product of the form (a + b)n = (a + b) · (a + b) · · · · (a + b). By analogy, we expect to see terms of the form shown below in the expansion for this binomial, i.e. (a + b)n = an + an−1 b + an−2 b2 + · · · + an−k bk + · · · + abn−1 + bn . The first and last terms are accompanied by the “coefficients” 1, since such terms can occur in only one way each. However, we must still “fill in the boxes” with coefficients that reflect the number of times that terms of the given form an−k bk occur. But this product is made by choosing k a’s out of a total of n factors (and picking b’s from all the rest of the factors). We already know how many ways there are of selecting k items out of a collection of n, namely, the binomial coefficients. Thus (a+b)n = an +C(n, 1)an−1 b+C(n, 2)an−2 b2 +· · ·+C(n, k)ak bn−k +· · ·+C(n, 2)a2 bn−2 +C(n, 1)abn−1 +bn where the binomial coefficients are as defined in section 8.9.3. We have used the symmetry property C(n, k) = C(n, n − k) in the coefficients in this expansion. v.2005.1 - January 5, 2009 22 Math 103 Notes 8.9.8 Chapter 8 Example Find the expansion of the expression (a + b)5 . Solution The coefficients we need in this expansion are formed from C(5, k). We have already calculated the binomial coefficients for the required expansion in example 8.9.5, namely, 1 5 10 10 5 1. Thus the desired expansion is (a + b)5 = a5 + 5a4 b + 10a3 b2 + 10a2 b3 + 5ab4 + b5 . 8.10 A coin toss experiment and binomial distribution A Bernoulli Trial is an experiment that has only two possible results. A typical example of this type is the coin toss. We have already studied examples of results of a Bernoulli trial in this chapter. Here we expand our investigation to consider more general cases and their distributions. We have already examined in detail an example of a Bernoulli trial in which each outcome is equally likely, i.e. a coin toss with P(H) = P(T). In this section we will drop the assumption that each event is equally likely, and examine a more general case. If we do not know that the coin is fair, we might assume that the probability that it lands on H is p and on T is q. That is p(H) = p, p(T ) = q In general, p and q may not be exactly equal. By the property of probabilities, p+q =1 Consider the following specific outcome of an experiment in which an (unfair) coin is tossed 10 times: TTHTHHTTTH Assuming that each toss is independent of the other tosses, we find that the probability of this event is: q · q · p · q · p · p · q · q · q · p = p4 q 6 . The probability of this specific event is the same as the probability of the specific event HHHHTTTTTT (Since each event has the same number of H’s and T’s). The probability of each event is a product of factors of p and q (one for each of H or T that appear). Further, the number of ways of obtaining an outcome with a specific number of H’s (for example, four H’s as in this illustration) is the same whether the coin is fair or not. (That number is a combination, i.e. the binomial coefficient C(n, k) as before.) The probability of getting any outcome with k heads when the (possibly unfair) coin is tossed n times is thus a simple generalization of the probability for a fair coin: v.2005.1 - January 5, 2009 23 Math 103 Notes Chapter 8 The binomial distribution Given a (possibly unfair) coin with P(H)= p, and P(T)=q, where p + q = 1, if the coin is tossed n times, the probability of getting exactly k heads is given by P (k heads out of n tosses) = C(n, k)pk q n−k = n! pk q n−k . k!(n − k)! We refer to this distribution as the Binomial distribution. In the case of a fair coin, p = q = 1/2 and the factor pk q n−k is replaced by (1/2)n . Having obtained the form of the binomial distribution, we wish to show that the probability of obtaining any of the possible outcomes, i.e. P(1 or 2 or . . . n Heads out of n tosses)=1. We can show this with the following calculation. For each toss, (p + q) = 1. Then raising each side to the power n, (p + q)n = 1n = 1, but by the Binomial theorem, the expression on the left can be expanded, to form, (p+q)n = pn +C(n, 1)pn−1q+C(n, 2)pn−2q 2 +· · ·+C(n, k)pk q n−k +· · ·+C(n, 2)p2 q n−2 +C(n, 1)pq n−1 +q n , that is, n (p + q) = n X C(n, k)pk q n−k k=0 n Therefore, since (p + q) = 1, it follows that n X C(n, k)pk q n−k = 1 k=0 Thus Pn k=0 P(k Heads out of n tosses)=1, verifying the desired relationship. Remark: Each term in the above expansion can be interpreted as the probability of a certain type of event. The first term is the probability of tossing exactly n heads: there is only one way this can happen, accounting for the coefficient 1. The last term is the probability of tossing exactly n Tails. The product pk q n−k reflects the probability of a particular sequence containing k Heads and the rest (n − k) Tails: but there are many ways of generating that type of sequence: C(n, k) is the number of distinct combinations that are all counted as k heads. Thus the combined probability of getting any of the events in which there are k Heads is given by C(n, k)pk q n−k . 8.10.1 Example Suppose P(H)=p = 0.1. What is the probability of getting 3 Heads if this unfair coin is tossed 5 times? v.2005.1 - January 5, 2009 24 Math 103 Notes Chapter 8 Solution ¿From the above results, P (3 heads out of 5 tosses) = p3 q 2 C(5, 3). But p = 0.1 and q = 1 −p = 0.9, so P(3 heads out of 5 tosses)= 0.13 0.92 C(5, 3) = (0.001)(.81)10 = 0.0081 8.11 Mean of a binomial distribution A binomial distribution has a particularly simple mean. An important result, established in the calculations in this section is as follows: Consider a Bernoulli trial in which the probability of event e1 is p. Then if this trial is repeated n times, the mean of the resulting binomial distribution, i.e. expected number of times that event e1 occurs is x̄ = np. Thus the mean of a binomial distribution is the number of repetitions multiplied by the probability of the event in a single trial. We here verify the simple formula for the mean of a binomial distribution. The calculations use many properties of series that were established in Chapter 1. The calculation is presented for completeness, rather than importance but the result (in the box above) is very useful and important. By definition of the mean, x̄ = n X xk p(xk ) k=0 But here xk = k is the number of heads obtained, and p(xk )=P(k heads in n tosses)= C(n, k)pk q n−k is the distribution of k heads in n tosses computed in this chapter. Then x̄ = n X k=0 k n−k k · C(n, k)p q = n X k=1 k n! pk q n−k k!(n − k)! where in the last sum we have dropped the k = 0 term, since it makes no contribution to the total. The numerators in the sum are of the form k · n · (n − 1) . . . (n − k + 1) and the denominators are k · (k − 1) . . . 2 · 1. We can cancel one factor of k from top and bottom. We can also take one common factor of n out of the sum: x̄ = n X n(n − 1) . . . (n − k + 1) (k − 1) . . . 2 · 1 k=1 pk q n−k = n n X (n − 1) . . . (n − k + 1) (k − 1) . . . 2 · 1 k=1 pk q n−k We now shift the sum by defining the following replacement index: let ℓ = k − 1 then k = ℓ + 1 so when k = 1, ℓ = 0 and when k = n, ℓ = n − 1. We replace the indices and take one common factor of p out of the sum: x̄ = n n−1 X (n − 1) . . . (n − ℓ) ℓ=0 v.2005.1 - January 5, 2009 ℓ! p ℓ+1 n−ℓ−1 q = np n−1 X ℓ=0 (n − 1)! pℓ q n−ℓ−1 ℓ!(n − ℓ − 1)! 25 Math 103 Notes Chapter 8 Let m = n − 1, then m X m! pℓ q m−ℓ = np, ℓ!(n − m)! ℓ=0 P where in the last step we have used the fact that nk=0 p(xk ) = 1. This verifies the result. x̄ = np 8.12 A continuous distribution 0.4 The Normal distribution 0.0 -4.0 4.0 Figure 8.4: The Normal (or Gaussian) distribution is given by equation (8.1) and has the distribution shown in this figure. If we were to repeat the number of coin tosses a large number of times, n, we would see a certain trend: There would be a peak in the distribution at the probability of getting heads 50% of the time, i.e. at N/2 heads. A fact which we state but do not prove here is that the probability of N/2 heads, p(N/2) behaves like p(N/2) ≈ This can also be written in the form r v.2005.1 - January 5, 2009 r 2 = πN N p(N/2) ≈ 4 r r 1 2π r 4 . N 1 = Const. 2π 26 Math 103 Notes Chapter 8 √ One finds that the shapes of the various distributions is similar, but that a scale factor of N /2 is applied to stretch the graph horizontally, while compressing it vertically to preserve its total area. The graph is also shifted so that its peak occurs at N/2. As the number of Bernoulli trials grows, i.e. as we toss our imaginary coin in longer and longer sets (N → ∞), a remarkable thing happens to the binomial distribution: it becomes smoother and smoother, until it grows to resemble a continuous distribution that looks like a “Bell curve”. That curve is known as the Gaussian or Normal distribution. If we scale√this curve vertically and horizontally (stretch vertically and compress horizontally by the factor N /2) and shift its peak to x = 0, then we find a distribution that describes the deviation from the expected value of 50% heads. The resulting function is of the form 1 2 p(x) = √ e−x /2 2π (8.1) We will study properties of this (and other) such continuous distributions in a later section. We show a typical example of the Normal distribution in Figure 8.4. Its cumulative distribution is then shown (without and with the original distribution superimposed) in Figure 8.5. 1.0 1.0 The cumulative distribution The cumulative distribution The normal distribution 0.0 0.0 -4.0 4.0 -4.0 4.0 Figure 8.5: The Normal probability density with its corresponding cumulative distribution. 8.13 Summary In this chapter, we introduced the notion of probability of elementary events. We learned that a probability is always a number between 0 and 1, and that the sum of (discrete) probabilities of all possible (discrete) outcomes is 1. We then described how to combine probabilities of elementary events to calculate probabilities of compound independent events in a variety of simple experiments. We defined the notion of a Bernoulli trial, such as tossing of a coin, and studied this in detail. We investigated a number of ways of describing results of experiments, whether in tabular or graphical form, and we used the distribution of results to define simple numerical descriptors. The v.2005.1 - January 5, 2009 27 Math 103 Notes Chapter 8 mean is a number that, more or less, describes the location of the “center” of the distribution (analogous to center of mass), defined as follows: The mean (expected value) x̄ of a probability distribution is x̄ = n X xi p(xi ). i=0 The standard deviation is, roughly speaking, the “width” of the distribution. The standard deviation, σ is σ= √ V where V is the variance, V = n X i=0 (xi − x̄)2 p(xi ). While the chapter was motivated by results of a real experiment, we then investigated theoretical distributions, including the binomial. We found that the distribution of events in a repetition of a Bernoulli trial (e.g. coin tossed n times) was a Binomial distribution, and we computed the mean of that distribution. Suppose that the probability of one of the events, say event e1 in a Bernoulli trial is p (and hence the probability of the other event e2 is q = 1 − p), then P (k occurrences of given event out of n trials) = n! pk q n−k . k!(n − k)! This is called the binomial distribution. The mean of the binomial distribution, i.e. the mean number of events e1 in n repeated Bernoulli trials is x̄ = np. v.2005.1 - January 5, 2009 28

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 8 Discrete probability and the laws of chance