Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 2: Counting possible outcomes and probabilities. Elements of Combinatorics. Basic discrete distributions. Reminder All possible outcomes of an experiment comprise the "sample space" Ω. Note: Outcomes { A1, A2 ,...AN } covering the entire set of possibilities (and thus forming Ω) are also called "collectively exhaustive“. Suppose that we conduct a certain experiment N times in a row and the event A happened n(A,N) times.Then, the probability p(A) can be defined as n(A,N ) P(A) lim N N Reminder Probability p should satisfy three major axioms: 0 ≤ p(A)≤1 (1) p(Ω) = 1 (2) For the collection of mutually exclusive (disjoined) events A i P( i Ai ) P ( Ai ) (3) i In plain English this means, that for mutually exclusive events the probability that one of them will occur (either A1 or A2, or … or AN) equals the sum of their individual probabilities. Reminder From these axioms some other properties of P can be derived. Here are some of them Compliments : p(A) = 1-P(Ac) (a) Inclusions: If A B,then P(A)≤P(B) (b) Unions of two events: p(AB) = p(A)+p(B)-p(AB) (c) Example 1: Roll two dice and suppose for simplicity that they are red and blue. Let F="At least one 5 appears", A="A 5 appears on the the red die", B="A 5 appears on the blue die". Question: Find p(F), p(A) and p(B). Solution: 1. p(A)=p(B)=1/6. 2. F= "A OR B (OR both)"= A U B We can use now property c from the previous slide p(A∩B)=p({5,5})=1/36. Then, applying (c) we find: p(F)=1/6+1/6-1/36=11/36. The other way is to find the probability that none of 5s appeared (5/6)(5/6)=25/36. p(F) = 1-25/36=11/36. • In the definition of union, AUB means A or B. This or is inclusive. Comment: Or can also be exclusive, meaning either this or that but not both of them. • In the definition of intersection, AB means A and B Example 2 Some event occurs during one second with the probability p. What is the probability that it occurs once during n seconds? A typical example is the radioactive decay. Here is a wrong solution. If the probability for one second is p, then the probability for n seconds will be n*p. Why is it wrong? Let’s consider more carefully the composite event An= “the decay occurs in n seconds”. Verbally, it can be described by the following sentence: (The event happens on the first second) , or [(it does not happen on the first) and ( happens on the second) ], or [(it does not happen during the first two seconds) and (happens on the third)], etc. Let us rewrite it using the symbols. An=A1U( A1c *A1)U ( A2c *A1)U( A3c *A1)U…U(An-1c*A1). Attention: A1 means that the decay occurs during one second, not necessary the first second! For instance, here A1 means that it happened on the second second This is the union of mutually exclusive events. Then, we find for the probability: p(An)=p(A1) + p(A1c *A1) + p(A2c *A1) +p(A3c *A1 )…+ p(An-1c*A1)= p(A1)(1+p(A1c)+ p(A2c )+ p(A3c ) + … p(An-1c )) = p[1+(1-p)1 +(1-p)2 +(1-p)3+… +(1-p)n-1] Let’s solve it with Mathematica. Open “Lect2_practice”. Define the function: Prob[p_,n_]:= p* Sum[(1-p)^k,{k,0,n-1}]; Create the table of probabilities depending on n for n from 0 to 30. t= Table[{n,Prob[0.1,n]},{n,0,30}]; Display the data ListPlot[t,PlotJoinedTrue,FrameTrue,PlotStyle{Thickness[0. 01],RGBColor[1,0,0]}] Reminder (continued) • A function pX(x) of random discrete variable X is called the “probability function” if it assigns the probability values to all possible values x of X. • For the continuous random variable Y the relation between probability and the probability density function fY(y) is the integral relation: Probability that X belongs to the interval between a and b is: b p(a X b) f ( X )dX (4) a From this definition it also follows: x p( X x ) f ( X )dX . (5) Continuous distribution (preliminary remarks) f ( xi ) 1 i P (E ) (1.5.1) f ( xi ) (1.5.2) i E Suppose that the circle has a unit circumference (we simply use units in which 2 Pi R=1). Continuous distribution. Probability density function (PDF). Suppose that every point on the circle is labeled by its distance x from some reference point x=0. The experiment consists of spinning the pointer and recording the label of the point at the tip of the pointer. Let X is the corresponding random variable. The sample space is the interval [0,1). Suppose that all values of X are equally possible. We wish to describe it in terms of probability. If it was a discrete variable (such as a dice), we would simply assign to every outcome a fixed value of probability to all outcomes.. p(xi)=const. However, for a continuous variable we must assign to each outcome a probability p(x )=0. Otherwise, we would not be able to fulfill the requirement 1.12. Something is obviously wrong! Continuous distribution and the probability density function Dealing with the infinitesimal numbers is a tricky business indeed. Those who studied calculus aware of this. A random variable X is said to have a continuous distribution with density function f(x) if for all a b we have b P (a X b ) f ( x )dx (1.15) a The analogs of Eqs. 1.12 and 1.13 for the continuous distributions would be (1.16) f (x) 1 P (E ) f ( x )dx E (1.17) P(E) is a probability that X belongs to E. f(x) P(a<X<b) a b Geometrically, P(a<X<b) is the area under the curve f(x) between a and b. Analogy with the density of matter (mass density) Examples: 1. The uniform distribution on (a,b): We are picking a value at random from (a,b). 1 , b a f (x) 0 axb (1.18) otherwise By direct integration you can verify that (1.18) satisfies the condition (1.16). We can now find PDF which describes the experiment with the spinner in which case b-a=2 1 , 2 f ( ) 0 0 x 2 otherwise The probability that the arrow will stop in the rage between and + equals /2. 2. The exponential distribution e x , f ( x ) 0 x0 otherwise (1.19) Those who know how to integrate can verify that (1.19) satisfies (1.16) (the total area under the curve f(x) equals 1. Note: In Matematica, the integral of a function f[x] (notice that […] rather than (…) is used) can be found as: Integrate[f[x],{x,x1,x2}] , Shift+Enter. Here x1 and x2 are the limits of integration. Please, practice with this at home. Topics Multiplication Rule Elements of Combinatorics. Counting the composite events. Permutations. Factorial. Combinations. Binomial coefficients. Basic distributions Practice Multiplication Rule Multiplication Rule Suppose that m experiments are performed in order and that, no matter what outcomes of experiments 1,2,...k-1 are, experiment k has nk possible outcomes. Then the total number of outcomes is N=n1×n2×.....×nm Comment: actual set of choices at the kth stage may depend upon what happened in earlier choices, as long as the number of choices does not. Let us consider now a bunch of examples. The picture is taken from http://www.garlandscience.com/mdf/MDF%20Ch1%20.pdf Topic 1. Elements of Combinatorics Permutations, factorials (1) In how many ways n different books can be arranged on a bookshelf. We are going to place n books on the shelf one at a time. There are n books we can pick for the 1-st position, n-1 books for the second one, n-2 for the third… It continues until we left with only one book for the last (n-th) position. According to the multiplication rule, the total number of such arrangements is a product of all those numbers: a(n) = n! = n(n-1)(n-2)···1 This function n! is called “n factorial”. 1!=1, 2!=2, 3!=6, 4! = 24, 5! = 120… Presence of factorials in counting the possible choices (outcomes) leads to some very unexpected results. From the book shelf to the Solar System 42 volumes of Encyclopedia Britannica are randomly placed on the shelf. How many different arrangements a(42) are possible? The answer: a(42) = 42! The numbers of such scale often found in the following applications: Cryptography Protein databases Internet traffic Is 42! a big number? Permutations (3) 42! ~ 1.4 1051 The volume of the Earth is ~ 1021m3 The average number of atoms per 1 m3 is ~1027 The total number of atoms in the body of the Earth is ~ 1027 1021 = 1048 The number of all possible arrangements of the Encyclopedia Britannica on a shelf far exceeds the total number of atoms in the body of our planet ! Permutations (4) a(42) = 42! ~ 1.4 1051 Let us answer now the cryptographic question: How long would it take for a supercomputer to find one number among 42! ? A supercomputer can perform ~ 1013 trials per hour. It results in ~1017 trials per year. This number is infinitesimal compared to a(42). This search will take forever. Permutations (5) Well, probably 42 is too much. What if it was just 10? 10 boys are trying all possible arrangements in a line. How long would it take assuming that each permutation is being performed in just 10 sec? Make your best guess! The answer: it will take 15 months of noninterrupted work! A few more words about different ways of sampling, about terminology and origin of HUGE NUMBERS The arrangements that we just discussed are often described as “sampling without replacement”. Indeed, after we placed one of N books on the shelf, we did not put it back. As a result, the number of choices decreases with every new step. The number of distinct arrangements is N! if the number of positions m=N (see the previous section) and PN,m if m<N (see the following section). The same experiment can be conducted “with replacement”. Here is how it’s typically described. (a) You pull at random one book from the heap for the first position on the shelf, and write down it’s order number m (let say, ranging from 1 through N=10). b) You put this book back to the heap, pull at random a book for the second position, write down it’s number and put it back. It continues until all positions are “filled”. The result of each experiment is a sequence of m numbers from 1 through N, such as {1,3,6,1,9,3…} (unlike the previous case, it can contain repetitive numbers). According to the MR , the total number of different sequences in this case is Nm. Statistical experiments are usually conducted without replacement. For instance, in study of exit polls, the same people are not questioned twice. However, Life and Nature provide a lot of examples of “ sampling with replacement”. Here is an example. There are 20 Amino Acids commonly found in proteins: A alanine P proline Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine M Methionine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine L leucine N asparagine Suppose that a polypeptide chain contains m sites each occupied by one of 20 Amino Acids. Then, the total number of different polypeptides of this sort is 20m. For a very moderate chain consisting of 100 elements, it results in 20100 possible variations. This number far exceeds the number of atoms in all visible Galaxies! Note: If any of m positions can be occupied by any of n objects, the sampling can also be described as “sampling with replacement” and the total number of distinct outcomes is nm. Permutations (6) Permutations of n objects some of which are identical. Suppose now that some of the books on your bookshelf are identical. Let’s say there are 10 books, but four of them are the most recent Harry Potter that you bought for your friends. How many different permutations are now possible? Permutations (7) Try a smaller number first! Consider 5 books 3 of which are identical. We painted the identical books in orange and temporarily assigned them numbers 1,2,3. 1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1 All 6 configurations are physically different. However, if you remove the numbers… Permutations (8) ..they all become identical. Therefore, the total number of the distinct configurations is 5!/3! =20 Similarly, for 10 books, 4 identical, 10!/4!=151200 Permutations (9) What if we have a few groups of identical books on the shelf? Example: 20 books total, among them Harry P. (4), Dr. Seuss (3), Bioinformatics (6). All others are different. Permutations in a group n objects which includes several groups of mutually identical objects (let’s say, three groups with m, k and s objects). The total number of distinct permutations = n!/(s!k!m!) k s n m Permutations (10) How many different permutations can be formed from the following amino acid sequences : TDLDTTLVLV, SSAESLKISQ m=4(S) k=3(T) s=2(V) n=10 l=2(D) n =10 a= 10!/4!=151200 m=3 (L) a= 10!/(2!2!3!3!)=25200 Combinations (1) Let’s consider the following problems. Problem 1 20 people belong to a club. How many ways can they pick president, vice-president and secretary. Problem 2 20 members belong to a club. In how many ways can they create a committee of three to plan a party. Solution of the first problem. Permutations of n objects taken m at a time. There are 20 people we can pick for a president. Having made the first choice, there are always 19 possibilities for a vicepresident, and 18 for secretary. Using the multiplication rule, we find: number of choices = 20·19 ·18. Combinations (2) Solution of the first problem (continued) In general, if there are n members and m offices, number of choices = n· ( n-1 ) · ( n-2 ) · · · ( n-m+1 ). It can be represented through the factorials: Pn,m n! (n m)! (7) Pn,m is called permutation of n objects taken m in a time Here we used an obvious fact that n! = n· ( n-1 ) · ( n-2 ) · · · ( n-m+1 )*[ (n-m)*(n-m-1)*..*1]= n· ( n-1 ) · ( n-2 ) · · · ( n-m+1 )*(n-m)! -> n· ( n-1 ) · ( n-2 ) · · · ( n-m+1 )= n! /(n-m)! Solution of the second problem. Combinations of n objects taken m at a time. Let’s compare the second and the first problem. Suppose that in both cases a group of people numbered 1, 2 and 3 is selected. In the first case, where the offices are specified, the order is important: 123 and 132 are different choices. In the second problem, the positions are equivalent,and all permutations of 1,2 and 3 lead to essentially the same result. As a result, a number of distinct cases is reduces by 3!. In a general case of n members and m choices, the result is Cn,m Pn,m m! n! m !(n m )! (8) Thus, m! in the denominator makes all the difference between the second case (the order is unimportant) and the first case, when the order is important. Bernoulli trials. Binomial distribution. Bernoulli trial process is a sequence of n chance experiments such that (1) Each experiment has two possible outcomes, which we may call success and failure. (2) The probability q of failure is given by p=1-q. (3) The probability p of success in each experiment is the same for each experiment, and this probability is not affected by any knowledge of previous outcomes. For such trials, the probability of k successes in n trials is described by so called binomial distribution: The probability of k successes in n consecutive Bernoulli trials is p(k,n) C pk qnk n,k (9) Try to explain why do we have Cn,k as a coefficient here. Example 1. A coin is tossed ten times. The two possible outcomes are heads and tails. The probability of each on any one toss is 1/2. What is the probability of having m heads in n tosses. Example 2. A football team wins each week with probability 0.6 and loses with probability 0.4. If we suppose that outcomes of their 10 games independent, what is the probability that they will win exactly 8 games? Example 3. The die is rolled 8 times. What is the probability that we get exactly two 3's. • Multinomial distribution. Consider a die with 1 painted on three sides, 2 painted on two sides and 3 painted on one side. If we roll the dye ten times, what is the probability we get five 1's, three 2's and two 3's? Here is the answer: 10! ( 1 )5(1)3( 1 )2 5!3! 2! 2 3 6 For any five occurrences of 1, the probability is (1/2) 5 and the number of such combinations is C10,5. For three 2, the probability is (1/3)³ and the number of combinations is C5,3}, and so on. C C C 10! 10,5 5,3 2,1 5!3! 2! Generalizing from this example, we see that if we have k possible outcomes for our experiment with probabilities p1,p2, ….pk , then the probability of getting exactly ni outcomes of type i in n=n1+n2+...+ nk trials is described by the multinomial distribution: n n n n ! 1 2 p(n ,n ,...n ) ( p ) ( p ) ( p ) k 1 2 k n !n !...n ! 1 2 k 1 2 k (10) • Example: The output of a machine is graded excellent 70% of the time, good 20% of the time and defective 10% of the time. What is a probability a sample of size 15 has 10 excellent, 3 good and 2 defective items? •The geometric distribution Suppose we roll a die repeatedly until a 5 occurs, and let N be the number of times we roll the die. We are looking for a probability of N=k, where k is any given number. p(N=1)=1/6; P(N=2)=(5/6)(1/6). P(N=3)=(5/6)²(1/6),P(N=k)= (5/6)^{k-1}(1/6). Generalizing, we see that if we are waiting for an event of probability p , the number of trials needed, N, has the probability distribution p(N k ) (1 p)k 1p (11) since {N=k} occurs exactly when we have k-1 failures followed by a success. The random variable in GD is the number of trials N including the first success •The negative binomial distribution The binomial distribution arises when the number of trials is fixed in advance, and the random variable is the number of successes in these n trials. In contrary, in some applications the number of successes is fixed (at the the value of m) while the random variable is the number of trials Tm up to and including this mth success. Suppose we repeat an experiment with a probability p of success until we have m successes and let Tm be the number of trials required. If m=4 one possible realization is F S F F S S F F F 1 2 3 4 5 6 7 8 9 S 10 •The negative binomial distribution F S F F S S F F F 1 2 3 4 5 6 7 8 9 S 10 The corresponding T4=10. The probability of such outcome is p4 (1-p) 6 . We can also present it as pm (1-p) N-m The total number of outcomes corresponding to T4=10 is C9,6. Why is that? (because exactly 6 F must happen during first 9 trials) By definition, the last outcome (#10) must be successful. Therefore, the number of outcomes is the number of ways 6 failures occur in the first Tm-1 trials. Thus, for the probability we have: P(T4=10)=C10-1,3p4(1- p)6 Generalizing, we see that P(Tm=m+k)=Ck+m-1,m-1,pm(1-p)k (12) or P(Tm=n)=Cn-1,m-1,pm(1-p)n-m (12’) Example All the considered distributions are used very often in various statistical applications including biological. Let’s consider an example dealing with Negative Binomial Distribution, the distribution that might look exotic to some of you. 1. In a computer game, one has to apprehend 7 Monsters randomly positioned in 20 rooms (only one Monster per room is allowed). What is the probability that the last Monster will be hiding in the 12-th room? Can you answer this question? 2. Important problem in bioinformatics is how to measure the similarity in an alignment. The simplest, but not the best, criteria is to measure the length(s) of those subsequences were the elements match exactly. This criteria is not wise. Even if two DNA sequences have a reasonably recent common ancestor, evolutionary changes will cause at least a small number of changes between them. Thus, focusing on well-matching rather than exactly matching subsequences would be more appropriate. One such approach is to consider subsequences where k mismatches (or up to k mismatches) are allowed. (see, e.g., W.J. Evans and G.R. Grant, “Statistical methods in bioinformatics’, Ch. 6.3). The following discussion is very brief. Please volunteer for 15-min class presentation on this topic Let’s consider a sequence where j mismatches are allowed before the j+1 mismatch occurs. The length Tk of any such subsequence is the number of trials up to (but not including) the (j+1) failure . It is clearly described by the NBD. Suppose 1-p is the probability of a mismatch. The probability that k trials are required (or that the sequence gets k bases long) before (j+1)th mismatch occurs is P( Tj=k) =Ck,j(1-p)j+1 pk-j (13) This is similar to (12’) for n+1 trials and m+1 failures. One should simply substitute in Eq. (13) j=>m+1, k => n+1 and p=>(1-p) The probability that n trials or fewer are required before failure j+1 is P( Tj=k) =Ck,j(1-p)j+1 pk-j (13) Now consider a probability that n trials or fewer are required before failure j+1. It can be found as a probability of the union, by adding up the probabilities(13) for k = j,j+1, …, n. n P(k n) Ck, j(1- p) j1 p k- j k j We took in consideration that the number of trials preceding the failure # j+1 can not be smaller than j. This results allows finding the probabilities of “well-matching” under the assumption of independence. Comparing in some way actual matches to this results helps to estimate the “score” of similarity. Let’s now investigate some of the distributions with Mathematica The most of this should be practiced at home. We will come back to it in Lect .3. Problems for the in-class activity * 1. A house has 10 rooms. We want to paint 2 yellow, 3 blue and 5 red. In how many ways can this be done? 2. A random chain consisting of 10 consecutive amino acids is formed. Assuming that all the positions can be occupied at random, what is the total possible number of chains? How this number will change if all the “letters” must be different? 3. There are 30 students in a class. In how many ways can professor give out 4 As, 6 Bs, 8 Cs and 12 Fs? 4. 6 businessman meet together. How many handshakes are exchanged if each shakes hands with all others? 5. A basketball team has 6 players over 6 feet and 5 who are under six feet. How many pictures can be taken if all the shorter players are sitting on the chairs while the taller players are standing behind. * We will solve a few of these problems. The rest you can use for practice at home. You can submit them as extra credit (no obligation). If you decide doing so, please add a distinct section after the last problem of the HW2, copy the problem(s) and add the solutions. 6. Four married couples buy a row of seats for a hockey game.(a) Find the number of ways they can arrange themselves if (a). The husband must sit to the left of his wife. (b) All the men must sit together and all the women must sit together. 7. A carton of 12 eggs has 4 rotten eggs and 8 good eggs. Three eggs are chosen at random from the carton to make a three-egg omelet. --What is the probability that only rotten eggs are chosen? --What is the probability that only good eggs are chosen? --What is the probability that the sample will consist of one rotten egg and two good eggs? 9. A coin, loaded to come up heads 2/3 of the time, is thrown until the head appears. What is the probability that an odd number of tosses is necessary? 10. Samuel Pepus wrote to Isaak Newton: “What is more likely: (a) one 6 in 6 Rolls or 2 6’s in 12 rolls? Compute the probabilities of these events. 11. Probabilities of sequences. Assume that the four bases A, C, T, and G occur with equal likelihood in a DNA sequence of nine monomers. (a) What is the probability of finding the sequence AAATCGAGT through random chance? (b) What is the probability of finding the sequence AAAAAAAAA through random chance? (c) What is the probability of finding any sequence that has four A’s, two T’s, two G’s, and one C, such as that in (a)? Hi. This insert was written in Spring 2003. It turned out later that the work of Roderick MacKinnon was honored with the Nobel Price for the year 2003 A revolutionary event in Life Science 1. X-ray structure of a voltage-dependent K+ channel 2. The principle of gating charge movement in a voltage-dependent K+ channel. Roderick MacKinnon and colleagues, Rockefeller University (Nature, May 1, 2003) Our emotions, thinking and feeling are orchestrated by the activity of ion channel proteins embedded in the cellular membranes membranes. In particular, we depend on the ability of ion channels to sense and respond to changes in the voltage generated across the membranes. Recently, a group of scientists from the Rockefeller University provided the first structure of the Voltage-dependent K+ channel. It was established a while ago that the role of electric sensors is played by the highly charged S4 groups. However, neither the position nor the way they affect the gating were not established. The results of MacKinnon’s group turned to be very surprising. Self-Test 2 1. 2. 3. 4. 5. 6. If you flip a fair coin 8 times, what is the probability of getting exactly 6 heads and 2 tails? A die is rolled three times. What is the probability that you get a larger number each time? There are three different routes connecting city A to city B. How many ways ca round trip be made from city A to B and back? How many ways if it is desired t take a different route on the way back? In how many ways can 3 oaks, 4 pines, and 2 maples be arranged along a property line if one does not distinguish between trees of the same kind? A shelf has 6 biological books and 4 mathematics books. Find the probability that 3 particular biological books will be together. How many mixed tennis duets can be formed between 4 men and 4 women? 7. You walk into a party without knowing anyone there. Six are women and four are men. You are told that there are 3 brothersister couples and you are asked to pick all three of them. What is the probability that you are correct? 8. How many ways can 4 men and 4 women sit in a row if no two men or two women sit next to each other? 9. If you roll a fair die eight times, what is the probability of getting exactly three 3s? 10. Find a probability that in five tosses of a fair die, a 3 will appear (a) twice, (b) at most once, (c) at least two times 11. If 20% of the bolts produced by a machine are defective, determine the probability that out of 4 bolts chosen at random, (a) 1, (b) 0, (c) less than 2, bolts are defective.