* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Notes: Section 001 (Dawes) - School of Computing Sites
Survey
Document related concepts
Transcript
20160927 and 20160930 Order Notation and Probability Order Notation We have looked at functions as ordered pairs. A more traditional way (but completely equivalent) to think of a function is as something that takes an input value and produces an output value. We are now going to develop a way to establish a relationship between functions. This relation will be extremely important in your future study of data structures (CISC-235) and algorithms (CISC-365). Let and be two functions from to - that is, and Suppose there exists a positive value such that for all , with only a finite number of exceptions. Then we say is order , and we write This is the way the text defines order. It’s perfectly correct, but I don’t think it is the most easily understood form. I prefer the following: Suppose there exists a positive value and and integer such that Then we say is order , and we write The two forms of the definition are completely equivalent. You can use whichever one you prefer. We also sometimes read ... but what does it mean? Suppose the definition is satisfied, so This is equivalent to as which is another way of saying that even when gets really large, can never get too much larger than ... and this means that even if goes up as increases, goes up at least as fast. Ok, that’s what it means ... but why is it useful? If f(n) represents the running time of an algorithm (and n represents the size of the input) then f(n) may be very complicated and hard to compute. But if we can find a simple g(n) such that f(n) is O(g(n)), then we can use g(n) to describe the running time of the algorithm. Let’s look at an example or two. Example 1: Can we find an and a I claim that Thus we can see that for which ? works: is 0 20 0 1 23 1 2 26 4 3 29 9 4 32 16 5 35 25 6 38 36 7 41 49 8 44 64 ... ... ... Example 2: For this example, I claim M = 4, k = 10 works And we see that 0 100 0 1 103 4 2 112 16 3 127 36 4 148 48 5 175 100 6 208 144 7 247 196 8 292 256 9 343 324 10 400 400 11 463 484 ... ... ... is From this we can extrapolate that for any constants a, b and c, , and let k be the smallest element of for which And from this we can extrapolate that when is is any polynomial (where the values are constants), then is The text develops these ideas in detail, and I recommend reading this section carefully. .... we can let We can state a few simple facts about this relationship between functions. 1. For all functions 2. (so the big-O relation is reflexive) does not necessarily imply that (so the big-O relation is not symmetric) 3. if is and is then is (so the big-O relation is transitive) In practice, when we use the big-O classification all of our functions are non-negative, so the absolute value signs in the definition become irrelevant. There are two other classifications to define, both of which are related to big-O. Big Classification: Let and be functions defined as before. We say is Omega” if there exists M > 0 such that with finitely many exceptions (or equivalently, if there exist M > 0 and k 0 such that (which we read as “big ) Classification: We can show that the big-O and big- relationships are independent of each other. That is, if , may or may not be , and vice versa. However, sometimes we find that for particular functions and is . When this happens, we say “ is Theta ” and is , it turns out that is .... which we read as is Probability In casual conversation, people tend to use the concepts of probability quite loosely. We are going to give precise definitions and formulas to give probability theory a firm footing in discrete mathematics. Sample Spaces Probability theory is based on the idea of observing the outcome of an experiment that has different possible outcomes. We call this process of experimenting and observing sampling. The type of experiment we are usually talking about here is one with a fixed set of possible outcomes, such as flipping a coin, tossing a six-sided die, or picking one ball out of a box of different coloured balls. A sample space consists of the set of possible outcomes of an experiment, and a function P( ) that assigns a value to each outcome. P( ) must have the following properties: for each outcome x We call P( ) a probability function. We will sometimes write (S,P) to identify the sample space where S is the set of outcomes and P is the probability function. Any function that satisfies the requirements is a valid probability function, but we usually want our probability functions to correspond to the “likelihood” of the different outcomes occurring. But that is dangerously close to a circular definition, since “likelihood” is often used as a synonym for “probability”. We can get a concrete sense of what we want our probability functions to do by considering an experiment which we sample many, many times. If we count the number of times a particular outcome occurs and divide that by the number of samples, we expect that this ratio will change less and less as the number of samples increases. The limit of this ratio as the number of samples goes to is what we want as the probability of that particular outcome. For example, consider a box containing a red ball, a yellow ball and a blue ball. If the balls are all identical in size and weight, we expect that if we take out one ball, record its colour and return it to the box, over and over again, the number of times we withdraw the red ball divided by the total number of samples, will get closer and closer to 1/3 ... as will the ratios for the yellow ball and the blue ball. Now suppose the box contains 2 red balls and 1 yellow ball. The possible outcomes from our sampling experiment are {red, yellow}. Again assuming that the balls are identical except for their colours, we expect that the ratio for red (the ratio of occurrences / samples) will approach 2/3, while the ratio for yellow approaches 1/3. So in general, we want P(x) to be the ratio “occurrences of x”/”number of samples” as the number of samples goes towards Events If (S,P) is a sample space, we use the word event to describe any subset of S. For example, if S = {1,2,3,4,5} then A = {1,3,5} is an event. {} is also an event, and so is S. Suppose we sample (S,P) by conducting the experiment once. If the outcome of the sample is an element of A, we say that A has occurred. Now we can define the probability of an event: the probability of event A is the sum of the probabilities of the elements of A. In notation For example, given S as above, suppose P(1) = P(2) = P(3) = P(4) = P(5) = 0.2 Then P({1,3,5}) = 0.6 But suppose P(1) = 0.4, P(2) = 0.1, P(3) = 0.3, P(4) = 0.1, P(5) = 0.1 Then P({1,3,5}) = 0.8 In casual discussions of probability, people often forget that the probability of an event depends on the probability function – they assume (without cause) that all outcomes are equally probable. A very simple example is when the experiment consists of rolling two fair 6-sided dice and adding the numbers that come up. The possible outcomes are {2,3,4,...,12} but the probabilities are not all equal. So the event {6,7,8} is very different from the probability of the event {2,3,4} even though both contain the same number of outcomes. To be honest, the (unjustified) assumption that all outcomes of an experiment are equally probable sometimes shows up in scientific discussions as well. Combinations of Events Let (S,P) be a sample space, and let A and B be events in that sample space. What can we say about A B? More precisely, can we compute P(A B) from P(A) and P(B)? Unfortunately P(A) and P(B) do not give enough information to compute P(A B). Consider this example. Let the experiment be tossing a single 6-sided die. We will assume that all outcomes are equally probable (i.e. P(i) = 1/6 i {1,2,3,4,5,6}) Let A = {1,2,3} B = {2,3,4} C = {4,5,6} P(A) = P(B) = P(C) = 1/2, but P(A B) = P({1,2,3,4}) = 2/3 while P(A C) = 1 In the first case two events with individual probabilities = 1/2 have a combined probability = 2/3, and in the second case two events with individual probabilities = 1/2 have a combined probability = 1 The difference of course is that A and B overlap (i.e. they have non-empty intersection) while A and C do not overlap ... and we can’t tell that just by looking at their probabilities. Fortunately the solution is obvious as soon as we recognize the problem. We just apply our old friend the Principle of Inclusion/Exclusion, and arrive at this formula: P(A B) = P(A) + P(B) – P(A Note that this means the only time P(A B) B) = P(A) + P(B) is when P(A B) = 0 Also observe that using this equation, if we know any three of the terms we can deduce the fourth. For example if we know P(A B) = 0.7, P(A) = 0.3, P(A B) = 0.2, then we know P(B) = 0.6 Here are some more useful facts about the probabilities of events P( ) = 0 P(S) = 1 P(A B) P(A) + P(B) P( ) = 1 – P(A) where represents the complement of A Conditional Probability Before we sample, we can compute the probability that a particular event will occur. After we sample, if we know the outcome then we know with complete certainty whether the event occurred or not. For example, if the event A = {1,5,6} and the outcome of the sample = 3, then A did not occur. But what if we are given partial information about the outcome? Can we compute the probability that event A occurred? More formally, let A and B be events. For a particular sample, suppose all we know is that B occurred (we don’t know the actual value of the outcome). What is the probability that A also occurred? We express this as “what is the probability of A, given B?” and we use the notation P(A | B) Before we analyze this we can do some examples. Suppose A B = (Eg. A = {1,2} and B = {3,4,5} ) Then A and B cannot occur simultaneously, so P(A | B) = 0 Suppose B A (Eg. A = {1,2,4,5} and B = {2,5} ) Then A occurs whenever B occurs, so P(A | B) = 1 For other situations a bit more thought is required. If event B has occurred, the only way A can also have occurred is if the outcome is in , so we clearly need to compute P(A B) ... but that is not the end of the story! Because we know B occurred, the set of possible outcomes is reduced to just B. What we need to know is what fraction of the total probability of B is concentrated in A B. Our solution then is P(A | B) = A few examples will help. Example 1: Suppose S = {1,2,..., 100} and Let A = {1,2,3} and B = {1,2,4,5} P(A) = P(B) = P(A B) = So with no information about the outcome, A is not very probable. But if we know that B occurred, it is clearly very probable that A also occurred. In fact, 2 out of the 4 outcomes in B are also in A, so it seems logical that P(A | B) = . And indeed, our formula gives P(A | B) = Now if the outcomes do not all have equal probability, it may be less obvious that the formula will give the correct result. We consider this in the next example. Example 2: Suppose we have a fair 10-sided die, where 4 of the sides show “1”, 2 sides show “2”, and the others show “3”, “4”, “5” and “6”. So S = {1,2,3,4,5,6} and P is shown by this table n P(n) 1 0.4 2 0.2 3 0.1 4 0.1 5 0.1 6 0.1 Let A = {1,3,4} and let B = {1,2,3,6}. Using the formula, we get P(A | B) = ... but does this make sense? By restricting the possible outcomes to B, it is as if outcomes 4 and 5 now have probability 0. All of the probability is concentrated in B (i.e. in the situation we are dealing with, P(B) = 1). Within B, outcome 1 has half of the probability (0.4 out of 0.8) so now outcome 1 has a “conditional probability” of P(1) = 1/2. Similarly the conditional probabilities of 2, 3 and 6 are 1/4, 1/8 and 1/8. So the outcomes that are of interest to us – i.e. the ones that are in A B, so 1 and 3 – have conditional probabilities and giving a total conditional probability for A, given B, of You can see that the formula is really just a quick way to express the same thought process we just went through in the example.