Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
William Christensen, Ph.D. In Section 4 we will combine elements of Section 2 (Distributions) with Section 3 (Probabilities) A Probability Distribution helps us understand the chance (probability) of some event occurring. But remember, being a statistician means you never say you’re certain Probability Distributions Here’s what a probability distribution looks like, in table form. In this probability distribution x represents the number of baby girls among 14 randomly selected newborns. Each probability P(x) represents the probability or chance of EXACTLY x number of girls among 14 randomly selected newborns. For example, the probability of finding exactly 6 girls among 14 newborn babies is 0.183 or 18.3%. The probability of finding exactly 0 girls among 14 newborns is 0.000 (this is rounded to 3 decimals – there is actually a small chance which we would see if we carried the decimals out further). x P(x) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0.000 0.001 0.006 0.022 0.061 0.122 0.183 0.209 0.183 0.122 0.061 0.022 0.006 0.001 0.000 Probability Distributions Requirements • 1. There are a couple of things that define a probability distribution – rules that any probability distribution must obey The sum of the probabilities must be 1 ∑P(x) = 1 Really, what this means is that every possible outcome must be included in the probability distribution. For example, in the previous slide we showed the probability distribution for the number of girls among 14 newborns. Did we include every possible outcome? Yes we did, we included from 0 – 14 of those 14 newborns being girls, that’s every possible outcome isn’t it? And, do the probabilities all add up to 1? Add them up and see. If they do, then we have satisfied this first rule of probability distributions. Probability Distributions Requirements 2. Every individual probability must be between 0 and 1. This is something we learned earlier and applies to all probabilities. 0 P(x) 1 • • Based on these two rules, does our sample of baby girls among 14 newborns meet the requirements for a probability distribution? YES These are important rules and concepts that you must remember throughout the course Probability Distributions Here’s what our probability distribution of girls among 14 newborn babies looks like in a histogram. Probability Distributions • • • Looking at the previous slide (the histogram of the probability distribution of girls among 14 newborns), could you guess what the “average” number of girls among 14 newborns is? Well, we can also calculate the “average” or what is sometimes also called the “expected value” (the term “expected value” is usually used when we are talking about the probability of some kind of money-related outcome) The “average” or “expected value” of a probability distribution = ∑[x * P(x)] Probability Distributions • • Remember that it is the average x value (number of girls among 14 newborns in our example) that we are calculating, NOT the average probability For our 14 newborns, we can use the formula as shown here in Excel. When we summed all the x*P(x) values, we got 7.000. Is that also the value you guessed looking at the histogram? x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 P(x) 0.000 0.001 0.006 0.022 0.061 0.122 0.183 0.209 0.183 0.122 0.061 0.022 0.006 0.001 0.000 x*P(x) 0.000 0.001 0.011 0.067 0.244 0.611 1.100 1.466 1.466 1.100 0.611 0.244 0.067 0.011 0.001 7.000 Probability Distributions • • • • • Here’s another example tailored to the “expected value” concept of average. Let’s say you go to Mesquite and make $1 bets with a potential payoff of $500. Let’s also say that the chances of winning are 1/1000 or 0.001 (that means the chances of losing are 1 - 0.001 = 0.999) Again, using the formula and Excel, we get the following results. The results show that, on average, you should “expect” to lose 50 cents each time you make your $1 bet. Event Win Lose x $499.00 -$1.00 P(x) 0.001 0.999 x*P(x) 0.499 -0.999 -$0.50 Binomial Probability Distributions Binomial Probability Distribution As you read through the following requirements of a “binomial” probability distribution you should recognize that our previous example of the number of girls among 14 newborn babies is one example of a binomial probability distribution 1. A binomial probability distribution always has a fixed number of trials. (in our example, this was 14 – we checked the probability of 0-14 girls among 14 newborns) 2. The trials must be independent. The outcome of any individual trial doesn’t affect the probabilities in the other trials. (in our example, the fact that gender of one baby had absolutely no effect on the gender of any other baby) 3. Each trial must have all outcomes classified into two categories. (in our example, babies had to be either boy or girl – thus all outcomes were classified into two categories) 4. The probabilities must remain constant for each trial. (in our example, the probability of any baby being a girl was 0.50 or 50% and this was the same for every baby) Notation for Binomial Probability Distributions Although the following formula can be used to manually calculate probability in a binomial probability distribution, I DO NOT expect you to know or remember it. However, I DO EXPECT you to be able to use Excel to calculate binomial probability distributions P(x) = Where, n = x p q P(x) = = = = n! (n - x )! x! * px * qn-x fixed number of trials specific number of successes in n trials probability of success in one of n trials probability of failure in one of n trials (q = 1 - p ) probability of getting exactly x successes among n trials Binomial Probabilities using Excel Example: Let’s use Excel to re-create the binomial probability distribution showing the probability of finding 0-14 girls among a group of 14 randomly selected newborn babies. If you want to follow along and see a movie of me doing this in Excel, just click anywhere on the next slide and it will play (or open the windows media file called binomdist1.wmv) 1. Set up an Excel spreadsheet with one column labeled x to represent the number of girls among the 14 newborns), and another column labeled P(x) to represent the probability of finding exactly x number of girls among the 14 newborns. x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 P(x) Binomial Probabilities using Excel 2. 3. Click on fx (function), then select the category called “Statistical”, and the function called BINOMDIST A shortcut alternative is to simply type the function =BINOMDIST in the first empty cell under the P(x) column The function BINOMDIST requires the following fields =BINOMDIST(number_s, trials, Probability_s, Cumulative) Number_s is the x value for which we are calculating the probability. E.g., to find the probability of 0 girls in 14 newborns, this value would be 0 (or cell A2, the cell address that contains 0). Trials is the total numbers of trials included in our distribution. E.g., for 14 newborns, Trials = 14 Probability is the chance that any single event might occur. E.g., for our newborns, we know the probability of a girl in any single case is 0.50. In any problem of this type you are either given the probability or the means to calculate it – you should never have to guess. Cumulative allows us to specify whether we want the result to be the probability of exactly x successes (this is normally the case and we enter “0” or “false”) or whether we want the results to “cumulate” the probabilities by adding all previous P(x)’s to the current x to give us the probability of x OR fewer successes. To cumulate, enter “1” or “True”. For 0 girls in 14 newborns, the Excel formula looks like this =BINOMDIST(0,14,0.5,FALSE) Binomial Probabilities using Excel 4. Next, we can copy the BINOMDIST formula down the P(x) column and we now have the entire binomial probability distribution for how many girls we might find among 14 randomly selected newborns. • Notice how all the probabilities are between 0 and 1 and they all add up to 1.000. Remember, these rules must be met in order to have a valid probability distribution. x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 P(x) 0.000 0.001 0.006 0.022 0.061 0.122 0.183 0.209 0.183 0.122 0.061 0.022 0.006 0.001 0.000 1.000 Binomial Probabilities using Excel • • Here’s another problem for you to try. AT&T Directory Assistance has been found to be accurate and correct 90% of the time. Assuming we make 5 calls to AT&T Directory Assistance, construct the binomial probability distribution for correctly answered directory assistance calls. Also, find the probability that AT&T gets exactly 3 of the 5 calls right. • I’ll provide the answers on the next slide, but you must know how to do and interpret these distributions, so I strongly encourage you to practice this several times at least. Binomial Probabilities using Excel • AT&T Directory Assistance has been found to be accurate and correct 90% of the time. Assuming we make 5 calls to AT&T Directory Assistance, construct the binomial probability distribution for correctly answered directory assistance calls. (click anywhere on the next slide or open file binomdist2.wmv to see me do this) Solution: x (correctly answered calls) 0 1 2 3 4 5 P(x) 0.000 0.000 0.008 0.073 0.328 0.590 1.000 =BINOMDIST(0,5,0.9,FALSE) Probability that AT&T gets exactly 3 calls right is 0.073 or 7.3% Mean, Standard Deviation and Variance for Binomial Probability Distributions Calculating the Mean of Binomial Probability Distributions • • • • • • • Remember: we learned to calculate the mean of a probability distribution by using the formula ∑[x*P(x)] We can still use that formula, but for Binomial Probability Distributions there is an easier formula. For Binomial Probability Distribution µ (mean) = n * p Where n is the number of trials (14 newborns in our previous example, or 5 calls to AT&T), and p is the probability of any single success (e.g., p=0.50 that any newborn was a girl, or p=0.90 that any call to AT&T directory assistance was handled correctly). Thus, for the number of girls out of 14 newborns, the mean or average would be µ = n * p = 14 * 0.50 = 7.00 girls (just like we calculated using the longer method) And, for AT&T Directory Assistance calls µ = n * p = 5 * 0.90 = 4.50 calls Remember: The mean, standard deviation, and variance that we calculate refer to the x’s (e.g., girls and calls) NOT to probability Calculating the Standard Deviation and Variance of Binomial Probability Distributions • • • • The quick formula for calculating the variance of a binomial probability distribution is σ2 = n * p * q Where σ2 represents variance (remember σ represents standard deviation and standard deviation squared σ2 is variance), n represents the number of trials, p represents the probability of success in any simply event and q = 1 – p (e.g., if p=0.50 then q=10.50=0.50, or if p=0.90 then q=1-0.90=0.10) Thus, standard deviation or σ = n * p * q I expect you to remember these formulas including your p’s and q’s for the entire course. Again p = probability and q = 1 - p Calculating the Standard Deviation and Variance of Binomial Probability Distributions Examples: • • For our problem involving the number of girls out of 14 newborns, we can now calculate standard deviation, using Excel as follows: q = 1 – p = 1 – 0.50 = 0.50, so standard deviation = sqrt(n * p * q) = sqrt(14 * 0.50 * 0.50) = 1.87 girls And for our problem involving the number of correctly handled AT&T directory assistance calls out of 5 calls, we can calculate standard deviation as follows: q = 1 – p = 1 – 0.90 = 0.10, so standard deviation = sqrt(n * p * q) = sqrt(5 * 0.90 * 0.10) = 0.67 calls Using Standard Deviation to determine what’s “Unusual” • • • • • Remember: In Section II we learned that it is unusual for a value (x) to vary by more than two standard deviations from the mean We can apply this to Binomial Probability Distributions to determine whether or not a particular x value is unusual (unusual meaning less than about a 5% chance of occurring) Going back to our examples, we found that for finding girls among 14 newborns, the mean=7 girls, and the standard deviation = 1.87 girls. Therefore, among 14 girls it would be unusual to find fewer than 3.26 or approximately 3 girls (7 - (2*1.87) = 3.26 which is the mean minus 2 standard deviations) It would also be unusual to find more than 10.74 or about 10 girls (7 + (2*1.87) = 10.74 which is the mean plus 2 standard deviations) Using Standard Deviation to determine what’s “Unusual” • • • Regarding the 5 calls to AT&T Directory Assistance, we calculated a mean = 4.5 calls, and a standard deviation = 0.67 calls. Therefore, among 5 calls to AT&T it would be unusual to have them correctly handle fewer than 3.16 or approximately 3 calls (4.5 - (2*0.67) = 3.16 which is the mean minus 2 standard deviations) It would also be unusual to have them correctly handle more than 5.84 (this is more than our total of 5 so we round this down to 5) or 5 calls (4.5 + (2*0.67) = 5.84 or 5 which is the mean plus 2 standard deviations) Poisson Probability Distributions Poisson Probability Distributions These distributions are typically used to describe arrivals of people, things, or occurrences over time. For example, a Poisson probability distribution could be used to describe people arriving at Golden Corral Restaurant, or airplanes arriving at Salt Lake International Airport, or earthquakes arriving in Japan. It is amazingly accurate at modeling such arrivals. Notation for Poisson Probability Distributions Although the following formula can be used to manually calculate probability in a Poisson probability distribution, I DO NOT expect you to know or remember it. However, I DO EXPECT you to be able to use Excel to calculate Poisson probability distributions P(x) = Where, µx = x = x! = P(x) = µ x • e -µ where e 2.71828 x! mean or average arrival rate during a given time period specific number of arrivals during a given time period factorial x probability of exactly x arrivals during a given time period • • • • Requirements for a Poisson Probability Distribution The randomly variable x represents the number of occurrences or arrivals of an event over some specific time period The occurrences or arrivals must be random The occurrences or arrivals must be independent of each other The occurrences or arrivals must be uniformly distributed over the time period being used. For example, you couldn’t use it to model arrivals at a restaurant for a whole day when the restaurant was extra-busy during lunch time and slow during the afternoon. Poisson Probability Distribution Parameters (mean, standard deviation variance) • • With a Poisson probability distribution, the mean ( µ ) is either given or determined from observation or experiment. However, IF n 100 AND n*p 10, then it is also possible to calculate the mean of a Poisson distribution by using the same formula we used with a binomial distribution, that is µ = n * p The standard deviation of a Poisson distribution is calculated using the following formula: σ= µ Poisson Probabilities using Excel Example: A classic example of the Poisson distribution involves the number of deaths caused by horse kicks in the Prussian Army between 1875 and 1894. During that 20 year period there were 196 deaths by horse kick. That’s an average of 196/20 = 9.8 horse-kick deaths per year in the Prussian Army. Remember that we have to have an average per time period in order to use the Poisson distribution (this is key). Now, let’s use Excel to determine the probability for various numbers of horse-kick deaths per year (there is no set number to investigate, we usually just keep going until the probability drops to near-zero – you’ll see what I mean when you try it). If you want to follow along and see a movie of me doing this in Excel, just click anywhere on the next slide (or open the windows media file called poisson1.wmv) 1. Set up an Excel spreadsheet with one column labeled x to represent the number of horse-kick deaths per year in the Prussian Army, and another column labeled P(x) to represent the probability of finding exactly x number of horse-kick deaths in the Prussian Army during any given year. x (horse-kick deaths/year) 0 1 2 3 4 5 6 P(x) Poisson Probabilities using Excel 2. 3. Click on fx (function), then select the category called “Statistical”, and the function called POISSON A shortcut alternative is to simply type the function =POISSON in the first empty cell under the P(x) column The function POISSON requires the following fields =POISSON( x, Mean, Cumulative) x represents the number of occurrences or arrivals. E.g., the number of horse-kick deaths per year in the Prussian Army Mean is the average rate of occurrences or arrivals during a specific time period. E.g., for the Prussian Army this was the average horse-kick deaths per year, which we calculated as 196 deaths divided by 20 years equals an average of 9.8 deaths per year. Cumulative allows us to specify whether we want the result to be the probability of exactly x occurrences (this is normally the case and we enter “0” or “false”) or whether we want the results to “cumulate” the probabilities by adding all previous P(x)’s to the current x to give us the probability of x OR fewer occurrences. To cumulate, enter “1” or “True”. For 0 horse-kick deaths per year the Excel formula looks like this =POISSON(0,9.8,FALSE) Poisson Probabilities using Excel 4. Next, we can copy the POISSON formula down the P(x) column AND continue to add x’s until the probability drops to 0 (or near zero). We now have the Poisson probability distribution for how many horse-kick deaths per year we expect to find in the Prussian Army. • It is interesting to note that the Poisson distribution is very accurate in predicting the probability of horse-kick deaths in the Prussian Army. • Notice how all the probabilities are between 0 and 1 and they all add up to 1.000. Remember, these rules must be met in order to have a valid probability distribution. x (horse-kick deaths/year) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 P(x) 0.000 0.001 0.003 0.009 0.021 0.042 0.068 0.096 0.117 0.127 0.125 0.111 0.091 0.068 0.048 0.031 0.019 0.011 0.006 0.003 0.002 0.001 0.000 1.000 Poisson Probabilities using Excel • • Here’s another problem for you to try. For a recent period of 100 years, there were 93 major earthquakes (at least 6.0 on the Richter scale) in the world (based on data from the World Almanac and Book of Facts). Use the Poisson distribution to find the probabilities of 0-8 major earthquakes during any given year. • I’ll provide the answers on the next slide, but you must know how to do and interpret these distributions, so I strongly encourage you to practice this several times at least. Poisson Probabilities using Excel • With 93 major earthquakes in the last 100 years, we can calculate an average of 93 / 100 = 0.93 major earthquakes per year. Using Excel we can now use the =POISSON function to calculate the probabilities of exactly x number of earthquakes in any given year. (click anywhere on the next slide or open file poisson2.wmv to see me do this) Solution: x (earthquakes) 0 1 2 3 4 5 6 7 8 P(x) 0.395 0.367 0.171 0.053 0.012 0.002 0.000 0.000 0.000 1.000 =POISSON(0,0.93,FALSE) Probability of exactly 5 earthquakes in any year is 0.002 or 0.2% • Just for fun, let’s compare our Poisson distribution, which gives us a calculated probability of 0-8 earthquakes during any given year, and compare it to the actual historical record of earthquakes to see if it did a good job predicting the actual probability. Wow! Almost spooky isn’t it – the Poisson distribution was pretty accurate. From our Poisson probabilities, we can predict the number of years in which there are x number of major earthquakes by taking the Poisson P(x) * 100 years Predicted 39 37 17 5 1 0 0 0 0 Actual 47 31 13 5 2 0 1 1 0 The Actual number of years in the last 100 years that experienced x number of major earthquakes is based on historical records years in which there were years in which there were years in which there were years in which there were years in which there were years in which there were years in which there were years in which there were years in which there were 0 1 2 3 4 5 6 7 8 major earthquakes major earthquakes major earthquakes major earthquakes major earthquakes major earthquakes major earthquakes major earthquakes major earthquakes William Christensen, Ph.D.