Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and the Normal Curve, continued Statistics for Political Science Levin and Fox Chapter 5 Part II 1 Let’s take another look at standard deviation Standard deviation is a measure of variation, and this variability is reflected in the sigma values (σ) in our distribution. Our mean (µ) establishes a standardized “zero” and our sigma values (σ) indicate the distance (or variation from µ) of our score from the µ. 2 Note how the mean equals zero. Also, see how one standard deviation away from the mean is represented by µ + 1σ or µ - 1σ (depending on the direction of the deviation). 3 Area Under the Curve Normal Curve: Under the normal curve, measures of standard deviation (or sigma units) correspond to specific percentages. µ + 1σ: 34.13 % µ - 1σ: 34.13 % = 68.26% µ + 2σ: 47.72% µ - 2σ: 47.72% = 95.44% Thus, the area under the normal curve between the mean and the point 1σ always includes 34.13% of the total cases and area 1σ above and below the mean includes 68.26% of cases. µ + 3σ: 49.87% µ - 3σ: 49.87% = 99.74% 4 The Area Under the Curve + 3σ : 49.87% + 2σ : 47.72% 5 Clarifying the Standard Deviation IQ and Gender: Research suggests that both men and women have a mean IQ of 100, but that they differ in terms of variability around the mean. Men: Specifically, the male distribution has a larger percentage of extremes scores, representing a small number of very bright and very dull individuals on the tails (and thus a larger range). Women: The distribution of women, by contrast, has a larger percentage of scores located closer to the average. 6 Clarifying the Standard Deviation IQ and Gender: Measures of Variability Here are the numbers: Men: Mean = 100 σ = 15 Women: Mean = 100 σ = 10 7 Clarifying the Standard Deviation: Men Men: Mean = 100 σ = 15 σ=15 x3 =45 99.74% IQ: 55 IQ: 145 IQ: 100 +1 σ =115 +2 σ =130 +3σ =145 Clarifying the Standard Deviation: Women Women: Mean = 100 σ = 10 99.74% IQ: 70 IQ: 130 IQ: 100 +1 σ =110 +2 σ =120 +3σ =120 Clarifying the Standard Deviation: Men Men: Mean = 100 σ = 15 σ=15 σ=15 68.26% IQ: 85 IQ: 100 IQ: 115 Clarifying the Standard Deviation: Women Women: Mean = 100 σ = 10 σ=10 σ=10 68.26% IQ: 90 IQ: 100 IQ: 110 Standard Deviation: Using Table A Standard Deviation: Using Table A So far, when analyzing the normal distribution, we have looked at distances from the mean that are exact multiples of the standard deviation (+1 σ, +2 σ, +3 σ or -1 σ, -2 σ, -3 σ). How do we determine the percentages of cases under the normal curve that fall between two scores, say +1 σ, +2 σ for example. Example: σ=1.40 What is the percentage of scores that fall between the mean (µ) and σ=1.40. Since σ=1.40 is greater than 1, but less than 2, we know it includes more than 34.13% but less than 47.72%. 12 Standard Deviation: Using Table A 34.13% 47.72% ?% σ= 1.0 σ= 1.4 σ= 2.0 Standard Deviation: Using Table A Standard Deviation: Using Table A To determine the exact percentage between the mean (µ) and σ=1.40, we need to consult Table A in Appendix B. Table A: Shows you the percent under the normal curve and: Column A: The sigma distances are labeled z in the left-hand column Column B: The percentage of the area under the normal curve between the mean and the various sigma distances from the mean Column C: The percentage of the area at or beyond various scores toward either tail of the distribution. 14 Standard Deviation: Using Table A Using Table A: A B C z µ and z beyond Z 1.40 41.92 8.08 34.13% 47.72% 41.92% σ= 1.4 Z Score Computed by Formula We obtain the z score by finding the deviation (X - µ), which gives the distance of the raw score from the mean, then dividing this raw score deviation by the standard deviation. z where = X-µ σ µ = Mean of a distribution σ = standard deviation of a distribution z = standard score 16 Z Scores Z Scores: The z score indicates the direction and degree that any given raw score deviates from the mean in a distribution on a scale of sigma units. z = X-µ σ 17 Z Scores So why do we use z scores? Z scores allow us to translate any raw score, regardless of unit of measure, into sigma units (standard deviation within a probability distribution) which provide us with a standardized/normalized way to evaluation the variation of raw scores from a standardized mean. BUT, the sigma distance is specific to particular distributions. It changes from one distribution to another. For this reason, we must know the standard deviation of a distribution before we are able to translate any particular raw score into units of standard deviation. 18 Z Scores Let’s Practice! Suppose we are studying the distribution of hours per month that federal employees volunteer for partisan interest groups. The mean is 4 hours and the standard deviation is 1.21 hours . We want to know how far 7 volunteer hours is from the mean. The z score allows us to translate any raw score (X) into sigma units (or a measure of standard deviation within a probability distribution). 19 Z Scores Let’s look at the data that we have: Z=? µ = 4 hours σ = 1.21 hours X = 7 hours NOTE: The raw score that we want to translate into a standardized score is 7 hours. z z = X-µ σ = 7-4 1.21 z = 3.30579 20 Clarifying the Standard Deviation: Men Volunteerism Mean = 4 hours σ = 1.21 σ=1.21 σ=1.21 68.26% 2.79 4 Hours 5.21 Clarifying the Standard Deviation: Men Volunteerism Mean = 4 hours σ = 1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 95.44% 1.58 2.79 4 Hours 5.21 6.42 Clarifying the Standard Deviation: Men Volunteerism Mean = 4 hours σ = 1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 99.74% .37 1.58 2.79 4 Hours 5.21 6.42 7.63 Clarifying the Standard Deviation: Men Volunteerism Mean = 4 hours σ = 1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 ?% .37 1.58 2.79 4 Hours 5.21 6.42 7.63 7 hrs ? Table A: A B C z µ and z beyond Z 3.30 49.95 .05 σ =1.21 z = 3.30 or 49.95% 4 hrs 7 hrs What did we do: We took a raw score (7 Hours) and turned it into a sigma score (z = 3.30) in order to determine the percentage likelihood of volunteering between the mean hour of 4 and 7 hours. Table A: A B C z µ and z beyond Z 3.30 49.95 .05 σ =1.21 z = 3.30 or 49.95% 4 hrs 7 hrs Z Scores Another Example: Cashiers’ Pay Suppose we are studying the distribution of pay for cashiers at a fast-food restaurant. The mean is $10 and the standard deviation is $1.5 . We want to know how far $ 12 is from the mean. 27 Z Scores Let’s look at the data that we have: Z=? µ = $ 10 σ = $ 1.5 X = $ 12 NOTE: The raw score that we want to translate into a standardized score is $ 12. z z = X-µ σ = 12 - 10 1.5 z = 1.33 28 σ = 1.5 34.13% $ 10 $ 11.50 ?% $ 10 $ 12 A B C z µ and z beyond Z 1.33 40.82 9.18 σ = 1.5 z = 1.33 40.82% $ 10 $ 12 What did we do: We took a raw score ($12) and turned it into a sigma score (z = 1.33) in order to determine the percentage likelihood of making between the mean hour of 10 and 12 dollars. A B C z µ and z beyond Z 1.33 40.82 9.18 σ = 1.5 z = 1.33 40.82% $ 10 $ 12 Probability and the Normal Curve We have covered finding probability and z scores, so let’s discuss finding probability under the normal curve. The normal curve can be used in conjunction with z scores and Table A to determine the probability of obtaining any raw score in a distribution. Remember, the normal curve is a probability distribution in which the total area under the curve equals 100% probability. 33 Probability and the Normal Curve The central area around the mean is where the scores occur most frequently. The extreme portions toward the end are where the extremely high and low scores are located. So, in probability terms, probability decreases as we travel along the baseline away from the mean in either direction. To say that 68.26% of the total frequency under the normal curve falls between -1σ and +1σ from the mean is to say that the probability is approximately 68 in 100 that any given raw score will fall in this interval. 34 Clarifying the Standard Deviation: Women 68.26% Or 68 in 100 Probability and the Normal Curve Example: Campaign Phone-Bank We are asked to calculate the z-score for the number of calls campaign volunteers made in a 3-hour shift. The mean number of calls is 21 with a standard deviation of 1.45σ. What is the probability that a volunteer will complete 25 or more calls during the 3 hour period? Let’s apply the z-score formula. 36 Example: Phone Banking z=? µ = 21 calls σ = 1.45 calls X = 25 calls z X-µ = z = σ 25 - 21 1. 45 z = 2.75 Goal: Turn the raw score (25 calls) into sigma units (z) in order to determine the likely percentage of volunteers who make between 21 and 25 calls, or more than 25 calls. Remember our equation. Plug in our values and scores. We have our z score. From our z score, we know that a raw score of 25 is located 2.75σ above the mean. 37 Probability and the Normal Curve Our next step is to use Table A to find the percent of the total frequency under the curve falling between the z score and the mean. So, 1. 2. 3. 4. Let’s find our z score (2.75) in Column A. Column B tells us that 49.70% of all volunteers should be able to complete between 21 and 25 calls in 3 hours. By moving the decimal two places to the left, we see that the probability is 50 in 100 (rounding up). Or P = .4970 that a volunteer will complete between 21 and 25 phone calls. 38 A B C z µ and z beyond Z 2.75 49.70 .30 σ =1.45 z = 2.75 49.70 or 50 in 100 50 in 100 or P =.4970 21 25 P of Calls: 21-25 P = .4970 50 in 100 50% Chance σ =1.45 z = 2.75 49.70 or 50 in 100 P =.4970 21 .003 25 P of Calls: 17-25 P = .9940 99 in 100 σ =1.45 z = -2.75 z = 2.75 99.40 or 100 in 100 .003 P =.9940 17 21 .003 25 P of Calls: less 17, more 25 P = .006 .6 in 100 .6 % Chance P of Calls: more than 25 P = .003 .3 in 100 .3 % Chance z = -2.75 σ =1.45 z = 2.75 99.40 or 100 in 100 .003 P =.9940 17 21 .003 25 Review of Probability Probability refers to the relative likelihood of occurrence of a particular outcome or event. The probability associated with an event is the number of times that event can occur relative to the total number of times any event can occur. We use a capital P to indicate probability. Probability varies from 1 to 1.0 although percentages rather than decimals may be used to express levels of probability. 43 The Probability Spectrum A zero probability indicates that something is impossible. Probabilities near zero (like .05 or .10) imply very unlikely occurrences. A probability of 1.0 constitutes certainty. High probabilities like .90, .95, or .99 signify very probable or likely outcomes. 44 Equation for Calculating Probability Probability of an outcome or event = Number of times the outcome or event can occur Total number of times any outcome or event can occur 45 Extra: Z Scores How do we determine the percent of cases for distances lying between any two score values? Example: A raw score lies 1.55σ above the mean. – Obviously our score falls between 1σ and 2σ. – So we know that this distance would include more than 34.15% but less than 47.72% of the total area under the normal curve. 46 Extra: Z Scores To determine the exact percentage in this interval, we must use Table A in Appendix B! Column A: The sigma distances are labeled z in the left-hand column Column B: The percentage of the area under the normal curve between the mean and the various sigma distances from the mean Column C: The percentage of the area at or beyond various scores toward either tail of the distribution. 47 Extra: Z Scores To determine the exact percentage in this interval, we must use Table A in Appendix B! Column A: The sigma distances are labeled z in the left-hand column Column B: The percentage of the area under the normal curve between the mean and the various sigma distances from the mean Column C: The percentage of the area at or beyond various scores toward either tail of the distribution. 48