Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and the Normal Curve, con6nued Sta$s$cs for Poli$cal Science Levin and Fox Chapter 5 Part II 1 Let’s take another look at standard devia6on Standard devia$on is a measure of varia6on, and this variability is reflected in the sigma values (σ) in our distribu$on. Our mean (µ) establishes a standardized “zero” and our sigma values (σ) indicate the distance (or varia6on from µ) of our score from the µ. 2 Note how the mean equals zero. Also, see how one standard deviation away from the mean is represented by µ + 1σ or µ - 1σ (depending on the direction of the deviation). 3 Area Under the Curve Normal Curve: Under the normal curve, measures of standard devia$on (or sigma units) correspond to specific percentages. µ + 1σ: 34.13 % µ - 1σ: 34.13 % = 68.26% µ + 2σ: 47.72% µ - 2σ: 47.72% = 95.44% µ + 3σ: 49.87% µ - 3σ: 49.87% = 99.74% Thus, the area under the normal curve between the mean and the point 1σ always includes 34.13% of the total cases and area 1σ above and below the mean includes 68.26% of cases. 4 The Area Under the Curve + 3σ : 49.87% + 2σ : 47.72% 5 Clarifying the Standard Devia6on IQ and Gender: Research suggests that both men and women have a mean IQ of 100, but that they differ in terms of variability around the mean. Men: Specifically, the male distribu$on has a larger percentage of extremes scores, represen$ng a small number of very bright and very dull individuals on the tails (and thus a larger range). Women: The distribu$on of women, by contrast, has a larger percentage of scores located closer to the average. 6 Clarifying the Standard Devia6on IQ and Gender: Measures of Variability Here are the numbers: Men: Mean = 100 σ = 15 Women: Mean = 100 σ = 10 7 Clarifying the Standard Deviation: Men Men: Mean = 100 σ = 15 σ=15 x3 =45 99.74% IQ: 55 IQ: 145 IQ: 100 +1 σ =115 +2 σ =130 +3σ =145 Clarifying the Standard Deviation: Women Women: Mean = 100 σ = 10 99.74% IQ: 70 IQ: 130 IQ: 100 +1 σ =110 +2 σ =120 +3σ =120 Clarifying the Standard Deviation: Men Men: Mean = 100 σ = 15 σ=15 σ=15 68.26% IQ: 85 IQ: 100 IQ: 115 Clarifying the Standard Deviation: Women Women: Mean = 100 σ = 10 σ=10 σ=10 68.26% IQ: 90 IQ: 100 IQ: 110 Standard Devia6on: Using Table A Standard Devia6on: Using Table A So far, when analyzing the normal distribu$on, we have looked at distances from the mean that are exact mul$ples of the standard devia$on (+1 σ, +2 σ, +3 σ or ‐1 σ, ‐2 σ, ‐3 σ). How do we determine the percentages of cases under the normal curve that fall between two scores, say +1 σ, +2 σ for example. Example: σ=1.40 What is the percentage of scores that fall between the mean (µ) and σ=1.40. Since σ=1.40 is greater than 1, but less than 2, we know it includes more than 34.13% but less than 47.72%. 12 Standard Deviation: Using Table A 34.13% 47.72% ?% σ= 1.0 σ= 1.4 σ= 2.0 Standard Devia6on: Using Table A Standard Devia6on: Using Table A To determine the exact percentage between the mean (µ) and σ=1.40, we need to consult Table A in Appendix B. Table A: Shows you the percent under the normal curve and: Column A: The sigma distances are labeled z in the leX‐hand column Column B: The percentage of the area under the normal curve between the mean and the various sigma distances from the mean Column C: The percentage of the area at or beyond various scores toward either tail of the distribu$on. 14 Standard Deviation: Using Table A Using Table A: A B C z µ and z beyond Z 1.40 41.92 8.08 34.13% 47.72% 41.92% σ= 1.4 Z Score Computed by Formula We obtain the z score by finding the deviation (X - µ), which gives the distance of the raw score from the mean, then dividing this raw score deviation by the standard deviation. z = X-µ σ where µ = Mean of a distribu$on σ = standard devia$on of a distribu$on z = standard score 16 Z Scores Z Scores: The z score indicates the direc$on and degree that any given raw score deviates from the mean in a distribu$on on a scale of sigma units. z = X-µ σ 17 Z Scores So why do we use z scores? Z scores allow us to translate any raw score, regardless of unit of measure, into sigma units (standard devia6on within a probability distribu6on) which provide us with a standardized/normalized way to evalua$on the varia$on of raw scores from a standardized mean. BUT, the sigma distance is specific to par$cular distribu$ons. It changes from one distribu6on to another. For this reason, we must know the standard devia8on of a distribu$on before we are able to translate any par$cular raw score into units of standard devia$on. 18 Z Scores Let’s Prac6ce! Suppose we are studying the distribu$on of hours per month that federal employees volunteer for par6san interest groups. The mean is 4 hours and the standard devia6on is 1.21 hours . We want to know how far 7 volunteer hours is from the mean. The z score allows us to translate any raw score (X) into sigma units (or a measure of standard devia$on within a probability distribu$on). 19 Z Scores Let’s look at the data that we have: Z = ? µ = 4 hours σ = 1.21 hours X = 7 hours NOTE: The raw score that we want to translate into a standardized score is 7 hours. z z = X-µ σ = 7-4 1.21 z = 3.30579 20 Clarifying the Standard Deviation: Men Volunteerism Mean = 4 hours σ = 1.21 σ=1.21 σ=1.21 68.26% 2.79 4 Hours 5.21 Clarifying the Standard Deviation: Men Volunteerism Mean = 4 hours σ = 1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 95.44% 1.58 2.79 4 Hours 5.21 6.42 Clarifying the Standard Deviation: Men Volunteerism Mean = 4 hours σ = 1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 99.74% .37 1.58 2.79 4 Hours 5.21 6.42 7.63 Clarifying the Standard Deviation: Men Volunteerism Mean = 4 hours σ = 1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 σ=1.21 ?% .37 1.58 2.79 4 Hours 5.21 6.42 7.63 7 hrs ? Table A: A B C z µ and z beyond Z 3.30 49.95 .05 σ =1.21 z = 3.30 or 49.95% 4 hrs 7 hrs What did we do: We took a raw score (7 Hours) and turned it into a sigma score (z = 3.30) in order to determine the percentage likelihood of volunteering between the mean hour of 4 and 7 hours. Table A: A B C z µ and z beyond Z 3.30 49.95 .05 σ =1.21 z = 3.30 or 49.95% 4 hrs 7 hrs Z Scores Another Example: Cashiers’ Pay Suppose we are studying the distribu$on of pay for cashiers at a fast‐food restaurant. The mean is $10 and the standard devia6on is $1.5 . We want to know how far $ 12 is from the mean. 27 Z Scores Let’s look at the data that we have: Z = ? µ = $ 10 σ = $ 1.5 X = $ 12 NOTE: The raw score that we want to translate into a standardized score is $ 12. z z = X-µ σ = 12 - 10 1.5 z = 1.33 28 σ = 1.5 34.13% $ 10 $ 11.50 ?% $ 10 $ 12 A B C z µ and z beyond Z 1.33 40.82 9.18 σ = 1.5 z = 1.33 40.82% $ 10 $ 12 What did we do: We took a raw score ($12) and turned it into a sigma score (z = 1.33) in order to determine the percentage likelihood of making between the mean hour of 10 and 12 dollars. A B C z µ and z beyond Z 1.33 40.82 9.18 σ = 1.5 z = 1.33 40.82% $ 10 $ 12 Probability and the Normal Curve We have covered finding probability and z scores, so let’s discuss finding probability under the normal curve. The normal curve can be used in conjunc$on with z scores and Table A to determine the probability of obtaining any raw score in a distribu$on. Remember, the normal curve is a probability distribu$on in which the total area under the curve equals 100% probability. 33 Probability and the Normal Curve The central area around the mean is where the scores occur most frequently. The extreme por$ons toward the end are where the extremely high and low scores are located. So, in probability terms, probability decreases as we travel along the baseline away from the mean in either direc6on. To say that 68.26% of the total frequency under the normal curve falls between ‐1σ and +1σ from the mean is to say that the probability is approximately 68 in 100 that any given raw score will fall in this interval. 34 Clarifying the Standard Deviation: Women 68.26% Or 68 in 100 Probability and the Normal Curve Example: Campaign Phone‐Bank We are asked to calculate the z‐score for the number of calls campaign volunteers made in a 3‐hour shik. The mean number of calls is 21 with a standard devia$on of 1.45σ. What is the probability that a volunteer will complete 25 or more calls during the 3 hour period? Let’s apply the z‐score formula. 36 Example: Phone Banking z=? µ = 21 calls σ = 1.45 calls X = 25 calls z = z z = X-µ σ 25 - 21 1. 45 = 2.75 Goal: Turn the raw score (25 calls) into sigma units (z) in order to determine the likely percentage of volunteers who make between 21 and 25 calls, or more than 25 calls. Remember our equation. Plug in our values and scores. We have our z score. From our z score, we know that a raw score of 25 is located 2.75σ above the mean. 37 Probability and the Normal Curve Our next step is to use Table A to find the percent of the total frequency under the curve falling between the z score and the mean. So, 1. Let’s find our z score (2.75) in Column A. 2. Column B tells us that 49.70% of all volunteers should be able to complete between 21 and 25 calls in 3 hours. 3. By moving the decimal two places to the leX, we see that the probability is 50 in 100 (rounding up). 4. Or P = .4970 that a volunteer will complete between 21 and 25 phone calls. 38 A B C z µ and z beyond Z 2.75 49.70 .30 σ =1.45 z = 2.75 49.70 or 50 in 100 50 in 100 or P =.4970 21 25 P of Calls: 21-25 P = .4970 50 in 100 50% Chance σ =1.45 z = 2.75 49.70 or 50 in 100 P =.4970 21 .003 25 P of Calls: 17-25 P = .9940 99 in 100 σ =1.45 z = ‐2.75 z = 2.75 99.40 or 100 in 100 .003 P =.9940 17 21 .003 25 P of Calls: less 17, more 25 P = .006 .6 in 100 .6 % Chance P of Calls: more than 25 P = .003 .3 in 100 .3 % Chance z = ‐2.75 σ =1.45 z = 2.75 99.40 or 100 in 100 .003 P =.9940 17 21 .003 25 Review of Probability Probability refers to the rela$ve likelihood of occurrence of a par$cular outcome or event. The probability associated with an event is the number of $mes that event can occur rela$ve to the total number of $mes any event can occur. We use a capital P to indicate probability. Probability varies from 1 to 1.0 although percentages rather than decimals may be used to express levels of probability. 43 The Probability Spectrum A zero probability indicates that something is impossible. Probabili$es near zero (like .05 or .10) imply very unlikely occurrences. A probability of 1.0 cons$tutes certainty. High probabili$es like .90, .95, or .99 signify very probable or likely outcomes. 44 Equa6on for Calcula6ng Probability Probability of an outcome or event = Number of times the outcome or event can occur Total number of times any outcome or event can occur 45 Extra: Z Scores How do we determine the percent of cases for distances lying between any two score values? Example: A raw score lies 1.55σ above the mean. – Obviously our score falls between 1σ and 2σ. – So we know that this distance would include more than 34.15% but less than 47.72% of the total area under the normal curve. 46 Extra: Z Scores To determine the exact percentage in this interval, we must use Table A in Appendix B! Column A: The sigma distances are labeled z in the leX‐hand column Column B: The percentage of the area under the normal curve between the mean and the various sigma distances from the mean Column C: The percentage of the area at or beyond various scores toward either tail of the distribu$on. 47 Extra: Z Scores To determine the exact percentage in this interval, we must use Table A in Appendix B! Column A: The sigma distances are labeled z in the leX‐hand column Column B: The percentage of the area under the normal curve between the mean and the various sigma distances from the mean Column C: The percentage of the area at or beyond various scores toward either tail of the distribu$on. 48