Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 5 The Standard Deviation as a Ruler and the Normal Model Copyright © 2014, 2012, 2009 Pearson Education, Inc. 1 NOTE on slides / What we can and cannot do The following notice accompanies these slides, which have been downloaded from the publisher’s Web site: “This work is protected by United States copyright laws and is provided solely for the use of instructors in teaching their courses and assessing student learning. Dissemination or sale of any part of this work (including on the World Wide Web) will destroy the integrity of the work and is not permitted. The work and materials from this site should never be made available to students except by instructors using the accompanying text in their classes. All recipients of this work are expected to abide by these restrictions and to honor the intended pedagogical purposes and the needs of other instructors who rely on these materials.” Some of these slides are taken from the Third Edition; others are my own additions. We can use these slides because we are using the text for this course. Please help us stay legal. Do not distribute these slides any further. The original slides are done in green / red and black. My additions are in red and blue. Topics in brown and maroon are optional. Slide 2- 2 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 2 2 Topics in this chapter Shifting and Rescaling Data Standardized values (z-scores) Using the standard deviation as a ruler The Normal Model The 68-95-99.7 Rule Finding normal percents and the reverse Normal Probability Plots The Normality Assumption Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 3 3 Division of Mathematics, HCC Course Objectives for Chapter 5 After studying this chapter, the student will be able to: 19. Compare values from two different distributions using their z-scores. 20. Use Normal models (when appropriate) and the 68-95-99.7 Rule to estimate the percentage of observations falling within one, two, or three standard deviations of the mean. 21. Determine the percentages of observations that satisfy certain conditions by using the Normal model and determine “extraordinary” values. 22. Determine whether a variable satisfies the Nearly Normal condition by making a normal probability plot or histogram. 23. Determine the z-score that corresponds to a given percentage of observations. Note: It is essential that this chapter be mastered. Almost everything in Unit 3 depends on it. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 4 5.2 Shifting and Scaling Copyright © 2014, 2012, 2009 Pearson Education, Inc. 5 National Health and Examination Survey • Who? 80 male participants between 19 and 24 who measured between 68 and 70 inches tall • What? Their weights in kilograms • When? 2001 – 2002 • Where? United States • Why? To study nutrition and health issues and trends • How? National survey Copyright © 2014, 2012, 2009 Pearson Education, Inc. 6 Shifting Weights • Mean: 82.36 kg • Maximum Healthy Weight: 74 kg • How are shape, center, and spread affected when 74 is subtracted from all values? • Shape and spread are unaffected. • Center is shifted by 74. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 7 Rules for Shifting • If the same number is subtracted or added to all data values, then: • The measures of the spread – standard deviation, range, and IQR – are all unaffected. • The measures of position – mean, median, and mode – are all changed by that number. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 8 Rescaling • If we multiply all data values by the same number, what happens to the position and spread? • To go from kg to lbs, multiply by 2.2. • The mean and spread are also multiplied by 2.2. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 9 How Rescaling Affects the Center and Spread • When we multiply (or divide) all the data values by a constant, all measures of position and all measures of spread are multiplied (or divided) by that same constant. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 10 Example: Rescaling Combined Times in the Olympics • The mean and standard deviation in the men’s combined event at the Olympics were 168.93 seconds and 2.90 seconds, respectively. • If the times are measured in minutes, what will be the new mean and standard deviation? • Mean: 168.93 / 60 = 2.816 minutes • Standard Deviation: 2.90 / 60 = 0.048 minute Copyright © 2014, 2012, 2009 Pearson Education, Inc. 11 Example: Office reward Workers in a particular office have the following annual salaries (in thousands) 62, 62, 58, 54, 50, 46, 44 What are the summary statistics (rounded)? Mean: 53.7 Median: 54 Range: 18 Standard Deviation: 7.34 The boss wants to reward everyone for a job well-done. He can give them a one-time bonus or an extra raise. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 12 12 Option 1 - $5K Bonus (Shifting) The data become 67, 67, 63, 59, 55, 51, 49 The summary statistics become Mean: 58.7 (was 53.7) Median: 59 (was 54) Range: 18 (was 18) Standard Deviation: 7.34 (was 7.34) What has changed? By what? What has stayed the same? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 13 13 Option 2 – 5% raise (Rescaling) The data become 65.1, 65.1, 60.9, 56.7, 52.5, 48.3, 46.2 The summary statistics become Mean: 56.4 (was 53.7) Median: 56.7 (was 54) Range: 18.9 (was 18) Standard Deviation: 7.74 (was 7.34) What has changed? By what? What has stayed the same? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 14 14 Summary of effects 5K bonus 5% raise Shifted Rescaled Mean Up amt of shift Up same percent Median Up amt of shift Up same percent Range No change Up same percent Standard Deviation No change Up same percent Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 15 15 5.1 Standardizing with z-Scores Copyright © 2014, 2012, 2009 Pearson Education, Inc. 16 Let’s do one more thing We have our office salaries 62, 62, 58, 54, 50, 46, 44 Recall: Mean = 53.7, St. Dev. = 7.34 Let’s shift then so that the average is 0. We get 8.3, 8.3, 4.3, 0.3, -3.7, -7.7. -9.7 Mean is (approximately) 0. Now divide them by the standard deviation. We get 1.13, 1.13, .59, -.04, -.5, -1.05, -1.32 Mean is still 0, Standard Deviation is 1. We have “standardized” the salaries. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 17 17 Benefits of Standardizing • • • Standardized values have been converted from their original units to the standard statistical unit of standard deviations from the mean. Thus, we can compare values that are measured on different scales, with different units, or from different populations. Compare: – – 62, 62, 58, 54, 50, 46, 44 1.13, 1.13, .59, -.04, -.5, -1.05, -1.32 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 18 18 Comparing Athletes • Natalya Dobrynska (Ukraine) took the gold in the Olympics with a long jump of 6.63 m for the women’s heptathlon, 0.5 m higher than average. • Hyleas Fountain (USA) won the 200 m run with a time of 23.21 s, 1.5 s faster than average. • Whose performance was more impressive? Copyright © 2014, 2012, 2009 Pearson Education, Inc. 19 How Many Standard Deviations Above? Long Jump 200 m Run Mean 6.11 m 24.71 s SD 0.24 m 0.70 s Individual 6.63 m 23.21 s • The standard deviation helps us compare. • Long Jump: • 1 SD above: 6.11 + 0.24 = 6.35 • 2 SD above: 6.11 + (2)(0.24) = 6.59 • Just over 2 standard deviations above Copyright © 2014, 2012, 2009 Pearson Education, Inc. 20 The z-Score • In general, to find the distance between the value and the mean in standard deviations: 1. Subtract the mean from the value. 2. Divide by the standard deviation. y y z s • This is called the z-score. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 21 The z-score • The z-score measures the distance of the value from the mean in standard deviations. • A positive z-score indicates the value is above the mean. • A negative z-score indicates the value is below the mean. • A small z-score indicates the value is close to the mean when compared to the rest of the data values. • A large z-score indicates the value is far from the mean when compared to the rest of the data values. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 22 How Many Standard Deviations Above? Long Jump 200 m Run Mean 6.11 m 24.71 s SD 0.24 m 0.70 s Individual 6.63 m 23.21 s • Standard Deviations from the Mean Long Jump: 200 m Run: 6.63 6.11 z 2.17 0.24 23.21 24.71 z 2.14 0.70 • Natalya Dobrynska’s long jump was a little more impressive than Hyleas Fountain’s 200 m run. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 23 Shifting, Scaling, and z-Scores • Converting to z-scores: y y 0 • Subtract the mean • Divide by the standard deviation • The shape of the distribution does not change. • Changes the center by making the mean 0 • Changes the spread by making the standard deviation 1 s s =1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 24 Example: SAT and ACT Scores • How high does a college-bound senior need to score on the ACT in order to make it into the top quarter of equivalent of SAT scores for a college with middle 50% between 1530 and 1850? • SAT: Mean = 1500, Standard Deviation = 250 • ACT: Mean = 20.8, Standard Deviation = 4.8 • Think → Plan: Want ACT score for upper quarter. Have y and s • Variables: Both are quantitative. Units are points. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 25 Show →Mechanics: Standardize the Variable • It is known that the middle 50% of SAT scores are between 1530 and 1850, y = 1500, s = 250 • The top quarter starts at 1850. 1850 1500 1.40 • Find the z-score: z 250 • For the ACT, 1.40 standard deviations above the mean: 20.8 1.40(4.8) 27.52 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 26 Conclusion • To be in the top quarter of applicants in terms of combined SAT scores, a collegebound senior would need to have an ACT score of at least 27.52. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 27 Practice Example – Which student performed better? • Student A received a 85 on a 100 point quiz with a mean of 90 and standard deviation of 5. • Student B received a 35 on a 50 point quiz with a mean of 37 and a standard deviation of 3. • We must compare z-scores! 85 − 90 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝐴: = −1 5 35 − 37 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝐵: = −2/3 3 Student B did better. Source: Mrs. Emily Francis, Instructor of Mathematics, HCC Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 28 28 Who is relatively taller: • A non-basketball playing man who is 75 inches tall (assume non-basketball playing men have a mean height of 71.5 inches tall and a standard deviation of 2.1 inches). • A male basketball player who is 85 inches tall (assume male basketball players have a mean height of 80 inches and a standard deviation of 3.3) 75 − 71.5 𝑧(𝑁𝑜𝑛𝑝𝑙𝑎𝑦𝑒𝑟): = +1.667 2.1 85 − 80 𝑧(𝑃𝑙𝑎𝑦𝑒𝑟): = +1.515 3.3 The non[player is relatively taller. Source: Mrs. Emily Francis, Instructor of Mathematics, HCC Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 29 29 5.3 Normal Models Copyright © 2014, 2012, 2009 Pearson Education, Inc. 30 Models • “All models are wrong, but some are useful.” George Box, statistician • −1 < z < 1: Not uncommon • z = ±3: Rare • z = 6: Shouts out for attention! Copyright © 2014, 2012, 2009 Pearson Education, Inc. 31 Example Suppose we asked 30 people the question: At what age did you get your first real job? We could construct a histogram and see if any pattern emerges. Source (next ten slides including this one): Marc Boyer and Martine Ferguson, slides for “Basic Statistics Course”, given in Fall 2008 at the FDA Center for Food Safety and Applied Nutrition. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 32 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 33 Now suppose that we asked 300 people the same question. Observe as the number of people we ask increases, the graph begins to take the shape of what the population would look like. The histogram now begins to take the shape of a normal or Gaussian distribution because the underlying distribution is normal. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 34 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 35 Now suppose that we asked 3000 people the same question. As the number of people increases the histogram appears more smooth. The histogram now looks like a Normal probability distribution. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 36 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 37 Next we will see what the population looks like (plot of the distribution of values in the entire population). The previous histograms had a vertical scale that showed the percentage of observations in each category. Now the vertical axis doesn’t show the percent of observations since we have an infinite population. We must start thinking in terms of area under the curve. The distribution of all values in the population is no longer a histogram. The area under the entire curve represents the entire population, and the proportion of that area that falls between two values is the probability of observing a value in that interval. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 38 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 39 The mean is the center of the normal distribution. The standard deviation gives an expression of the spread. A special case of the normal distribution is called the standard normal distribution. The standard normal distribution has mean zero and standard deviation 1. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 40 The Normal Model • Bell Shaped: unimodal, symmetric • A Normal model for every mean and standard deviation. • m (read “mew”) represents the population mean. s (read “sigma”) represents the population standard deviation. • N(m, s) represents a Normal model with mean m and standard deviation s. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 41 A little history – Normal Model First published in 1718 by Abraham de Moivre (“Doctrine of Chances”). He had no idea how to apply it to experimental observations. Context of estimating binomial (coin-toss, etc.) probabilities for large n. The paper remained unknown until another statistician, Karl Pearson, discovered it in 1924! Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 42 42 A little history – Normal Model Pierre-Simon, Marquis Laplace - Analytical Theory of Probabilities (1812) – first used the normal distribution in 1778 for the analysis of errors of experiments. Karl Friedrich Gauss, who claimed to have used the method since 1794, justified it rigorously in 1809 (independent of LaPlace). Sometimes the Normal distribution is referred to as the Gaussian distribution. The name "bell curve" goes back to Jouffret who first used the term "bell surface" in 1872 for a “multivariate normal” distribution, i.e. an extension to three dimensions. The name "normal distribution" was coined independently by Charles S. Peirce, Francis Galton and Wilhelm Lexis around 1875. The independent discoveries show how naturally the Normal Model arises. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 43 43 Parameters and Statistics • Parameters: Numbers that help specify the model • m, s • Statistics: Numbers that summarize the data • y , s, median, mode (We will see this in Chapter 10). • N(0, 1) is called the standard Normal model, or the standard Normal distribution. • The Normal model should only be used if the data is approximately symmetric and unimodal. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 44 The 68-95-99.7 Rule (also called the “Empirical Rule”) • 68% of the values fall within 1 standard deviation of the mean. • 95% of the values fall within 2 standard deviations of the mean. • 99.7% of the values fall within 3 standard deviations of the mean. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 45 More on the 68-95-99.7 rule If the population is normally distributed then: 1. Approximately 68% of the observations are within 1 standard deviation of the population mean. 2. Approximately 95% of the observations are within 2 standard deviations of the population mean. 3. Approximately 99.7% of the observations are within 3 standard deviations of the population mean. Source (this and the next 5 slides): Marc Boyer and Martine Ferguson, slides for “Basic Statistics Course”, given in Fall 2008 at FDA Center for Food Safety and Applied Nutrition. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 46 Approximately 68% of the observations fall within 1 standard deviation of the mean Copyright © 2014, 2012, 2009 Pearson Education, Inc. 47 More on the 68-95-99.7 rule Note that the range "within one standard deviation of the mean" is highlighted in green. The area under the curve over this range is the relative frequency of observations in the range. That is, 0.68 = 68% of the observations fall within one standard deviation of the mean (µ ± σ). Below the axis, in red, is another set of numbers. These numbers are simply measures of standard deviations from the mean. In working with the variable X we will often find it necessary to convert into units of standard deviations from the mean. When the variable is measured this way, the letter Z is commonly used. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 48 Approximately 95% of the observations fall within 2 standard deviations of the mean Copyright © 2014, 2012, 2009 Pearson Education, Inc. 49 Approximately 99.7% of the observations fall within 3 standard deviations of the mean Copyright © 2014, 2012, 2009 Pearson Education, Inc. 50 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 51 Example of the 68-95-99.7 Rule • In the 2010 winter Olympics men’s slalom, Li Lei’s time was 120.86 sec, about 1 standard deviation slower than the mean. Given the Normal model, how many of the 48 skiers were slower? • About 68% are within 1 standard deviation of the mean. • 100% – 68% = 32% are outside. • “Slower” is just the left side. • 32% / 2 = 16% are slower. • 16% of 48 is 7.7. • About 7 are slower than Li Lei. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 52 The Empirical Rule is only an approximation. • • • • IQ’s have a mean of 100 and a standard deviation of 16. If a student has an IQ of 116, what percent of students have a higher score. Answer: 16% using the Empirical Rule. We will see later that the correct answer is closer to 15.87%. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 53 Three Rules For Using the Normal Model 1. Make a picture. 2. Make a picture. 3. Make a picture. • When data is provided, first make a histogram to make sure that the distribution is symmetric and unimodal. • Then sketch the Normal model. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 54 Working With the 68-95-99.7 Rule • Each part of the SAT has a mean of 500 and a standard deviation of 100. Assume the data is symmetric and unimodal. If you earned a 700 on one part of the SAT how do you stand among all others who took the SAT? • Think → • Plan: The variable is quantitative and the distribution is symmetric and unimodal. Use the Normal model N(500, 100). Copyright © 2014, 2012, 2009 Pearson Education, Inc. 55 Show and Tell • Show → Mechanics: • Make a picture. • 700 is 2 standard deviations above the mean. • Tell → Conclusion: • 95% lies within 2 standard deviations of the mean. • 100% - 95% = 5% are outside of 2 standard deviations of the mean. • Above 2 standard deviations is half of that. • 5% / 2 = 2.5% • Your score is higher than 2.5% of all scores on this test. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 56 Example: 68-95-99.7 rule. Example: For men aged 18 to 24, serum cholesterol levels have a mean of 178 mg/100mL with a standard deviation of 40.7 mg/mL. Pete’s cholesterol reading is 231. Where is Pete with respect to the mean cholesterol level? Copyright © 2014, 2012, 2009 Pearson Education, Inc. 57 Watching Pete’s Cholesterol level (231 – 178) = 53. The standard deviation was 40.7 53 / 40.7 is 1.3 standard deviations above the mean. Pete’s z-score is 1.3 We can say that between 68% and 95% of the stated population has a cholesterol level more extreme than Pete’s. Between 2.5% and 16% have a cholesterol level higher than Pete’s. We can say more using technology. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 58 Importance of the z-score The z-score is a ruler for comparing populations, even those which do not have the same mean and standard deviation. One study showed the mean cholesterol of American women as 188 mg/100mL and a standard deviation of 24 mg/100 mL. By coincidence, Susan has a cholesterol reading of 231 mg/100mL. Who’s is really higher – Pete’s or Susan’s? Copyright © 2014, 2012, 2009 Pearson Education, Inc. 59 Pete and Susan Susan is above her mean by 231-188, or 43 mg/100mL. 43.24 = 1.792 standard deviations. Susan’s z-score is 1.792. Pete’s z-score is 1.3. Susan’s is higher. Medically, Susan may have a bigger problem than Pete. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 60 Pete and Susan Susan’s reading is closer to the mean (43 mg/100ml vs. Pete’s of 53 mg/100ml). But Susan’s population has smaller variability than Pete’s. This made Susan’s cholesterol more extreme than Pete’s. It’s about variability! Copyright © 2014, 2012, 2009 Pearson Education, Inc. 61 5.4 Finding Normal Percentiles Copyright © 2014, 2012, 2009 Pearson Education, Inc. 62 What if z is not −3, −2, −1, 0, 1, 2, or 3? • If the data value we are trying to find using the Normal model does not have such a nice z-score, we will use a computer. • Example: Where do you stand if your SAT math score was 680? m = 500, s = 100 • Note that the z-score is not an integer: 680 500 z 1.8 100 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 63 *Finding Normal Percentiles by Hand (This is both slower and less accurate. Don’t do it this way!) When a data value doesn’t fall exactly 1, 2, or 3 standard deviations from the mean, we can look it up in a table of Normal percentiles. Table Z in Appendix D provides us with normal percentiles, but many calculators and statistics computer packages provide these as well. Let’s use the technology. As for the tables – let’s not and say we did! Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 64 64 Using StatCrunch for the Normal Model • What percent of all SAT scores are below 680? • m = 500, s = 100 • Stat → Calculators → Normal • Fill in info, hit Compute Copyright © 2014, 2012, 2009 Pearson Education, Inc. 65 Using StatCrunch for the Normal Model • What percent of all SAT scores are below 680? • m = 500, s = 100 • Stat → Calculators → Normal • Fill in info, hit Compute • 96.4% of SAT scores are below 680. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 66 Using the TI for the Normal Model Same exercise – what percent of SAT scores are lower than 680? On the TI, [DISTR], then normalcdf. The syntax for normalcdf id normalcdf(min,max,mean,stdev). Here, the minimum is “minus infinity”. Input a large negative number, say -99999 Use the negative sign below the 3. Normalcdf(-99999,680,500,100) = 0.9641 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 67 Using the TI for the Normal Model Copyright © 2014, 2012, 2009 Pearson Education, Inc. 68 A Probability Involving “Between” • What is the proportion of SAT scores that fall between 450 and 600? m = 500, s = 100 • Think → • Plan: Probability that x is between 450 and 600 = Probability that x < 600 – Probability that x < 450 • Variable: We are told that the Normal model works. N(500, 100) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 69 A Probability Involving “Between” • What is the proportion of SAT scores that fall between 450 and 600? m = 500, s = 100 • Show → Mechanics: Use StatCrunch to find each of the probabilities. • Probability that x is between 450 and 600 = Probability that x < 600 – Probability that x < 450 = 0.8413 – 0.3085 = 0.5328 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 70 A Probability Involving “Between” (SC) • What is the proportion of SAT scores that fall between 450 and 600? m = 500, s = 100 • Probability that x is between 450 and 600 = Probability that x < 600 – Probability that x < 450 = 0.8413 – 0.3085 = 0.5328 • Conclusion: The Normal model estimates that about 53.28% of SAT scores fall between 450 and 600. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 71 A Probability Involving “Between” (TI) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 72 From Percentiles to Scores: z in Reverse • Suppose a college admits only people with SAT scores in the top 10%. How high a score does it take to be eligible? m = 500, s = 100 • Think → • Plan: We are given the probability and want to go backwards to find x. • Variable: N(500, 100) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 73 From Percentiles to Scores: z in Reverse (SC) • Suppose a college admits only people with SAT scores in the top 10%. How high a score does it take to be eligible? m = 500, s = 100 • Show → Mechanics: Use StatCrunch putting in 0.9 for the probability. • Probability x < 628 = 0.9 • Conclusion: Because the school wants the SAT Verbal scores in the top 10%, the cutoff is 628. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 74 From Percentiles to Scores: z in Reverse (TI) Going from a percent to a score is the inverse of normalcdf. Therefore, use InvNorm(pct,mean,stdev) However, InvNorm only computes the lower x%. However, the highest 10% is the lowest 90%. Therefore, use InvNorm(0.9,500,100) = 628.15 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 75 From Percentiles to Scores: z in Reverse (TI) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 76 What z-scores correspond to the middle 95%? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 77 77 Middle 95% The z-score cutoffs for the middle 95% are +z and –z. How to find z? Issue: • InvNorm goes only from a cutoff to the extreme • We need to “fudge” to accommodate InvNorm! left. There is 0.95 in the middle, plus 0.025 on the extreme left. InvNorm(0.975) is 1.959963 or 1.96. This is extremely important for Unit 3. You need to understand this and keep it in mind. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 78 78 An Application to Test Scores QUESTION 1 The SAT Verbal has a mean of 500 and a standard deviation of 100. Pat got 650 on the SAT Verbal. How well did Pat do? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 79 79 An Application to Test Scores ANSWER – actually, two answers! If we standardize Pat’s score, we get 650 – 500 100 Or a z-score of +1.5. That is, Pat’s score is 1.5 standard deviations above the mean SAT score of 500. Only people who have had statistics think in terms of z-scores, so let’s figure a percentile. We use normalcdf(?,1.5). ?? On the TI, use a low lower bound such as -999. Normalcdf((-)999,1.5)=93.32%, a respectable job. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 80 80 An Application to Test Scores Tell: If Pat got a 650 on the SAT Verbal, his score was in the 93.32 percentile. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 81 81 An Application to Test Scores QUESTION 2 One college that Pat is considering requires the ACT. Pat took it as well and got a 27. The ACT has a mean of 21 and a standard deviation of 4.7 How well did Pat do? As well as the SAT? ANSWER Standardizing Pat’s score, we get 27 – 21 4.7 Or a z-score of +1.28. Not quite as good. As before, on the TI, use a low lower bound such as -.999. For a percentile, Normalcdf((-)999,1.28)=89.97%, still a good showing. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 82 82 An Application to Test Scores Tell: If Pat got a 27 on the ACT, which is N(21,4.7), then Pat’s score was in the 89.97 percentile. This score was not as good as his SAT score, which was in the 93.32 percentile. We are using z-scores to, in effect, compare apples and oranges – two datasets with completely different means and standard deviations. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 83 83 An Application to Test Scores QUESTION 3 How well would Pat have to do on the ACT to match his percentile (93.32) and equivalent z-score (1.5) on the SAT? ANSWER Remembering our standardization, we have (X is Pat’s ACT score) 1.5 = 𝑋 −21 47 We manipulate to get X: (1.5)*(4.7)+21 = 27.05! Even though Pat just missed it with the 27, 28 is needed since ACT scores are reported in whole numbers! Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 84 84 An Application to Test Scores Tell: In order to do as well on the ACT as on the SAT, Pat would need an ACT score of 28. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 85 85 Percentiles and Z-scores • What percent of a standard Normal model is found in each region? Draw a picture for each a) z > -2.05 b) z < -0.33 c) 1.2 < z < 1.8 d) |z| < 1.28 • In a standard Normal model, what value(s) of z cut(s) off the region described? Draw a picture first! a) The highest 20% b) The highest 75% c) The lowest 3% d) The middle 90% Source: Mrs. Emily Francis, Instructor of Mathematics, HCC Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 86 86 More Percentiles and Z-scores • What percent of a standard Normal model is found in each region? Draw a picture for each a) z > -1.05 b) z < -0.40 c) 1.3 < z < 2.0 • In a standard Normal model, what value(s) of z cut(s) off the region described? Draw a picture first! a) The highest 20% b) The highest 60% c) The lowest 6% d) The middle 75% Source: Mrs. Emily Francis, Instructor of Mathematics, HCC Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 87 87 Additional exercises Some IQ tests are standardized to a normal model with a mean of 100 and a standard deviation of 16. A) Draw the model for these IQ scores clearly labeling showing what the 68-95-99.7 Rule predicts about the scores B) In what interval would you expect to find the central 95% of IQ scores to be found? C) About what percent of people should have IQ scores above 116? D) About what percent of people should have IQ scores between 68 and 84? E) About what percent of people would have IQ scores above 132? Source: Mrs. Emily Francis, Instructor of Mathematics, HCC Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 88 88 44) Based on the Normal model N(100, 16) describing IQ scores, what percent of people’s IQ scores would you expect to be – Over 80? – Under 90? – Between 112 and 132? 46) In the same model, what cutoff value bounds – The highest 5% of all IQs? – The lowest 30% of the IQs? – The middle 80% of the IQs? Source: Mrs. Emily Francis, Instructor of Mathematics, HCC Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 89 89 Underweight Cereal Boxes • Based on experience, a manufacturer makes cereal boxes that fit the Normal model with mean 16.3 ounces and standard deviation 0.2 ounces, but the label reads 16.0 ounces. What fraction will be underweight? • Think → • Plan: Find Probability that x < 16.0 • Variable: N(16.3, 0.2) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 90 Underweight Cereal Boxes • What fraction of the cereal boxes will be underweight (less than 16.0)? m = 16.3, s = 0.2 • Show → Mechanics: Use StatCrunch to find the probability. • Probability x < 16.0 = 0.0668 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 91 Underweight Cereal Boxes • What fraction of the cereal boxes will be underweight (less than 16.0)? m = 16.3, s = 0.2 • Probability x < 16.0 = 0.0668 • Conclusion: I estimate that approximately 6.7% of the boxes will contain less than 16.0 ounces of cereal. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 92 Underweight Cereal Boxes Part II • Lawyers say that 6.7% is too high and recommend that at most 4% be underweight. What should they set the mean at? s = 0.2 • Think → • Plan: Find the mean such that Probability(x < 16.0) = 0.04. • Variable: N(?, 0.2) • Reality Check: Note that the mean must be less than 16.3 ounces. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 93 How I would do it We cannot do this using InvNorm or normalcdf. We can, however, get a z-score that corresponds to the lowest 4%, and then solve: (16 – xbar)/0.02 = z Use 16 because “underweight” is defined as less than 16 oz. The lowest 4% in the standard normal corresponds to InvNorm(0.04,0,1) = -1.7506. We need to be 1.75 standard deviations below the mean Solve: (16 – xbar)/0.02 = -1.75. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 94 How I would do it (next step) (16 – xbar)/0.02 = -1.75. (16 – xbar) = -1.75 * 0.02 16 – xbar = - 0.035 16 + 0.035 = xbar Xbar = 16.035 oz. (to three decimal places) This should clarify the next slide. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 95 Underweight Cereal Boxes Part II • Lawyers say that 6.7% is too high and recommend that at most 4% be underweight. What should they set the mean at? s = 0.2 • Mechanics: Sketch a picture. • Use StatCrunch to find z such that the area to the left of the standard Normal Model is 0.04. • • z = −1.75 Find 16 + 1.75(0.02) = 16.035 ounces Copyright © 2014, 2012, 2009 Pearson Education, Inc. 96 Underweight Cereal Boxes Part II • Lawyers say that 6.7% is too high and recommend that at most 4% be underweight. What should they set the mean at? s = 0.2 • z = −1.75 • Find 16 + 1.75(0.02) = 16.035 ounces • Conclusion: The company must set the machine to average 16.035 ounces per box. • Note: Correction from the publisher’s slide, which said 16.35. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 97 Underweight Cereal Boxes Part III • The CEO vetoes that plan and sticks with a mean of 16.2 ounces and 4% weighing under 16.0 ounces. She demands a machine with a lower standard deviation. What standard deviation must the machine achieve? • Think → • Plan: Find s such that Probability x < 16.0 = 0.04. • Variable: N(16.2, ?) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 98 Underweight Cereal Boxes Part III • What standard deviation must the machine achieve? N(60.2, ?) Show → Mechanics: From before, z = −1.75 16.0 16.2 • 1.75 s • s = 0.114 • 1.75s = 0.2, • Conclusion: The company must get the machine to box cereal with a standard deviation of no more than 0.114 ounces. The machine must be more consistent. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 99 Section 5.5 Normal Probability Plots Copyright © 2014, 2012, 2009 Pearson Education, Inc. 100 Checking if the Normal Model Applies • A histogram will work, but there is an alternative method. •One problem with histograms – they look different with different bin widths. • Instead use a Normal Probability Plot. • Plots each value against the z-score that would be expected had the distribution been perfectly normal. • If the plot shows a line or is nearly straight, then the Normal model works. • If the plot strays from being a line, then the Normal model is not a good model. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 101 The Normal Model Applies • The Normal probability plot is nearly straight, so the Normal model applies. Note that the histogram is unimodal and somewhat symmetric. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 102 The Normal Model Does Not Apply • The Normal probability plot is not straight, so the Normal model does not apply applies. Note that the histogram is skewed right. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 103 Histogram with the TI Example: Data: 62, 63, 65, 66, 68, 70, 71, 73, 75 Use [STAT][EDIT] to put the dataset in L1. The first few data points are shown. NOTE: You will do this a lot in this course! Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 104 104 Slide 1- 104 Histogram with the TI First, select [Y1] and turn off any functions from Algebra class! Press [2nd][Y1] and go to one of the three plots. Turn it on. Select the histogram. Make sure that L1 (or wherever you put the data) is in Xlist. Make sure the 1 is in Freq Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 105 105 Histogram with the TI (default) You can get a window default by selecting Zoom and then 9 Below is the window. It shows a bin width of 3.25. It includes all of the values. Because we have integers, I’d rather have 3 as a bin width. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 106 106 Histogram with the TI Choose as window X:[60,78];Y[1,3]. You may have to play with this. • For X, I picked a little lower than the min and a little higher than the max. • For Y, I picked a little bigger than the largest bin frequency than I expected. Xscl is the length of the bin. In this case, choosing 3 makes cut points at 60, 63, 66, 69. 72. 75, and 78. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 4- 107 107 NPP with the TI [STATPLOT], then turn Plot 1 on. o Select the lower right plot, This is the NPP. o Press Zoom, 9 o Looks pretty good! o Copyright © 2014, 2012, 2009 Pearson Education, Inc. 108 Normal Probability Plot – StatCrunch (called a QQ plot) • • • • • • • Assume that your data are in the first column Select Graph, then QQ Plot Select the column where your data are. Continue on as in all of the other StatCrunch graphs. The graph comes up, but with the normal scale on the x-axis and the data on the y-axis. This is the opposite of how most books do it! Again, I would recommend the TI. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 109 109 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 110 110 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 111 111 Publisher’s Instructions: Normal Probability (QQ) Plot • • • • • QQ Plot Displays the sample quantiles of a variable versus the quantiles of a standard normal distribution. Select the column(s) to be displayed in the plot(s). Enter an optional Where clause to specify the data rows to be included in the computation. Select an optional Group by column to generate a separate QQ plot for each distinct value of the Group by column. Click the Next button to specify graph layout options. Click the Create Graph! button to create the plot(s). Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 112 112 Other tests for Normality There are several analytical (as opposed to graphical) tests to see if data fit a normal distribution. • Goodness of fit test – will demonstrate in Chapter 22 • Shapiro-Wilk test – used by FDA / CFSAN; this is also in StatCrunch (I’ll show it to you after we do Unit 3.) • Lilliefors Test • Anderson-Darling Test • and several others. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 113 5.end Wrap-up Copyright © 2014, 2012, 2009 Pearson Education, Inc. 114 What Can Go Wrong? Don’t use a Normal model when the distribution is not unimodal and symmetric. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 115 115 An example – incorrect z-score Below : µ = 0.5; σ = 0.288 The point 0.99 is actually at the 99th percentile. If you assume N(.5,.288), the z-score would be 1.701; percentile would be 95.56! Copyright © 2014, 2012, 2009 Pearson Education, Inc. 116 Slide 1- 116 An example – incorrect z-score • • • Below : µ = 0.5; σ = 0.5 The point 5.98 would be at the 95th percentile. If you assume N(.5,.5), the z-score of 5.98 would be off the charts! 0.6 0.5 0.4 0.3 0.2 0.1 0 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 117 117 What Can Go Wrong? (cont.) Don’t use the mean and standard deviation when outliers are present—the mean and standard deviation can both be distorted by outliers. Don’t round your results in the middle of a calculation. Don’t worry about minor differences in results. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 118 118 What Can Go Wrong • Don’t use the Normal model when the distribution is not unimodal and symmetric. • Always look at the picture first. • Don’t use the mean and standard deviation when outliers are present. • Check by making a picture. • Don’t round your results in the middle of the calculation. • Always wait until the end to round. • Don’t worry about minor differences in results. • Different rounding can produce slightly different results. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 119 What have we learned? The story data can tell may be easier to understand after shifting or rescaling the data. • Shifting data by adding or subtracting the same amount from each value affects measures of center and position but not measures of spread. • Rescaling data by multiplying or dividing every value by a constant changes all the summary statistics—center, position, and spread. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 120 120 What have we learned? (cont.) We’ve learned the power of standardizing data. • Standardizing uses the SD as a ruler to measure distance from the mean (z-scores). • With z-scores, we can compare values from different distributions or values based on different units. • z-scores can identify unusual or surprising values among data. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 121 121 What have we learned? (cont.) We’ve learned that the 68-95-99.7 Rule can be a useful rule of thumb for understanding distributions: • For data that are unimodal and symmetric, about 68% fall within 1 SD of the mean, 95% fall within 2 SDs of the mean, and 99.7% fall within 3 SDs of the mean. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 122 122 What have we learned? (cont.) We see the importance of Thinking about whether a method will work: • Normality Assumption: We sometimes work with Normal tables (Table Z). These tables are based on the Normal model. But the TI is faster and more accurate. • Data can’t be exactly Normal, so we check the Nearly Normal Condition by making a histogram (is it unimodal, symmetric and free of outliers?) or a normal probability plot (is it straight enough?). Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 123 123 Division of Mathematics, HCC Course Objectives for Chapter 5 After studying this chapter, the student will be able to: 19. Compare values from two different distributions using their z-scores. 20. Use Normal models (when appropriate) and the 68-95-99.7 Rule to estimate the percentage of observations falling within one, two, or three standard deviations of the mean. 21. Determine the percentages of observations that satisfy certain conditions by using the Normal model and determine “extraordinary” values. 22. Determine whether a variable satisfies the Nearly Normal condition by making a normal probability plot or histogram. 23. Determine the z-score that corresponds to a given percentage of observations. Note: It is essential that this chapter be mastered. Almost everything in Unit 3 depends on it. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 124