Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6 The Standard Deviation as a Ruler and the Normal Model Copyright © 2009 Pearson Education, Inc. NOTE on slides / What we can and cannot do The following notice accompanies these slides, which have been downloaded from the publisher’s Web site: “This work is protected by United States copyright laws and is provided solely for the use of instructors in teaching their courses and assessing student learning. Dissemination or sale of any part of this work (including on the World Wide Web) will destroy the integrity of the work and is not permitted. The work and materials from this site should never be made available to students except by instructors using the accompanying text in their classes. All recipients of this work are expected to abide by these restrictions and to honor the intended pedagogical purposes and the needs of other instructors who rely on these materials.” We can use these slides because we are using the text for this course. Please help us stay legal. Do not distribute these slides any further. The original slides are done in orange / brown and black. My additions are in red and blue. Topics in green are optional. Copyright © 2009 Pearson Education, Inc. Slide 1- 3 Topics in this chapter Shifting and Rescaling Data Standardized values (z-scores) Using the standard deviation as a ruler The Normal Model The 68-95-99.7 Rule Finding normal percents and the reverse Normal Probability Plots The Normality Assumption Copyright © 2009 Pearson Education, Inc. Slide 1- 4 Division of Mathematics, HCC Course Objectives for Chapter 6 After studying this chapter, the student will be able to: 20. Compare values from two different distributions using their zscores. 21. Use Normal models (when appropriate) and the 68-95-99.7 Rule to estimate the percentage of observations falling within one, two, or three standard deviations of the mean. 22. Determine the percentages of observations that satisfy certain conditions by using the Normal model and determine “extraordinary” values, and the reverse. 23. Determine whether a variable satisfies the Nearly Normal condition by making a normal probability plot or histogram. Note: It is essential that this chapter be mastered. Almost everything in Unit 3 depends on it. Copyright © 2009 Pearson Education, Inc. Let’s review our summary statistics Workers in a particular office have the following annual salaries (in thousands) 62, 62, 58, 54, 50, 46, 44 What are the summary statistics (rounded)? Mean: 53.7 Median: 54 Range: 18 Standard Deviation: 7.34 The boss wants to reward everyone for a job well-done. He can give them a one-time bonus or an extra raise. Copyright © 2009 Pearson Education, Inc. Slide 1- 6 Option 1 - $5K Bonus The data become 67, 67, 63, 59, 55, 51, 49 The summary statistics become Mean: 58.7 (was 53.7) Median: 59 (was 54) Range: 18 (was 18) Standard Deviation: 7.34 (was 7.34) What has changed? By what? What has stayed the same? This is called Shifting Copyright © 2009 Pearson Education, Inc. Slide 1- 7 Option 2 – 5% raise The data become 65.1, 65.1, 60.9, 56.7, 52.5, 48.3, 46.2 The summary statistics become Mean: 56.4 (was 53.7) Median: 56.7 (was 54) Range: 18.9 (was 18) Standard Deviation: 7.74 (was 7.34) What has changed? By what? What has stayed the same? This is called Rescaling Copyright © 2009 Pearson Education, Inc. Slide 1- 8 Shifting Data Shifting data: Adding (or subtracting) a constant to every data value adds (or subtracts) the same constant to measures of position. Adding (or subtracting) a constant to each value will increase (or decrease) measures of position: center, percentiles, max or min by the same constant. Its shape and spread - range, IQR, standard deviation remain unchanged. When we gave the employees a $5K bonus, we shifted their salaries. Copyright © 2009 Pearson Education, Inc. Slide 1- 9 Another example – shifting data 80 men of a particular height and body frame were weighed. The average weight in kilograms is 82.36. The NIH recommends that the average be 74 kg. Let’s shift by 74 kg. The new mean is 8.36 kg. Note also If I weigh 80 kg, then I am +6 kg with respect to normal weight. If I weigh 70 kg, then I am -4 kg with respect to normal weight. Copyright © 2009 Pearson Education, Inc. Slide 1- 10 Shifting Data (cont.) NIH example: The following histograms show a shift from men’s actual weights to kilograms above (or if negative, below) recommended weight: Copyright © 2009 Pearson Education, Inc. Slide 1- 11 Rescaling Data Rescaling data: When we multiply (or divide) all the data values by any constant, all measures of position (such as the mean, median, and percentiles) and measures of spread (such as the range, the IQR, and the standard deviation) are multiplied (or divided) by that same constant. When we gave the employees a 5% raise, we rescaled their salaries. Copyright © 2009 Pearson Education, Inc. Slide 1- 12 Rescaling Data (cont.) NIH Example: The men’s weight data set measured weights in kilograms. If we want to think about these weights in pounds, we would rescale the data: Copyright © 2009 Pearson Education, Inc. Slide 1- 13 Summary of effects 5K bonus 5% raise Shifted Rescaled Mean Up amt of shift Up same percent Median Up amt of shift Up same percent Range No change Up same percent Standard Deviation No change Up same percent Copyright © 2009 Pearson Education, Inc. Slide 1- 14 Summary of Effects Shifting: Adding (or subtracting) a constant to every data value adds (or subtracts) the same constant to measures of center, but leaves the measure of spread unchanged. Rescaling: When we multiply (or divide) every data value by a constant, all measures of center and spread are multiplied (or divided) by that same constant. Copyright © 2009 Pearson Education, Inc. Slide 1- 15 Let’s do one more thing We have our office salaries 62, 62, 58, 54, 50, 46, 44 Recall: Mean = 53.7, St. Dev. = 7.34 Let’s shift then so that the average is 0. We get 8.3, 8.3, 4.3, 0.3, -3.7, -7.7. -9.7 Mean is (approximately) 0. Now divide them by the standard deviation. We get 1.13, 1.13, .59, -.04, -.5, -1.05, -1.32 Mean is still 0, Standard Deviation is 1. We have “standardized” the salaries. Copyright © 2009 Pearson Education, Inc. Slide 1- 16 Benefits of Standardizing Standardized values have been converted from their original units to the standard statistical unit of standard deviations from the mean. Thus, we can compare values that are measured on different scales, with different units, or from different populations. Compare: 62, 62, 58, 54, 50, 46, 44 1.13, 1.13, .59, -.04, -.5, -1.05, -1.32 Copyright © 2009 Pearson Education, Inc. Slide 1- 17 The Standard Deviation as a Ruler The trick in comparing very different-looking values is to use standard deviations as our rulers. The standard deviation tells us how the whole collection of values varies, so it’s a natural ruler for comparing an individual to a group. As the most common measure of variation, the standard deviation plays a crucial role in how we look at data. Copyright © 2009 Pearson Education, Inc. Slide 1- 18 Standardizing with z-scores We compare individual data values to their mean, relative to their standard deviation using the following formula: y y z s We call the resulting values standardized values, denoted as z. They can also be called z-scores. Copyright © 2009 Pearson Education, Inc. Slide 1- 19 Standardizing with z-scores (cont.) Standardized values have no units. z-scores measure the distance of each data value from the mean in standard deviations. That is, a z-score measures how many standard deviations we are from the mean. A negative z-score tells us that the data value is below the mean, while a positive z-score tells us that the data value is above the mean. Copyright © 2009 Pearson Education, Inc. Slide 1- 20 Standardizing with z-scores (cont.) Standardizing data into z-scores shifts the data by subtracting the mean and rescales the values by dividing by their standard deviation. Standardizing into z-scores does not change the shape of the distribution. Standardizing into z-scores changes the center by making the mean 0. Standardizing into z-scores changes the spread by making the standard deviation 1. Copyright © 2009 Pearson Education, Inc. Slide 1- 21 When Is a z-score BIG? A z-score gives us an indication of how unusual a value is because it tells us how far it is from the mean. In particular, the z-score of a data point measures the number of standard deviations the data point is from the mean. Remember that a negative z-score tells us that the data value is below the mean, while a positive zscore tells us that the data value is above the mean. The larger a z-score is (negative or positive), the more unusual it is. Copyright © 2009 Pearson Education, Inc. Slide 1- 22 When Is a z-score Big? (cont.) There is no universal standard for z-scores, but there is a model that shows up over and over in Statistics. This model is called the Normal model (You may have heard of “bell-shaped curves.”). Normal models are appropriate for distributions whose shapes are unimodal and roughly symmetric. These distributions provide a measure of how extreme a z-score is. Copyright © 2009 Pearson Education, Inc. Slide 1- 23 When Is a z-score Big? (cont.) There is a Normal model for every possible combination of mean and standard deviation. We write N(μ,σ) to represent a Normal model with a mean of μ and a standard deviation of σ. We use Greek letters because this mean and standard deviation do not come from data—they are numbers (called parameters) that specify the model. Nothing is ever perfectly normal (or perfectly much of any “nice” distribution.) However, the normal model is useful for a wide variety of situations. Copyright © 2009 Pearson Education, Inc. Slide 1- 24 Example Suppose we asked 30 people the question: At what age did you get your first real job? We could construct a histogram and see if any pattern emerges. Source (next ten slides including this one): Marc Boyer and Martine Ferguson, slides for “Basic Statistics Course”, given in Fall 2008 at the FDA Center for Food Safety and Applied Nutrition. Copyright © 2009 Pearson Education, Inc. Copyright © 2009 Pearson Education, Inc. Now suppose that we asked 300 people the same question. Observe as the number of people we ask increases, the graph begins to take the shape of what the population would look like. The histogram now begins to take the shape of a normal or Gaussian distribution because the underlying distribution is normal. Copyright © 2009 Pearson Education, Inc. Copyright © 2009 Pearson Education, Inc. Now suppose that we asked 3000 people the same question. As the number of people increases the histogram appears more smooth. The histogram now looks like a Normal probability distribution. Copyright © 2009 Pearson Education, Inc. Copyright © 2009 Pearson Education, Inc. Next we will see what the population looks like (plot of the distribution of values in the entire population). The previous histograms had a vertical scale that showed the percentage of observations in each category. Now the vertical axis doesn’t show the percent of observations since we have an infinite population. We must start thinking in terms of area under the curve. Copyright © 2009 Pearson Education, Inc. The distribution of all values in the population is no longer a histogram. The area under the entire curve represents the entire population, and the proportion of that area that falls between two values is the probability of observing a value in that interval. Copyright © 2009 Pearson Education, Inc. Copyright © 2009 Pearson Education, Inc. The mean is the center of the normal distribution. The standard deviation gives an expression of the spread. A special case of the normal distribution is called the standard normal distribution. The standard normal distribution has mean zero and standard deviation 1. Copyright © 2009 Pearson Education, Inc. Why is a normal distribution like a lion? They both have a mean µ! Copyright © 2009 Pearson Education, Inc. Slide 1- 35 When Is a z-score Big? (cont.) Summaries of data, like the sample mean and standard deviation, are written with Latin letters. Such summaries of data are called statistics. When we standardize Normal data, we still call the standardized value a z-score, and we write z Copyright © 2009 Pearson Education, Inc. y Slide 1- 36 When Is a z-score Big? (cont.) Once we have standardized, we need only one model: The N(0,1) model is called the standard Normal model (or the standard Normal distribution). Be careful—don’t use a Normal model for just any data set, since standardizing does not change the shape of the distribution. Copyright © 2009 Pearson Education, Inc. Slide 1- 37 When Is a z-score Big? (cont.) When we use the Normal model, we are assuming the distribution is Normal. We cannot check this assumption in practice, so we check the following condition: Nearly Normal Condition: The shape of the data’s distribution is unimodal and symmetric. This condition can be checked by making a histogram or a Normal probability plot (to be explained later). Copyright © 2009 Pearson Education, Inc. Slide 1- 38 A little history – Normal Model First published in 1718 by Abraham de Moivre (“Doctrine of Chances”). He had no idea how to apply it to experimental observations. Context of estimating binomial (coin-toss, etc.) probabilities for large n. The paper remained unknown until another statistician, Karl Pearson, discovered it in 1924! Copyright © 2009 Pearson Education, Inc. Slide 1- 39 A little history – Normal Model Pierre-Simon, Marquis Laplace - Analytical Theory of Probabilities (1812) – first used the normal distribution in 1778 for the analysis of errors of experiments. Karl Friedrich Gauss, who claimed to have used the method since 1794, justified it rigorously in 1809 (independent of LaPlace). Sometimes the Normal distribution is referred to as the Gaussian distribution. The name "bell curve" goes back to Jouffret who first used the term "bell surface" in 1872 for a “multivariate normal” distribution, i.e. an extension to three dimensions. The name "normal distribution" was coined independently by Charles S. Peirce, Francis Galton and Wilhelm Lexis around 1875. The independent discoveries show how naturally the Normal Model arises. Copyright © 2009 Pearson Education, Inc. Slide 1- 40 The 68-95-99.7 Rule Normal models give us an idea of how extreme a value is by telling us how likely it is to find one that far from the mean. We can find these numbers precisely, but until then we will use a simple rule that tells us a lot about the Normal model… Copyright © 2009 Pearson Education, Inc. Slide 1- 41 The 68-95-99.7 Rule (cont.) It turns out that in a Normal model: about 68% of the values fall within one standard deviation of the mean; about 95% of the values fall within two standard deviations of the mean; and, about 99.7% (almost all!) of the values fall within three standard deviations of the mean. Copyright © 2009 Pearson Education, Inc. Slide 1- 42 The 68-95-99.7 Rule (cont.) The following shows what the 68-95-99.7 Rule tells us: Copyright © 2009 Pearson Education, Inc. Slide 1- 43 More on the 68-95-99.7 rule If the population is normally distributed then: 1. Approximately 68% of the observations are within 1 standard deviation of the population mean. 2. Approximately 95% of the observations are within 2 standard deviations of the population mean. 3. Approximately 99.7% of the observations are within 3 standard deviations of the population mean. Source (this and the next 5 slides): Marc Boyer and Martine Ferguson, slides for “Basic Statistics Course”, given in Fall 2008 at FDA Center for Food Safety and Applied Nutrition. Copyright © 2009 Pearson Education, Inc. Approximately 68% of the observations fall within 1 standard deviation of the mean Copyright © 2009 Pearson Education, Inc. More on the 68-95-99.7 rule Note that the range "within one standard deviation of the mean" is highlighted in green. The area under the curve over this range is the relative frequency of observations in the range. That is, 0.68 = 68% of the observations fall within one standard deviation of the mean (µ ± σ). Below the axis, in red, is another set of numbers. These numbers are simply measures of standard deviations from the mean. In working with the variable X we will often find it necessary to convert into units of standard deviations from the mean. When the variable is measured this way, the letter Z is commonly used. Copyright © 2009 Pearson Education, Inc. Approximately 95% of the observations fall within 2 standard deviations of the mean Copyright © 2009 Pearson Education, Inc. Approximately 99.7% of the observations fall within 3 standard deviations of the mean Copyright © 2009 Pearson Education, Inc. Copyright © 2009 Pearson Education, Inc. The First Three Rules for Working with Normal Models Make a picture. Make a picture. Make a picture. And, when we have data, make a histogram to check the Nearly Normal Condition to make sure we can use the Normal model to model the distribution. Copyright © 2009 Pearson Education, Inc. Slide 1- 50 Example: 68-95-99.7 rule. Example: For men aged 18 to 24, serum cholesterol levels have a mean of 178 mg/100mL with a standard deviation of 40.7 mg/mL. Pete’s cholesterol reading is 231. Where is Pete with respect to the mean cholesterol level? Copyright © 2009 Pearson Education, Inc. Watching Pete’s Cholesterol level (231 – 178) = 53. The standard deviation was 40.7 53 / 40.7 is 1.3 standard deviations above the mean. Pete’s z-score is 1.3 We can say that between 68% and 95% of the stated population has a cholesterol level more extreme than Pete’s. Between 2.5% and 16% have a cholesterol level higher than Pete’s. We can say more using technology. Copyright © 2009 Pearson Education, Inc. Importance of the z-score The z-score is a ruler for comparing populations, even those which do not have the same mean and standard deviation. One study showed the mean cholesterol of American women as 188 mg/100mL and a standard deviation of 24 mg/100 mL. By coincidence, Susan has a cholesterol reading of 231 mg/100mL. Who’s is really higher – Pete’s or Susan’s? Copyright © 2009 Pearson Education, Inc. Pete and Susan Susan is above her mean by 231-188, or 43 mg/100mL. 43.24 = 1.792 standard deviations. Susan’s z-score is 1.792. Pete’s z-score is 1.3. Susan’s is higher. Medically, Susan may have a bigger problem than Pete. Copyright © 2009 Pearson Education, Inc. Pete and Susan Susan’s reading is closer to the mean (43 mg/100ml vs. Pete’s of 53 mg/100ml). But Susan’s population has smaller variability than Pete’s. This made Susan’s cholesterol more extreme than Pete’s. It’s about variability! Copyright © 2009 Pearson Education, Inc. *Finding Normal Percentiles by Hand (This is both slow and inaccurate) When a data value doesn’t fall exactly 1, 2, or 3 standard deviations from the mean, we can look it up in a table of Normal percentiles. Table Z in Appendix D provides us with normal percentiles, but many calculators and statistics computer packages provide these as well. Let’s use the technology. As for the tables – let’s not and say we did! Copyright © 2009 Pearson Education, Inc. Slide 1- 56 *Finding Normal Percentiles by Hand (cont.) Table Z is the standard Normal table. We have to convert our data to z-scores before using the table. The figure shows us how to find the area to the left when we have a z-score of 1.80: Copyright © 2009 Pearson Education, Inc. Slide 1- 57 Finding Normal Percentiles Using Technology Much preferred method Many calculators and statistics programs have the ability to find normal percentiles for us. Both the TI and StatCrunch will easily do it. The ActivStats Multimedia Assistant offers two methods for finding normal percentiles: The “Normal Model Tool” makes it easy to see how areas under parts of the Normal model correspond to particular cut points. There is also a Normal table in which the picture of the normal model is interactive. Copyright © 2009 Pearson Education, Inc. Slide 1- 58 Finding Normal Percentiles Using Technology (cont.) The following was produced with the “Normal Model Tool” in ActivStats: Copyright © 2009 Pearson Education, Inc. Slide 1- 59 Finding Normal Percentiles Using the TI To find the percentile between z = -0.5 and z = 1. Press 2nd VARS, which will get you “DISTR” Press 2 – normalcdf(, then “ENTER When normalcdf( appears, type (.5,1) (Not – as in subtract). Your answer will display. To find the percentile less than 1 Enter normalcdf (-999,1) as above. Similarly, we can enter normalcdf(-.5,999) to find the percent of values bigger than -0.5. Copyright © 2009 Pearson Education, Inc. Slide 1- 60 Normalcdf with the 2.55 operating system See the screen captures at the right for the percentile between z = -0.5 and z = 1. Enter -.5 and 1 in the menu; then select Paste. Normalcdf appears on the next screen (you must scroll to see it all.). Copyright © 2009 Pearson Education, Inc. Slide 1- 61 (Make a picture)3 with ShadeNorm Select [DISTR] as before, but this time, select [DRAW], then ShadeNorm. Enter -0.5 and 1. The result is at the lower right. You might have to select an appropriate window. Copyright © 2009 Pearson Education, Inc. Slide 1- 62 Copyright © 2009 Pearson Education, Inc. Slide 1- 63 Finding Normal Percentiles Using StatCrunch Under Stat, select “Calculators”, then “Normal”. First, select <= and list – 0.5. Note the answer as 0.3085. Copyright © 2009 Pearson Education, Inc. Slide 1- 64 Finding Normal Percentiles Using StatCrunch Now select => and type 1. Note the answer as 0.1587. Then calculate 1 – 0.3085 – 0.1587 =0.5328 I recommend the TI – it is faster and more direct. Copyright © 2009 Pearson Education, Inc. Slide 1- 65 Another way with StatCrunch Under Data, select “Compute Expression” Type the expression as shown. The answer will be added to the first nonempty column (0.5328023). I still recommend the TI. Copyright © 2009 Pearson Education, Inc. Slide 1- 66 Let’s verify the 68-95-99.7 rule with the TI! It turns out that in a Normal model: about 68% of the values fall within one standard deviation of the mean; about 95% of the values fall within two standard deviations of the mean; Normalcdf((-)1,1)=.6826894809 Normalcdf((-)2,2)=.954499876 about 99.7% (almost all!) of the values fall within three standard deviations of the mean. Normalcdf((-3),3)=.9973000656 Copyright © 2009 Pearson Education, Inc. Slide 1- 67 From Percentiles to Scores: z in Reverse Sometimes we start with areas and need to find the corresponding z-score or even the original data value. Example: What z-score represents the first quartile in a Normal model? Copyright © 2009 Pearson Education, Inc. Slide 1- 68 Z in reverse with the TI. SAT Math scores are normally distributed with a mean score of 500 and a standard deviation of 100. Great Eastern Technical University brags that they will consider for admission only students in the 90th percentile of the population as measured by the SAT Math scores. What is the cutoff for admission consideration at GETU? There is a TI function, InvNorm “Inverse Normal”. This function goes the other way from normalcdf. Copyright © 2009 Pearson Education, Inc. Slide 1- 69 SAT score with InvNorm The arguments are InvNorm(Pct,Mean,StDev) Because InvNorm takes only Percentile, it gives the score corresponding to the area from the percentile to the extreme left side. InvNorm(0.90,500,100) = 628.155. Since SAT scores are typically reported in units of 10, a 630 is required for admission consideration at GWTU. Copyright © 2009 Pearson Education, Inc. Slide 1- 70 Using InvNorm Enter 2nd, then DIST, then InvNorm Old Operating System: Enter “.9,500,100” (this includes entering the commas) Close Parentheses You will then nave “invNorm(.9,500,100)” entered. New Operating System: Fill in as shown below: Copyright © 2009 Pearson Education, Inc. Slide 1- 71 Using InvNorm Notice that the default mean is 0, standard deviation is 1. Leaving it this way will get the z-score corresponding to the 90th percentile, i.e. InvNorm(0.9) = 1.2816. This will be very useful when we get to Unit 3. Copyright © 2009 Pearson Education, Inc. Slide 1- 72 Z in reverse with the TI. SAT Math scores are normally distributed with a mean score of 500 and a standard deviation of 100. Great Western Technical University brags that they will consider for admission only students in the top 5% of the population as measured by the SAT Math scores. What is the cutoff for admission consideration at GWTU? Note that the top 5% is the bottom 95%. InvNorm(0.95,500,100) = 664.48 Again, if SAT reports in units of 10, you would need an SAT Math score of 670 for consideration at GWTU. Copyright © 2009 Pearson Education, Inc. Slide 1- 73 What z-scores correspond to the middle 95%? Copyright © 2009 Pearson Education, Inc. Slide 1- 74 Middle 95% The z-score cutoffs for the middle 95% are +z and –z. How to find z? Issue: InvNorm goes only from a cutoff to the extreme left. We need to “fudge” to accommodate InvNorm! There is 0.95 in the middle, plus 0.025 on the extreme left. InvNorm(0.975) is 1.959963 or 1.96. This is extremely important for Unit 3. You need to understand this and keep it in mind. Copyright © 2009 Pearson Education, Inc. Slide 1- 75 z in Reverse with StatCrunch Start out the same way as before This time, use the right-hand side and look for the answer on the left (1.959964) Copyright © 2009 Pearson Education, Inc. Slide 1- 76 An Application to Test Scores QUESTION 1 The SAT Verbal has a mean of 500 and a standard deviation of 100. Pat got 650 on the SAT Verbal. How well did Pat do? Copyright © 2009 Pearson Education, Inc. Slide 1- 77 An Application to Test Scores ANSWER – actually, two answers! If we standardize Pat’s score, we get 650 – 500 100 Or a z-score of +1.5. That is, Pat’s score is 1.5 standard deviations above the mean SAT score of 500. Only people who have had statistics think in terms of z-scores, so let’s figure a percentile. We use normalcdf(?,1.5). ?? On the TI, use a low lower bound such as -999. Normalcdf((-)999,1.5)=93.32%, a respectable job. Copyright © 2009 Pearson Education, Inc. Slide 1- 78 An Application to Test Scores Tell: If Pat got a 650 on the SAT Verbal, his score was in the 93.32 percentile. Copyright © 2009 Pearson Education, Inc. Slide 1- 79 An Application to Test Scores QUESTION 2 One college that Pat is considering requires the ACT. Pat took it as well and got a 27. The ACT has a mean of 21 and a standard deviation of 4.7 How well did Pat do? As well as the SAT? ANSWER Standardizing Pat’s score, we get 27 – 21 4.7 Or a z-score of +1.28. Not quite as good. As before, on the TI, use a low lower bound such as -.999. For a percentile, Normalcdf((-)999,1.28)=89.97%, still a good showing. Copyright © 2009 Pearson Education, Inc. Slide 1- 80 An Application to Test Scores Tell: If Pat got a 27 on the ACT, which is N(21,4.7), then Pat’s score was in the 89.97 percentile. This score was not as good as his SAT score, which was in the 93.32 percentile. Copyright © 2009 Pearson Education, Inc. Slide 1- 81 An Application to Test Scores QUESTION 3 How well would Pat have to do on the ACT to match his percentile (93.32) and equivalent z-score (1.5) on the SAT? ANSWER Remembering our standardization, we have (X is Pat’s ACT score) 1.5 = X – 21 4.7 We manipulate to get X (1.5)*(4.7)+21 = 27.05! Even though Pat just missed it with the 27, 28 is needed since ACT scores are reported in whole numbers! Copyright © 2009 Pearson Education, Inc. Slide 1- 82 An Application to Test Scores Tell: In order to do as well on the ACT as on the SAT, Pat would need an ACT score of 28. Copyright © 2009 Pearson Education, Inc. Slide 1- 83 Shortcut with the TI If we have, for example, N(500,100) and Pat’s 650, we do not have to compute the z-score. Use normalcdf with two more parameters as shown. (-999,1.5) vs (-999,650,500,100) Copyright © 2009 Pearson Education, Inc. Slide 1- 84 Going the other way Question 4: Chris scored in the 60th percentile. (a) What is Chris’s z-score? (b) What is Chris’s SAT score? We can use the TI function InvNorm. InvNorm(0.6) = 0.2533. Then 0.2533 = (x – 500)/100 Then x = 525 Shortcut: InvNorm(0.6,500,100) = 525.335 Copyright © 2009 Pearson Education, Inc. Slide 1- 85 Copyright © 2009 Pearson Education, Inc. Slide 1- 86 Are You Normal? Normal Probability Plots When you actually have your own data, you must check to see whether a Normal model is reasonable. Looking at a histogram of the data is a good way to check that the underlying distribution is roughly unimodal and symmetric. Copyright © 2009 Pearson Education, Inc. Slide 1- 87 Are You Normal? Normal Probability Plots (cont) A more specialized graphical display that can help you decide whether a Normal model is appropriate is the Normal probability plot. If the distribution of the data is roughly Normal, the Normal probability plot approximates a diagonal straight line. Deviations from a straight line indicate that the distribution is not Normal. Copyright © 2009 Pearson Education, Inc. Slide 1- 88 Are You Normal? Normal Probability Plots (cont) Nearly Normal data have a histogram and a Normal probability plot that look somewhat like this example: Copyright © 2009 Pearson Education, Inc. Slide 1- 89 Are You Normal? Normal Probability Plots (cont) A skewed distribution might have a histogram and Normal probability plot like this: Copyright © 2009 Pearson Education, Inc. Slide 1- 90 Normal probability plots with the TI Assume that your data are in L1 (the data from Chapter 5 are there.) Data: 62, 63, 65, 66, 68, 70, 71, 73, 75 Choose X:[60,80],Y:[-4,4] Press 2nd, Y= to brig up “STAT PLOT” Pick one of the plots 1 through 3; say 1 and turn it on – make sure the others are off. The TYPE is the one at the lower right (the squiggly one) Be sure that L1 is after “Data List” Select ZOOM, then type 9 Copyright © 2009 Pearson Education, Inc. Slide 1- 91 Normal Probability Plot – StatCrunch (called a QQ plot) Assume that your data are in the first column Select Graph, then QQ Plot Select the column where your data are. Continue on as in all of the other StatCrunch graphs. The graph comes up, but with the normal scale on the x-axis and the data on the y-axis. This is the opposite of how most books do it! Again, I would recommend the TI. Copyright © 2009 Pearson Education, Inc. Slide 1- 92 Copyright © 2009 Pearson Education, Inc. Slide 1- 93 Copyright © 2009 Pearson Education, Inc. Slide 1- 94 Publisher’s Instructions: Normal Probability (QQ) Plot QQ Plot Displays the sample quantiles of a variable versus the quantiles of a standard normal distribution. Select the column(s) to be displayed in the plot(s). Enter an optional Where clause to specify the data rows to be included in the computation. Select an optional Group by column to generate a separate QQ plot for each distinct value of the Group by column. Click the Next button to specify graph layout options. Click the Create Graph! button to create the plot(s). Copyright © 2009 Pearson Education, Inc. Slide 1- 95 What Can Go Wrong? Don’t use a Normal model when the distribution is not unimodal and symmetric. Copyright © 2009 Pearson Education, Inc. Slide 1- 96 An example – incorrect z-score Below : µ = 0.5; σ = 0.288 The point 0.99 is actually at the 99th percentile. If you assume N(.5,.288), the z-score would be 1.701; percentile would be 95.56! Copyright © 2009 Pearson Education, Inc. Slide 1- 97 An example – incorrect z-score Below : µ = 0.5; σ = 0.5 The point 5.98 would be at the 95th percentile. If you assume N(.5,.5), the z-score of 5.98 would be off the charts! 0.6 0.5 0.4 0.3 0.2 0.1 0 0. 00 1. 00 2. 00 3. 00 4. 00 5. 00 6. 00 7. 00 8. 00 9. 00 10 .0 0 11 .0 0 12 .0 0 13 .0 0 14 .0 0 15 .0 0 Copyright © 2009 Pearson Education, Inc. Slide 1- 98 What Can Go Wrong? (cont.) Don’t use the mean and standard deviation when outliers are present—the mean and standard deviation can both be distorted by outliers. Don’t round your results in the middle of a calculation. Don’t worry about minor differences in results. Copyright © 2009 Pearson Education, Inc. Slide 1- 99 What have we learned? The story data can tell may be easier to understand after shifting or rescaling the data. Shifting data by adding or subtracting the same amount from each value affects measures of center and position but not measures of spread. Rescaling data by multiplying or dividing every value by a constant changes all the summary statistics—center, position, and spread. Copyright © 2009 Pearson Education, Inc. Slide 1- 100 What have we learned? (cont.) We’ve learned the power of standardizing data. Standardizing uses the SD as a ruler to measure distance from the mean (z-scores). With z-scores, we can compare values from different distributions or values based on different units. z-scores can identify unusual or surprising values among data. Copyright © 2009 Pearson Education, Inc. Slide 1- 101 What have we learned? (cont.) We’ve learned that the 68-95-99.7 Rule can be a useful rule of thumb for understanding distributions: For data that are unimodal and symmetric, about 68% fall within 1 SD of the mean, 95% fall within 2 SDs of the mean, and 99.7% fall within 3 SDs of the mean. Copyright © 2009 Pearson Education, Inc. Slide 1- 102 What have we learned? (cont.) We see the importance of Thinking about whether a method will work: Normality Assumption: We sometimes work with Normal tables (Table Z). These tables are based on the Normal model. But the TI is faster and more accurate. Data can’t be exactly Normal, so we check the Nearly Normal Condition by making a histogram (is it unimodal, symmetric and free of outliers?) or a normal probability plot (is it straight enough?). Copyright © 2009 Pearson Education, Inc. Slide 1- 103 Example from the Video The situation: A company fills cereal boxes to an average of 16 oz. There are minor variations. The standard deviation is 0.2 oz. If the company sets its standard at 16.0 oz, half of the boxes will be underweight. Suppose the machine is set at 16.3 oz. What percent of the boxes will be underweight? Copyright © 2009 Pearson Education, Inc. Slide 1- 104 Example from the Video Easy way: With the TI, normalcdf(16.3,999,16,0.2) = 0.0668 It can also be worked by hand. About 6.7%, or about 1 in 15 boxes will be underweight. Tell: If the machine is set at 16.3 oz with a st. dev. of 0.2 oz, about 1 in 15 boxes will be underweight. Copyright © 2009 Pearson Education, Inc. Slide 1- 105 Example from the Video The company decides that 1 in 15 is too high. They now want 1 in 25, or 4%. How high should the machine be set? With the TI, invNorm(0.04) = - 1.75 The machine must be set 1.75 standard deviations above the mean, or .35 above the mean. Tell: The machine must be set for a mean of 16.35 to have 4% underweight boxes.. Copyright © 2009 Pearson Education, Inc. Slide 1- 106 Example from the Video The company president wants less free cereal! The mean must be set at 16.2 and no more than 4% underweight. What to do? Change the standard deviation. We need N(16.2,σ) and we have to find σ. We know that the z-score that corresponds to 4% underweight is “-1.75”. We can then solve -1.75 = (16.2 – 16)/ σ for σ. We get σ = 0.114. Copyright © 2009 Pearson Education, Inc. Slide 1- 107 Example from the Video Tell: The company must get the machine to box cereal with a standard deviation of 0.114 ounces in order to meet the stated objective of a mean of 16.2 oz. and no more than 4% underweight boxes. Copyright © 2009 Pearson Education, Inc. Slide 1- 108 Topics in this chapter Shifting and Rescaling Data Standardized values (z-scores) Using the standard deviation as a ruler The Normal Model The 68-95-99.7 Rule Finding normal percents and the reverse Normal Probability Plots The Normality Assumption Copyright © 2009 Pearson Education, Inc. Slide 1- 109 Division of Mathematics, HCC Course Objectives for Chapter 6 After studying this chapter, the student will be able to: 20. Compare values from two different distributions using their zscores. 21. Use Normal models (when appropriate) and the 68-95-99.7 Rule to estimate the percentage of observations falling within one, two, or three standard deviations of the mean. 22. Determine the percentages of observations that satisfy certain conditions by using the Normal model and determine “extraordinary” values, and the reverse. 23. Determine whether a variable satisfies the Nearly Normal condition by making a normal probability plot or histogram. Note: It is essential that this chapter be mastered. Almost everything in Unit 3 depends on it. Copyright © 2009 Pearson Education, Inc.