Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6 The Standard Deviation as a Ruler and the Normal Model Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Normal Distributions Many real-life histograms have a symmetric, unimodal and tapered shape. Example: Here is the earthquakes distribution again .... Tuesday, June 18, 13 Normal Distributions • Example: The distribution of college age males’ heights Tuesday, June 18, 13 The Normal Model • George Box: – “All models are wrong, but some are useful.” • If data are unimodal and roughly symmetric, then the normal model is appropriate. – Written N(µ, σ) • µ is the mean for the model • σ is the SD for the model Tuesday, June 18, 13 The Standard Deviation as a Ruler QUESTION: How do we compare very different looking Normal Distributions - some wide, some narrow, some flat and some tall? ANSWER: Use standard deviations as our rulers. The standard deviation tells us how the whole collection of values varies, so it’s a natural ruler for comparing an individual to a group. It is the most common measure of variation. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 3 Standardizing with z-scores We introduce the z-score. We compare individual data values to their group’s mean value, relative to their standard deviation: How far is a data point from it’s group’s mean? Subtract the value from the mean: (y - mean) How many standard deviations is that distance? Divide the distance by s: (y-mean) / s Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 4 Standardizing with z-scores (cont.) Standardized values have no units. z-scores measure the distance of each data value from the mean in multiples of standard deviations. A negative z-score tells us that the data value is below the mean (to the left) A positive z-score tells us that the data value is above the mean (to the right). Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 5 Benefits of Standardizing Standardized values have been converted from their original units to the standard statistical unit of standard deviations from the mean. Thus, we can compare values that are measured on different scales, with different units, or from different populations. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 6 Example of Z-Score Comparisons Q: Which is more overweight - a 13 lbs Chihuahua or a 118 lb Doberman Pincer? Chihuahua: mean weight = 7 lb, SD = 1.2 lb z-score = (13 - 7) / 1.2 = 5.0 Doberman: mean weight = 107 lb, SD = 24 lb z-score = (118 - 107) / 24 = 0.46 So we see that the Chihuahua is far more overweight than the Doberman. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Ex 2: Compare Female to Male Height • Who is relatively ‘taller’; A 65” female or a 70” male? • Standardize the Results – Compute the z-score… – Calculates the distance of a value from the mean in standard deviations • Females: Mean = 64.08, SD = 3.422 z = (65 - 64.08) / 3.422 = 0.29 Tuesday, June 18, 13 • Males: Mean = 70.49, SD = 4.080 z = (70 - 70.49) / 4.080 = -0.12 Female Tuesday, June 18, 13 Male Why the z-score works SHIFTING DATA Adding a constant to every data value adds the same constant to measures of position. SO ... adding a constant to each value will increase position (center, quartiles, max or min) by the same amount. Its shape (range, IQR, standard deviation) remain unchanged. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 7 Shifting Data (cont.) EXAMPLE: The following histograms show a shift from men’s actual weights to kilograms above recommended weight: Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 8 SCALING DATA When we multiply all the data values by any constant, all measures of position (such as the mean, median, and percentiles) and all measures of spread (such as the range, the IQR, and the standard deviation) are multiplied by that same constant. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 9 SCALING DATA (cont.) EXAMPLE: The men’s weight data set measured weights in kilograms. If we want to think about these weights in pounds, we would rescale the data: Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 10 Back to z-scores Standardizing data into z-scores both (i) shifts the data by subtracting the mean and (ii) rescales the values by dividing by their standard deviation. Standardizing into z-scores does not change the shape of the distribution. Standardizing into z-scores changes the center so that the mean is 0. Standardizing into z-scores changes the spread so that the standard deviation 1. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 11 When Is a z-score BIG? A z-score gives us an indication of how many standard deviations a data point is from it’s mean Remember that a negative z-score tells us that the data value is below the mean . . . While a positive z-score tells us that the data value is above the mean. The larger a z-score is (negative or positive), the more unusual it is. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 12 Models There is a Normal model for every possible combination of mean and standard deviation. We write N(µ,σ) to represent a Normal model with a mean of µ and a standard deviation of σ. We use Greek letters because this mean and standard deviation do not come from data—they are numbers (called parameters) that specify the model. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 14 When Is a z-score Big? (cont.) Summaries of data, like the sample mean and standard deviation, are written with Latin letters. Such summaries of data are called statistics. When we standardize Normal data, we still call the standardized value a z-score, and we write Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 15 When Is a z-score Big? (cont.) Once we have standardized, we need only one model: The N(0,1) model is called the standard Normal model (or the standard Normal distribution). Be careful—don’t use a Normal model for just any data set, since standardizing does not change the shape of the distribution. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 16 When Is a z-score Big? (cont.) When we use the Normal model, we are assuming the distribution is Normal. We often cannot check this assumption in practice, so we check the following condition: Nearly Normal Condition: The shape of the data’s distribution is unimodal and symmetric. This condition can be checked by making a histogram or a Normality plot (to be explained later). Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 17 The 68-95-99.7 Rule of Thumb It turns out that in a Normal model: 68% of the values fall within one SD of the mean; 95% of the values fall within two SD of the mean; and, 99.7% (almost all!) of the values fall within three SD of the mean. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 19 The 68-95-99.7 Rule (cont.) The following shows what the 68-95-99.7 Rule tells us (visualize your histogram embedded within): Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 20 The First Three Rules for Working with Normal Models Make a picture. Make a picture. Make a picture. And, when we have data, make a histogram to check the Nearly Normal Condition to make sure we can use the Normal model to model the distribution. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 21 Finding Normal Percentiles by Hand When a data value doesn’t fall exactly 1, 2, or 3 standard deviations from the mean, we can look it up in a table of Normal percentiles. Table Z in Appendix D provides us with normal percentiles, but Many calculators and statistics computer packages provide these as well. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 22 Example – Bone Density Test A bone mineral density test can be helpful in identifying the presence of osteoporosis. The result of the test is commonly measured as a z score, which has a normal distribution with a mean of 0 and a standard deviation of 1. A randomly selected adult undergoes a bone density test. Q: Find the percentage of people that result is a bone density reading less than +1.27. Copyright © 2014, 2012, 2010 Pearson Education, Inc. Tuesday, June 18, 13 Section 6.2- Example – continued The percentage is the AREA under the ND curve to the LEFT of the given z-score, that is that is less than 1.27. Copyright © 2014, 2012, 2010 Pearson Education, Inc. Tuesday, June 18, 13 Section 6.2- Look at Table A-2 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Tuesday, June 18, 13 Section 6.2- Example – continued The percentage of randomly sampled adults having a bone density less than 1.27 is 89.8% Copyright © 2014, 2012, 2010 Pearson Education, Inc. Tuesday, June 18, 13 Section 6.2- Example #2 Using the same bone density test, find the percentage of adults with a bone density above –1.00 (which is considered to be in the “normal” range of bone density readings. The percentage of randomly sampled adults having a bone density above –1 is 84.1% Copyright © 2014, 2012, 2010 Pearson Education, Inc. Tuesday, June 18, 13 Section 6.2- Example #3 What percentage of adults have a bone density reading between –1.00 and –2.50 (these subjects have osteopenia). 1. The area to the left of z = –2.50 is 0.0062. 2. The area to the left of z = –1.00 is 0.1587. 3. The area between z = –2.50 and z = –1.00 is the difference between the areas found above. The percentage of randomly sampled adults with osteopenia is © 2014, 2012, 2010 Pearson Education, Inc. Section 6.20.1587 - 0.0062Copyright = 0.1525 or 15.25% Tuesday, June 18, 13 Notation denotes the percentage of z scores between a and b. denotes the percentage of z scores greater than a. denotes the percentage of z scores less than a. Copyright © 2014, 2012, 2010 Pearson Education, Inc. Tuesday, June 18, 13 Section 6.2- Quiz: EPA mpg We now cover every which way we can use the z-table and z-score to find what we need. We just need to know three things: 1st - We are dealing with a Normal Distribution 2nd - Mean value 3rd - Standard Deviation (SD) EXAMPLE: The EPA finds that consumer car mileage follow a normal distribution with a mean of 24 mpg and SD of 6 mpg. Q1: What is the Rule-of-Thumb breakdown? Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q1: What is the Rule-of-Thumb breakdown? A: We expect that ... 68% of cars to get between 18 and 30 mpg. 95% of cars to get between 12 and 36 mpg, and 99.7% of cars to get between 6 and 42 mpg. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q2: What percentage of cars get less than 15 mpg? Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q2: What percentage of cars get less than 15 mpg? A Find the Area to the left of 15 mpg: z-score for 15 mpg: z = ( 15 - 24 ) / 6 = -1.5 Z-Table: => 0.0668 or 6.68% Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q3: What percentage of cars get between 20 and 30 mpg? Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q3: What percentage of cars get between 20 and 30 mpg? A: Find area between 20 and 30 mpg: 30 mpg: z = (30-24) / 6 = +1.00 => 0.8413 area to the left 20 mpg: z = (20-24) / 6 = -0.67 => 0.2514 area to the left In-between: 0.8413 - 0.2514 = 0.5889 or 58.89% of cars Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q4: What percentage of cars get better than 40 mpg? Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q4: What percentage of cars get better than 40 mpg? A: Find the area to the left of 40 mpg and subtract from 1.000 40 mpg: z= (40-30) / 6 = 2.67 => 0.9962 area to the left Area to the right = 1.000 - 0.9962 = 0.0038 or 0.38% of cars Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q5: What mpg separates the lower 20% or cars from the upper 80%? Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q5: What mpg separates the lower 20% or cars from the upper 80%? A: Do a reverse table lookup to get the z-score, then a reverse z-score to the mpg value: Body of Z-Table: 0.2005 => z = -0.84 z-score formula: -0.84 = ( y - 24) / 6 solve for y: 18.95 mpg Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q6: Find the 3rd Quartile? Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q6: Find the 3rd Quartile? I.E. the 75th percentile. A: Reverse lookup 75%, or an area of 0.7500, then solve for the mpg: Body of Z Table: 0.7486 => z = 0.67 Z score in reverse: 0.67 = ( y - 24 ) / 6 solve for y: 28.0 mpg Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q7: Find the mpg for the top 5% of cars? Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Quiz: EPA mpg Q7: Find the mpg for the top 5% of cars? A: Find the z-score for the lower 95% Reverse lookup: 0.9500 => z = +1.645 reverse z-score: 1.645 = (y - 24) / 6 solve for y: 33.87mpg Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Reverse Table Lookup: From Percentiles to Scores Sometimes we start with areas and need to find the corresponding z-score and maybe even the original data value. Example: What z-score represents the first quartile in a Normal model? Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 24 From Percentiles to Scores: z in Reverse (cont.) Look in body of the Table Z for an area of 0.2500 The exact area is not there, but 0.2514 is pretty close. Identify the row and column: This figure is associated with z = -0.60, and 0.07 respectively Put these numbers together: z = - 0.67 SO ... the first quartile is 0.67 SD below the mean. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 25 Software: Finding Normal Percentiles • Given a particular range of values, what percent of values fall in that range – MAKE A PICTURE FIRST!!!!! • Minitab > Graph > Probability Distribution Plot > View Probability – Distribution: Normal, Enter Mean, Enter SD – Shaded Area: • ‘Probability’ if you’re given the % • ‘X Value’ if you want the % • Select the appropriate picture • Enter the value • Ex: IQ scores are nearly normal with mean 100 and SD 16. Find the percent of IQ scores over 80. Tuesday, June 18, 13 Just Checking: 0.894 • IQ Scores Again… Mean 100, SD 16 1. 2. Find the percent of scores below 90 Find the percent of scores between 112 and 132 Tuesday, June 18, 13 Checking NORMALITY: Are You Normal? Normal Probability Plots When you actually have your own data, you must check to see whether a Normal model is reasonable. Looking at a histogram of the data is a good way to check that the underlying distribution is roughly unimodal and symmetric. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 26 Are You Normal? Normal Probability Plots (cont) Try software: A more specialized graphical display that can help you decide whether a Normal model is appropriate is the Normal probability plot. If the distribution of the data is roughly Normal, the Normal probability plot approximates a diagonal straight line. Deviations from a straight line indicate that the distribution is not Normal. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 27 Are You Normal? Normal Probability Plots (cont) Nearly Normal data have a histogram and a Normal probability plot that look somewhat like this example: Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 28 Are You Normal? Normal Probability Plots (cont) A skewed distribution might have a histogram and Normal probability plot like this: Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 29 Does the Normal Model Apply? • Check #1: Histogram – Make sure unimodal, roughly symmetric • Check #2: Normal Probability Plot – Minitab > Graph > Probability Plot • Single • Enter the variable – The Normal Probability Plot plots the data values against what we would expect under a perfect normal model – Look for: Points that form a roughly straight, diagonal line – Note: This graph is slightly different from your book, but same general idea. You can do a lot of work in the options to get it to look identical (not worth it). • http://www.canyons.edu/faculty/morrowa/140/datasets – Use the Fall 2009 COC Math 140 Survey Results (cleaned heights) – Determine if the normal model applies for height, age, and gpa Tuesday, June 18, 13 Some more examples Tuesday, June 18, 13 What Can Go Wrong? Don’t use a Normal model when the distribution is not unimodal and symmetric. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 30 What Can Go Wrong? (cont.) Don’t use the mean and standard deviation when outliers are present—the mean and standard deviation can both be distorted by outliers. Don’t round your results in the middle of a calculation. Don’t worry about minor differences in results. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 31 What have we learned? The story data can tell may be easier to understand after shifting or rescaling the data. Shifting data by adding or subtracting the same amount from each value affects measures of center and position but not measures of spread. Rescaling data by multiplying or dividing every value by a constant changes all the summary statistics—center, position, and spread. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 32 What have we learned? (cont.) We’ve learned the power of standardizing data. Standardizing uses the SD as a ruler to measure distance from the mean (z-scores). With z-scores, we can compare values from different distributions or values based on different units. z-scores can identify unusual or surprising values among data. Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 33 What have we learned? (cont.) We see the importance of Thinking about whether a method will work: Normality Assumption: We sometimes work with Normal tables (Table Z). These tables are based on the Normal model. Data can’t be exactly Normal, so we check the Nearly Normal Condition by making a histogram (is it unimodal, symmetric and free of outliers?) or a normal probability plot (is it straight enough?). Copyright © 2009 Pearson Education, Inc. Tuesday, June 18, 13 Slide 1- 35