Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
z-scores: Using Standard Deviation as a Ruler The trick in comparing very different-looking values is to use standard deviations as our rulers. The standard deviation tells us how the whole collection of values varies, so it’s a natural ruler for comparing an individual to a group. As the most common measure of variation, the standard deviation plays a crucial role in how we look at data. Remember: Standard Deviation (s) is a measure of spread. It approximately measures the average distance of each data point from the mean. Standardizing with z-scores A z-score measures how many standard deviations a number is from the mean. Example: 6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30 n = 14 (there are 14 numbers) Mean: y 16.6 Minimum: Q1: Median: Q3: Maximum: 5 12 16.5 22 27 Because the mean (16.6) is very close to the median (16.5), there is a good chance that the data is symmetric. To be sure, we can look at the histogram of the data. (above) With symmetric data, the standard deviation is the appropriate measure of spread. The standard deviation: s ≈ 6.6 6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30 Because 10 is exactly 1 standard deviation away from the mean, the z-score corresponding to the number 10 would be z = -1 (The negative is because the number is to the left of the mean.) The z-score corresponding to the number 20 would be approximately 0.5 or ½ . Standardizing with z-scores We compare individual data values to their mean, relative to their standard deviation using the following formula: y y z s We call the resulting values standardized values, denoted as z. They can also be called zscores Standardized values have no units. z-scores measure the distance of each data value from the mean in standard deviations. A negative z-score tells us that the data value is below the mean, while a positive z-score tells us that the data value is above the mean. Example: Use the formula z y y to standardize the following data by converting to z-scores. s 6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30 First, what is the mean? y 6 y y s z 8 y What is the standard deviation? 10 12 13 s 15 The Standard Deviation as a Ruler z-scores measure the number of standard deviations a number is from the mean. A positive z-score means that the datum is to the of the mean. A negative z-score means that the datum is to the of the mean. If a z-score is near zero (close to the mean) that indicates that the datum is Most z-scores fall between -2 and 2. A z-score higher than 2 or less than -2 are unusual. (95% of the data in a set that is normally distributed are less than 2 standard deviations away from the mean) If a z-score has a value higher than 3 or lower than -3, then the corresponding datum is very unusual. (97.7% of the data in a set that is normally distributed are less than 3 standard deviations away from the mean) Normal Curve Normally distributed data has a histogram that looks similar to the bell curve below. As you can see, most of the data is near the mean (where z = 0). z Example: The z-scores for the data set: 2, 5, 7, 8, 13 are as follows: Benefits of Standardizing (Converting to z-scores) Standardized values have been converted from their original units to the standard statistical unit of standard deviations from the mean. Thus, we can compare values that are measured on different scales, with different units, or from different populations. Example: JaNathan earned a 93% on a test in Mr. Kane’s class. The test scores for that test were normally distributed with a mean of 75 and a standard deviation of 12. During the football season, JaNathan ran the 40-yard dash in 4.5 seconds. The mean time for the team in the 40-yd dash was normally distributed with a mean of 5.1 seconds and a standard deviation of 0.33. Which is more impressive, JaNathan’s 93% test score or his 4.5 second 40-yard dash time? Because the two numbers we are trying to compare use different units, we need to standardize the units (convert them to z-scores) before we can compare them. Convert each to z-scores (show work) Test Score 93 Running time 4.5 Because the 4.5 second run time has a z-score that is further away from zero than the z-score of the 93% test score, it is more impressive that JaNathan ran the 40-yard dash in 4.5 seconds. (Keep in mind that a negative z-score for run time is good because we want our run time to be less than the mean run time.. Also, a positive z-score for test grade is good because we want to score higher than the mean on a test.) What have we learned? We’ve learned the power of standardizing data. o Standardizing uses the Standard Deviation as a ruler to measure distance from the mean (z-scores). o With z-scores, we can compare values from different distributions or values based on different units. o z-scores can identify unusual or surprising values among data.