Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics Chapter 2 Name: 2.4 Measures of Variation Learning objectives: 1. How to find the range of a data set 2. How to find the variance and standard deviation of a population and of a sample 3. How to use the Empirical Rule and Chebychevs Theorem to interpret standard deviation 4. How to approximate the sample standard deviation for grouped data 5. How to use the coefficient of variation to compare variation in different data sets Range 1. The difference between the maximum and minimum data entries in the set. 2. The data must be quantitative. 3. Range = (Max. data entry) − (Min. data entry) A corporation hired 6 graduates. The starting salaries for each graduate are shown. Find the range of the starting salaries. Starting salaries (1000s of dollars) 41 38 39 45 47 41 Deviation 1. The difference between the data entry, x, and the mean of the data set. 2. Population data set: • Deviation of x = x − µ 3. Sample data set: • Deviation of x = x − x̄ Example Find the deviation of the starting salaries. Formulas for Population Variance and Standard Deviation P (x − µ)2 2 Population Variance σ = N This gives us a formula for the ”average deviation” for population data. However, the units of variance is equal to the units of the data raised to the second power. For example, if x is in dollars, then the units on variance will be dollars squared. rP √ (x − µ)2 Population Standard Deviation σ = σ2 = N If we take the square root of the variance we get σ, the population standard deviation — a parameter that has the same units as numbers in the data set. How to find the Population Variance & Standard Deviation P x 1. Find the mean of the population data set, µ = N 2. Find deviation of each entry, x − µ 3. Square each deviation, (x − µ)2 . P 4. Add to get the sum of squares, (x − µ)2 P 5. Divide by N to get the population variance, (x − µ)2 N rP 6. Find the square root to get the population standard deviation, (x − µ)2 N A corporation hired 6 graduates. The starting salaries for each graduate are shown. Find the population variance and standard deviation of the starting salaries. Starting salaries (1000s of dollars) 41 38 39 45 47 41 Formulas for Sample Variance and Standard Deviation P (x − x̄)2 2 Sample Variance s = n−1 rP √ (x − x̄)2 Sample Standard Deviation s = s2 = n−1 How to find the Sample Variance & Standard Deviation P x 1. Find the mean of the sample data set, x̄ = n 2. Find deviation of each entry, x − x̄ 3. Square each deviation, (x − x̄)2 . P 4. Add to get the sum of squares, (x − x̄)2 P (x − x̄)2 5. Divide by (n − 1) to get the sample variance, n−1 rP 6. Find the square root to get the sample standard deviation, (x − x̄)2 n−1 A corporation hired 6 graduates. The starting salaries for each graduate are shown. Find the sample variance and standard deviation of the starting salaries. Starting salaries (1000s of dollars) 41 38 39 45 47 41 Try This! You are asked to compare three data sets (dotplots below). Without calculating, determine which data set has the greatest sample standard deviation and which has the least sample standard deviation. Entry Deviation Squares Data entries that lie more than two standard deviations x x−µ (x − µ)2 from the mean are considered unusual, while those that 1 -3 9 lie more than three standard deviations from the mean 3 -1 1 are very unusual. Unusual and very unusual entries have 5 1 1 a greater influence on the standard deviation than entries 7 3 9 closer to the mean. This happens because the deviations are squared. Consider the data in the above table. The squares of the deviations of the entries farther from the mean (1 and 7) have a greater influence on the value of the standard deviation than those closer to the mean (3 and 5). Try This! A sample of 500 monthly utility bills for households in a city was collected. The mean of the sample was $70 and the sample standard deviation was $8. Here is a short list of a few of the 500 measurements from the sample: $74, $52, $62, $98 Are any of the data entries unusual or very unusual? Explain your reasoning. The Empirical Rule For data with a (symmetric) bell-shaped distribution, the standard deviation has the following characteristics: • About 68% of the data lie within one standard deviation of the mean. • About 95% of the data lie within two standard deviations of the mean. • About 99.7% of the data lie within three standard deviations of the mean. 4. Example: The mean IQ score of students in a particular calculus class is 110, with a standard deviation of 5. (Assume the data set has a bell-shaped distribution.) a) Use the Empirical Rule to find the percentage of students with an IQ above 120. b) Use the Empirical Rule to find the percentage of students with an IQ between 100 and 110 c) Use the Empirical Rule to find the percentage of students with an IQ between 105 and 120 5. Try This! : In a survey conducted by the National Center for Health Statistics, the sample mean height of women in the United States (ages 20-29) was 64.3 inches, with a sample standard deviation of 2.62 inches. Use the Empirical Rule to estimate the percent of the women whose heights are between 59.06 inches and 64.3 inches. Chebychev’s Theorem 1. The percentage of any data set lying within k standard deviations (where k is any 1 number greater than one) of the mean is at least: 1 − 2 · 100% k 3 1 2. For example, when k = 2: In any data set, at least 1 − 2 = or 75% of the data lie 2 4 within 2 standard deviations of the mean. 8 1 3. When k = 3: In any data set, at least 1 − 2 = or 88.9% of the data lie within 3 3 9 standard deviations of the mean. 4. k can be any number greater than one Example The mean time in the finals for the women’s 800-meter freestyle at the 2012 Summer Olympics was 502.84 seconds, with a standard deviation of 4.68 seconds. Apply Chebychev’s Theorem to the data using k = 2. Interpret the results. Try This! The mean time in the finals for the women’s 800-meter freestyle at the 2012 Summer Olympics was 502.84 seconds, with a standard deviation of 4.68 seconds. Apply Chebychev’s Theorem to the data using k = 1.5. Interpret the results. Example From a sample with n=48, the mean cost of purchasing a home in a major city was $520,000 and the standard deviation was $40,000. Using Chebychev’s Theorem, determine at least how many of the homes cost between $460,000 and $580,000? Try This! Heights of adult women have a mean of 63.6 in. and a standard deviation of 2.5 in. Does Chebyshev’s Theorem say about the percentage of women with heights between 58.6 in. and 68.6 in.? At least how many women in a sample of 50 would have heights between 58.6 in. and 68.6 in.?