Download sample standard deviation

Statistics Numerical Representation of Data Part 2 – Measure of Variation Warm –up Approximate the mean of the frequency distribution. Class 1– 6 7 – 12 13 – 18 19 – 24 Frequency, f 21 16 28 13 Warm-up - What can be said about the relationship between the mean and median in the dotplot below? a) b) c) d) The mean is smaller than the median. The mean is bigger than the median. The mean is equal to the median. Nothing can be determined based on the graph. Warm-up - An investor was interested in determining how much gain she had in her 401K plan in the last 6 quarters. The data is listed below. Find the median and the mean of the data. -510 110 1230 1900 -680 1700 a) Mean = 1021.7 b) Mean = 1021.7 c) Mean = 625 d) Mean = 625 e) Mean = 625 Median = 670 Median = 1565 Median = 3.5 Median = 670 Median = 1565 Agenda      Warm-up Homework Review Lesson Objectives  Determine the range of a data set  Determine the variance and standard deviation of a population and of a sample  Use the Empirical Rule and Chebychev’s Theorem to interpret standard deviation  Approximate the sample standard deviation for grouped data Summary Homework Measures of Variation Consider the following: Wait time in minutes:  Bank 1 Median 7.20 Mean 7.15 Complaints/mo 3 Bank 2 7.20 7.15 22 Objective 1  Compute the range of a variable from raw data 3-7 Range Range  The difference between the maximum and minimum data entries in the set.  The data must be quantitative.  Range = (Max. data entry) – (Min. data entry) Example: Finding the Range A corporation hired 10 graduates. The starting salaries for each graduate are shown. Find the range of the starting salaries. Starting salaries (1000s of dollars) 41 38 39 45 47 41 44 41 37 42 Solution: Finding the Range  Ordering the data helps to find the least and greatest salaries. 37 38 39 41 41 41 42 44 45 47 minimum  Range = (Max. salary) – (Min. salary) = 47 – 37 = 10 maximu m The range of starting salaries is 10 or $10,000. EXAMPLE Finding the Range of a Set of Data The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company. 23, 36, 23, 18, 5, 26, 43 Find the range. Range = 43 – 5 = 38 minutes 3-11 Objective 2  Compute the variance of a variable from raw data 3-12 Deviation, Variance, and Standard Deviation Deviation  The difference between the data entry, x, and the mean of the data set.  Population data set:   Deviation of x = x – μ Sample data set:  Deviation of x = x – x Example: Finding the Deviation A corporation hired 10 graduates. The starting salaries for each graduate are shown. Find the deviation of the starting salaries. Starting salaries (1000s of dollars) 41 38 39 45 47 41 44 41 37 42 Solution: • First determine the mean starting salary. x 415    41.5 N 10 Solution: Finding the Deviation  Determine the deviation for each data entry. Salary ($1000s), x Deviation: x – μ 41 41 – 41.5 = –0.5 38 38 – 41.5 = –3.5 39 39 – 41.5 = –2.5 45 45 – 41.5 = 3.5 47 47 – 41.5 = 5.5 41 41 – 41.5 = –0.5 44 44 – 41.5 = 2.5 41 41 – 41.5 = –0.5 37 37 – 41.5 = –4.5 42 Σx = 415 42 – 41.5 = 0.5 Σ(x – μ) = 0 To order food at a McDonald’s Restaurant, or at Wendy’s Restaurant, you must stand in line. The following data represent the wait time (in minutes) in line for a simple random sample of 30 customers at each restaurant during the lunch hour. For each sample, answer the following: (a) What was the mean wait time? (b) Draw a histogram of each restaurant’s wait time. (c ) Which restaurant’s wait time appears more dispersed? Which line would you prefer to wait in? Why? 3-16 Wait Time at Wendy’s 1.50 2.53 1.88 3.99 0.90 0.79 1.20 2.94 1.90 1.23 1.01 1.46 1.40 1.00 0.92 1.66 0.89 1.33 1.54 1.09 0.94 0.95 1.20 0.99 1.72 0.67 0.90 0.84 0.35 2.00 Wait Time at McDonald’s 3.50 0.00 1.97 0.00 3.08 0.00 0.26 0.71 0.28 2.75 0.38 0.14 2.22 0.44 0.36 0.43 0.60 4.54 1.38 3.10 1.82 2.33 0.80 0.92 2.19 3.04 2.54 0.50 1.17 0.23 3-17 (a) The mean wait time in each line is 1.39 minutes. 3-18 (b) 3-19 The population variance of a variable is the sum of squared deviations about the population mean divided by the number of observations in the population, N. That is it is the mean of the sum of the squared deviations about the population mean. 3-20 The population variance is symbolically represented by σ2 (lower case Greek sigma squared). Note: When using the above formula, do not round until the last computation. Use as many decimals as allowed by your calculator in order to avoid round off errors. 3-21 EXAMPLE Computing a Population Variance The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company. 23, 36, 23, 18, 5, 26, 43 Compute the population variance of this data. Recall that 174   24.85714 7 3-22 xi μ xi – μ (xi – μ)2 23 36 23 18 24.85714 24.85714 24.85714 24.85714 -1.85714 11.14286 -1.85714 -6.85714 3.44898 124.1633 3.44898 47.02041 5 26 43 24.85714 24.85714 24.85714 -19.8571 1.142857 18.14286 394.3061 1.306122 329.1633  x    i  2 x     i N 2 2  902.8571 902.8571   129.0 minutes2 7 3-23 The Computational Formula 3-24 EXAMPLE Computing a Population Variance Using the Computational Formula The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company. 23, 36, 23, 18, 5, 26, 43 Compute the population variance of this data using the computational formula. 3-25 Data Set : 23, 36, 23, 18, 5, 26, 43 2 2 2 2 x  23  36  ...  43  5228 i x i  23  36  ...  43  174 2  2 x i x    i N N 2 1742 5228  7  7  129.0 3-26 The sample variance is computed by determining the sum of squared deviations about the sample mean and then dividing this result by n – 1. 3-27 Note: Whenever a statistic consistently overestimates or underestimates a parameter, it is called biased. To obtain an unbiased estimate of the population variance, we divide the sum of the squared deviations about the mean by n - 1. 3-28 EXAMPLE Computing a Sample Variance Previously, we obtained the following simple random sample for the travel time data: 5, 36, 26. Compute the sample variance travel time. Travel Time, xi Sample Mean, Deviation about the Mean, Squared Deviations about the Mean,  x  x 2 x xi  x 5 22.333 5 – 22.333 = -17.333 (-17.333)2 = 300.432889 36 22.333 13.667 186.786889 26 22.333 3.667 13.446889 i  x  x i s 2 x  x    i n 1 2  500.66667 2  500.66667 3 1  250.333 square minutes 3-29 Objective 3  Compute the standard deviation of a variable from raw data 3-30 The population standard deviation is denoted by It is obtained by taking the square root of the population variance, so that The sample standard deviation is denoted by s It is obtained by taking the square root of the sample variance, so that s  s2 3-31 EXAMPLE Computing a Population Standard Deviation The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company. 23, 36, 23, 18, 5, 26, 43 Compute the population standard deviation of this data. Recall, from the last objective that σ2 = 129.0 minutes2. Therefore, 902.8571   2   11.4 minutes 7 3-32 EXAMPLE Computing a Sample Standard Deviation Recall the sample data 5, 26, 36 results in a sample variance of s2   xi  x n 1  2  500.66667 3 1  250.333 square minutes Use this result to determine the sample standard deviation. s  s2  500.666667  15.8 minutes 3 1 3-33 EXAMPLE Comparing Standard Deviations Determine the standard deviation waiting time for Wendy’s and McDonald’s. Which is larger? Why? 3-34 Wait Time at Wendy’s 1.50 2.53 1.88 3.99 0.90 0.79 1.20 2.94 1.90 1.23 1.01 1.46 1.40 1.00 0.92 1.66 0.89 1.33 1.54 1.09 0.94 0.95 1.20 0.99 1.72 0.67 0.90 0.84 0.35 2.00 Wait Time at McDonald’s 3.50 0.00 1.97 0.00 3.08 0.00 0.26 0.71 0.28 2.75 0.38 0.14 2.22 0.44 0.36 0.43 0.60 4.54 1.38 3.10 1.82 2.33 0.80 0.92 2.19 3.04 2.54 0.50 1.17 0.23 3-35 EXAMPLE Comparing Standard Deviations Determine the standard deviation waiting time for Wendy’s and McDonald’s. Which is larger? Why? Sample standard deviation for Wendy’s: 0.738 minutes Sample standard deviation for McDonald’s: 1.265 minutes 3-36 Deviation, Variance, and Standard Deviation Population Variance ( x   )   N 2  2 Sum of squares, SSx Population Standard Deviation 2  ( x   )   2  N  Finding the Population Variance & Standard Deviation In Words 1. Find the mean of the population data set. In Symbols x  N 2. Find deviation of each entry. x–μ 3. Square each deviation. (x – μ)2 4. Add to get the sum of squares. SSx = Σ(x – μ)2 Finding the Population Variance & Standard Deviation In Words In Symbols 5. Divide by N to get the population variance. 2  ( x   ) 2  N 6. Find the square root to get the population standard deviation. ( x   ) 2  N Example: Finding the Population Standard Deviation A corporation hired 10 graduates. The starting salaries for each graduate are shown. Find the population variance and standard deviation of the starting salaries. Starting salaries (1000s of dollars) 41 38 39 45 47 41 44 41 37 42 Recall μ = 41.5. Solution: Finding the Population Standard Deviation   Determine SSx Salary, x N = 10 41 Deviation: x – μ Squares: (x – μ)2 41 – 41.5 = –0.5 (–0.5)2 = 0.25 38 38 – 41.5 = –3.5 (–3.5)2 = 12.25 39 39 – 41.5 = –2.5 (–2.5)2 = 6.25 45 45 – 41.5 = 3.5 (3.5)2 = 12.25 47 47 – 41.5 = 5.5 (5.5)2 = 30.25 41 41 – 41.5 = –0.5 (–0.5)2 = 0.25 44 44 – 41.5 = 2.5 (2.5)2 = 6.25 41 41 – 41.5 = –0.5 (–0.5)2 = 0.25 37 37 – 41.5 = –4.5 (–4.5)2 = 20.25 42 42 – 41.5 = 0.5 (0.5)2 = 0.25 Σ(x – μ) = 0 SSx = 88.5 Solution: Finding the Population Standard Deviation Population Variance ( x   ) 88.5   8.9 •   N 10 2 2 Population Standard Deviation •    2  8.85  3.0 The population standard deviation is about 3.0, or $3000. Deviation, Variance, and Standard Deviation Sample Variance ( x  x ) s  n 1 2 2  Sample Standard Deviation 2  ( x  x ) s  s2  n 1  Finding the Sample Variance & Standard Deviation In Words In Symbols 1. Find the mean of the sample data set. x 2. Find deviation of each entry. xx 3. Square each deviation. ( x  x )2 4. Add to get the sum of squares. SS x  ( x  x ) 2 x n Finding the Sample Variance & Standard Deviation In Words 5. Divide by n – 1 to get the sample variance. 6. Find the square root to get the sample standard deviation. In Symbols 2  ( x  x ) s2  n 1 ( x  x ) 2 s n 1 Example: Finding the Sample Standard Deviation The starting salaries are for the Chicago branches of a corporation. The corporation has several other branches, and you plan to use the starting salaries of the Chicago branches to estimate the starting salaries for the larger population. Find the sample standard deviation of the starting salaries. Starting salaries (1000s of dollars) 41 38 39 45 47 41 44 41 37 42 Solution: Finding the Sample Standard Deviation   Determine SSx n = 10 Deviation: x – μ Squares: (x – μ)2 41 41 – 41.5 = –0.5 (–0.5)2 = 0.25 38 38 – 41.5 = –3.5 (–3.5)2 = 12.25 39 39 – 41.5 = –2.5 (–2.5)2 = 6.25 45 45 – 41.5 = 3.5 (3.5)2 = 12.25 47 47 – 41.5 = 5.5 (5.5)2 = 30.25 41 41 – 41.5 = –0.5 (–0.5)2 = 0.25 44 44 – 41.5 = 2.5 (2.5)2 = 6.25 41 41 – 41.5 = –0.5 (–0.5)2 = 0.25 37 37 – 41.5 = –4.5 (–4.5)2 = 20.25 42 42 – 41.5 = 0.5 (0.5)2 = 0.25 Salary, x Σ(x – μ) = 0 SSx = 88.5 Solution: Finding the Sample Standard Deviation Sample Variance ( x  x ) 88.5   9.8 • s  n 1 10  1 2 2 Sample Standard Deviation 88.5  3.1 • s s  9 2 The sample standard deviation is about 3.1, or $3100. Example: Using Technology to Find the Standard Deviation Sample office rental rates (in dollars per square foot per year) for Miami’s central business district are shown in the table. Use a calculator or a computer to find the mean rental rate and the sample standard deviation. (Adapted from: Cushman & Wakefield Inc.) Office Rental Rates 35.00 33.50 37.00 23.75 26.50 31.25 36.50 40.00 32.00 39.25 37.50 34.75 37.75 37.25 36.75 27.00 35.75 26.00 37.00 29.00 40.50 24.50 33.00 38.00 Solution: Using Technology to Find the Standard Deviation Sample Mean Sample Standard Deviation Interpreting Standard Deviation   Standard deviation is a measure of the typical amount an entry deviates from the mean. The more the entries are spread out, the greater the standard deviation. Estimating Standard Deviation  Range Rule of Thumb   S = R/4 A quick estimation tool to determine if the standard deviation calculation is approximately correct Interpreting Standard Deviation: Empirical Rule (68 – 95 – 99.7 Rule) For data with a (symmetric) bell-shaped distribution, the standard deviation has the following characteristics: • About 68% of the data lie within one standard deviation of the mean. • About 95% of the data lie within two standard deviations of the mean. • About 99.7% of the data lie within three standard deviations of the mean. Interpreting Standard Deviation: Empirical Rule (68 – 95 – 99.7 Rule) 99.7% within 3 standard deviations 95% within 2 standard deviations 68% within 1 standard deviation 34% 2.35% 2.35% x  3s 34% 13.5% x  2s 13.5% x s x xs x  2s x  3s Standard Deviation and Area  Zσ 1σ 1.645σ 1.960σ 2σ 2.576σ 3σ 3.2906σ 4σ 5σ 6σ Percentage within CI 68.2689492% 90% 95% 95.4499736% 99% 99.7300204% 99.9% 99.993666% 99.9999426697% 99.9999998027% Percentage outside CI 31.7310508% 10% 5% 4.5500264% 1% 0.2699796% 0.1% 0.006334% 0.0000573303% 0.0000001973%  7σ 99.9999999997440% 0.0000000002560%           Ratio outside CI 1 / 3.1514871 1 / 10 1 / 20 1 / 21.977894 1 / 100 1 / 370.398 1 / 1000 1 / 15,788 1 / 1,744,278 1 / 506,800,000 1 / 390,600,000,000 Example: Using the Empirical Rule In a survey conducted by the National Center for Health Statistics, the sample mean height of women in the United States (ages 20-29) was 64 inches, with a sample standard deviation of 2.71 inches. Estimate the percent of the women whose heights are between 64 inches and 69.42 inches. Solution: Using the Empirical Rule • Because the distribution is bell-shaped, you can use the Empirical Rule. 34% 13.5% 55.87 x  3s 58.58 x  2s 61.29 x s 64 x 66.71 xs 69.42 x  2s 72.13 x  3s 34% + 13.5% = 47.5% of women are between 64 and 69.42 inches tall. Chebychev’s Theorem  The portion of any data set lying within k standard deviations (k > 1) of the mean is at 1 least: 1 k2 1 3 1  2  or 75% • k = 2: In any data set, at least 2 4 of the data lie within 2 standard deviations of the mean. 1 8 1  2  or 88.9% • k = 3: In any data set, at least 3 9 of the data lie within 3 standard deviations of the mean. Example: Using Chebychev’s Theorem The age distribution for Florida is shown in the histogram. Apply Chebychev’s Theorem to the data using k = 2. What can you conclude? Solution: Using Chebychev’s Theorem k = 2: μ – 2σ = 39.2 – 2(24.8) = -10.4 (use 0 since age can’t be negative) μ + 2σ = 39.2 + 2(24.8) = 88.8 At least 75% of the population of Florida is between 0 and 88.8 years old. Standard Deviation for Grouped Data Sample standard deviation for a frequency distribution   ( x  x ) 2 f s n 1 where n= Σf (the number of entries in the data set) When a frequency distribution has classes, estimate the sample mean and standard deviation by using the midpoint of each class. Example: Finding the Standard Deviation for Grouped Data You collect a random sample of the number of children per household in a region. Find the sample mean and the sample standard deviation of the data set. Number of Children in 50 Households 1 3 1 1 1 1 2 2 1 0 1 1 0 0 0 1 5 0 3 6 3 0 3 1 1 1 1 6 0 1 3 6 6 1 2 2 3 0 1 1 4 1 1 2 2 0 3 0 2 4 Solution: Finding the Standard Deviation for Grouped Data   First construct a frequency distribution. Find the mean of the frequency distribution. xf 91 x   1.8 n 50 The sample mean is about 1.8 children. x f xf 0 10 0(10) = 0 1 19 1(19) = 19 2 7 2(7) = 14 3 7 3(7) =21 4 2 4(2) = 8 5 1 5(1) = 5 6 4 6(4) = 24 Σf = 50 Σ(xf )= 91 Coefficient of Variation The coefficient of variation (or CV) for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean. This is a measure of the variability of the data. The higher the percentage, the more variable. Sample CV = s  100% x Population CV =   100%  Coefficient of Variation Example  The average score in a calculus class is 110, with a standard deviation of 5; the average score in a statistics class is 106, with a standard deviation of 4. Which class is more variable in terms of scores?  CVc= 5/110 = 4.5% ; CVs = 4/106 = 3.8% The calculus class is more variable.  Summary     Determined the range of a data set Determined the variance and standard deviation of a population and of a sample Used the Empirical Rule and Chebychev’s Theorem to interpret standard deviation Approximated the sample standard deviation for grouped data Homework   Pt 1 – Pg. 84 – 86; # 1-21 odd Pt 2 – Pg. 86 -91; # 23 – 49 odd

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download sample standard deviation