Download 155S3.3 - Cape Fear Community College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Elementary mathematics wikipedia , lookup

Transcript
155S3.3_3 Measures of Variation
January 24, 2012
MAT 155
Dr. Claude Moore
Cape Fear Community College
Chapter 3
Statistics for Describing, Exploring, and Comparing Data
3­1 Review and Preview
3­2 Measures of Center
3­3 Measures of Variation
3­4 Measures of Relative Standing and Boxplots
Visit "Graphing Calculator Tutorial for Statistics" for specific instruction. http://media.pearsoncmg.com/aw/aw_mml_shared_1/gc_tutorial_stats/start.html
Key Concept The range of a set of data values is the difference between the maximum data value and the minimum data value. Discuss characteristics of variation, in particular, measures of variation, such as standard deviation, for analyzing data.
Range = ( maximum data value) ­ ( minimum data value)
Make understanding and interpreting the standard deviation a priority.
Round­Off Rule for Measures of Variation When rounding the value of a measure of variation, carry one more decimal place than is present in the original set of data.
It is very sensitive to extreme values; therefore not as useful as other measures of variation.
Round only the final answer, not values in the middle of a calculation.
1
155S3.3_3 Measures of Variation
January 24, 2012
Standard Deviation ­ Important Properties
The standard deviation of a set of sample values, denoted by s, is a measure of variation of values about the mean. It is a type of average deviation of values from the mean that is calculated by using Formula 3­4 or 3­5. Formula 3­5 is just a different version of Formula 3­4; it is algebraically the same.
Formula 3­4: Sample Standard Deviation
Formula 3­5: Shortcut formula for Sample Standard Deviation
• The standard deviation is a measure of variation of all values from the mean.
• The value of the standard deviation s is usually positive.
• The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others).
• The units of the standard deviation s are the same as the units of the original data values.
Range Rule of Thumb The standard deviation, s, is approximately equal to the range divided by 4.
Population Standard Deviation
Comparing Variation in
Different Samples
It’s a good practice to compare two sample standard deviations only when the sample means are approximately the same.
This formula is similar to the previous formula, but instead, the population mean and population size are used.
When comparing variation in samples with very different means, it is better to use the coefficient of variation, which is defined later in this section.
2
155S3.3_3 Measures of Variation
Rationale for using n – 1 versus n
There are only n – 1 independent values. With a given mean, only n – 1 values can be freely assigned any number before the last value is determined.
Dividing by n – 1 yields better results than dividing by n. It causes s2 to target σ 2 whereas division by n causes s2 to underestimate σ 2.
Chebyshev’s Theorem The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1 ­ 1/ K2, where K is any positive number greater than 1. For K = 2 and K = 3, we get the following statements: • At least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean. • At least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean. January 24, 2012
Empirical rule. This rule states that for data sets having a distribution that is approximately bell­shaped, the following properties apply. (See Figure 3­3.) • About 68% of all values fall within 1 standard deviation of the mean. • About 95% of all values fall within 2 standard deviations of the mean. • About 99.7% of all values fall within 3 standard deviations of the mean.
Find the (a) range, (b) variance, and (c) standard deviation for the given sample data. Then answer the given questions.
116/6. Tests of Child Booster Seats The National Highway Traffic Safety Administration conducted crash tests of child booster seats for cars. Listed below are results from those tests, with the measurements given in hic (standard head injury condition units). According to the safety requirement, the hic measurement should be less than 1000. Do the different child booster seats have much variation among their crash test measurements? 774 649 1210 546 431 612 S32A
The coefficient of variation (or CV) for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean, and is given by the following: Sample
Population
3
155S3.3_3 Measures of Variation
Find the (a) range, (b) variance, and (c) standard deviation for the given sample data. Then answer the given questions.
116/8. FICO Scores A simple random sample of FICO credit rating scores is listed below. As of this writing, the mean FICO score was reported to be 678. Based on these results, is a FICO score of 500 unusual ? Why or why not?
714 751 664 789 818 779 698 836 753 834 693 802
S32B
January 24, 2012
Find the (a) range, (b) variance, and (c) standard deviation for the given sample data. Then answer the given questions.
117/17. Years to Earn Bachelor’s Degree Listed below are the lengths of time ( in years) it took for a random sample of college students to earn bachelor’s degrees ( based on data from the U. S. National Center for Education Statistics). Based on these results, is it unusual for someone to earn a bachelor’s degree in 12 years? S32C
4 4 4 4 4 4 4.5 4.5 4.5 4.5 4.5 4.5 6 6 8 9 9 13 13 15
(a) range = 15 ­ 4 = 11
(b) variance ≈ (3.50563462)2 ≈ 12.29
(c) standard deviation
sx ≈ 3.50563462 ≈ 3.51
Find the (a) range, (b) variance, and (c) standard deviation for the given sample data. Then answer the given questions.
117/19. Bankruptcies Listed below are the numbers of bankruptcy filings in Dutchess County, New York State. The numbers are listed in order for each month of a recent year ( based on data from the Poughkeepsie Journal ). Identify any of the values that are unusual. S32D
59 85 98 106 120 117 97 95 143 371 14 15
118/22. In Exercises 21–24, find the coefficient of variation for each of the two sets of data, then compare the variation. (The same data were used in Section 3­ 2.) BMI for Miss America The trend of thinner Miss America winners has generated charges that the contest encourages unhealthy diet habits among young women. Listed below are body mass indexes ( BMI) for Miss America winners from two different time periods. S32E S32F
BMI for 1920s & 1930s: 20.4 21.9 22.1 22.3 20.3 18.8 18.9 19.4 18.4 19.1
BMI for recent years: 19.5 20.3 19.6 20.2 17.8 17.9 19.1 18.8 17.6 16.8
(a) range = 371 ­ 14 = 357
(b) variance ≈ (91.14424133)2 ≈ 8307.3
(c) standard deviation
sx ≈ 91.14424133 ≈ 91.1
4
155S3.3_3 Measures of Variation
118/24. In Exercises 21–24, find the coefficient of variation for each of the two sets of data, then compare the variation. (The same data were used in Section 3­ 2.) Customer Waiting Times Waiting times (in minutes) of customers at the Jefferson Valley Bank (where all customers enter a single waiting line) and the Bank of Providence ( where customers wait in individual lines at three different teller windows) are listed below. S32G S32H
Jefferson Valley ( single line): 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 7.7 7.7 Providence ( individual lines): 4.2 5.4 5.8 6.2 6.7 7.7 7.7 8.5 9.3 10.0
January 24, 2012
119/29. Finding Standard Deviation from a Frequency Distribution. In Exercises 29 and 30, find the standard deviation of sample data summarized in a frequency distribution table by using the formula below, where x represents the class midpoint, f represents the class frequency, and n represents the total number of sample values. Also, compare the computed standard deviations to these standard deviations obtained by using Formula 3­4 with the original list of data values: (Exercise 29) 3.2 mg; (Exercise 30) 12.5 beats per minute.
S33A S33B
See instructions for use of TI with frequency table http://cfcc.edu/faculty/cmoore/TI83­1­VarStatsFD.htm
Jefferson:
cv ≈ (0.4766783215 / 7.15)(100%) ≈ 6.67%
Providence:
cv ≈ (1.821629307 / 7.15)(100%) ≈ 25.48%
Based on the coefficients of variation, it seems that the Jefferson Valley (single line) is more consistent in waiting time. There is less variation in waiting time at Jefferson.
119/30. Finding Standard Deviation from a Frequency Distribution. In Exercises 29 and 30, find the standard deviation of sample data summarized in a frequency distribution table by using the formula below, where x represents the class midpoint, f represents the class frequency, and n represents the total number of sample values. Also, compare the computed standard deviations to these standard deviations obtained by using Formula 3­4 with the original list of data values: (Exercise 29) 3.2 mg; (Exercise 30) 12.5 beats per minute.
S32I S32J
We see that sx = 12.29707074 is just a little smaller than that for the actual data values where std = 12.5.
Remember that calculations with a frequency distribution are approximations.
See instructions for use of TI with frequency table http://cfcc.edu/faculty/cmoore/TI83­1­VarStatsFD.htm
5