* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Section 3.2 - USC Upstate: Faculty
Survey
Document related concepts
Transcript
Section 3.2 Measures of Variation Range Standard Deviation Variance 3.2 / 1 The Range • The range is the difference between the largest and smallest values of a distribution. • Example: Find the range: 10, 13, 17, 17, 18 The range = largest minus smallest = 18 -10 = 8 3.2 / 2 The Standard Deviation The standard variation is a measure of the average variation of the data entries from the mean. Standard deviation of a sample s (x x) n 1 n = sample size 2 mean of the sample 3.2 / 3 To calculate standard deviation of a sample • Calculate the mean of the sample. • Find the difference between each entry (x) and the mean. These differences will add up to zero. • Square the deviations from the mean. • Sum the squares of the deviations from the mean. • Divide the sum by (n 1) to get the variance. • Take the square root of the variance to get the 2 • standard deviation. s (x x) n 1 3.2 / 4 The Variance The variance is the square of the standard deviation Variance of a Sample s 2 (x x ) 2 n 1 3.2 / 5 Example Find the standard deviation and variance x 30 26 22 78 xx 4 0 -4 Mean = 26 The variance s2 2 ( x x ) n 1 (x - x) Sum = 0 2 16 0 16 ___ 32 The standard deviation = 32 2 =16 s = 16 4 6 Example Find the mean, the standard deviation and variance x xx (x - x) mean = 5 4 1 1 5 0 0 5 0 0 7 2 4 4 1 1 Σx =25 ( x x) 2 2 6 3.2 / 7 Example cont. Mean = 5 S tan dard deviation s ( x x )2 n 1 1 .5 1 .22 6 1.5 1.22 4 Variance s 1.5 6 1 .5 4 2 3.2 / 8 Computation Formulas for Sample Variance and Standard Deviation: 2 x x n 2 Sample variance s 2 n 1 Sample standard devaition s To find Σx2 To find ( Σx ) 2 2 x 2 x n1 n Square the x values, then add. Sum the x values, then square. 3.2 / 9 Use the computing formulas to find s and s2 x 4 x2 16 5 25 5 25 7 4 25 s 49 16 131 2 s 2 x 2 x n n 1 131 625 s 2 51 2 x 5 1.5 x n 1 2 n s 1.5 1.22 10 Population Mean population mean x N where N number of data values in the population Population Standard Deviation x 2 N where N number of data values in the population 3.2 / 11 Coefficient Of Variation • The disadvantage of the standard deviation as a comparative measure of variation is that it depends on the units of measurement. This means that it is difficult to use the standard deviation to compare measurements from different populations. • For this reason, statisticians have defined the coefficient of variation, which expresses the standard deviation as a percentage of the sample or population mean. 3.2 / 12 Coefficient Of Variation: • The coefficient of variation is a measurement of the relative variability (or consistency) of data. s CV 100 or 100 x • Notice that the numerator and denominator in the definition of CV have the same units, so CV itself has no units of measurement. This give us the advantage of being able to directly compare the variability of two different populations using the coefficient of variation. 3.2 / 13 CV is used to compare variability or consistency A sample of newborn infants had a mean weight of 6.2 pounds with a standard deviation of 1 pound. A sample of three-month-old children had a mean weight of 10.5 pounds with a standard deviation of 1.5 pound. Which (newborns or 3-month-olds) are more variable in weight? 3.2 / 14 To compare variability, compare Coefficient of Variation • For newborns: CV = 16% • For 3-month-olds: CV = 14% Higher CV: more variable Lower CV: more consistent Use Coefficient of Variation • You may wish to compare two groups of data, to answer: – Which is more consistent? – Which is more variable? 3.2 / 15 Example A local fishing store sells spinners (a type of fishing lure). The store has only 8 different types of spinners for sale. The prices (in dollars) are 2.10 1.95 2.60 2.00 1.85 2.25 2.15 2.25 Find the coefficient of variation Solution a. Compute the mean and standard deviation of the population μ = $2.14 and σ = $0.22 3.2 / 16 Example cont. b. Compare the CV of prices and comment on the meaning of the results. 0.22 CV x100 x100 .1028 x100 10.28% 2.14 The CV can be though of as a measure of the spread of the data relative to the average of the data. Since the fishing store is very small, it carries a small selection of spinners that are all priced similarly. The CV tells us that the standard deviation of the spinner prices is only 10.28% from the mean. 3.2 / 17 Example A large fishing store in Nebraska has a broad selection of spinners. The prices of a random sample of 10 spinners are 1.69 1.49 3.09 1.79 1.39 2.89 1.49 1.39 1.49 1.99 a. Use the calculator to compute x and s x $1.87 and s = $0.62 b. Compute the CV for the spinner prices 0.62 CV x100 x100 .3316 x100 33.16% 1.87 3.2 / 18 Example cont. Compare the mean, standard deviation, and CV for the spinner prices at the two fishing stores. Comment on the differences. The CV for Nebraska store is three times more than the CV from the previous example. First, because the fishing store in the previous example is small, and tends to have higher prices (larger μ). Second, it has limited selection of spinners with a smaller variation of price. 3.2 / 19 Shebyshev’s Theorem The spread of dispersion of a set of data about the mean will be small if the standard deviation is small, and it will be large if the standard deviation is large. If we are dealing with a symmetrical bell-shaped distribution, then we can make very definite statements about the proportion of the data that must lie within a certain number of standard deviations on either side of the mean. However, the concept of data spread about the mean can be expressed quite generally for all data distributions (skewed, symmetric, or other shape) by using the remarkable theorem of Chebyshev. 3.2 / 20 CHEBYSHEV'S THEOREM For any set of data and for any number k, greater than one, the proportion of the data that lies within k standard deviations of the mean is at least: 1 1 k 2 3.2 / 21 Results of Chebyshev’s theorem 1 1 1 1 1 0.75 75 2 2 k 2 4 • For k = 2: or at least 75% of the data fall in the interval from 1 • from 2 to 2 (between 2 St Deviations) • For K = 3 at least 88.9% (between 3 St Deviations) • • For K = 4 at least 93.8% (between 4 St Deviations) 3.2 / 22 Using Chebyshev’s Theorem • A mathematics class completes an examination and it is found that the class mean is 77 and the standard deviation is 6. • According to Chebyshev's Theorem, between what two values would at least 75% of the grades be? 3.2 / 23 Mean = 77 Standard deviation = 6 At least 75% of the grades would be in the interval: x 2 s to x 2 s 77 – 2(6) to 77 + 2(6) 77 – 12 to 77 + 12 65 to 89 Assignment 5 3.2 / 24 Entering Data (Calc.) Data is stored in Lists on the calculator. Locate and press the STAT button on the calculator. Choose EDIT. The calculator will display the first three of six lists (columns) for entering data. Simply type your data and press ENTER. Use your arrow keys to move between lists. Data can also be entered from the home screen using set notation -- {15, 22, 32, 31, 52, 41, 11} → L1 (where → is the STO key) • Data can be entered in a second list based upon the information in a previous list. In the example below, we will double all of our data values in L1 and store them in L2. If you arrow up ONTO L2, you can enter a formula for generating L2. The formula will appear at the bottom of the screen. Press ENTER and the new list is created. 3.2 / 25 Clearing Data (Calc.) • To clear all data from a list: Press STAT. From the EDIT menu, move the cursor up ONTO the name of the list (L1). Press CLEAR. Move the cursor down. NOTE: The list entries will not disappear until the cursor is moved down. (Avoid pressing DEL as it will delete the entire column. If this happens, you can reinstate the column by pressing STAT #5 SetUpEditor.) • You may also clear a list by choosing option #4 under the EDIT menu, ClrList. ClrList will appear on the home screen waiting for you to enter which list to clear. Enter the name of a list by pressing the 2nd button and the yellow L1 (above the 1). To clear an individual entry: Select the value and press DEL. 3.2 / 26 Sorting Data (Calc.) • Sorting Data: (helpful when finding the mode) Locate and press the STAT button. Choose option #2, SortA(. Specify the list you wish to sort by pressing the 2nd button and the yellow L1 list name. Press ENTER and the list will be put in ascending order (lowest to highest). SortD will put the list in descending order. • One Variable Statistical Calculations: Press the STAT button. Choose CALC at the top. Select 1-Var Stats. Notice that you are now on the home screen. Specify the list you wish to use by choosing the 2nd button and the list name: Press ENTER and view the calculations. Use the down arrow to view all of the information. • 3.2 / 27 One Variable Statistical Calculations (Calc.) = mean x = the sum of the data x 2 = the sum of the squares of the data = the sample standard deviation sx = the population standard deviation x = the sample size (# of pieces of data) n min X = the smallest data entry = data at the first quartile Q1 med = data at the median (second quartile) = data at the third quartile Q3 max X = the largest data entry x 3.2 / 28 Measures of Dispersion (Calc) Range, Standard Deviation, Variance, Mean Absolute Deviation • Problem: For the data set {10, 12, 40, 35, 14, 24, 13, 21, 42, 30}, find the range, the standard deviation, the variance, and the mean absolute deviation to the nearest hundredth. • A quick reminder before we begin the solution: In statistics, the population form is used when the data being analyzed includes the entire set of possible data. The sample form is used when the data is a random sample taken from the entire set of data. You should use population form unless you know that you are working with a random sample of the data. 3.2 / 29 Measures of Dispersion cont. (Calc) • To find the range: • To find the range: Enter the data, as is, into L1. You can enter the list on the home screen and "store" to L1, or you can go directly to L1 (2nd STAT, #1 Edit). • Sort the list to quickly retrieve the highest and lowest values for the range. (2nd STAT, #2 SortA). You can choose ascending or descending. Read the high and low values from L1 for computing the range. Range = 42 - 10 = 32. • OR: To find the range: Do not sort. Simply type on the home screen using the min and max functions found under MATH → NUM #6 min and #7 max. 3.2 / 30 Range = 32 Measures of Dispersion cont. (Calc) • To find standard deviation: • To find standard deviation: Since this question deals with the complete set, we will be using "population" form, not sample form. • Go to one-variable stats for "population" standard deviation. STAT → CALC #1 1-Var Stats • • NOTE! The standard deviations found in the CATALOG, stdDev, and also found by 2nd LIST → MATH #7 stdDev are both Sample standard deviations. • Population Standard Deviation = 11.43 3.2 / 31 Measures of Dispersion cont. (Calc) To find variance: To find variance: The "population" variance is the square of the population standard deviation. The symbol is under VARS - #5 Statistics NOTE! The variance found in the CATALOG and also found by 2nd List → MATH #8 variancePopulation are both 1Sample variances. MAD | x x | n To find mean absolute deviation: To find mean absolute deviation: To calculate the mean absolute deviation you will have to enter the formula. n i 1 i 1 n Population MAD | xi x | n i1 Mean Absolute Deviation = 10.12 3.2 / 32 Measures of Dispersion cont. (Calc) NOTE! Be sure that you have run 1-Var Stats (under STAT - CALC #1) first, so that the calculator will have computed . Otherwise, you will get an error from this formula. x and n are found under VARS #5 Statistics. Sum and abs are quickly found in CATALOG. Sum is also under 2nd LIST - MATH #5 sum. abs is also under MATH - NUM #1abs. OR: To find mean absolute deviation: A longer, but workable, solution can also be accomplished using the lists. As stated above, run 1-Var Stats so the calculator will compute . Now, go to L2 (STAT #1 EDIT) and move UP onto L2. Type, at the bottom of the window, the portion of the formula that finds the difference between each data entry and the mean, using absolute value to make these distances positive. Now, find the mean, , of L2 by using 1-Var Stats on L2, and read the answer of 10.12. 3.2 / 33 Measures of Dispersion on Grouped Data Problem: Data Entry Frequency 100 8 150 15 200 21 250 14 300 5 For the data set shown in this table, find the range, the standard deviation, and the variance to the nearest hundredth. Since this question deals with the complete set, we will be using "population" form, not sample form. For central tendency on grouped data, see Mean, Mode, Median with Grouped Data. 3.2 / 34 Measures of Dispersion on Grouped Data • Solution: • To find the range: No need for calculator work for the range. It is easily observed from the table. Range = 300 - 100 = 200. To find standard deviation: Remember, we are looking for "population" form which will be found using 1-Var Stats. • Enter the "Data Entry" into L1 and the "Frequency" into L2. Go to one-variable stats to find "population" standard deviation. STAT → CALC #1 1-Var Stats Be sure to use parameters L1, L2 to indicate both the values AND their frequencies. • NOTE! The standard deviation found in the CATALOG, stdDev, and also found by 2nd LIST → MATH #7 stdDev are both Sample standard deviations. 3.2 / 35 Population Standard Deviation = 56.42 Measures of Dispersion on Grouped Data To find variance: The "population" variance is the square of the population standard deviation. The symbol is under VARS - #5 Statistics NOTE! The variance found in the CATALOG and also found by 2nd List → MATH #8 variance are both Sample variances. Population Variance = 3183.42 3.2 / 36