Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Review • Sections 2.5-2.7 • Qualitative Numerical Statistics – Outliers – Measures of variability • Range • Variance • Standard Deviation 1 Review R=Max-Min x x n 2 s 2 (x x) s s i n 1 2 2 i i n 1 2 2 Example – Using Standard Deviation Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within a) one standard deviation from the mean i.e. ( x s, x s ) b) two standard deviations from the mean i.e. ( x 2 s, x 2 s ) c) three standard deviations from the mean i.e. ( x 3s, x 3s) 3 Example – Using Standard Deviation Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within a) one standard deviation from the mean i.e. (70.4 16.7, 70.4 16.7) b) two standard deviations from the mean i.e. ( x 2 s, x 2 s ) c) three standard deviations from the mean i.e. ( x 3s, x 3s) 4 Example – Using Standard Deviation Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within a) one standard deviation from the mean i.e. (70.4 16.7, 70.4 16.7) (53.7, 87.1) b) two standard deviations from the mean i.e. ( x 2 s, x 2 s ) c) three standard deviations from the mean i.e. ( x 3s, x 3s) 5 Example – Using Standard Deviation Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within a) one standard deviation from the mean i.e. 59, 70, 73, 75, 81, 84, 86 b) two standard deviations from the mean i.e. (70.4 2(16.7), 70.4 2(16.7)) (37.0, 103.8) c) three standard deviations from the mean i.e. ( x 3s, x 3s) 6 Example – Using Standard Deviation Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within a) one standard deviation from the mean i.e. 59, 70, 73, 75, 81, 84, 86 b) two standard deviations from the mean i.e. 59, 70, 73, 75, 81, 84, 86 c) three standard deviations from the mean i.e. (70.4 3(16.7), 70.4 3(16.7)) (21.3, 120.5) 7 Example – Using Standard Deviation Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. Work out which data points are within a) one standard deviation from the mean i.e. 59, 70, 73, 75, 81, 84, 86 b) two standard deviations from the mean i.e. 59, 70, 73, 75, 81, 84, 86 c) three standard deviations from the mean i.e. 35, 59, 70, 73, 75, 81, 84, 86 8 Interpreting the Standard Deviation Chebyshev’s Theorem The proportion (or fraction) of any data set lying within K standard deviations of the mean is always at least 1-1/K2, where K is any positive number greater than 1. For K=2 we obtain, at least 3/4 (75 %) of all scores will fall within 2 standard deviations of the mean, i.e. 75% of the data will fall between x 2s and x 2s 9 Interpreting the Standard Deviation Chebyshev’s Theorem The proportion (or fraction) of any data set lying within K standard deviations of the mean is always at least 1-1/K2, where K is any positive number greater than 1. For K=3 we obtain, at least 8/9 (89 %) of all scores will fall within 3 standard deviations of the mean, i.e. 89% of the data will fall between x 3s and x 3s 10 This Data is Symmetric, Bell Shaped (or Normal Data) x M Relative Frequency 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 11 This Data is Symmetric, Bell Shaped (or Normal Data) Relative Frequency 0.5 0.4 x M 0.3 0.2 0.1 0 1 2 3 4 5 12 This Data is Symmetric, Bell Shaped (or Normal Data) Relative Frequency 0.5 0.4 x M 0.3 0.2 0.1 0 1 2 3 4 5 6 13 7 The Empirical Rule The Empirical Rule states that for bell shaped (normal) data: 68% of all data points are within 1 standard deviations of the mean 95% of all data points are within 2 standard deviations of the mean 99.7% of all data points are within 3 standard deviations of the mean 14 The Empirical Rule The Empirical Rule states that for bell shaped (normal) data, approximately: 68% of all data points are within 1 standard deviations of the mean 95% of all data points are within 2 standard deviations of the mean 99.7% of all data points are within 3 standard deviations of the mean 15 Z-Score To calculate the number of standard deviations a particular point is away from the standard deviation we use the following formula. 16 Z-Score To calculate the number of standard deviations a particular point is away from the standard deviation we use the following formula. z x or xx z s The number we calculate is called the z-score of the measurement x. 17 Example – Z-score Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. a) Find the z-score of the data point 35. b) Find the z-score of the data point 73. 18 Example – Z-score Here are eight test scores from a previous Stats 201 class: 35, 59, 70, 73, 75, 81, 84, 86. The mean and standard deviation are 70.4 and 16.7, respectively. a) Find the z-score of the data point 35. z = -2.11 b) Find the z-score of the data point 73. z = 0.16 19 Interpreting Z-scores The further away the z-score is from zero the more exceptional the original score. Values of z less than -2 or greater than +2 can be considered exceptional or unusual (“a suspected outlier”). Values of z less than -3 or greater than +3 are often exceptional or unusual (“a highly suspected outlier”). 20 Percentiles Another method for detecting outliers involves percentiles. 21 Percentiles Another method for detecting outliers involves percentiles. The pth percentile ranking is a number so that p% of the measurements fall below the pth percentile and 100 – p% fall above it. 22 How to Find Percentiles 1) Rank the n points of data from lowest to highest 2) Pick a percentile ranking you want to find. Say p. 3) Compute p L n 100 – If L is a whole number, then 1/2 way between the L and L+1st number. – If L is not a whole number then round up. 23 Important Percentiles Memorize: The 25th percentile is called the lower quartile (QL) The 75th percentile is called the upper quartile (QU) 24 Important Percentiles Memorize: The 25th percentile is called the lower quartile (QL) The 75th percentile is called the upper quartile (QU) The 50th percentile is called the 25 Important Percentiles Memorize: The 25th percentile is called the lower quartile (QL) The 75th percentile is called the upper quartile (QU) The 50th percentile is called the median (M) 26 Important Percentiles The interquartile range (IQR) is defined to be: IQR = QU -QL 27 Example - Fax 28 Example - Fax Here are the number of pages faxed by each fax sent from our Math and Stats department since April 24th, in the order that they occurred. 5, 1, 2, 6, 10, 3, 6, 2, 2, 2, 2, 2, 2, 4, 5, 1, 13, 2, 5, 5, 1, 3, 6, 37, 2, 8, 2, 25 29 Example - Fax Here are the number of pages faxed by each fax sent from our Math and Stats department since April 24th, in the order that they occurred. 5, 1, 2, 6, 10, 3, 6, 2, 2, 2, 2, 2, 2, 4, 5, 1, 13, 2, 5, 5, 1, 3, 6, 37, 2, 8, 2, 25 Find the 40th percentile, QU , QL , M and IQR. 30 How to Find Percentiles 1) Rank the n points of data from lowest to highest 2) Pick a percentile ranking you want to find. Say p. 3) Compute p L n 100 – If L is not a whole number then round up. – The percentile is 1/2 way between the L and L+1st number. 31 Example - Fax 1) Rank the n points of data from lowest to highest 5, 1, 2, 6, 10, 3, 6, 2, 2, 2, 2, 2, 2, 4, 5, 1, 13, 2, 5, 5, 1, 3, 6, 37, 2, 8, 2, 25 Find the 40th percentile. 32 Example - Fax 1) Rank the n points of data from lowest to highest 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 Find the 40th percentile. 33 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 2) Pick a percentile ranking you want to find. 40% Find the 40th percentile. 34 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 2) Pick a percentile ranking you want to find. 40% 3) Compute p L n 100 35 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 2) Pick a percentile ranking you want to find. 40% 3) Compute p 40 L n 28 11.2 100 100 36 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 3) Compute p 40 L n 28 11.2 100 100 Half way between the 11th and 12th number. 37 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 3) Compute p 40 L n 28 11.2 100 100 Half way between the 11th and 12th number. Answer: 2 38 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 To compute QU and QL , M. Find the Median, divide the data into two equal parts and then the Medians of these. An example and more specific instructions will be done on the board. 39 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 M =3 QU = 6 QL = 2 IQR=6-2=4. 40 Percentiles Sometimes the IQR, is a better measure of variance then the standard deviation since it only depends on the center 50% of the data. That is, it is not effected at all by outliers. 41 Percentiles Sometimes the IQR, is a better measure of variance then the standard deviation since it only depends on the center 50% of the data. That is, it is not effected at all by outliers. To use the IQR as a measure of variance we need to find the Five Number Summary of the data and then construct a Box Plot. 42 Five Number Summary and Outliers The Five Number Summary of a data set consists of five numbers, – MIN, QL , M, QU , Max 43 Five Number Summary and Outliers The Five Number Summary of a data set consists of five numbers, – MIN, QL , M, QU , Max Suspected Outliers lie – Above 1.5 IQRs but below 3 IQRs from the Upper Quartile – Below 1.5 IQRs but above 3 IQRs from the Lower Quartile Highly Suspected Outliers lie – Above 3 IQRs from the Upper Quartile – Below 3 IQRs from the Lower Quartile. 44 Five Number Summary and Outliers The Inner Fences are: – data between the Upper Quartile and 1.5 IQRs above the Upper Quartile and – data between the Lower Quartile and 1.5 IQRs below the Lower Quartile The Outer Fences are: – data between 1.5 IQRs above the Upper Quartile and 3 IQRs above the Upper Quartile and – data between 1.5 IQRs Lower Quartile and 3 45 IQRs below the Lower Quartile Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 Min=1, QL = 2, M = 3, QU = 6, Max = 37. IQR=6-2=4. 46 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 Min=1, QL = 2, M = 3, QU = 6, Max = 37. IQR=6-2=4. Inner Fence extremes: -4, 12 47 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 Min=1, QL = 2, M = 3, QU = 6, Max = 37. IQR=6-2=4. Inner Fence extremes: -4, 12 Outer Fence extremes: -10, 18 48 Example - Fax 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37 Min=1, QL = 2, M = 3, QU = 6, Max = 37. IQR=6-2=4. Inner Fence extremes: -4, 12 Outer Fence extremes: -8, 18 Suspected Outliers: 13 Highly Suspected Outliers: 25, 37 49 Definition: Boxplot A boxplot is a graph of lines (from lowest point inside the lower inner fence to highest point in the upper inner fence) and boxes (from Lower Quartile to Upper quartile) indicating the position of the median. * Lowest data Point more than the lower inner fence Outliers Median Lower Quartile Highest data Point less than Upper the upper inner Quartile fence 50 Homework • Read Chapter 3 • Assignment 1 due Next Week • Problems… 51 Problems • Problems (z-score) 2.90, 2.102, 2.128 • Problems (percentiles) 2.125, 2.126 52 Example: Aptitude tests Before being accepted into a manufacturing job, one must complete two aptitude tests. Your score on the tests will decide whether you will be in management or whether you will work on the factory floor. One test is a manual dexterity test, the other is a statistics test. The manual dexterity test (out of 10) has a mean of 6 and a standard deviation of 1. The statistics test (out of 50) has a mean of 25 with a standard deviation of 3. Your score is 7/10 on the manual dexterity test, and a 34/50 on the statistics test. In which test were 53 you exceptional? Example: Aptitude tests The problem with comparing the two test scores stems from the fact that the tests are on two different scales. If we are going to do meaningful comparisons, then we must somehow, standardize the scores. 54 Answer Calculate the z-score for the two tests. – Z-score of Man. Dex. – Z-score of Stats. = (7-6)/1 = 1 = (34-25)/3 = 3 55 2.90 a. x 8.24 s 2 3.36 s 1.83 56 2.90 a. x 8.24 s 2 3.36 s 1.83 b. x s (6.41, 10.07). This contains 18 points x 2s (4.58, 11.90). This contains 24 points x 3s (2.75, 13.73). This contains 25 points 57 2.90 b. x s (6.41, 10.07). This contains 18 points x 2s (4.58, 11.90). This contains 24 points x 3s (2.75, 13.73). This contains 25 points c. The percentages are 72%, 96% and 100%. These are relatively close to the percentages of the Empirical Rule and better than those given by Chebyshev’s Rule. They agree with both rules. 58 2.90 c. The percentages are 72%, 96% and 100%. These are relatively close to the percentages of the Empirical Rule and better than those given by Chebyshev’s Rule. They agree with both rules. d. R=12-5=7. This is not close to the value of s. 59 2.102 60 2.102 By Chebeshev’s Rule at least 8/9 or 88% of the trees fall within 3 standard deviations of the mean. 61 2.102 By Chebeshev’s Rule at least 8/9 or 89% of the trees fall within 3 standard deviations of the mean. Here that means 89% of the trees are between 21 and 39 feet ( 30 3(3) feet). Hence there at most (11%)*(5000trees)=555 trees that are over 40 feet. Therefore the buyer should not buy. 62 2.128 63 2.128 a. z x 64 2.128 a. z x 175 79 4.17 23 65 2.128 a. z x 175 79 4.17 23 b. Yes this is an outlier since z>3. 66 2.128 a. z x 175 79 4.17 23 b. Yes this is an outlier since z>3. c. - A rare event - An unusual event happened that week: -Running a free clinic -Training students - A new family that require LA moved near by 67 2.125 68 2.125 a. 4 b. 6; 3 c. 3 d. 69 2.125 a. 4 b. 6; 3 c. 3 d. Skewed to the right 70 2.125 a. 4 b. 6; 3 c. 3 d. Skewed to the right e. 50%, 75% 71 2.125 a. 4 b. 6; 3 c. 3 d. Skewed to the right e. 50%, 75% f. 12, 13, 16 72 2.126 a. 73 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 74 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 Min 85 Max 196 Qu = 172 QL= 151 M = 165 75 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 Min 85 Inner Fence Limits: Max 196 172+1.5(21)=203.5 Qu = 172 151-1.5(21)=119.5 QL= 151 M = 165 76 IQR =21 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 Min 85 Inner Fence Limits: Max 196 172+1.5(21)=203.5 Qu = 172 151-1.5(21)=119.5 QL= 151 M = 165 77 IQR =21 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 Min 85 Inner Fence Limits: Max 196 172+1.5(21)=203.5 Qu = 172 151-1.5(21)=119.5 QL= 151 Outer Fence Limits: M = 165 172+3(21)=235 78 IQR =21 151-3(21)=88 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 Min 85 Inner Fence Limits: Max 196 172+1.5(21)=203.5 Qu = 172 151-1.5(21)=119.5 QL= 151 Outer Fence Limits: M = 165 172+3(21)=235 79 IQR =21 151-3(21)=88 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 80 100 120 140 160 180 80 200 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 80 100 120 140 160 180 81 200 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 * 80 100 120 140 160 180 82 200 2.126 a. 85 100 121 142 145 157 158 159 161 163 165 166 170 171 171 172 172 173 184 187 196 * Suspected Outlier Highly Suspected Outlier * 80 100 120 140 160 180 83 200