Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Review: Central Measures Mean, Median and Mode When do we use mean or median? • • • median mode C B A A A A B A B C If there is (are) outliers, use Median If there is no outlier, use Mean. Example: • For a data 1, 1.2, 1.5, 1.7, 1.8, 1.9, 2.3, 2.5, 2.8, 3. mean A Which one is more appropriate? A B C • For a data 1, 1.2, 1.5, 1.7, 1.8, 1.9, 2.3, 2.5, 2.8, 3, 10, 40, which one is more appropriate? Relationship between the central measures A Agresti/Franklin Statistics, 1 of 25 Section 2.4 C B CAgresti/Franklin Statistics, 2 of 25 Measuring Spread How Can We Describe the Spread of Quantitative Data? Range: difference between the largest and smallest observations. IQR =Interquartile Range =3rd quartile-1st quartile • IQR is robust to outliers since it is the difference of two medians. Standard deviation Agresti/Franklin Statistics, 3 of 25 Agresti/Franklin Statistics, 4 of 25 Identify the minimum and maximum sugar values: a. 2 and 14 c. 1 and 15 b. d. 1 and 3 0 and 16 mode • • • • Standard Deviation Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations s= dot plot Mean=8.8, median=10, Q1=3, Q3=12 Min=1 Max=15 Range=15-1=14 Agresti/Franklin Statistics, 5 of 25 ( x − x )2 n −1 Agresti/Franklin Statistics, 6 of 25 1 Sample Standard Deviation (Shortcut Formula) n (Σ Σx2) - (Σ Σx)2 n (n - 1) s= Example: Publix check-out waiting times in minutes Data: 1, 4, 10. Find the sample mean and sample standard deviation. x−x ( x − x )2 x n=3 1 4 10 15 ∑x Formula 2-5 1−5=-4 16 1 25 42 ∑ ( x − x )2 -1 5 ∑ (x − x ) 2 s= Agresti/Franklin Statistics, 7 of 25 Data: 1, 4, 10. Find the sample mean and sample standard deviation Using the shortcut formula: s = = = = ∑ x 2 − (∑ ) 2 x n (n − 1) 3 ( 117 ) − (15 3 (3 − 1) − 225 6 351 21 = 42 = 21 = 4.6 min 3 −1 Standard Deviation Key Points The standard deviation is a measure of variation of all values from the mean The value of the standard deviation s is usually positive and always non-negative. The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others) )2 = = 15 = 5.0 min 3 Agresti/Franklin Statistics, 8 of 25 Example: Publix check-out waiting times in minutes n n −1 x= 126 6 The units of the standard deviation s are the same as the units of the original data values 4 . 6 min Agresti/Franklin Statistics, 9 of 25 Agresti/Franklin Statistics, 10 of 25 Empirical Rule Parameter and Statistic For bell-shaped data sets: Approximately 68% of the observations fall within 1 standard deviation of the mean Approximately 95% of the observations fall within 2 standard deviations of the mean Approximately 99% of the observations fall within 3 standard deviations of the mean Agresti/Franklin Statistics, 11 of 25 A parameter is a numerical summary of the population A statistic is a numerical summary of a sample taken from a population Agresti/Franklin Statistics, 12 of 25 2 Five summary statistics Boxplot Minimum =1 1st quartile = 3 Median =10 3rd quartile=12 Maximum =15 Agresti/Franklin Statistics, 13 of 25 Agresti/Franklin Statistics, 14 of 25 B oxplot of SUGA Rg 16 Boxplot max 14 12 Q3 A box is constructed from Q1 to Q3 10 Q2=median A line is drawn inside the box at the median A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier A line extends outward from the upper end of the box to the largest observation that is not a potential outlier g R A 8 G U S mean 6 4 2 Q1 min 0 Agresti/Franklin Statistics, 15 of 25 Boxplot A box is constructed from Q1 to Q3 A line is drawn inside the box at the median A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier A line extends outward from the upper end of the box to the largest observation that is not a potential outlier Agresti/Franklin Statistics, 17 of 25 Agresti/Franklin Statistics, 16 of 25 Comparison using boxplots Example: Your company makes plastic pipes, and you are concerned about the consistency of their diameters. You measure ten pipes a week for three weeks. Create a boxplot to examine the distributions. 1 2 3 4 Open the worksheet PIPE.MTW. Choose Graph > Boxplot or Stat > EDA > Boxplot. Under Multiple Y's, choose Simple. Click OK. In Graph Variables, enter 'Week 1' 'Week 2' 'Week 3'. Click OK. Agresti/Franklin Statistics, 18 of 25 3 Graph window output B ox pl ot o f We ek 1 , We ek 2 , W ee k 3 Skewed to the right 9 Symmetric Skewed to the left 8 7 a t a D 6 5 4 Week 1 Week 2 Week 3 Agresti/Franklin Statistics, 19 of 25 Agresti/Franklin Statistics, 20 of 25 Interpreting the results Tip To see precise information for Q1, median, Q3, interquartile range, whiskers, and N, hover your cursor over any part of the boxplot. The boxplot shows: Week 1 median is 4.985, and the interquartile range is 4.4525 to 5.5575. Week 2 median is 5.275, and the interquartile range is 5.08 to 5.6775. An outlier appears at 7.0. Week 3 median is 5.43, and the interquartile range is 4.99 to 6.975. The data are positively skewed. Conclusion: The medians for the three weeks are similar. However, during Week 2, an abnormally wide pipe was created, and during Week 3, several abnormally wide pipes were created. Z-Score The z-score for an observation measures how far an observation is from the mean in standard deviation units z= Agresti/Franklin Statistics, 21 of 25 observatio n - mean standard deviation An observation in a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3 Agresti/Franklin Statistics, 22 of 25 Inverse problem Example: Converting to z-score If Bob’s score is 1.5 standard deviation higher than the mean, what is Bob’s score for the previous problem. Denote Bob’s score=x, then 1.5=(x-75)/10 so x=1.5(10)+75=90. Agresti/Franklin Statistics, 23 of 25 Scores on a test have a mean of 75 and a standard deviation of 10. Bob has a score of 90. Convert Bob’ score to a zscore. Round to the nearest hundredth. Bob’s z-score=(90-75)/10=1.50 which means that Bob’s score is 1.5 standard deviation higher than the mean. Agresti/Franklin Statistics, 24 of 25 4 2.6 How are descriptive summaries misused? (read) Figure 2.18, page 75 HW4: • read section 3.2 • problems 2.57, 2.62, 2.63, 2.65, 2.67, 2.68, 2.69, 2.71, 2.72 Agresti/Franklin Statistics, 25 of 25 5