Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Daniel S. Yates The Practice of Statistics Third Edition Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Copyright © 2008 by W. H. Freeman & Company Objectives for 1.2 • Given a data set, How do you compute mean, median, quartiles, and the five-number summary? • How do you construct a box plot using the fivenumber summary? • How do you compute the inter-quartile range? • How do you identify an outlier using the interquartile range rule? • How do you compute the standard deviation and variance? Measure for The Center of a Distribution The Means of a Data Set • So far, we know several measures of central tendency of a set of numbers: means, median, and mode. • The means is the arithmetic average of the data set. The Mean of a Data Set “Average Value” • Σ (sigma) means to add them all up. All the data values and get a total. • Take the total and divide by the number of data. Example - Mean • Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Find the mean. • Answer 85. • Use calculator – Stat edit, enter data in L1 Second Stat, Math, Mean( L1), Enter The Median of the Data Set • Median is the center of the data set. • Half of the data set is above and Half is below the median. The 50th Percentile. • The median may or may not be in the data set. Calculation for Median “Middle Value” Example - Median • Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Find the median. • Answer 85. • Use calculator – Stat edit, enter data in L1 Second Stat, Math, Median( L1), Enter Terminology “A measure is resistant” • A measure that does not respond strongly to the influence of outliers (extreme observations). • Furthermore, a measure that is resistant does not respond strongly to changes in a few observations. Are mean and median resistant? Mean and Median Applet Mean vs Median • Mean is not a resistant measure. – It is sensitive to the influence of a few extreme observations (outliers). – It is sensitive to skewed distributions. The mean is pulled towards the tail. • Median is resistant. – It is resistant to extreme values and skewed distributions. • For skewed distributions the median is the better measure for center. Measure for Spread Range Quartiles Five Number Summary The Standard Deviation Range • The difference between the largest value and the smallest value. • Gives the full spread of the data. • But may be dependent on outliers. Quartiles • We can describe the spread (variability of a distribution) by giving several percentiles (pth percentile of a distribution) • Typically we use 25th percentile, 50th percentile, 75th percentile. • Q1, median, Q3. Example • Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Find Q1, median, and Q3. • Answer: Q1 = 78, Median = 85, Q3 = 91 • Using the calculator – STAT, CALC, 1-Var Stats L1, ENTER Five Number Summary Using the calculator, we again use 1-Var Stats. Five Number Summary Computer Software Output Five Number Summary Computer Software Output Graphical Display of 5 Number Summary Example - Boxplot • Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Answer 74 78 85 91 98 • Calculator STAT PLOT, make appropriate selections on the menu, ZOOM, 9:Zoom Stat Interquartile Range Identifying Outliers Variance and Standard Deviation Example – Variance and Standard 86 86-85=1 1 Deviation 84 84-85=-1 1 Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 91 91-85=6 36 75 75-85=-10 100 Calculate the variance and standard deviation. 78 78-85=-7 49 80 80-85=-5 25 74 74-85=-11 121 87 87-85=2 4 76 76-85=-9 81 96 96-85=11 121 82 82-85=-3 9 90 90-85=5 25 98 98-85=13 169 93 93-85=8 64 s 2 2 ( x x ) i x x i n 1 n 1190 85 14 806 s 62 13 2 Standard Deviation s 62 7.874 Calculator – STAT EDIT, enter data in list 1, QUIT STAT CALC 1-Var Stat Total 1190 Tot 806 Standard Deviation • The standard deviation is zero when there is no spread. • The Standard deviation gets larger as the spread increases. Impact of adding a constant to all data in the set? • Joey’s first 14 quiz grades in a marking period were – 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Add 32 points to each score, then store in L2. • Compute 1-Var Stat. What has changed? • The five-number summary has changed but the standard deviation has not? • The measure the spread remains the same? The impact of multiplying each data in the set by a constant? • • • • Using the data set in L1 multiply the 2. Compute 1-Var Stat. What has changed? The five-number summary has changed by 2 times and the standard deviation has changed by 2 times. • The measure of the spread has increased.