Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Slide 1 Statistics Workshop Tutorial 6 •Measures of Relative Standing • Exploratory Data Analysis Slide 2 Section 2-6 Measures of Relative Standing Created by Tom Wegleitner, Centreville, Virginia Copyright © 2004 Pearson Education, Inc. Definition z Score Slide 3 (or standard score) the number of standard deviations that a given value x is above or below the mean. Copyright © 2004 Pearson Education, Inc. Measures of Position z score Sample Population x x z= s x µ z= Round to 2 decimal places Copyright © 2004 Pearson Education, Inc. Slide 4 Interpreting Z Scores Slide 5 FIGURE 2-14 Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: z score between –2 and 2 sd Unusual Values: z score < -2 or z score > 2 sd Copyright © 2004 Pearson Education, Inc. Definition Slide 6 Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. Q1 (Third Quartile) separates the bottom 75% of sorted values from the top 25%. Copyright © 2004 Pearson Education, Inc. Quartiles Slide 7 Q1, Q2, Q3 divides ranked scores into four equal parts 25% (minimum) 25% 25% 25% Q1 Q2 Q3 (maximum) (median) Copyright © 2004 Pearson Education, Inc. Percentiles Slide 8 Just as there are quartiles separating data into four parts, there are 99 percentiles denoted P1, P2, . . . P99, which partition the data into 100 groups. Copyright © 2004 Pearson Education, Inc. Finding the Percentile of a Given Score Percentile of value x = number of values less than x total number of values Copyright © 2004 Pearson Education, Inc. Slide 9 • 100 From Percentile to Data Value • What score is at the kth percentile? • (1) Rank the data from lowest to highest • (2) Find L (locator) L = k% * n • a) If L is not a whole number, round up and find the score in that position • b) If L is a whole #, find the average of the scores in positions L and L+1 Some Other Statistics Slide 11 Interquartile Range (or IQR): Q3 - Q1 Semi-interquartile Range: Q3 - Q1 2 Midquartile: Q3 + Q1 2 10 - 90 Percentile Range: P90 - P10 Copyright © 2004 Pearson Education, Inc. Slide 13 Section 2-7 Exploratory Data Analysis (EDA) Created by Tom Wegleitner, Centreville, Virginia Copyright © 2004 Pearson Education, Inc. Definition Slide 14 Exploratory Data Analysis is the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate data sets in order to understand their important characteristics Copyright © 2004 Pearson Education, Inc. Outliers • An outlier is a very high or very low value that stand apart from the rest of the data • They may be from data collection errors, data entry errors, or simply valid but unusual data values. • Always identify and examine outliers to determine if they are in error Important Principles Slide 16 An outlier can have a dramatic effect on the mean An outlier have a dramatic effect on the standard deviation An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured Copyright © 2004 Pearson Education, Inc. Definitions Slide 17 For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value A boxplot ( or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3 Copyright © 2004 Pearson Education, Inc. Boxplots Figure 2-16 Copyright © 2004 Pearson Education, Inc. Slide 18 Outliers • A data point is considered an outlier if it is 1.5 times the interquartile range above the 75th percentile or 1.5 times the interquartile range below the 25th percentile • In other words, outliers are numbers outside the interval [Q1-1.5*IQR, Q3+1.5*IQR] Box Plots and Histograms • When looking at one variable, it’s a good idea to look at the box plot and histogram together • Box plots complement histograms by providing more specific information about the center, the quartiles, and outliers Boxplots Figure 2-17 Copyright © 2004 Pearson Education, Inc. Slide 21 Shape, Center and Spread • What should you tell about a quantitative variable? • Always report the shape, center and spread • If the distribution is skewed, report the median and IQR • In a symmetric distribution, report the mean and standard deviation • If there are any clear outliers and you are reporting the mean and the standard deviation, report them with the outliers and without them Slide 23 Now we are ready for Part 21 of Day 1