Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Baby Leo’s 4-month “Healthy Baby” check-up reported the following: 1) He is in the 90th percentile for weight 2) He is in the 95th percentile for head circumference 3) He is in the 100th percentile for height. Interpret these scores. What do they mean? Slide 1 Section 3-5 Exploratory Data Analysis (EDA) Slide 2 Key Concept This section discusses outliers, then introduces a new statistical graph called a boxplot, which is helpful for visualizing the distribution of data. Slide 3 Definition Exploratory Data Analysis (EDA) the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate data sets in order to understand their important characteristics Slide 4 Definition An outlier is a value that is located very far away from almost all of the other values. An extreme value that falls outside general pattern of data. Not all outliers are errors. An outlier can have a dramatic effect on the mean. An outlier can have a dramatic effect on the standard deviation. An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured. Slide 5 Definitions For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value. A boxplot ( or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3. (Calculator: with/without potential outlier, boxplot/modified boxplot) Slide 6 Boxplots (useful for revealing: center, spread, distribution, outliers) Slide 7 Modified Boxplots Some statistical packages provide modified boxplots which represent outliers as special points. A data value is an outlier if it is … above Q3 by an amount greater than 1.5 X IQR or below Q1 by an amount greater than 1.5 X IQR Use the outlier criterion on our dataset 1 2 3 4 8 165 To identify any outliers. Slide 8 Modified Boxplot Construction A modified boxplot is constructed with these specifications: A special symbol (such as an asterisk) is used to identify outliers. The solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier. Slide 9 Modified Boxplots - Example Slide 10 Do male doctors perform more C-sections than female doctors? A study in Switzerland examined the number of C-sections performed in one year by a sample of male/female doctors. Male Dr. Data: 20 25 25 27 28 31 33 34 36 37 44 50 59 85 86 Min Q1 M Q3 Max Female dr. data: 5 7 10 14 Min Q1 18 19 25 M = 18.5 29 Q3 31 33 Max • Are there any outliers? • What do the modified boxplots tell you? • Give some overall observations. Slide 11 You do! Here are measured reaction times (in seconds) in a test of driving skills: 2.4 2.5 2.8 2.0 2.4 2.9 3.2 3.5 2.7 2.7 2.8 2.7 1) Find the five-number summary 2) Draw the modified boxplot. Are there any outliers? Slide 12 Boxplots - cont Slide 13 Boxplots - cont Slide 14 Statdisk Pulse rates Compare male and female 1) Center 2) Variation 3) 5 # summary 4) Are there any outliers? Slide 15 Recap In this section we have looked at: Exploratory Data Analysis Effects of outliers 5-number summary Boxplots and modified boxplots Slide 16