Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROB AND STATS Measuring/Describing the Spread Quartiles, Box-Plots and Five Number Summary, Standard Deviation and Variance READ Page 42, the first two paragraphs to understand where we are going with quartiles and why we use them to improve our description of spread. QUARTILES - Q1 & Q3 Find the median for your data. Q1 is the median of the left half of the data Q3 is the median of the right half of the data Let’s look at 3 sets of data. P42 EX 1.12 FINDING QUARTILES P43 Barry Bond’s home run counts (arranged in order) are There is an even number of observations, so the median lies in midway between the middle pair. The first quartile is the median of the 8 observations to the left of M. The third quartile is the median of the 8 observations to the right of M. WHAT IS? M Q1 Q3 Quartiles are resistant because it wouldn’t have mattered if Bond’s record was 73 or 703. EX 1.12 (CONTINUED) Hank Aaron’s data in increasing order What is M? Q1? Q3? Be aware that computer programs may calculate a little differently. CONSIDER THIS DATA SET 8, 17, 17, 18, 19, 20, 21, 22, 22, 22, 25, 26, 27, 28, 29, 50, 54 Find the Median and Quartiles M = 22 Q1 = 18.5 Q3 = 27.5 INTERQUARTILE RANGE(IQR) IQR P43 = Q3 - Q1 It is the distance or range between the quartiles. DETERMINING OUTLIERS THE INTERQUARTILE RANGE (IQR) There is a simple formula we can use to determine if a piece of data is an outlier or not using the IQR. If the data (observation) is 1.5 x IQR above Q3 or below Q1 we call it an outlier. P44 EX 1.13 DETERMINING OUTLIERS P(44) Do you think that Barry Bonds 73 home run season is an outlier? Let’s check. IQR = Q3 – Q1 = 41 – 25 = 16 Q3 + 1.5 x IQR = 41 + (1.5 x 16) = 65 (upper cutoff) Q1 – 1.5 x IQR = 25 – (1.5 x 16) = 1 (lower cutoff) Since 73 is above the upper cutoff, it is an outlier. FIVE NUMBER SUMMARY Minimum P44 - Smallest piece of data Q1 - Quartile 1 M - median Q3 - Quartile 3 Maximum - Largest piece of data Written as: Min Q1 M Q3 Max GIVE THE FIVE NUMBER SUMMARY FOR LET’S DRAW A BOX PLOT FROM THE 5 NUMBER SUMMARY BOXPLOTS Box plots show less detail than histograms or stemplots, therefore they are best used for a side-by-side comparison of more than one distribution Drawn vertical or horizontal Max Q3 Median Q1 Min LET’S READ THE BOX PLOT Find the median first. Look at the spread Next the quartiles In looking at the boxplot, do you feel that Bonds and Aaron were about equally consistent? MODIFIED BOXPLOT Same as a regular box plot but outliers are plotted individually The “whiskers” extend to the smallest and largest observations that are not outliers. Five number summary MODIFIED BOXPLOT Bonds 16 25 34 41 73 Aaron 13 28 38 44 47 Outliers Bonds IQR = Q3 – Q1 = 41 – 25 = 16 Q3 + 1.5 x IQR = 41 + (1.5 x 16) = 65 (upper cutoff) Q1 – 1.5 x IQR = 25 – (1.5 x 16) = 1 (lower cutoff) Aaron None USE THE CALCULATOR TO DRAW A BOXPLOT Follow along in your book on page 47 STANDARD DEVIATION s - one of the most important measures in stats Measures the spread by looking at how far the observations are from the mean. VARIANCE P49 s2 - Set of observations is the mean of the squares of the deviations of the observations from the mean. FORMULA FOR STANDARD DEVIATION DEGREES OF FREEDOM Describe the number of values in the final calculation of a statistic that are free to vary. D.O.F = n – 1 PROPERTIES OF THE STANDARD DEVIATION s- measures the spread about the mean and should be only used when the mean is chosen as the measure of the center. s = 0 only when there is no spread. s, like the mean is not a resistant measure. Why would this be? Strong skewness or a few outliers can make sure very large. LETS WORK THIS PROBLEM BY HAND. Follow the table on page 50. A study in Switzerland examined the number of hysterectomies (uterus removals) performed by doctors one year. The data for 15 doctors are given below. 27 50 33 25 25 31 37 44 20 36 57 34 28 27 50 33 25 25 31 37 44 20 36 57 34 28 Observations Deviations Square deviations CHOOSING BETWEEN THE FIVE-NUMBER SUMMARY AND THE MEAN AND S.D. Need some way to describe the center and spread of a distribution Five-Number Summary is better for skewed distributions or distributions with strong outliers. x & s are better for distributions that are reasonably symmetric. ASSIGNMENT Work on problems 1.36-1.43