Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CH 4 Summary Statistics: Measures of Location and Dispersion 4.1 Summation Notation The sum of values, x1 x2 xn , can be denoted as n x i 1 i . Example Select 4 students and ask “how many brothers and sisters do you have?” Data: 2,3,1,3 The data values would represent x values. x1 2 x2 3 x3 1 x4 3 4 Therefore, x i 1 2 3 1 3 9 i Or we can write x 9 4.2 The Mean, Median, and the Mode Measure of Central Tendency Description of Average (Typical Value) sample mean: X X (simple average) n where n is the sample size Example: number of siblings Data: 2,3,1,3 2 3 1 3 2.25 X 4 Suppose we had selected a 5th person for our sample which had 10 siblings. New Data: 2,3,1,3,10 X Note: X 2 3 1 3 10 3.8 5 is not necessarily a possible value and is sensitive to extreme scores sample median: ~ X (middle score) rank data from smallest to largest if n is odd, median is the middle score if n is even, median is the average of two middle scores half of the data will fall above the median and half below Example (number of siblings) Data: 2,3,1,3 1,2,3,3 ~ 23 X 2 2.5 New Data: 2,3,1,3,10 1,2,3,3,10 Note: ~ X 3 ~ X is not sensitive to extreme scores Therefore, the median is a better measure of central tendency if extreme scores exist. If extreme scores are unlikely, the mean varies less from sample to sample than the median and is a better measure. Suppose we were studying cost of a house or income. What method of central tendency should be used? What if Bill Gates ends up in our sample? sample mode: most frequent score Do example: Number of brothers and sisters for 4 people. 2,3,1,3 Mode = 3 for 5 people 2,3,1,3,10 Mode = 3 Note: does not always exist/can be more than one Unstable (What happens if a 3 is changed in our example) can be used with qualitative data Example Average hair color Data (black, brown, black, red, blonde, brown, brown) Midrange: Low High 2 Example (number of siblings) Data: 2,3,1,3 Low High 1 3 2 Midrange = 2 2 New Data: 2,3,1,3,10 Low High 1 10 5.5 Midrange = 2 2 Note: totally dependent on extreme scores. 4.3 Quartiles and Percentiles Quartiles - divide the data into four equally sized parts First Quartile, Q1: 25% of the data lies below Q1, 75% of the data lies above Q1 Second Quartile (median), Q2: 50% of the data lies below Q2, 50% of data lies above Q2 Third Quartile, Q3: 75% of the data lies below Q3, 25% of the data lies above Q3 1) 2) 3) 4) Procedure to Compute Quartiles Order the data from smallest to largest Find the median. This is the 2nd Quartile Q1 is the median of the lower half of the data; that is, it is the median of the data falling below Q2 (not including Q2) Q3 is the median of the upper half of the data (same as above) Note: there are a few different ways to calculate the quartiles. I will accept any valid method. Interquartile range (IQR) = Q3 – Q1 Middle 50% of the data Example Amount of money in pockets of the members of a meeting Students Faculty 1 10 3 15 8 15 5 43 6 28 5 20 10 25 0 73 0 31 7 24 5 28 stem and leaf displays Students Faculty 0 0013555678 0 1 0 1 055 2 2 04588 3 3 1 4 4 3 5 5 6 6 7 7 3 Students Q1 = 1 Q2 = 5 Q3 = 7 Faculty Q1 = 15 Q2 = 25 Q3 = 31 5 number summary: The low score, Q1, Q2, Q3, and the high score are known as the Five number summary of a data set. Example Students Min = 0 Q1 = 1 Q2 = 5 Q3 = 7 Max = 10 Faculty Min = 10 Q1 = 15 Q2 = 25 Q3 = 31 Max = 73 Example: exam scores for 40 students Low = 5.6 Q1 = 71.5 Q2 = 80 Q3 = 88.5 High = 100 Note: middle 50% of scores between 71.5 and 88.5 25% of scores above 88.5 How many data values fall within any two values? 10 Boxplots: 1) 2) 3) 4) Procedure Draw a scale to include the lowest and highest data value To the right of the scale draw a box from Q1 to Q3 Include a solid line through the box at the median level Draw solid lines, called whiskers, from Q1 to the lowest value and from Q3 to the highest value Example Students Min = 0 Q1 = 1 Q2 = 5 Q3 = 7 Max = 10 Faculty Min = 10 Q1 = 15 Q2 = 25 Q3 = 31 Max = 73 Boxplots 80 70 Students 60 50 40 30 20 10 0 Students Faculty Identifying shapes from Boxplots As seen in previous example this is bivariate data with One Qualitative and One Quantitative variable Example How does tread design affect an automobiles stopping distance? Tread Design (A, B, or C) Stopping Distance A 43 38 33 A B C 4.4 Measures of Dispersion Distribution #1 1 2 5 3 5555555 4 5 5 Distribution #2 1 5 2 55 3 555 4 55 5 5 Do these distributions look the same? Calculate all measures of central tendency for both distributions. Distribution #1 X ~ X = 35 =35 mode = 35 midrange =35 sample range: Distribution #2 X ~ X =35 =35 mode = 35 midrange = 35 high score - low score Example: Years of experience of faculty 1, 30, 22, 10, 5 sample range = 30-1 = 29 years Note: totally sensitive to extreme scores easy to compute sample variance: measures squared distances from S2 X X 2 n 1 X X 2 n X 2 n n 1 Note: large values of S 2 suggest large variability Example: Years of experience of faculty 1, 30, 22, 10, 5 X2 1 900 484 100 25_ 1510 X_ 1 30 22 10 5_ Total 68 S 2 nX 2 X nn 1 2 5(1510) (68) 2 7550 4624 2926 146.3 5(4) 20 20 sample standard deviation: S S2 Example: Years of experience of faculty 1, 30, 22, 10, 5 S S 2 146.3 12.095 Note: standard deviation uses the same units as the data 4.5 Empirical Rule and Standardized Scores Z-score: Gives the number of standard deviations an observation is above or below the mean. z xx s Example Test scores X = 79, s = 9 88 79 9 1 9 9 Your score is 1 standard deviation above the mean If your score is 88%, your z-score is z 61 79 18 2 9 9 Your score is 2 standard deviations below the mean If your score is 61%, your z-score is z Empirical rule: For mound shaped distributions Approximately 68% of the data fall within 1 standard deviation of the mean ( x s, x s ) Approximately 95% of the data fall within 2 standard deviations of the mean ( x 2 s, x 2 s ) Approximately 99.7% of the data fall within 3 standard deviations of the mean ( x 3s, x 3s) Example Suppose that the amount of liquid in “12 oz.” Pepsi cans is a mound shaped distribution with x 12 oz. and s = 0.1 oz. 68% of the data lies between what two values? (11.9 oz. , 12.1 oz.)