Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 4: CENTER: Mean, Median VARIABILITY: Standard Deviation, Interquartile Range Content Objective: SWBAT determine mean, median, deviation, standard deviation, and interquartile range for a data set. Language Objective: SWBAT explain why median and IQR are better measure of center and spread instead of mean and standard deviation when the data is skewed or outliers are present. The sample mean of a numerical sample, x1, x2, x3, . . . , xn, denoted x , is x + x + x + ... + xn å xi x= 1 2 3 = n n The population mean is denoted by µ, is the average of all x values in the entire population. House Price in Lowtown x 97,000 93,000 110,000 121,000 113,000 95,000 100,000 122,000 99,000 2,000,000 å x = 2,950,000 xi 2,950,000 = n 10 = 295,000 x=å The “average” or mean price for this sample of 10 houses in Lowtown is $295,000 Outlier In the sample of 10 houses from Lowtown, the mean was affected very strongly by the one house with the extremely high price. The other 9 houses had selling prices around $100,000. This illustrates that the mean can be very sensitive to a few extreme values. The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list). Then Data sets and graphs from Peck, Olsen, Devore Page 1 of 6 ìthe single middle value if n is odd sample median= í î the mean of the middle two values if n is even We put the data in numerical increasing order to get 93,000 95,000 97,000 99,000 110,000 113,000 121,000 122,000 100,000 2,000,000 Since there are 10 (even) data values, the median is the mean of the two values in the middle. 100,000 + 110,000 median = = $105,000 2 Comparing the Sample Mean & Sample Median: The median splits the area in the distribution in half and the mean is the point of balance. Typically, when a distribution is skewed positively, the mean is larger than the median, when a distribution is skewed negatively, the mean is smaller then the median, and when a distribution is symmetric, the mean and the median are equal. RULES for finding median: Find the count by using the formula: _____________________ Notice this works whether you have an odd or even number. Data sets and graphs from Peck, Olsen, Devore Page 2 of 6 The simplest numerical measure of the variability of a numerical data set is the range, which is defined to be the difference between the largest and smallest data values. range = maximum – minimum The n deviations from the sample mean are the differences: x1 - x, x2 - x, x3 - x, . . . , xn - x The sum of all of the deviations from the sample mean will be equal to 0 (zero), except possibly for the effects of rounding the numbers. This means that the average deviation from the mean is always 0 (zero) and cannot be used as a measure of variability. Ex. (Show this using post it notes)Time it took 9 student nurses to complete paperwork (in minutes) (manipulate the times from all being 3’s to different configurations with 1 deviation away, 2 deviations, etc. The sample variance, denoted s2 is the sum of the squared deviations from the mean divided by n-1. 2 å (x - x ) 2 s = n -1 The sample standard deviation, denoted s is the positive square root of the sample variance. Data sets and graphs from Peck, Olsen, Devore Page 3 of 6 å (x - x ) s= s = n -1 2 2 The population standard deviation is denoted by s (sigma)and the population variance is denoted by s 2 . ex. Time it took 9 student nurses to complete paperwork (in minutes). Find the variance and standard deviation of these times. x- x x 1 2 2 3 3 3 4 4 5 S= S= (x- x )2 S= Another measure of Variability is INTERQUARTILE RANGE 10 Macintosh Apples were randomly selected and weighed (in ounces). Determine the range, mean, variance, and standard deviation using the formulas. x 7.52 8.48 7.36 6.24 7.68 6.56 6.40 8.16 7.68 8.16 74.24 x-x 0.096 1.056 -0.064 -1.184 0.256 -0.864 -1.024 0.736 0.256 0.736 0.000 (x - x)2 0.0092 1.1151 0.0041 1.4019 0.0655 0.7465 1.0486 0.5417 0.0655 0.5417 5.5398 Data sets and graphs from Peck, Olsen, Devore Page 4 of 6 IQR (Inter-Quartile Range) is a ____________________measure of variability—it is generally NOT affected by ______________________________________ in a data set. Quartiles—divide data into 4 quarters To find quartiles: 1. arrange data in ascending order 2. find median value________________________________ 3. divide data into lower and upper halves—excluding the median 4. find median of the lower half__________________________________ 5. find median of the upper half__________________________________ 1 4 5 7 9 1 4 5 7 9 10 median = median = Q1 = Q1 = Q3 = Q3 = Can use the calculator to find the median and quartiles using 1-Var Stats (scroll down to second page of results) 6. subtract to find the interquartile range: IQR = Q3 – Q1 The IQR is the width of the ____________ ____________of the data—it is not likely to be overly dependent on extreme values or outliers. Must always be _______________ or ______________. A ________________ (relative to the data values) represents a small amount of variability; a large IQR represents ________________________________. The IQR can be zero even if the data set has some variability if _____________________ ________________________________. 5-number summary: N ( min – Q1 – med – Q3 – max) Data sets and graphs from Peck, Olsen, Devore Page 5 of 6 ex. 15 students with part time jobs were randomly selected and the number of hours worked last week was recorded. Determine the median and IQR. 19, 12, 14, 10, 12, 10, 25, 9, 8, 4, 2, 10, 7, 11, 15 Data sets and graphs from Peck, Olsen, Devore Page 6 of 6