Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Means & Medians Chapter 4 Parameter • Fixed value about a population • Typical unknown Statistic • Value calculated from a sample Measures of Central Tendency • Median - the middle of the data; 50th percentile –Observations must be in numerical order –Is the middle single value if n is odd –The average of the middle two values if n is even NOTE: n denotes the sample size Measures of Central Tendency parameter • Mean - the arithmetic average –Use m to represent a population statistic mean –Use x to represent a sample mean Formula: x x n S is the capital Greek letter sigma – it means to sum the values that follow Measures of Central Tendency • Mode – the observation that occurs the most often –Can be more than one mode –If all values occur only once – there is no mode –Not used as often as mean & median Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median. The numbers are in order & n is odd – so find the middle observation. 2 The median is 4 lollipops! 3 4 8 12 Suppose we have sample of 6 customers that buy the following number of lollipops. The median is … The median is 5 The numbers are in order lollipops! & n is even – so find the middle two observations. Now, average these two values. 2 5 3 4 6 8 12 Suppose we have sample of 6 customers that buy the following number of lollipops. Find the mean. To find the mean number of lollipops add the observations and divide by n. x 5.833 2 3 4 6 8 12 6 2 3 4 6 8 12 Using the calculator . . . What would happen to the median & mean if the 12 lollipops were 20? The median is . . . The mean is . . . 5 7.17 2 3 4 6 8 20 6 What happened? 2 3 4 6 8 20 What would happen to the median & mean if the 20 lollipops were 50? The median is . . . The mean is . . . 5 12.17 2 3 4 6 8 50 6 What happened? 2 3 4 6 8 50 Resistant • Statistics that are not affected by outliers • Is the median resistant? ►Is the mean resistant? YES NO IMPORTANT: Median is resistant to outliers Mean is NOT resistant to outliers Look at the following data set. Find the mean. 22 23 24 25 25 26 29 30 x 25.5 Now find how eachWill observation this sum always equal zero? deviates from the mean. YES What is the sum of the deviations from This is the deviation from the mean. the mean? x x 0 Look at the following data set. Find the mean & median. Mean = 27 Median = 27 21 27 Create a histogram with the data. x-scale of 2) Then Look(use at the placement of find mean median. thethe mean andand median in this symmetrical distribution. 23 23 24 25 25 27 27 28 30 30 26 26 26 27 30 31 32 32 Look at the following data set. Find the mean & median. Mean = 28.176 Median = 25 Create a histogram with the data. x-scale of 8) Then Look(use at the placement of find mean median. thethe mean andand median in this right skewed 22 29 distribution. 28 22 24 25 28 21 23 62 23 24 23 26 36 38 25 Look at the following data set. Find the mean & median. Mean = 54.588 Median = 58 Create a histogram with the data. Then findplacement the meanof and Look at the median. the mean and median in this skewed left distribution. 21 46 54 47 53 60 55 55 56 63 64 58 58 58 58 62 60 Recap: • In a symmetrical distribution, the mean and median are equal. • In a skewed distribution, the mean is pulled in the direction of the skewness. • In a symmetrical distribution, you should report the mean! • In a skewed distribution, the median should be reported as the measure of center! Example calculations • During a two week period 10 houses were sold in Fancytown. House Price in Fancytown x 231,000 313,000 299,000 312,000 285,000 317,000 294,000 297,000 315,000 287,000 x 2,950,000 x 2,950,000 x n 10 295,000 The “average” or mean price for this sample of 10 houses in Fancytown is $295,000 • During a two week period 10 houses were sold in Lowtown. House Price in Lowtown x 97,000 93,000 110,000 121,000 113,000 95,000 100,000 122,000 99,000 2,000,000 x 2,950,000 x 2,950,000 x n 10 295,000 Outlier The “average” or mean price for this sample of 10 houses in Lowtown is $295,000 • Looking at the dotplots of the samples for Fancytown and Lowtown we can see that the mean, $295,000 appears to accurately represent the “center” of the data for Fancytown, but it is not representative of the Lowtown data. • Clearly, the mean can be greatly affected by the presence of even a single outlier. Dotplots for Fancytown and Lowtown Outlier Lowtown Fancytown 500000 295000 1000000 1500000 2000000 1. In the previous example of the house prices in the sample of 10 houses from Lowtown, the mean was affected very strongly by the one house with the extremely high price. 2. The other 9 houses had selling prices around $100,000. 3. This illustrates that the mean can be very sensitive to a few extreme values. SOOOO…… Describing the Center of a Data Set with the median The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list). Then the single middle value if n is odd sample median= the mean of the middle two values if n is even Example of Median Calculation Consider the Fancytown data. First, we put the data in numerical increasing order to get 231,000 285,000 287,000 294,000 297,000 299,000 312,000 313,000 315,000 317,000 Since there are 10 (even) data values, the median is the mean of the two values in the middle. 297000 299000 median $298,000 2 Consider the Lowtown data. We put the data in numerical increasing order to get 93,000 95,000 97,000 99,000 100,000 110,000 113,000 121,000 122,000 2,000,000 Since there are 10 (even) data values, the median is the mean of the two values in the middle. 100,000 110,000 median $105,000 2 • Typically, 1. when a distribution is skewed positively, the mean is larger than the median, 2. when a distribution is skewed negatively, the mean is smaller then the median, and 3. when a distribution is symmetric, the mean and the median are equal. Trimmed mean: Purpose is to remove outliers from a data set To calculate a trimmed mean: • Multiply the % to trim by n • Truncate that many observations from BOTH ends of the distribution (when listed in order) • Calculate the mean with the shortened data set Find a 10% trimmed mean with the following data. 12 14 19 20 22 24 25 26 26 10%(10) = 1 So remove one observation from each side! 14 19 20 22 24 25 26 26 22 8 35 Example of Trimmed Mean House Price in Fancytown Sum of the eight 231,000 middle values is 285,000 2,402,000 287,000 294,000 Divide this value 297,000 by 8 to obtain 299,000 the 10% 312,000 trimmed mean. 313,000 315,000 317,000 x 2,950,000 x 291,000 median 295,000 10% Trim Mean 300,250 Example of Trimmed Mean House Price in Lowtown Sum of the eight 93,000 middle values is 95,000 857,000 97,000 99,000 Divide this value 100,000 by 8 to obtain the 10% 110,000 trimmed mean. 113,000 121,000 122,000 2,000,000 x 2,950,000 x 295,000 median 105,000 10% Trim Mean 107,125