Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Atkinson Statistical Measures The Median, the Mean and the Mode Before you can begin to understand statistics, there are four terms you will need to fully understand. The first term 'average' is something we have been familiar with from a very early age when we start analyzing our marks on report cards. We add together all of our test results and then divide it by the sum of the total number of marks there are. We often call it the average. However, statistically it's the Mean! The Mean Example: Four tests results: 15, 18, 22, 20 The sum is: 75 Divide 75 by 4: 18.75 The 'Mean' (Average) is 18.75 The Median The Median is the 'middle value' in your list. When the totals of the list are odd, the median is the middle entry in the list after sorting the list into increasing order. When the totals of the list are even, the median is equal to the sum of the two middle (after sorting the list into increasing order) numbers divided by two. Thus, remember to line up your values, the middle number is the median! Be sure to remember the odd and even rule. Examples: Find the Median of: 9, 3, 44, 17, 15 (Odd amount of numbers) Line up your numbers: 3, 9, 15, 17, 44 (smallest to largest) The Median is: 15 (The number in the middle) Find the Median of: 8, 3, 44, 17, 12, 6 (Even amount of numbers) Line up your numbers: 3, 6, 8, 12, 17, 44 Add the 2 middles numbers and divide by 2: 8 12 = 20 ÷ 2 = 10 The Median is 10. The Mode The mode in a list of numbers refers to the list of numbers that occur most frequently. A trick to remember this one is to remember that mode starts with the same first two letters that most does. Most frequently Mode. You'll never forget that one! Examples: Find the mode of: 9, 3, 3, 44, 17 , 17, 44, 15, 15, 15, 27, 40, 8, Put the numbers is order for ease: 3, 3, 8, 9, 15, 15, 15, 17, 17, 27, 40, 44, 44, The Mode is 15 (15 occurs the most at 3 times) *It is important to note that there can be more than one mode and if no number occurs more than once in the set, then there is no mode for that set of numbers. Range Occasionally in Statistics you'll be asked for the 'range' in a set of numbers. The range is simply the the smallest number subtracted from the largest number in your set. Thus, if your set is 9, 3, 44, 15, 6 - The range would be 44-3=41. Your range is 41. Standard Deviation The Standard Deviation (σ) is a measure of how spread out numbers are. The formula is easy: it is the square root of the Variance. So now you ask, "What is the Variance?" Variance The Variance (which is the square of the standard deviation, ie: σ2) is defined as: The average of the squared differences from the Mean. In other words, follow these steps: 1. Work out the Mean (the simple average of the numbers) 2. Now, for each number subtract the Mean and then square the result (the squared difference). 3. Then work out the average of those squared differences. (Why Square?) Example You and your friends have just measured the heights of your dogs (in millimeters): The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm. Find out the Mean, the Variance, and the Standard Deviation. Answer: 600 + 470 + 170 + 430 + 300 Mean = 1970 = 5 = 394 5 so the average height is 394 mm. Let's plot this on the chart: Now, we calculate each dogs difference from the Mean: To calculate the Variance, take each difference, square it, and then average the result: 2062 + 762 + (-224)2 + 362 + (-94)2 Variance: σ = 2 108,520 = 5 = 21,704 5 So, the Variance is 21,704. And the Standard Deviation is just the square root of Variance, so: Standard Deviation: σ = √21,704 = 147 And the good thing about the Standard Deviation is that it is useful. Now we can show which heights are within one Standard Deviation (147mm) of the Mean: So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is extra large or extra small. Rottweillers are tall dogs. And Dachsunds are a bit short ... but don't tell them! *Note: Why square ? Squaring each difference makes them all positive numbers (to avoid negatives reducing the Variance) And it also makes the bigger differences stand out. For example 1002=10,000 is a lot bigger than 502=2,500. But squaring them makes the final answer really big, and so un-squaring the Variance (by taking the square root) makes the Standard Deviation a much more useful number. Histogram Definition of Histogram A histogram is a bar graph that shows how frequently data occur within certain ranges or intervals. The height of each bar gives the frequency in the respective interval. Examples of Histogram The histogram shown below gives the number of children visited a particular zoo. The Normal Curve The normal distributions are a very important class of statistical distributions. All normal distributions are symmetric and have bell-shaped density curves with a single peak. To speak specifically of any normal distribution, two quantities have to be specified: the mean , where the peak of the density occurs, and the standard deviation , which indicates the spread or girth of the bell curve. (The greek symbol Different values of is pronounced mu and the greek symbol is pronounced sig-ma.) and yield different normal density curves and hence different normal distributions. Try the applet below for example. You should be able to change the mean using the sliders and see the density change. and the standard deviation The normal density can be actually specified by means of an equation. The height of the density at any value x is given by Although there are many normal curves, they all share an important property that allows us to treat them in a uniform fashion. The 68-95-99.7% Rule All normal density curves satisfy the following property which is often referred to as the Empirical Rule. 68% of the observations fall within 1 standard deviation of the mean, that is, between and . 95% of the observations fall within 2 standard deviations of the mean, that is, between and . 99.7% of the observations fall within 3 standard deviations of the mean, that is, between and . Thus, for a normal distribution, almost all values lie within 3 standard deviations of the mean. The check buttons below will help you realize the appropriate percentages of the area under the curve. Remember that the rule applies to all normal distributions. Also remember that it applies only to normal distributions. An Example Let us apply the Empirical Rule to Example 1.17 from Moore and McCabe. The distribution of heights of American women aged 18 to 24 is approximately normally distributed with mean 65.5 inches and standard deviation 2.5 inches. From the above rule, it follows that 68% of these American women have heights between 65.5 - 2.5 and 65.5 + 2.5 inches, or between 63 and 68 inches, 95% of these American women have heights between 65.5 - 2(2.5) and 65.5 + 2(2.5) inches, or between 63 and 68 inches. Again, you can try this out with the example below. Therefore, the tallest 2.5% of these women are taller than 70.5 inches. (The extreme 5% fall more than two standard deviations, or 5 inches from the mean. And since all normal distributions are symmetric about their mean, half of these women are the tall side.) Almost all young American women are between 58 and 73 inches in height if you use the 99.7% calculations. Pearson’s Correlation Coefficient The correlation coefficient ρX, Y between two random variables X and Y with expected values μX and μY and standard deviations σX and σY is defined as: where E is the expected value operator and cov means covariance. A widely used alternative notation is Since μX = E(X), σX2 = E[(X - E(X))2] = E(X2) − E2(X) and likewise for Y, we may also write