Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Standard Deviation Second most important after the middle of data is how spread out or variable it is. The standard deviation describes the typical distance of a data value from the mean. • Name - the Population S.D. is called σ, the Greek letter “sigma.” The Sample S.D. is called s. • The Empirical Rule - for symmetric unimodal data: – About 2/3 of data falls within one standard deviation of the mean, or between µ − σ and µ + σ. – About 95% of data falls within two standard deviations of the mean, or between µ − 2σ and µ + 2σ. – Almost all data falls within three standard deviations of the mean, or between µ − 3σ and µ + 3σ. • σ is the Greek ancestor of our “s” for standard deviation. • People sometimes talk about the variance, which is the standard deviation squared. It is another way to package the same info about the spread of data. Empirical Rule • 2/3 w/in 1 s.d of mean • 95% w/in 2 s.d.s of mean • almost all w/in 3 s.d.s of mean Ex:College age women’s heights roughly symmetric, unimodal, mean µ = 65.5 inches, a stand. dev. of 2.5 inches. So • 2/3 of women are between 65.5 − 2.5 = 63 in. and 65.5 + 2.5 = 68 in. or 00 50 3 and 50 800 • 95% of women are between 65.5 − 5 = 60.5 in. and 65.5 + 5 = 70.5 in. or 50 0.500 and 50 10.500 1 • almost all women are between 65.5 − 7.5 = 58 in. and 65.5 + 7.5 = 73 in. or 40 1000 and 60 100 • This rule works pretty well if the data is not unimodal, but it works poorly for highly skewed data. In general don’t use mean and standard deviation for highly skewed data. Standard Deviation Formula 1. Each data value minus the mean (distance from mean): x1 − µ, x2 − µ, x3 − µ, . . . xn − µ 2. Square each difference (makes distances positive) (x1 − µ)2 , (x2 − µ)2 , (x3 − µ)2 , . . . (xn − µ)2 3. Average (x1 − µ)2 + (x2 − µ)2 + (x3 − µ)2 + · · · + (xn − µ)2 n 4. Take square root r (x1 − µ)2 + (x2 − µ)2 + (x3 − µ)2 + · · · + (xn − µ)2 σ= n Calculating Standard Deviation 1. Data value minus mean. 2. Square each difference. 3. Average. 4. Square root. 2 Ex: The mean of 0 and 4 is µ = 2 xi 0 4 xi − µ −2 2 (xi − µ)2 4 4 (x1 − µ)2 + (x2 − µ)2 4+4 = =4 2 2 r (x1 − µ)2 + (x2 − µ)2 2 √ = 4 = 2. σ= • This is about as hard a calculation of standard deviation as I will expect you to do (we’ll learn how to compute all these quantities in Excel). But the formula is helpful to understand how s.d. works. • So the typical distance of 0 and 4 from their mean (2) is 2. That makes sense. Sample Standard Deviation The formula for sample standard deviation is almost the same, but not quite. Of course you use sample mean instead of population, but also r (x1 − x)2 + (x2 − x)2 + · · · + (xn − x)2 s= n−1 you divide by n − 1 instead of n. This turns out to make the sample s.d. of the sample do the best job of approximating the population s.d of the population. But if you calculate both with the same data the sample s.d. will be slightly larger. Whenever someone says the standard deviation they mean the sample s.d. The standard deviation is sensitive to outliers. • If your data is reasonably symmetric the mean and the standard deviation together give a nice simple picture of the data (the Empirical Rule tells you how to interpret this). If your data is skewed, you should not use the mean and standard deviation, but instead should use the median and ... you’ll find out what next lecture! • When you calculate with real data, it is almost always the standard deviation. If you thought there was only one standard deviation given by the formula with n − 1 in it you would never run into any practical problems. But this sample s.d. is just a rough approximation to the population standard deviation, which is a Platonic ideal. When we speak about what the standard deviation means, such as in the Empirical Rule, we are always speaking about the population standard deviation. 3 Calculating Sample Standard Deviation 1. Data value minus mean. 2. Square each difference. 3. Divide sum by n − 1 4. Square root. Ex: The mean of 0 and 4 is x = 2 xi 0 4 xi − x −2 2 (xi − x)2 4 4 (x1 − x)2 + (x2 − x)2 4+4 = =8 2−1 1 r (x1 − xu)2 + (x2 − x)2 2−1 √ √ = 8 = 2 2 ∼ 2.83. σ= • This is about as hard a calculation of standard deviation as I will expect you to do (we’ll learn how to compute all these quantities in Excel). But the formula is helpful to understand how s.d. works. • So the typical distance of 0 and 4 from their mean (2) is 2. That makes sense. Lecture 7 Key Points After this lecture you should be able to • Be able to apply the Empirical Rule given the mean and s.d., and have a rough sense of what the s.d. tells you based on the Empirical Rule and the interpretation as the typical distance from the mean. • Very roughly estimate the standard deviation from a histogram. • Know not to use the s.d. to summarize a skewed distribution. After processing this lecture you should be able to • Understand that there is a difference between the population and sample standard deviation • Use the standard deviation formula to understand the properties of the standard deviation and to compute it by hand for very small data sets. • Calculate population and standard deviation in Excel. 4