Download Standard Deviation (SD)

STANDARD DEVIATION (SD) A dataset or sample is reasonably described by where the data points are centered (central tendency), how much spread or dispersion there is among the data points, and its frequency distribution (i.e., the shape of its histogram). This information allows interpretations and further calculations to be made from the data. Where the dataset approximates a normal (bell shaped) distribution, the mean is the best measure of central tendency, although medians are better central tendency measures for other distributions. If the distribution is approximately normal then the standard deviation indicates the dispersion of the data. It might be expected that the estimate of dispersion would be based upon the average of the deviations of each data point from the mean, ignoring whether the deviations were positive or negative. However, some valuable statistical procedures (e.g., multiple linear regression, ANOVA) rely on the square of the deviations rather than the absolute deviations. Therefore, the most commonly reported measures of dispersion – the variance and the standard deviation – are also based on the square of the deviations. The variance is calculated by first finding the deviation of each score (X) from the mean (M), [X – M], then squaring each deviation [(X – M)2], and then adding these squared deviations together to obtain the sum of squares (SS): 𝑆𝑆 = ∑(𝑋 − 𝑀)2 The variance of the sample is the average of this SS, obtained by dividing the SS value by the number of scores (N) in the sample. Thus, the variance is given by SS  N. This measure of variance is very useful in many statistical calculations, but, because of the squaring of the deviations, it is out of scale with the original data. The problem of scale is addressed by taking the square root of the variance to give the standard deviation, so compensating for the squaring of the deviations in the calculation of the variance. Thus, the standard deviation is √𝑆𝑆 ÷ 𝑁. When working with a sample of data, the formulae above tend to slightly underestimate the population standard deviation (or variance). The correction for this underestimate is to divide by N – 1, rather than N, yielding slightly higher values. Therefore, the formulae that are usually used to calculate these statistics are: 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = √𝑆𝑆 ÷ (𝑁 − 1) and 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑆𝑆 ÷ (𝑁 − 1) The standard deviation should always be reported for reasonably normally distributed data because it provides a good idea of the data’s variability. Figure 1 illustrates the percentage of scores expected to occur within each standard deviation. For example, 68% of the scores fall within one SD of the mean, 96% within two SDs, and 99.96% within three SDs. Data points more than three standard deviations from the mean are highly unlikely if the data are normally distributed, which is why these data points are often scrutinized and removed as outliers in a sample. Figure 1. A normal frequency distribution with standard deviations (SDs) noted The standard deviation also provides the means of calculating further useful statistics, including standardized (z) scores, standardized effect sizes such as Cohen’s d, and, in combination with the sample size, the standard error of the mean and confidence intervals. Catherine O. Fritz and Peter E. Morris CROSS-REFERENCES Descriptive statistics Distributions Effect size Histograms Normal Distribution Variance Z scores FURTHER READING Most good statistical textbooks, at any level, will address this topic well. Aron, A., Coups, E., & Aron, E. N. (2014). Statistics for the behavioral and social sciences: A brief course (5th ed.). Upper Saddle River, NJ: Pearson. Howell, D. C. (2010). Fundamental statistics for the behavioral sciences (7th ed.). Wadsworth. Kranzler, J. H. (2010). Statistics for the terrified (5th ed.). Upper Saddle River, NJ: Pearson.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Standard Deviation (SD)