Download Standard Deviation (SD)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Corecursion wikipedia , lookup

Generalized linear model wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Probability box wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
STANDARD DEVIATION (SD)
A dataset or sample is reasonably described by where the data points are centered
(central tendency), how much spread or dispersion there is among the data points, and its
frequency distribution (i.e., the shape of its histogram). This information allows
interpretations and further calculations to be made from the data. Where the dataset
approximates a normal (bell shaped) distribution, the mean is the best measure of central
tendency, although medians are better central tendency measures for other distributions. If
the distribution is approximately normal then the standard deviation indicates the dispersion
of the data.
It might be expected that the estimate of dispersion would be based upon the average
of the deviations of each data point from the mean, ignoring whether the deviations were
positive or negative. However, some valuable statistical procedures (e.g., multiple linear
regression, ANOVA) rely on the square of the deviations rather than the absolute deviations.
Therefore, the most commonly reported measures of dispersion – the variance and the
standard deviation – are also based on the square of the deviations.
The variance is calculated by first finding the deviation of each score (X) from the
mean (M), [X – M], then squaring each deviation [(X – M)2], and then adding these squared
deviations together to obtain the sum of squares (SS):
𝑆𝑆 = βˆ‘(𝑋 βˆ’ 𝑀)2
The variance of the sample is the average of this SS, obtained by dividing the SS value by
the number of scores (N) in the sample. Thus, the variance is given by SS ο‚Έ N. This measure
of variance is very useful in many statistical calculations, but, because of the squaring of the
deviations, it is out of scale with the original data. The problem of scale is addressed by
taking the square root of the variance to give the standard deviation, so compensating for the
squaring of the deviations in the calculation of the variance. Thus, the standard deviation is
βˆšπ‘†π‘† ÷ 𝑁.
When working with a sample of data, the formulae above tend to slightly
underestimate the population standard deviation (or variance). The correction for this
underestimate is to divide by N – 1, rather than N, yielding slightly higher values. Therefore,
the formulae that are usually used to calculate these statistics are:
π‘†π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π·π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘› = βˆšπ‘†π‘† ÷ (𝑁 βˆ’ 1) and π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ = 𝑆𝑆 ÷ (𝑁 βˆ’ 1)
The standard deviation should always be reported for reasonably normally distributed
data because it provides a good idea of the data’s variability. Figure 1 illustrates the
percentage of scores expected to occur within each standard deviation. For example, 68% of
the scores fall within one SD of the mean, 96% within two SDs, and 99.96% within three
SDs. Data points more than three standard deviations from the mean are highly unlikely if the
data are normally distributed, which is why these data points are often scrutinized and
removed as outliers in a sample.
Figure 1. A normal frequency distribution with standard deviations (SDs) noted
The standard deviation also provides the means of calculating further useful statistics,
including standardized (z) scores, standardized effect sizes such as Cohen’s d, and, in
combination with the sample size, the standard error of the mean and confidence intervals.
Catherine O. Fritz and Peter E. Morris
CROSS-REFERENCES
Descriptive statistics
Distributions
Effect size
Histograms
Normal Distribution
Variance
Z scores
FURTHER READING
Most good statistical textbooks, at any level, will address this topic well.
Aron, A., Coups, E., & Aron, E. N. (2014). Statistics for the behavioral and social sciences:
A brief course (5th ed.). Upper Saddle River, NJ: Pearson.
Howell, D. C. (2010). Fundamental statistics for the behavioral sciences (7th ed.).
Wadsworth.
Kranzler, J. H. (2010). Statistics for the terrified (5th ed.). Upper Saddle River, NJ: Pearson.