Download Standard Deviation

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
DESCRIPTVE STATISTICS
Frequency distribution
Stem-and-leaf Graph
Mean, Median, Mode.
Range, Variance, Standard Deviation
Skewness, Kurtosis
Contents of the course

Descriptive statistics


Graph, table, mean and standard deviation
Inferential statistics





Probability and distribution
Hypothesis test
Analysis of Variation
Correlation and regression analysis
Other special topics
Types of Graphs
DESCRIPTVE STATISTICS

Graphical Summaries





Frequency distribution
Histogram
Stem and Leaf plot
Boxplot
√
√
Numerical Summaries



Location – mean, median, mode.
Spread – range, variance, standard deviation
Shape – skewness, kurtosis
DESCRIPTVE STATISTICS
Frequency Distribution

A frequency distribution shows the number of
observations falling into each of several ranges of
values. Frequency distributions are portrayed as
frequency tables, histograms, or polygons.
Frequency distributions can show either the actual
number of observations falling in each range or the
percentage of observations. In the latter instance,
the distribution is called a relative frequency
distribution.
DESCRIPTVE STATISTICS
Frequency Distribution

A frequency distribution uses a smooth curve to
connect the points and, similar to a graph, is plotted
on two axes: The horizontal axis from left to right
(or x axis) indicates the different possible values of
some variable (a phenomenon where observations
vary from trial to trial). The vertical axis from bottom
to top (or y axis) measures frequency or how many
times a particular value occurs.
DESCRIPTVE STATISTICS
Frequency Distribution
DESCRIPTVE STATISTICS
Normal distributions take the form of a symmetric bell-shaped curve. The standard
normal distribution is one with a mean of 0 and a standard deviation of 1
DESCRIPTVE STATISTICS
Frequency Distribution
Normal
J-shaped
+ve skewed
-ve skewed
Bimodal
DESCRIPTVE STATISTICS
Stem-and-leaf Graph
A stem-and-leaf graph or stemplot comes from the field of
exploratory data analysis. This type of graph is a good
choice if the data set is small. You use the data to
create the graph by dividing each observation of data
into a stem and a leaf.
The leaf consists of one digit and the stem consists of the
remaining digits.
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
Stem-and-leaf Graph
For example, 35 has stem 3 and leaf 5. The number
354 has stem 35 and leaf 4.
DESCRIPTVE STATISTICS
Stem-and-leaf Graph



To construct the graph, write the stems in a
column and the leaves in a second column in
increasing order.
Example: Scores for a pre-calculus exam that
counted 100 points were (from smallest to
largest) as follows:
33, 42, 49, 49, 53, 55, 55, 61, 63, 67, 68, 68, 69,
69, 72, 73, 74, 78, 80, 83, 88, 88, 88, 90, 92, 94,
94, 94, 94, 96, 100
DESCRIPTVE STATISTICS
Stem-and-leaf Graph
stem
The stemplot would look like this: 3
leaf
3
4
299
5
355
6
1378899
7
2348
8
03888
9
0244446
10
0
DESCRIPTVE STATISTICS
Stem-and-leaf Graph
To understand the stemplot, look at the second
row. You see 4 299. This represents the 42 and
the two 49s. The data itself actually shows us
the shape and distribution of the data. The
stemplot shows us that most scores fell in the
60s, 70s, 80s, and 90s. More than half of the
students received a score of 70 or better. A
little less than half received a score of 80 or
better. About one-fourth of the students
received a score of 90 or better.
Stem-and-Leaf Display: GPAs
Stem-and-leaf of GPAs N = 50
Leaf Unit = 0,10
3 2 222
8 2 55555
15 2 6667777
(14) 2 88888999999999
21 3 000001
15 3 22233
10 3 444555
4 3 67
2 3 8
1 4 0
O,2 was set as the increment
between the lines.
N = 50 values of the display
Leaf unit = 0,10 i.e. the stem unit = 1
Smallest value:
- a stem of 2 and
- a leaf of 2 = 2.2 GPA
- with 3 students
Largest value:
- a stem of 4 and
- a leaf of 0 = 4.0 GPA
- with 1 student
Median ≈ 2.9 GPA
DESCRIPTVE STATISTICS
Measure of Central
Tendency
How it’s Determined (N- number
of scores)
Data for which it’s Appropriate
Mode
The most frequently occurring
score is identified
•Data
Median
The scores are arranged in
order from smallest to largest,
and the middle score (when N
is an odd number) or the
midpoint between the two
middle scores (when N is an
even number) is identified.
•Data
Arithmetic mean
All the scores are added
together, and their sum is
divided by the total number (N)
of scores.
•Data
Geometric mean
All the scores are multiplied
together, and the nth root of
their product is computed.
Data on ratio scales
Data that fall in an ogive curve (e.g.,
growth data)
on nominal, ordinal, interval,
and ratio scales.
•Multimodal distribution (two or more
modes may be identified when a
distribution has multiple peaks)
on ordinal, interval, and ratio
scales
•Data that are highly skewed
on interval and ratio scales
•Data that fall in a normal distribution
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
Measure of Central
Tendency
How it’s Determined (Nnumber of scores)
Data for which it’s Appropriate
Range
The difference between the •Data on ordinal, interval, and
highest and lowest scores in ratio scales*
the distribution
Interquartile range
The difference between the •Data on ordinal, interval, and
25th and 75th percentiles
ratio scales
•Especially useful for highly
skewed data
Standard deviation
s = Σ(X – M)²
N
•Data
on interval and ratio scales
•Most appropriate for normally
distributed data
Variance
s² = Σ(X – M)²
N
•Data
on interval and ratio scales
•Most appropriate for normally
A measure of the average distance distributed data
between each of a set of data points
•Especially
useful
inferential
and their mean value; equal to the sum
procedures
(e.h.,
of the squares of the deviation from statistical
the mean value.
analysis of variance)
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
Standard Deviation
The standard deviation (SD) is a measure of spread
in your data. The larger the SD, the more spread
there is in your data.
Think in terms of
dispersion. The larger the SD, the more dispersion
there is in your data. The smaller the SD, the less
dispersion exists in your data.
The standard
deviation is a measure of variability around a mean
score. In statistical terms, the standard deviation is
the square root of a measure called the variance,
which is the average of the squares of the deviation
scores for the sample for a particular item.
Standard Deviation
A larger standard deviation (shown in light pink) indicates more scatter
-- less precision -- in the results. A smaller standard deviation (shown
in light blue) indicates less scatter. Both sets of results have the same
mean (the green line).
Standard Deviation
To illustrate the standard deviation and the type of insight
it provides, the following table presents scores for two
students, Bill and Tom, over their last ten college exams.
Standard Deviation
Both students ended up with an average exam
score of 80, as indicated by a mean of 80.0 for
each student. Note that the standard deviation
around Bill’s mean of 80.0 is 10.1, while the
standard deviation around Tom’s mean of 80.0 is
only 2.6. Obviously, by looking at the scores for
the two students, we can see that Tom is much
more consistent than Bill. Tom’s scores range
from a low of 76 to a high of 84 (a range of only 8
points), whereas Bill’s scores range from a low of
66 to a high of 96 (a range of 30 points).
Standard Deviation
If there is no spread or dispersion in your data,
then the SD is zero. While a zero SD would be
unlikely in a large sample, this is something that
could happen in a small sample of physicians
when rating administrator communication, for
example. If each physician gave the same rating
on an item, then that would mean there is no
spread or dispersion in the data at all. Thus, the
SD would be zero. On the other hand, if the
physician responses were dispersed evenly
across the rating scale, the SD would be larger.
Standard Deviation
Here is something else to remember. If the data
are normally distributed or shaped in a “bell
curve,” approximately 68% of the scores will fall
between one SD above the mean and one SD
below the mean. Furthermore, 95% of all scores
will fall between 2 SDs above and below the
mean. Finally, 99.7% of scores will fall between 3
SDs above and below the mean.
Standard Deviation
Standard Deviation
Standard Deviation
Standard Deviation
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
Normal distributions take the form of a symmetric bell-shaped curve. The standard
normal distribution is one with a mean of 0 and a standard deviation of 1
DESCRIPTVE STATISTICS
Skewness
The degree of departure from symmetry of a
distribution.
A positively skewed distribution has a "tail" which
is pulled in the positive direction.
A negatively skewed distribution has a "tail"
which is pulled in the negative direction.
Skewness
Skewness is defined as asymmetry in the
distribution of the sample data values.
Values on one side of the distribution tend
to be further from the 'middle' than values
on the other side.
For skewed data, the usual measures of
location will give different values, for
example,
mode<median<mean
would
indicate positive (or right) skewness.
Skewness
Positive (or right) skewness is more
common than negative (or left) skewness.
If there is evidence of skewness in the data,
we can apply transformations, for
example, taking logarithms of positive
skew data.
Measures of Shape: Skewness
Left skewed
Measures of Shape: Skewness
Right skewed
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
Skew is the tilt (or lack of it) in a distribution. The more common type is right
skew, where the tail points to the right. Less common is left skew, where the tail
is points left.
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
Skew is the tilt (or lack of it) in a distribution. The more common type is right
skew, where the tail points to the right. Less common is left skew, where the tail
is points left.
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
Kurtosis
The degree of peakedness of a distribution.
A normal distribution is a mesokurtic
distribution.
A pure leptokurtic distribution has a higher
peak than the normal distribution and has
heavier tails.
A pure platykurtic distribution has a lower
peak than a normal distribution and lighter
tails.
Measures of Shape: Kurtosis
Platykurtic
Measures of Shape: Kurtosis
Platykurtic
Measures of Shape: Kurtosis
Leptokurtic

Measures of Shape: Kurtosis
Leptokurtic

DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS
DESCRIPTVE STATISTICS