Download Statistics in Applied Science and Technology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics in Applied Science and
Technology
Chapter 4 Summarizing Data
July, 2000
Guang Jin
Key Concepts in This Chapter
 Mean
 Median
 Mode
 Range
 Standard Deviation
 Variance
 Coefficient of Variation
July, 2000
Guang Jin
Measures of Central Tendency
 Central tendency - the tendency of a set
of data to center around certain values.
 The three most common values are the
mean, the median, and the mode.
July, 2000
Guang Jin
The Mean
 The arithmetic mean (or simply, mean) is
computed by summing all the observations in
the sample and dividing the sum by the
number of observations.
 Symbolically, the mean x
n
x
July, 2000
x
i 1
n
i
x1 is the first and xi is the
ith in a series of observations.
n is the total number of
observations
Guang Jin
The Mean (Continued)
 The arithmetic mean may be considered the
balance point, or fulcrum, in a distribution.
 The arithmetic mean is the point that
balances the positive and negative
deviations from the fulcrum.
 The mean is affected by values of each
observations of the distribution and may be
distorted when extreme values exist.
July, 2000
Guang Jin
The Median
 Median is defined as the middle value when
observations are ordered.
 Median is the value above which there are
the same number of observations as below.
 For an even number of observations, the
median is the average of the two
middlemost values.
July, 2000
Guang Jin
The Mode
 The mode is the observation that occurs
most frequently.
 Mode can be read from a graph as that value
on the horizontal axis that corresponds to
the peak of the distribution.
July, 2000
Guang Jin
Which Average Should You Use
for Quantitative Data?
 When a distribution of observation is normal or
not too skewed, the values of the mode, the
median and the mean are same or similar, and any
of them can be used to describe central tendency.
 When a distribution is skewed, appreciable
difference between the values of mean and
median, therefore both the mean and median
should be reported.
July, 2000
Guang Jin
Measures of central tendency for
Qualitative Data
 The mode always can be used with
qualitative data
 Median can be used whenever the
qualitative data is ordinal
 Mean is not appropriate for qualitative
data
July, 2000
Guang Jin
Measures of Variation
 Measure of variation (or variability) is
important to know whether observations
tend to be quite similar (homogeneous) or
whether they vary considerably
(heterogeneous).
 Three most common measures of variation
include the range, the standard deviation,
and the variance.
July, 2000
Guang Jin
Range
 The range is defined as the difference in
value between the highest (maximum) and
lowest (minimum) observation:
Range = X max - X min
July, 2000
Guang Jin
Standard Deviation and Variance
 By far the most widely used measure of variation
is the standard deviation, represented by symbol
s.
 Standard deviation is the square root of the
variance (represented by symbol s2) of the
observation.
 The larger the standard deviation and variance, the
more heterogeneous the distribution.
July, 2000
Guang Jin
Variance
 The variance (s2) is computed by squaring
each deviation from the mean, adding them
up, and dividing their sum by one less than
n, the sample size:
n
s2 
July, 2000
 (x  x)
i 1
2
i
n 1
Guang Jin
Standard Deviation
 The standard deviation (s, sometimes
represented by SD) is computed by
extracting the square root of the variance:
s s
2
 The units of the standard deviation is the
same as the unites of raw data.
July, 2000
Guang Jin
Important Generalizations
 For most frequency distributions, a majority
(often as many as 68%) of all observations
are within one standard deviation on either
side of the mean.
 For most frequency distributions, a small
minority (often as many as 5%) of all
observations deviate more than two standard
deviations on either side of the mean.
July, 2000
Guang Jin
Variability for Qualitative Data
 For qualitative data can not be ordered,
measures of variability are nonexistent.
 For qualitative data can be ordered, it is
appropriate to describe variability by
identifying extreme observations.
July, 2000
Guang Jin
Coefficient of Variation
 Coefficient of variation (represented by CV) is
defined as the ratio of the standard deviation to the
absolute value of the mean, expressed as a
percentage:
 CV depicts the size of the standard deviation
relative to its mean and can be used to compare
the relative variation of even unrelated quantities.
July, 2000
Guang Jin
Equations for Population and Sample
Means and Standard Deviation
n
x
Mean
x
Variance
s 
July, 2000
N
i
i 1

n
 (x  x)
i 1
2
i
n 1
s s
x
i 1
i
N
N
n
2
Standard
deviation
Population
Sample
Quantity
2
Guang Jin
2 
2
(
x


)
 i
i 1
N
  2