Download Chapter 13

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Introductory Mathematics
& Statistics
Chapter 13
Measures of Variation
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-1
Learning Objectives
• Calculate common measures of variation
(including the range, interquartile range, mean
deviation and standard deviation) from
grouped and ungrouped data
• Calculate and interpret the coefficient of
variation
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-2
13.1 Introduction
• A measure of central tendency in itself is not sufficient to
describe a set of data adequately
• A measure of dispersion (or spread) of the data is usually
required
• This measure gives an indication of the internal variation of
the data—that is, the extent to which data items vary from
one another or from a central point
• Some reasons for requiring a measure of dispersion of a set
of data:
– As an indication of the reliability of the average value
– To assist in controlling unwanted variation
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-3
13.2 The range
• The simplest measure of dispersion is the range
• It is the difference between the largest and smallest values in
a set of data
Range = largest observation – smallest observation
• Examples of uses of range include
– Temperature fluctuations on a given day
– Movement of share prices
– Acceptable range of systolic and diastolic blood pressures
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-4
13.2 The range (cont…)
• Range is considered primitive as it considers only the
extreme values, which may not be useful indicators of the
bulk of the population
• Extreme values, called outliers, may often result from errors
of measurement
• Outliers are defined as values that are inconsistent with the
rest of the data
• Although the range is the quickest and easiest measure of
dispersion to calculate, its should be interpreted with some
caution
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-5
13.3 The interquartile range (midspread)
• Measures the range of the middle 50% of the values only
• Is defined as the difference between the upper and lower
quartiles
Interquartile range = upper quartile – lower quartile
= Q3 – Q1
• May be calculated from grouped frequency distributions that
contain open-ended class intervals
• It is usually only used with a large number of observations
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-6
13.4 The mean deviation
• The mean deviation takes into account the actual value of
each observation
• It measures the ‘average’ distance of each observation away
from the mean of the data
• It gives an equal weight to each observation
• It is generally more sensitive than the range or interquartile
range, since a change in any value will affect it
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-7
13.4 The mean deviation (cont…)
• The residual measures the actual deviation (or distance) of each
observation from the mean
• A set of x values has a mean of
x
• The residual of a particular x-value is:
Residual  x  x
Example
If the mean for a set of data is 3.22, find the residual for an
observation of 4.38
Solution
The residual of 4.38 is 4.38 – 3.22 = 1.16
Note: Residuals can be in the negative range. It shows that the
observation is below the mean
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-8
13.4 The mean deviation (cont…)
• The mean deviation is defined as the mean of these absolute
deviations:
xx

Mean deviation 
n
• To calculate the mean deviation
Step 1: Calculate the mean of the data
Step 2: Subtract the mean from each observation and record
the resulting differences
Step 3: Write down the absolute value of each of the differences
found in Step 2 (ignore their signs)
Step 4: Calculate the mean of the absolute values of the
differences found in step 3
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-9
13.4 The mean deviation (cont…)
Example
The batting scores of a cricketer was recorded over 10
completed innings to date.
His scores were: 32, 27, 38, 25, 20, 32, 34, 28, 40, 29
Calculate the mean deviation of the cricketers’ scores
Solution
Step 1
32  27    29
10
 30.5
x
The cricketers’ average number of runs is 30.5
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-10
13.4 The mean deviation (cont…)
Step 2 and 3 completed in the table
Step 4
Score
Deviation from
mean
Absolute value of
deviation
32
+1.5
1.5
27
-3.5
3.5



29
-1.5
 x  x   0
Mean
xx

deviation 
1.5
 x  x  47.0
n
47.0

10
 4.7
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-11
13.4 The mean deviation (cont…)
• Calculation of the mean deviation from a
frequency distribution
– If the data is in the form of a frequency distribution, the
mean deviation can be calculated
f xx

Mean deviation 
f
Where f = the frequency on an observation x
f
= the sum on the frequencies = n
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-12
13.5 The standard deviation
• The most commonly used measure of dispersion is the
standard deviation
• It takes into account every observation and measures the
‘average deviation’ of observations from mean
• It works with squares of residuals, not absolute values,
therefore it is easier to use in further calculations
• The values of the mean deviation and standard deviation
should be reasonably close, since they are both measuring
the variation of the observations from their mean
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-13
13.5 The standard deviation (cont…)
• Population standard deviation
– Uses squares of the residuals, which will eliminate the
effect of the signs, since squares of numbers cannot be
negative
Step 1: find the sum of the squares of the residuals
Step 2: find their mean.
Step 3: take the square root of this mean.
Standard deviation   
2


x

x

N
Where N = the size of the population
The square of the population standard deviation is called
the variance. Variance  2
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-14
13.5 The standard deviation (cont…)
• Sample standard deviation
– It is rare to calculate the value of
usually very large

since populations are
– It is far more likely that the sample standard deviation
(denoted by s) will be needed.
Sample standard deviation  s 
2


x

x

n 1
– Where: (n – 1) is the number of observations in the sample
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-15
13.5 The standard deviation (cont…)
• A note on the use of (n − 1) in formulae
– If the value of n is large, it will only make a slight
difference to the answer whether you divide by n or
(n − 1)
– To calculate the value of s from a sample the calculator
button will usually be indicated by one of  n−1 or xn−1 or
x or  written either on it or near it
– To calculate the value of  from a population, the
calculator key will usually be indicated by one of n or xn
or x or  written either on it or near it
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-16
13.5 The standard deviation (cont…)
• Important points about the standard deviation
– The standard deviation cannot be negative
– The standard deviation of a set of data is zero if, and only
if, the observations are of equal value
– The standard deviation can never exceed the range of
the data
– The more scattered the data, the greater the standard
deviation
– The square of the standard deviation is called the
variance
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-17
13.5 The standard deviation (cont…)
• Calculation of the sample standard deviation
Step 1: Calculate the mean
x
Step 2: For each x-value, find the value of the residual
Step 3: Square the residuals
xx
x  x 
2
Step 4: Calculate the sum of the squares of the residuals
Step 5: Divide the sum found in step 4 by (n – 1)
Step 6: Take the square root of the quantity found in step 5:
this is the sample standard deviation
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-18
13.5 The standard deviation (cont…)
• Calculation of the standard deviation from a
frequency distribution
– If the data are in the form of a frequency distribution,
No. Units
n
1
Frequency
f
85
2
192
3
123
Total
400
Total
– Calculate standard deviation using:
 f x  x 
2
s
 f 1
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-19
13.5 The standard deviation (cont…)
• Calculation of the standard deviation from a
grouped frequency distribution
– When calculating s from a grouped frequency distribution,
we should assume that the observations in each class
interval are concentrated at the midpoint of the interval
 f m  x 
2
s
– Where
 f 1
x
= the estimated mean of the same
m = the midpoint of the class interval
f = the frequency of the class interval
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-20
13.6 The coefficient of variation
• This is a measure of relative variability used to:
– measure changes that have occurred in a population over
time
– compare variability of two populations that are expressed in
different units of measurement
• It is expressed as a percentage rather than in terms of
the units of the particular data
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-21
13.6 The coefficient of variation
(cont…)
• The formula for the coefficient of variation (V) is:
s
V  100 %
x
Where
x
= the mean of the sample
s = the standard deviation of the sample
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-22
13.6 The coefficient of variation (cont…)
Example
Calculate the coefficient of variation for the price of 400 g cans
of pet food, given that the mean is 81 cents and s = 6.77 cents.
Interpret the results.
Solution
s
V  100 %
x
 6.77 
 100
%
 81 
 8.36%
This means that the standard deviation of the price of a 400g
can of pet food is 8.36% of the mean price.
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-23
13.7 Remarks
•
Among the more important characteristics of the
standard deviation are:
– It is the most frequently used measure of dispersion, and
because of its mathematical properties it has widespread
use in problems involving statistical inference
– If the mean cannot be calculated, neither can the standard
deviation
– Its value is affected by the value of every observation in the
data
– If the data have a number of extreme values, the value of
the standard deviation may be distorted so as not to be a
good ‘representative’ measure of dispersion
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-24
Summary
•
Among the more important characteristics of the standard deviation
are:
– It is the most frequently used measure of dispersion, and
because of its mathematical properties it has widespread use in
problems involving statistical inference.
– If the mean cannot be calculated. neither can the standard
deviation.
– Its value is affected by the value of every observation in the
data.
– If the data have a number of extreme values, the value of the
standard deviation may be distorted so as not to be a good
‘representative’ measure of dispersion.
Copyright  2010 McGraw-Hill Australia Pty Ltd
PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e
13-25