Download br7ch03 - Web4students

Document related concepts
no text concepts found
Transcript
Understandable Statistics
Seventh Edition
By Brase and Brase
Prepared by: Lynn Smith
Gloucester County College
Chapter Three
Averages and Variation
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
1
Measures of Central Tendency
• Mode
• Median
• Mean
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
2
The Mode
the value or property that occurs
most frequently in the data
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
3
Find the mode:
6, 7, 2, 3, 4, 6, 2, 6
The mode is 6.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
4
Find the mode:
6, 7, 2, 3, 4, 5, 9, 8
There is no mode for this data.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
5
The Median
the central value of an ordered
distribution
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
6
To find the median of raw data:
• Order the data from smallest to largest.
• For an odd number of data values, the
median is the middle value.
• For an even number of data values, the
median is found by dividing the sum of
the two middle values by two.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
7
Find the median:
Data:
5, 2, 7, 1, 4, 3, 2
Rearrange: 1, 2, 2, 3, 4, 5, 7
The median is 3.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
8
Find the median:
Data:
31, 57, 12, 22, 43, 50
Rearrange: 12, 22, 31, 43, 50, 57
The median is the average of the middle two values =
31  43
 37
2
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
9
The Mean
The mean of a collection of data is found by:
• summing all the entries
• dividing by the number of entries
mean
sum of all entries

number of entries
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
10
Find the mean:
6, 7, 2, 3, 4, 5, 2, 8
6  7  2  3  4  5  2  8 37
mean 
  4.625  4.6
8
8
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
11
Sigma Notation
•The symbol  means “sum the following.”
•  is the Greek letter (capital) sigma.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
12
Notations for mean
Sample mean
“x bar”
Population mean
x
Greek letter (mu)
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .

13
Number of entries
in a set of data
• If the data represents a sample, the
number of entries = n.
• If the data represents an entire
population, the number of entries = N.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
14
Sample mean
x
x
n
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
15
Population mean
x

N
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
16
Resistant Measure
a measure that is not influenced by
extremely high or low data values
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
17
Which is less resistant?
• Mean
• Median
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
The mean is less
resistant. It can be
made arbitrarily
large by increasing
the size of one value.
18
Trimmed Mean
a measure of center that is more
resistant than the mean but is still
sensitive to specific data values
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
19
To calculate a (5 or 10%)
trimmed mean
• Order the data from smallest to largest.
• Delete the bottom 5 or 10% of the data.
• Delete the same percent from the top of
the data.
• Compute the mean of the remaining 80 or
90% of the data.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
20
Compute a 10% trimmed mean:
15, 17, 18, 20, 20, 25, 30, 32, 36, 60
• Delete the top and bottom 10%
• New data list:
17, 18, 20, 20, 25, 30, 32, 36
• 10% trimmed mean =
 x 198

 24 .8
n
8
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
21
Measures of Variation
• Range
• Standard Deviation
• Variance
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
22
The Range
the difference between the largest
and smallest values of a
distribution
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
23
Find the range:
10, 13, 17, 17, 18
The range = largest minus smallest
= 18 minus 10 = 8
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
24
The standard deviation
a measure of the average variation
of the data entries from the mean
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
25
Standard deviation of a sample
s
 (x  x)
n 1
2
mean of the
sample
n = sample size
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
26
To calculate standard
deviation of a sample
•
•
•
•
•
•
Calculate the mean of the sample.
Find the difference between each entry (x) and the
mean. These differences will add up to zero.
Square the deviations from the mean.
Sum the squares of the deviations from the
mean.
Divide the sum by (n  1) to get the variance.
Take the square root of the variance to get
the standard deviation.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
27
The Variance
the square of the standard
deviation
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
28
Variance of a Sample
(
x

x
)

2
s 
n 1
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
2
29
Find the standard deviation and
variance
x
30
26
22
78
x x
4
0
4
mean=
26
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
(x  x )
Sum = 0
2
16
0
16
___
32
30
The variance
s 
2

( x  x)
n 1
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
2
= 32  2 =16
31
The standard deviation
s=
16  4
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
32
Find the mean, the
standard deviation and
variance
mean = 5
x
xx
(x - x)
4
1
1
5
0
0
5
0
0
7
2
4
4
1
1
25
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
6
33
2
The mean, the standard
deviation and variance
Mean
=5
S tan dard deviation  1.5  1.22
Variance
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
6

 1 .5
4
34
Computation formula for
sample standard
deviation:
s
SS x
n 1

 x

2
where
SS x   x
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
2
n
35
To find  x
2
Square the x values, then add.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
36
To find (  x )
2
Sum the x values, then square.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
37
Use the computing formulas to
find s and s2
x
x2
n=5
4
16
(Sx) 2 = 25 2 = 625
5
25
Sx2 = 131
5
25
SSx = 131 – 625/5 = 6
7
49
s2 = 6/(5 –1) = 1.5
4
25
16
131
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
s = 1.22
38
Population Mean and Standard
Deviation
x

population mean   
N
population
standard deviation   
2


x

x

N
where N  number of data values in the population
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
39
COEFFICIENT OF
VARIATION:
a measurement of the relative
variability (or consistency) of data
s

CV   100 or
 100
x

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
40
CV is used to
compare variability or
consistency
A sample of newborn infants had a mean weight of
6.2 pounds with a standard deviation of 1 pound.
A sample of three-month-old children had a mean
weight of 10.5 pounds with a standard deviation of
1.5 pounds.
Which (newborns or 3-month-olds) are more
variable in weight?
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
41
To compare variability,
compare Coefficient of Variation
For
newborns:
For 3month-olds:
CV = 16%
Higher CV:
more variable
CV = 14% Lower CV:
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
more consistent
42
Use Coefficient of Variation
To compare two groups of data,
to answer:
Which is more consistent?
Which is more variable?
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
43
CHEBYSHEV'S THEOREM
For any set of data and for any number k,
greater than one, the proportion of the
data that lies within k standard
deviations of the mean is at least:
1
1 
k
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
2
44
CHEBYSHEV'S THEOREM for k = 2
According to Chebyshev’s Theorem, at
least what fraction of the data falls
within “k” (k = 2) standard deviations of
the mean?
1
3
At least 1  2 2  4  75 %
of the data falls within 2 standard deviations of
the mean.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
45
CHEBYSHEV'S THEOREM for k = 3
According to Chebyshev’s Theorem, at
least what fraction of the data falls
within “k” (k = 3) standard deviations of
the mean?
1
8
At least 1  3 2  9  88 . 9 %
of the data falls within 3 standard deviations of
the mean.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
46
CHEBYSHEV'S THEOREM for k =4
According to Chebyshev’s Theorem, at
least what fraction of the data falls
within “k” (k = 4) standard deviations of
the mean?
1
15
At least 1  4 2  16  93 . 8 %
of the data falls within 4 standard deviations of
the mean.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
47
Using Chebyshev’s Theorem
A mathematics class completes an examination
and it is found that the class mean is 77 and the
standard deviation is 6.
According to Chebyshev's Theorem, between
what two values would at least 75% of the
grades be?
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
48
Mean = 77
Standard deviation = 6
At least 75% of the grades would be in the
interval:
x  2 s to x  2 s
77 – 2(6) to 77 + 2(6)
77 – 12 to 77 + 12
65 to 89
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
49
Mean and Standard Deviation of
Grouped Data
• Make a frequency table
• Compute the midpoint (x) for each class.
• Count the number of entries in each class
(f).
• Sum the f values to find n, the total
number of entries in the distribution.
• Treat each entry of a class as if it falls at
the class midpoint.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
50
Sample Mean for a Frequency
Distribution
xf

x 
n
x = class midpoint
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
51
Sample Standard Deviation for
a Frequency Distribution
s
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
 ( x  x) f
2
n 1
52
Computation Formula for
Standard Deviation for a
Frequency Distribution
SS x
s
n 1
where SSx   x
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .

xf 

f
2
2
n
53
Calculation of the mean of
grouped data
Ages:
f
30 - 34
x
xf
32
128
37
185
42
84
47
xf = 820
423
4
35 - 39
5
40 - 44
2
f = 20
45 - 49
9
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
54
Mean of Grouped Data
xf  xf

x

n
f
820

 41 . 0
20
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
55
Calculation of the standard
deviation of grouped data
Ages:
f
x
x–
mean
(x –
mean)2
32
30 - 34
–9
81
37
4
80
–4
16
42
35 - 39
2
1
1
47
5
f =
20
40 - 44 Mean
324
6
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
2
(x – mean)2
f
324
36
 (x – mean)2 f
= 730
56
Calculation of the standard
deviation of grouped data
  x  x   730
f = n = 20
2
( x  x) f

s

2
n 1
730
20  1
 38 . 42  6 . 20
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
57
Computation Formula for
Standard Deviation for a
Frequency Distribution
SS x
s
n 1
where SSx   x
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .

xf 

f
2
2
n
58
Computation Formula for
Standard Deviation
x
f
xf
x2f
32
4
128
4096
5
37
42
47
185
2
9
6845
3528
84
f = 20 xf = 820
423
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
19881
x2f =
34350
59
Computation Formula for
Standard Deviation for a
Frequency Distribution
where
SS
  x f 
2
x

xf
n

2

820 2
34350 
 730
20
SS x
730
s 

 6 . 20
n1
20  1
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
60
Weighted Average
Average calculated where some of
the numbers are assigned more
importance or weight
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
61
Weighted Average
xw

Weighted Average 
w
where w  the weight of the data value x.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
62
Compute the Weighted Average:
•
•
•
•
•
•
Midterm grade = 92
Term Paper grade = 80
Final exam grade = 88
Midterm weight = 25%
Term paper weight = 25%
Final exam weight = 50%
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
63
Compute the Weighted Average:
• Midterm
• Term Paper
• Final exam
x
92
80
88
w
.25
.25
.50
1.00
xw
23
20
44
87
 xw  87  87  Weighted Average
 w 1.00
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
64
Percentiles
For any whole number P (between 1 and
99), the Pth percentile of a distribution is
a value such that P% of the data fall at or
below it.
The percent falling above the Pth percentile
will be (100 – P)%.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
65
Percentiles
60% of data
P 40
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
Highest value
Lowest value
40% of data
66
Quartiles
• Percentiles that divide the data into
fourths
• Q1 = 25th percentile
• Q2 = the median
• Q3 = 75th percentile
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
67
Q1
Median
= Q2
Q3
Highest value
Lowest value
Quartiles
Inter-quartile range = IQR = Q3 — Q1
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
68
Computing Quartiles
• Order the data from smallest to largest.
• Find the median, the second quartile.
• Find the median of the data falling below
Q2. This is the first quartile.
• Find the median of the data falling above
Q2. This is the third quartile.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
69
Find the quartiles:
12
23
41
15
24
45
16
25
51
16
30
17
32
18
33
22
33
22
34
The data has been ordered.
The median is 24.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
70
Find the quartiles:
12
23
41
15
24
45
16
25
51
16
30
17
32
18
33
22
33
22
34
The data has been ordered.
The median is 24.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
71
Find the quartiles:
12
23
41
15
24
45
16
25
51
16
30
17
32
18
33
22
33
22
34
For the data below the median, the median is 17.
17 is the first quartile.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
72
Find the quartiles:
12
23
41
15
24
45
16
25
51
16
30
17
32
18
33
22
33
22
34
For the data above the median, the median is 33.
33 is the third quartile.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
73
Find the interquartile range:
12
23
41
15
24
45
16
25
51
16
30
17
32
18
33
22
33
22
34
IQR = Q3 – Q1 = 33 – 17 = 16
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
74
Five-Number Summary of Data
•
•
•
•
•
Lowest value
First quartile
Median
Third quartile
Highest value
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
75
Box-and-Whisker Plot
a graphical presentation of the fivenumber summary of data
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
76
Making a
Box-and-Whisker Plot
• Draw a vertical scale including the lowest
and highest values.
• To the right of the scale, draw a box from
Q1 to Q3.
• Draw a solid line through the box at the
median.
• Draw lines (whiskers) from Q1 to the
lowest and from Q3 to the highest values.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
77
Construct a
Box-and-Whisker Plot:
12
23
41
15
24
45
16
25
51
16
30
17
32
18
33
22
33
Lowest = 12
Q1 = 17
median = 24
Q3 = 33
22
34
Highest = 51
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
78
Box-and-Whisker Plot
60 55 50 45 40 35 30 25 20 15 -
Lowest = 12
Q1 = 17
median = 24
Q3 = 33
Highest = 51
10 Copyright (C) 2002 Houghton Mifflin Company. All rights reserved .
79