Download 1332CentralTendency&Dispersion.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Elementary mathematics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Lecture 4.3
Instruction: Measures of Central Tendency
This lecture discusses three statistics of numerical data sets called measures of central
tendency.
A measure of central tendency is a statistic that assigns a numerical
value as representative of an entire data set.
One measure of central tendency is the arithmetic mean. The symbol x-bar, x , denotes
the arithmetic mean of a sample set. The arithmetic mean, defined below, can be thought of as
the average.
For a given numerical set of data S = { x1 , x2 ,… , xn } with n elements, the
arithmetic mean of the set is given by the formula:
∑x
x=
.
n
The arithmetic mean has three significant characteristics. First, changing the value of any score
or adding to the data set a new score not equal to the mean, will change the mean of the data set.
Second, if some constant value c is added to each value in the data set, the mean changes to
x + c . Third, if some constant value c is multiplied by each value in the data set, the mean
changes to c ⋅ x
A second measure of central tendency is the median. The median, defined below, is the
midpoint of the distribution of the data set.
For data set S arranged in ascending order, the median is the value that divides the
data set exactly in half, and exactly 50% of the data will be equal to or less than the
median. If ( n + 1) 2 is an integer, it equals the position of the median. If ( n + 1) 2
is a not an integer, the position of the median is the midpoint between the score in
the n 2 position and the score in the ( n + 2 ) 2 position.
If n ( S ) is odd for a sample S of non-rounded data arranged in ascending order, the median is the
middle number in S. If n ( S ) is even for a sample S of non-rounded data arranged in ascending
order, the median is the mean of the two middle numbers.
The third measure of central tendency is the mode. The mode, defined below, is the most
common number in a numerical data set.
For data set S with some frequency f k greater than any other frequency f j , the
mode is the value with the greatest frequency.
According to the definition above, there is no mode in a numerical data set that contains data
values such that the frequencies of all the data values are equal. If, however, there exists any one
or more frequencies greater than one or more other frequencies, the data set has a mode, and the
Lecture 4.3
mode equals the data value (or values) with the greatest frequency. Data sets with multiple
modes are said to be multimodal. Data sets with two modes are said to be bimodal.
Consider a sample V = {6, 5, 2, 12, 1, 3, 2, 4, 0, 4, 13, 6, 6, 7, 1, 6} . To find the three
measures of central tendency, we must find the arithmetic mean, the median, and the mode.
Arranging the data set in ascending order, will help identify frequencies and the median:
V = {0, 1, 1, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 12, 13} .
The data point 6 appears in the data set the most (has the greatest frequency), so the mode equals
6. The arithmetic mean equals the ratio of the sum of the data points to the number of data
points as computed below.
x=
0 + 1 + 1 + 2 + 2 + 3 + 4 + 4 + 5 + 6 + 6 + 6 + 6 + 7 + 12 + 13 78
=
= 4.875
16
16
Since n (V ) is even, the median equals the mean of the two middle numbers as computed below.
median =
4+5 9
= = 4.5
2
2
In summary, for the given data set V, we have the three measures of central tendency:
mean = 4.875, median = 4.5, & mode = 6.
Consider a larger set of data S displayed by the frequency distribution below.
x 22 23 24 25 26 27 28 29 30 31 32 33
f 5 3 7 1 1 2 4 10 4 1 1 1
Since the frequency distribution organizes the data set, finding the three measures of central
tendency for the data set is not much more difficult for S than it was for V; even though,
n ( S ) > n (V ) . Note that n ( S ) = ∑ f = 5 + 3 + 7 + 1 + 1 + 2 + 4 + 10 +4 + 1 + 1 + 1 = 40 . To find the
mode, select the data value with the greatest frequency, which is 29. To find the median, start by
calculating its position: ( 40 + 1) 2 = 20.5. Since position of the median is 20.5, the median
equals the average of the 20th and 21st values in the data set arranged in ascending order:
( 28 + 28) 2 = 28. To find the arithmetic mean, calculate the ratio of the sum of the data points
to the number of data points as below.
x=
∑ f ⋅ x 5 ⋅ 22 + 3 ⋅ 23 + 7 ⋅ 24 + 25 + 26 + 2 ⋅ 27 + 4 ⋅ 28 + 10 ⋅ 29 + 4 ⋅ 30 + 31 + 32 + 33
=
= 26.75
40
∑f
In summary, for the given data set S, we have the three measures of central tendency:
mean = 26.75, median = 28, & mode = 29.
Application Exercise 4.3
Problems
Suppose NASA studies the effects of micro-gravity on the immune system. As part of this study,
NASA collects thirty blood samples from astronauts after six consecutive weeks in orbit and
records the number of white cells in thousands per cubic millimeter below.
3.6
5.9
6.3
5.1
5.0
7.2
5.2
9.3
8.1
7.1
9.9
9.2
5.9
9.9
5.7
7.9
9.9
8.4
6.0
8.5
6.7
7.9
7.7
4.4
8.0
4.7
6.9
7.8
9.1
4.9
#1
Calculate the mean number of white cells in thousands per cubic millimeter.
#2
Identify the median number of white cells in thousands per cubic millimeter.
#3
Identify the mode of the sample.
#4
Assume every measurement given above is actually eight times greater than the given
amount. What would the new mean be?
#1 7.073
#2 15.5
#3 9.9
#4 56.586
Lecture 4.3
Instruction: Measures of Central Tendency
This lecture discusses three statistics of numerical data sets called measures of central
tendency.
A measure of central tendency is a statistic that assigns a numerical
value as representative of an entire data set.
One measure of central tendency is the arithmetic mean. The symbol x-bar, x , denotes
the arithmetic mean of a sample set. The arithmetic mean, defined below, can be thought of as
the average.
For a given numerical set of data S = { x1 , x2 ,… , xn } with n elements, the
arithmetic mean of the set is given by the formula:
∑x
x=
.
n
The arithmetic mean has three significant characteristics. First, changing the value of any score
or adding to the data set a new score not equal to the mean, will change the mean of the data set.
Second, if some constant value c is added to each value in the data set, the mean changes to
x + c . Third, if some constant value c is multiplied by each value in the data set, the mean
changes to c ⋅ x
A second measure of central tendency is the median. The median, defined below, is the
midpoint of the distribution of the data set.
For data set S arranged in ascending order, the median is the value that divides the
data set exactly in half, and exactly 50% of the data will be equal to or less than the
median. If ( n + 1) 2 is an integer, it equals the position of the median. If ( n + 1) 2
is a not an integer, the position of the median is the midpoint between the score in
the n 2 position and the score in the ( n + 2 ) 2 position.
If n ( S ) is odd for a sample S of non-rounded data arranged in ascending order, the median is the
middle number in S. If n ( S ) is even for a sample S of non-rounded data arranged in ascending
order, the median is the mean of the two middle numbers.
The third measure of central tendency is the mode. The mode, defined below, is the most
common number in a numerical data set.
For data set S with some frequency f k greater than any other frequency f j , the
mode is the value with the greatest frequency.
According to the definition above, there is no mode in a numerical data set that contains data
values such that the frequencies of all the data values are equal. If, however, there exists any one
or more frequencies greater than one or more other frequencies, the data set has a mode, and the
Lecture 4.3
mode equals the data value (or values) with the greatest frequency. Data sets with multiple
modes are said to be multimodal. Data sets with two modes are said to be bimodal.
Consider a sample V = {6, 5, 2, 12, 1, 3, 2, 4, 0, 4, 13, 6, 6, 7, 1, 6} . To find the three
measures of central tendency, we must find the arithmetic mean, the median, and the mode.
Arranging the data set in ascending order, will help identify frequencies and the median:
V = {0, 1, 1, 2, 2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 12, 13} .
The data point 6 appears in the data set the most (has the greatest frequency), so the mode equals
6. The arithmetic mean equals the ratio of the sum of the data points to the number of data
points as computed below.
x=
0 + 1 + 1 + 2 + 2 + 3 + 4 + 4 + 5 + 6 + 6 + 6 + 6 + 7 + 12 + 13 78
=
= 4.875
16
16
Since n (V ) is even, the median equals the mean of the two middle numbers as computed below.
median =
4+5 9
= = 4.5
2
2
In summary, for the given data set V, we have the three measures of central tendency:
mean = 4.875, median = 4.5, & mode = 6.
Consider a larger set of data S displayed by the frequency distribution below.
x 22 23 24 25 26 27 28 29 30 31 32 33
f 5 3 7 1 1 2 4 10 4 1 1 1
Since the frequency distribution organizes the data set, finding the three measures of central
tendency for the data set is not much more difficult for S than it was for V; even though,
n ( S ) > n (V ) . Note that n ( S ) = ∑ f = 5 + 3 + 7 + 1 + 1 + 2 + 4 + 10 +4 + 1 + 1 + 1 = 40 . To find the
mode, select the data value with the greatest frequency, which is 29. To find the median, start by
calculating its position: ( 40 + 1) 2 = 20.5. Since position of the median is 20.5, the median
equals the average of the 20th and 21st values in the data set arranged in ascending order:
( 28 + 28) 2 = 28. To find the arithmetic mean, calculate the ratio of the sum of the data points
to the number of data points as below.
x=
∑ f ⋅ x 5 ⋅ 22 + 3 ⋅ 23 + 7 ⋅ 24 + 25 + 26 + 2 ⋅ 27 + 4 ⋅ 28 + 10 ⋅ 29 + 4 ⋅ 30 + 31 + 32 + 33
=
= 26.75
40
∑f
In summary, for the given data set S, we have the three measures of central tendency:
mean = 26.75, median = 28, & mode = 29.
Lecture 4.4
Contemporary Mathematics
Instruction: Measures of Dispersion
This lecture discusses three statistics of numerical data sets called measures of
dispersion. Consider the two samples below each with the same mean and median.
A = {47, 50, 53}
B = {0, 50, 100}
For both sets, x = 50. For sample A, the mean is a good estimate for any score found in the set,
but the mean is not a good estimate for any score found in sample B. The scores in sample B are
spread further apart than those in sample A. Sample B is said to have greater variability.
Statistics that measure the magnitude of variability are called measures of dispersion.
A measure of dispersion is a statistic that assigns a numerical value
to describe the variability of a data set. Variability refers to the
spread of a data set. A measure of dispersion measures how spread
out or how widely dispersed a set of data is.
One particular measure of dispersion is the range. The range, defined below, is the
distance between the largest and smallest values in a sample.
The range is the difference of the largest and smallest values in a sample.
The range of set A above equals six because 53 − 47 = 6. The range of set B above equals 100
because 100 − 0 = 100.
A second measure of dispersion is the sample variance. To discuss variance, we must
first discuss a deviation and the squares of deviations.
Deviation equals distance from the mean. A deviation score equals x − x .
According to the definition above, the deviations of scores below the mean are negative, and the
deviations of scores above the mean are positive. The table below shows the deviations for set
A.
x x−x
47
–3
50
0
53
3
Scores below the mean have negative deviations. Scores above the mean have positive
deviations. Scores equal to the mean have zero deviations. While deviations can be positive or
(
)
2
negative depending on the position of the respective score, the squares of deviations, x − x ,
are always positive.
Lecture 4.4
To calculate sample variance, we must calculate the deviation of each score in the sample as
above as well as the square of each deviation as below.
x
x−x
( x − x)
47
50
53
–3
0
3
9
0
9
2
The population variance equals the mean of the sum of the squares of the deviations.
The sample variance equals an estimate of the population variance given by the formula in the
box below.
The sample variance, denoted var, equals the ratio:
var =
(
∑ x−x
)
n −1
2
.
The sample variance for sample A = {47, 50, 53} is calculated below.
var =
9 + 0 + 9 18
=
=9
3 −1
2
The third measure of dispersion is the standard deviation, which equals the square root of the
variance.
The standard deviation, denoted s, is a distance from the mean that equals
the square root of the variance:
s = var =
(
∑ x−x
)
2
.
n −1
The standard deviation measures the typical or standard distance of scores
in the sample from the mean.
According to the definition above, widely dispersed data sets have large standard deviations.
Indeed, the larger the sample's standard deviation, the more widely dispersed are the elements in
the sample. The standard deviation of sample A = {47, 50, 53} is given here: s = 9 = 3 .
The standard deviation has two key characteristics. First, adding a constant to each score
in a sample will not change the standard deviation. Thus, if A* = {46, 49, 52} , then s = 3.
Second, multiplying each score by a constant causes the standard deviation to be multiplied by
the same constant. Thus, if A* = {94, 100, 106} , then s = 6.
Application Exercise 4.4
Problems
Suppose NASA studies the effects of micro-gravity on the immune system. As part of this study,
NASA collects thirty blood samples from astronauts after six consecutive weeks in orbit and
records the number of white cells in thousands per cubic millimeter below.
3.6
5.9
6.3
5.1
5.0
7.2
5.2
9.3
8.1
7.1
9.9
9.2
5.9
9.9
5.7
7.9
9.9
8.4
6.0
8.5
6.7
7.9
7.7
4.4
8.0
4.7
6.9
7.8
9.1
4.9
#1
Calculate the range of the sample.
#2
Calculate the variance of the sample.
#3
Calculate the standard deviation of the sample.
#4
Assume every measurement in the sample is actually four times greater than the given
amount. What would the new standard deviation be?
#1 6.3
#2 var = 3.22
#3 s ≈ 1.79
#4 s ≈ 7.18
Assignment 4.4
Problems
#1
Which statistic equals the difference between the smallest and largest values in a sample?
#2
Which statistic measures the typical or standard distance of scores in the sample from the
mean.
#3
Find the range and standard deviation for the sample below. Round answers to nearest
hundredth.
{206.3, 210.4, 209.3, 211.1, 210.8, 213.5, 212.6, 210.5, 211.0, 214.2}
#4
Find the standard deviation for the data set displayed by the frequency distribution below.
Round answers to nearest hundredth.
Value
9
7
5
3
1
#5
Frequency
3
4
7
5
2
Consider the sample S = { x1 , x2 , x3 } whose standard deviation equals 5. What is the
standard deviation of a data set comprised of three datum:
{6 ⋅ x1 ,
6 ⋅ x2 , 6 ⋅ x3 } ?