Download Investigation 1: Deviation from the Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Measures of Spread: Standard Deviation
So far in our study of numerical measures used to describe data sets, we have focused on the mean and the
median. These measures of center tell us the most typical value of the data set. If we were asked to make
a prediction about a member of a data set, we would use a measure of center to predict that value. However,
measures of center do not give us the complete picture.
MAD, Standard Deviation, Range, and Interquartile Range are measures of spread, tell us how
much values typically vary from the center.
Consider the following test scores:
Student Test 1 Test 2 Test 3 Test 4
Johnny
65
82
93
100
Will
82
86
88
84
Anna
80
99
73
88
Who is the best student? How do you know?
Thinking about the Situation
Discuss the following with your partner or group. Write your answers on your own paper. Be
prepared to share your answers with the class.
1)
2)
3)
What is the mean test score for each student? 85 points
Based on the mean, who is the best student? Answers will vary; since they all have the same mean, it is
hard to use the mean to answer this question.
If asked to select one student, who would you pick as the best student? Explain. Some may look at
Johnny’s increasing trend over time. Some may look at Will’s consistency. Some may note Anna’s one
low grade. Be sure that arguments for each student being the “best” come out, even if you have to be
the one who makes them. The point is, just using the mean to describe each student is not enough. I
think that we can all agree that they are not “equal” in their test performance. We need more
information than just the typical test score. One thing to look at is how consistent each student is, and
measures of spread will give us that information.
Investigation 1: Deviation from the Mean
Discuss the following with your partner or group. Write your answers on your own paper. Be
prepared to share your answers with the class.
Usually we calculate the mean, or average, test score to describe how a student is doing. Johnny, Will, and
Anna all have the same average. However, these three students do not seem to be “equal” in their test
performance. We need more information than just the typical test score to describe how they are doing. One
thing we can look at is how consistent each student is with their test performance. Does each student tend to
do about the same on each test, or does it vary a lot from test to test? Measures of spread will give us that
information. In statistics, deviation is the amount that a single data value differs from the mean.
4)
Complete the table below by finding the deviation from the mean for each test score for each student.
Score
𝑥
Mean
𝑥̅
Deviation from
the Mean
𝑥 − 𝑥̅
Score
𝑥
Johnny
Mean
𝑥̅
Deviation from
the Mean
𝑥 − 𝑥̅
Will
Test 1
65
85
-20
Test 1
82
85
-3
Test 2
82
85
-3
Test 2
86
85
1
Test 3
93
85
8
Test 3
88
85
3
Test 4
100
85
15
Test 4
84
85
-1
Sum of Deviations
0
Score
𝑥
Deviation from
the Mean
𝑥 − 𝑥̅
Mean
𝑥̅
Sum of Deviations
0
Anna
Test 1
80
85
-5
Test 2
99
85
14
Test 3
73
85
-12
Test 4
88
85
3
Sum of Deviations
5)
0
How does the sum of the deviations from the mean relate to the mean being the balance point for a set
of data? The mean is the balance point, so the distances between the mean and each point balance in
either direction.
Investigation 2: Mean Absolute Deviation
Discuss the following with your partner or group. Write your answers on your own paper. Be
prepared to share your answers with the class.
One way to measure consistency is to find the average deviation from the mean. In other words, how far do
most values in a data set fall from the mean? One way to answer this question would be to find the average
deviation, or distance, that the data values fall from the mean. So we would add up the deviations to find the
total deviation and then divide by the number of data values to find the mean deviation. However, the fact that
the deviations from the mean always add up to zero is a problem. No matter what we divide zero by, we always
get zero! When talking about spread, a value of zero indicates that there is no spread, or variability. One way
to fix this problem is to look at only the distances from the mean, and not their directions as indicated by the
sign of the deviation (positive or negative). We can take the absolute value of the distances and then find the
average distance.
6)
Complete the table below by filling in the deviation from the mean for each test score for each student
that you calculated in Investigation 1. Then find the absolute value of each deviation.
Deviation from Absolute Deviation
the Mean
from the Mean
|𝑥 − 𝑥̅ |
𝑥 − 𝑥̅
Johnny
Deviation from Absolute Deviation
the Mean
from the Mean
|𝑥 − 𝑥̅ |
𝑥 − 𝑥̅
Will
Test 1
-20
20
Test 1
-3
3
Test 2
-3
3
Test 2
1
1
Test 3
8
8
Test 3
3
3
Test 4
15
15
Test 4
-1
1
Sum
46
Sum
8
Mean of Absolute
Deviation
46/4 = 11.5
Mean of Absolute
Deviation
8/4 = 2
Deviation from Absolute Deviation
the Mean
from the Mean
|𝑥 − 𝑥̅ |
𝑥 − 𝑥̅
Anna
Test 1
-5
5
Test 2
14
14
Test 3
-12
12
Test 4
3
3
Sum
34
Mean of Absolute
Deviation
34/4 = 8.5
The average of the absolute deviations from the mean is called the Mean Absolute Deviation.
7)
What does the Mean Absolute Deviation (MAD) tell you about each student? Is there one student who
seems to be more consistent than the others? The MAD tells us the average deviation from the mean for
each student, telling us how much their test scores typically vary from their average test score. Will’s
MAD is the lowest, indicating that his test scores are the least spread out.
8)
Interpret the MAD for Johnny in context. Johnny’s test scores typically fall within 11-12 points of his
average score of 85. So Johnny’s scores typically vary anywhere from 74 to 96.
Investigation 3: Calculating the Standard Deviation
Below is the formula for calculating the standard deviation. It looks pretty complicated, doesn’t it? Let’s break
it down step by step so we can see how it is finding the “average deviation from the mean.”
( x   ) 2

n
Let’s start with Johnny’s data.
Step 1: In the table below, record the data values (test scores) in the second column labeled “Value.” The x is
used to denote a value from the data set. The first value is written for you.
Step 2: Find the mean of the test scores (we did this in the “Thinking About the Situation”) and record at the
bottom of the second column next to the symbol . Mu (pronounced “mew”) is the lowercase Greek letter that
later became our letter “m.”
 is another symbol that we use for mean (in addition to 𝑥̅ ).
Step 3: In the third column, find the deviation from the mean for each test score by taking each test score and
subtracting the mean. The first difference has been done for you.
Step 4: Add the values in the third column to find the sum of the deviations from the mean. If you have done
everything correctly so far, the sum should be zero. The capital Greek letter , called sigma, is a symbol that is
used to indicate the sum.
Step 5: Square each deviation to make it positive and record these values in the last column of the table. The
first value is done for you.
Step 6: Find the sum of the squared deviations by adding up the values in the fourth column and putting the
sum at the bottom of the column. This is the sum of the squared deviations from the mean.
Step 7: Find the average of the squared deviations from the mean by dividing the sum of column four by the
number of data values, n, (the number of test scores).
Step 8: “Un-do” the squaring by taking the square root. Now you have found the standard deviation! The
symbol for standard deviation is the lower-case letter sigma, .
Johnny’s Data
Squared Deviations from the
Mean
(Value – Mean)2
(𝑥 − 𝜇)2
Test
Value (𝑥)
Deviations from the Mean
Value – Mean
(𝑥 − 𝜇)
1
65
65-85 = -20
(-20)2 = 400
2
82
82-85 = -3
(-3)2 = 9
3
93
93-85 = 8
(8)2 = 64
4
100
100-85 = 15
(15)2 = 225
Sum ∑(𝑥 − 𝜇) = 0
Mean 𝜇 = 85
Sum ∑(𝑥 − 𝜇)2 = 698
Average of Squared ∑(𝑥 − 𝜇)2 698
=
= 174.5
Deviations
𝑛
4
Square Root of
∑(𝑥 − 𝜇)2
Average of Squared √
= √174.5 ≈ 13.2
𝑛
Deviations
Now repeat the process with Will and Anna’s data.
Will’s Data
Squared Deviations from the
Mean
(Value – Mean)2
(𝑥 − 𝜇)2
Test
Value (𝑥)
Deviations from the Mean
Value – Mean
(𝑥 − 𝜇)
1
82
-3
9
2
86
1
1
3
88
3
9
4
84
-1
1
Mean 𝜇 = 85
Sum ∑(𝑥 − 𝜇) = 0
Sum ∑(𝑥 − 𝜇)2 = 20
Average of Squared ∑(𝑥 − 𝜇)2 20
=
=5
Deviations
𝑛
4
Square Root of
∑(𝑥 − 𝜇)2
Average of Squared √
= √5 ≈ 2.2
𝑛
Deviations
Anna’s Data
Squared Deviations from the
Mean
(Value – Mean)2
(𝑥 − 𝜇)2
Test
Value (𝑥)
Deviations from the Mean
Value – Mean
(𝑥 − 𝜇)
1
80
-5
25
2
99
14
196
3
73
-12
144
4
88
3
9
Mean 𝜇 = 85
Sum ∑(𝑥 − 𝜇) = 0
Sum ∑(𝑥 − 𝜇)2 = 374
Average of Squared ∑(𝑥 − 𝜇)2 374
=
= 93.5
Deviations
𝑛
4
Square Root of
∑(𝑥 − 𝜇)2
Average of Squared √
= √93.5 ≈ 9.7
𝑛
Deviations
Discussion Questions:
9)
Translate into words: ∑(𝑥 − 𝜇)2 . The sum of the squared deviations from the mean. Or from inside
parentheses out, subtract the mean from each value, square these differences and add them up.
10)
Interpret Anna’s standard deviation in context. Anna’s test grades typically vary by about ten points
from her average score of 85. So her test scores typically vary from about 75 to 95.
11)
Who is the best student? How do you know? Answers will vary. Will is the most consistent as his scores
vary by about 2.2 points from the mean of 85 points.
Graphing Calculator – Finding standard deviation
Refer to Mean vs. Median notes for more details and screen shots for finding 1Variable Statistics.
Step 1: Enter the following data into L1 in the calculator.
28, 48, 53, 25, 38, 23, 49, 32
Step 2: Press STAT
Step 3: Press the right arrow button so that CALC is highlighted and the CALC menu appears.
(1: 1-VAR STATS should already be highlighted)
Step 4: Press ENTER (or 1). 1-VAR STATS will appear on the Home screen
Step 5: Enter the list name, press 2ND and then 1, L1 should appear on the screen.
Step 6: Press ENTER. 1 VAR-STATS for L1 will appear. The down arrow at the bottom of the screen tells you
there is more information below.

Standard Deviation
o Notation: Sx (Sample)
o In this example, the Sample standard deviation is 11.759…
o Notations: σx (Population)
o We will be using Sx (Sample Standard Deviation) for
standard deviation. This is used more frequently because in
order to use σx (Population Standard Deviation) we need to
have the data for the entire population and that is not often
the case. Sample Deviation is the larger of the two values.
***Sx is the sample standard deviation (divided by n-1). σx is the population deviation (divided
by n).