Download Lecture Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Psych 5500/6500
Standard Deviations, Standard Scores, and
Areas Under the Normal Curve
Fall, 2008
1
Standard Deviation
The standard deviation is the square root of
the variance. It is a measure of variability,
which means the greater the standard
deviation of a group of scores, the more
the scores differed from each other.
2
Example
Sample A: Y = 6, 7, 8, 9
S=1.12
Sample B: Y = 5, 9, 15, 18 S=5.07
Sample B has more variability among its
scores, and this is reflected by it having a
larger standard deviation.
3
Caveat
You still need to look at the data, however, to
see if the standard deviation is a good
measure of variability (remember it can be
affected by one score far from the mean).
• Sample A: Y= 5, 7, 9, 11, 13
S=2.83
• Sample B: Y = 9, 9, 9, 9, 20
S=4.4
4
Evaluating the Size of the
Standard Deviation
So far we have looked at comparing the
standard deviations of two samples to see
which sample had greater variability. What
does knowing the variance and standard
deviation of one sample tell us?
5
Example
We have sampled some people from a
population and measured the weight of each
person in pounds.
S² = 400
S = 20
What does that tell us? Is that a large or small
amount of variability?
6
Variance and Standard Deviation Are Affected
by Scale
If we had measured the same people in ounces,
rather than pounds, then S²=102,400 and S=320
(compared to S² = 400 and S = 20 when measured
in pounds)
Even though the people’s weights didn’t change, the
use of a different scale greatly affected the
variance and standard score.
Conclusion: simply knowing the standard deviation
of a sample doesn’t tell us whether the scores
differed a lot without knowing about the
measurement scale being used.
7
Is Variability ‘Bad’?
The variability of your population is whatever it
is. Theoretically there is nothing ‘bad’ about
a variable having a large amount of
variability. Pragmatically, however, the larger
the variability the easier it is to get a
nonrepresentative sample, and thus the
harder it is to make firm conclusions about
the population from which you sampled.
This will be seen to influence the statistical
analyses we perform.
8
What the Standard Deviation Can Tell
You About Your Data
Under some circumstances, knowing the value
of the standard deviation can give you quite
useful and specific knowledge about your
data.
This occurs when your data are ‘normally
distributed’. A formal definition of ‘normally
distributed’ is given in the next slide, a less
exact definition is to say that a graph of
normally distributed data gives us a bellshaped curve.
9
The Normal Curve
1
(Y μ) 2 / 2 σ 2
f Y  
e
σ 2
Note that all of the elements in the formula for
computing Y are constants except for ‘σ’
and ‘μ’. Thus knowing the values of those
two elements completely determines the
curve.
10
Normal Curves
11
Why Focus on the Normal Curve?
Many statistical procedures are based upon
the assumption that our data will be
normally distributed. We will be examining
the reasonableness of this assumption, the
consequences of it not being true, and
what to do if it isn’t true, as the semester
progresses.
12
Assuming ‘Normality’
1. There is an abundance of empirical data
indicating that the distribution of scores is
often approximately normal.
2. It is possible to transform nonnormal data
into a nearly normal distribution.
3. The Central limit theorem leads us to
expect normal distributions in many cases.
13
Central Limit Theorem
If the scores of interest are the result of the
sum (or mean) of several independent
nonnormal measures, then the distribution of
the scores will approximate the normal
distribution. The greater the number of
measures that go into that score, the more
likely the score itself will be normally
distributed.
14
Population of Outcomes of
Rolling One Six-Sided Die
Each outcome (1 – 6) has an equal chance of occurring. This
rectangular shaped distribution is not ‘normally distributed’.
15
Population of Sums of Rolling
Two Six-Sided Dice
Note starting to be more normally distributed
16
Population of Means of Rolling
Two Six-Sided Dice
Of course, same shape as sum.
17
Standard Deviations and the
Normal Curve
For all normal curves:
(34.1 x 2) = 68.2% of the scores fall within 1 std dev of the mean.
(34.1+13.6) x 2 = 95.4% within 2 std devs of the mean.
(34.1+13.6+2.1) x 2 = 99.6% within 3 std devs of the mean.*
* We would get 99.74 if we hadn’t rounded 34.1 and 13.6 to one decimal place.
18
You sample from a population that is normally distributed and that
has a mean of 80 and a standard deviation of 5. What does that tell
you?
Approximately 68% of the scores fall between 75 and 85 (805).
Approximately 95% of the scores fall between 70 and 90 (8010).
Over 99% of the scores fall between 65 and 95 (8015).
19
When the data are normally
distributed
Approximately 68% of the scores fall within one
standard deviation of the mean.
Approximately 95% fall within two standard deviations
of the mean.
Over 99% fall within three standard deviations of the
mean.
The further the data are from being ‘normal’ the less
accurate these percentages are, but they often give
you at least some idea of the spread of the scores.
20
‘Eyeballing’ Standard Deviations
Now for something that will impress your
friends and make you the life of the party.
You should be able to look at a curve and
estimate its standard deviation.
21
A population is given below, to estimate its
standard deviation divide up the horizontal
axis into six equally wide areas (three on
each side of the mean).
22
It looks like the standard deviation must be
around 18 (any guess between 15 and 25
would not be too bad).
23
Segue to Standard Scores
Two normal curves, note that when the distribution widens out (i.e. has
greater variance), that the standard deviations spread out to match. In
both cases, for example, 34.1% of the scores fall between the mean and
24
one standard deviation above the mean.
Look at a score of 140 on the top curve, it is more than 2 standard
deviations above the mean, and very few scores fall above it. Now
look at a score of 140 on the bottom curve, it is a little more than 1
standard deviation above the mean, and while still impressive it is
not as unusually high (compared to the other scores) as in the top curve.
25
Standard Scores
Standard Scores: tell us how many standard
deviations above or below the mean a
particular ‘raw’ score falls. We will use ‘z’ to
stand for a standard score. During the
semester we will be looking at a variety of
apparently different formulas, but they all
have the same basic idea:
some score - mean of those scores
standard score 
standard deviation of those scores
26
z scores
Comparing a score to other scores in the sample
Y-Y
z
S
Comparing a score to other scores in the population
Y -μ
z
σ
27
Example 1
Say you want to find the standard score for a raw score of 90 in the
distribution above, which has a mean of μ=70 and a standard deviation
of σ=18. First estimate z by looking at the curve, a score of 90 is a
little more than 1 standard deviation above the mean. Now compute it:
z = (Y- μ)/ σ so z =(90-70)/18 = 1.11
28
Example 2
Say you want to find the standard score for a raw score of 35 in the
distribution above, which has a mean of μ=70 and a standard deviation
of σ=18. First estimate z by looking at the curve, a score of 35 is not
quite 2 standard deviations below the mean. Now compute it:
z = (Y- μ)/ σ so z = (35-70)/18 = -1.94
29
Standard Scores (cont.)
Standard scores are useful in that they tell us how a
score compares to other scores in its group. Look
again at the normal curve, a z score of –1 (one
standard deviation below the mean) is fairly low
compared to the other scores, but a z score of –3
30
would be extremely low.
Standard scores are ‘standard’ in that they allow us to
compare scores from different groups (i.e. compare
apples and oranges).
Say Timmy comes home with a ‘raw’ score of Y=120
on a math test, and a ‘raw’ score of Y=60 on an
English test. In which test did he do better? There
is not enough information.
But if we knew that his standard score in the math test
was z=0.8 we would know that his score was above
the mean but that quite a few students did better.
And if we knew that his standard score on the
English test was z=2.5 we would knew that he did
REALLY well on that test. So now we know he did
better in the English test than in the math test. 31
Finding Areas Under the Normal
Curve
To find what proportion of scores fall in certain
areas of the normal curve you can use either
the Normal Distribution Table provided in the
Critical Values Tables area of the Course
Materials page or the Normal Distribution
Tool in the Oakley Stat Tools link on the
Course Materials page. The tool is easier
but the table is what will be available in an
exam.
32
Example 1
Example 1: What proportion and percent of the scores fall
between the mean (z=0) and z=1.33? (i.e. 0 ≤ z ≤ 1.33). Look
up z=1.33 in the table, then go over to Column A to get your
answer.
33
Example 2
Example 2: What proportion and percent of the scores fall at or above z=1.33?
(i.e. 1.33 ≤ z).
Note: you could also have computed this by taking .5000-.4082 (see
previous slide)
34
Negative z values
Example 3: The table doesn't bother to give negative
values of z, as the curve is symmetrical, the area
(z ≤ -1.33) is the same as on the other side of the
graph (1.33 ≤ z). In other words, ignore the negative
sign and just look up the value of z on the chart.
35
Example 4
Example 4: Now that we are at this point, I will ask what
proportion and what percent of the curve falls between -2.27 ≤ z ≤
2.27 (i.e. falls within 2.27 standard deviations of the mean in
either direction).
Answer: .4884 + .4884 = .9768 or 97.68%
36
Example 5
Example 5: Now we will move to the following type of
question: You sample from a population that is normally
distributed, has a mean of 50 and a standard deviation of
16. Question: what proportion and percent of the scores
will fall within ±8 of the mean (i.e. 42 ≤ Y ≤ 58)?
Step 1: draw the curve, then shade in the area in question.
37
Example 5 (cont.)
Step 2: the only way to proceed is to change scores
into z scores so that you can use the table
Find the standard score for Y=58
Y - μ 58 - 50
z

 0.50
σ
16
Find the standard score for Y=42
Y - μ 42 - 50
z

 0.50
σ
16
38
Example 5 (cont.)
Now that we have changed the question from
42  Y  58 to –0.5  z  0.5 we can answer
the question.
Answer: .1915 + .1915 = .3830 or 38.30%
39