Download Chapter 4: z-scores and Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Probability amplitude wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Chapter 4: z-scores and Probability
**This chapter corresponds to chapter 8 (“Are Your Curves Normal?”) of your book.
What it is: z-scores (also called “standard scores”) are raw scores that have been adjusted for
the mean and standard deviation of the distribution from which the raw scores came. z-scores
are expressed in standard deviation units and represent the number of standard deviations
above or below the mean that a given raw score is (e.g., a z-score of 1.0 is one standard
deviation above the mean). We can also use our knowledge of the normal curve to assign a
probability to the occurrence of any given z-score. Because z-scores use a standardized metric
(i.e., standard deviation units), you can directly compare the magnitude and probability of zscores from different distributions of scores, even if those distributions have radically different
means and standard deviations.
When to use it: You should use z-scores when you have continuous data that you wish to
express in a standardized metric (i.e., standard deviation units) and/or when you wish to assign
a probability of occurrence to a given score (e.g., what is the probability of receiving a score one
standard deviation or more above the mean?). z-scores are also especially useful for comparing
the magnitude and/or probability of two raw scores drawn from distributions with different means
and/or standard deviations.
Questions asked by z-scores: How many standard deviation units above/below the mean is a
given raw score? What is the probability of a given raw score occurring? What are the relative
probabilities and/or magnitudes of two raw scores drawn from distributions with different means
and/or standard deviations?
Examples of research questions that would use z-scores:
o
o
If the average SAT-Math score is 500 with a standard deviation of 100, what is the
probability of receiving a score higher than 600?
If one person scores 25 on the Stressful Life Events Inventory (which has a mean of 20
and standard deviation of 5) and another person scores 110 on the Major Stressors
Questionnaire (which has a mean of 100 and standard deviation of 15), which person
has the higher score, making that person more likely to have acute anxiety attacks?
Using SPSS to Calculate z-scores: (dataset: Chapter 4 Example 1.sav)
Albert attends a very competitive prep school where all 40 students at this school take both the
SAT and ACT and then their scores are posted for everyone to see. Albert’s best subject is
English, so he is especially interested in how well he did on the SAT-Verbal and ACT-English
sections. When the scores are posted, he sees that he scored a 28 on the ACT-English and a
610 on the SAT-Verbal. Because the two tests use a different metric, Albert is curious about
how a score of 28 on the ACT compares to a score of 610 on the SAT. Albert also wonders how
well he did on each test in comparison to his classmates (i.e., what was the probability of any of
his classmates scoring higher than him on each test?).
Selection of the appropriate statistic(s)
Z-scores are the appropriate statistic because SAT and ACT scores represent continuous data,
Albert is interested in comparing the magnitude of his scores on two tests that use different
metrics (i.e., have very different means and standard deviations), and Albert wishes to assign a
probability to obtaining a higher score than his on each test
Computation of the statistic(s)
We will use SPSS to calculate z-scores for us. Open the dataset “Chapter 4 Example 1.sav”.
Take a moment to familiarize yourself with the data. Note how data for this type of analysis
should be entered.
1) Each participant (i.e., student at the prep school) has one row in the data.
2) One column is used to indicate each participant’s identification number, which is just a
number that is assigned to each participant in the study (this variable is “ID” in the
present example).
3) The second column indicates each participant’s score on the first variable (ACT-English
Scores) for which we wish to calculate z-scores.
4) The third column indicates each participant’s score on the second variable (SAT-Verbal
Score) for which we wish to calculate z-scores.
The data should look something like this in SPSS (note that Albert’s scores of 28 and 610 are
listed at the top; they don’t have to be listed at the top, but for the sake of this example it makes
it easier to keep track of Albert’s scores):
If you switch to variable view, you should see that the two variables (“acte” and “satv”) have
labels indicating that they represent “ACT-English Scores” and “SAT-Verbal Scores”,
respectively. If you did not like those labels, you could change the labels to whatever you want.
To calculate z-scores in SPSS, click on the “Analyze” drop-down menu, highlight “Descriptive
Statistics”, and then click “Descriptives…”, as pictured below.
The following pop-up window will appear:
Note that the variables are listed in the pop-up window by their labels, with their variable names
in parentheses (e.g. “ACT-English Scores [acte]”).
Highlight the variable(s) for which you wish to calculate z-scores (“ACT-English Scores [acte]”
and “SAT-Verbal Scores [satv]” in this example) and then click on the arrow to make the
variable(s) appear in the Variable(s): window, as pictured below.
Now check the box next to “Save standardized values as variables”. This is telling SPSS to
calculate z-scores for every raw score for each of the variables in the “Variable(s):” window, and
to save these z-scores as new variables in the “data view” spreadsheet. Your screen should
look like this:
Click “OK” and navigate to “data view” to find your new columns of z-scores. The “data view”
screen should now look like this:
SPSS has created two new variables (“Zacte” and “Zsatv”) that are the z-scores for “acte” and
“satv”, respectively. For instance, you can see that Albert’s z-score for the ACT is 1.14 and his
z-score for the SAT is 1.09 (rounded). Thus, Albert scored 1.14 standard deviation units above
the mean on the ACT and 1.09 standard deviations above the mean on the SAT. Therefore,
Albert scored very similarly on the SAT and ACT, although his score on the ACT is perhaps
slightly higher compared to the mean score.
Transforming z-scores into Raw Scores, and Vice-versa
Along with the z-scores listed in the “data view”, SPSS has also generated some descriptive
statistics for the raw scores of “acte” and “satv” in the output window. If you navigate to the
output window you’ll see the following table:
Descriptive Statistics
N
Minimum Maximum
Mean
Std. Deviation
ACT-English Scores
40
5.00
33.00
21.1000
6.05022
SAT-Verbal Scores
40
260.00
750.00
501.6250
99.73290
Valid N (listwise)
40
Sample sizes (n=40) for each variable. Minimum and maximum raw scores for each variable. Raw score means and SDs for each variable. The raw score means and standard deviations are of the most interest. Notice that the mean
raw score for the ACT-English was 21.1 (SD = 6.05) and the mean raw score for the SATVerbal was 501.63 (SD = 99.73). z-scores represent the number of standard deviation units
above/below the mean a given raw score is. You can use this information to transform z-scores
back to raw scores, and vice-versa. For instance, because the raw score mean and standard
deviation of the ACT were 21.1 and 6.05, respectively, it makes sense that Albert’s score of 28
(which is about 7 points higher than the mean) would be a bit more than one standard deviation
above the mean, corresponding to a z-score a little higher than 1.0. This is indeed the case, as
Albert’s ACT-English z-score is 1.14. Similarly, knowing that Albert’s z-score is slightly above
1.0, and knowing that the raw score mean and standard deviation are 21.1 and 6.05, one would
expect that Albert’s raw score on the ACT was a little higher than 27.
A more concrete and objective way to do this is using the following formula for transforming a zscore back into a raw score:
x = z (s) + x
x = 1.14(6.05) + 21.1
x = 6.90 + 21.1
x = 28
Similarly, the following formula is used for transforming a raw score into a z-score:
z=
x−x
s
z=
28 − 21.1
6.05
z=
6 .9
6.05
z = 1.14
Assigning a Probability to Scores
Albert also wished to know the probability that his classmates would score higher than him on
each test. For each z-score, there is an associated probability of achieving a higher z-score, and
these probabilities are listed in Table B.1 (Areas Under the Normal Curve) on pages 329-331 of
Salkind (2008). For instance, given the properties of the normal curve, what is the probability
that a student would score higher than a 28 on the ACT? A score of 28 corresponds to a z-score
of 1.14, so we go to Table B.1 to find the percentage of z-scores under the normal curve that fall
above 1.14.
We first find the z-score 1.14 in Table B.1, and note the “area between the mean and the zscore”. 37.29% of scores fall between the mean (which is always zero in a distribution of zscores) and 1.14. Because 50% of all scores always fall below the mean (zero) in a normal
curve, this means that 87.29% (50% + 37.29% = 87.29%) of all scores fall below a z-score of
1.14. This means that only 12.71% (100% - 87.29% = 12.71%) of scores fall above a z-score of
1.14. Therefore, the probability of one of Albert’s classmates scoring higher than him on the
ACT-English is only .1271 (or 12.71%). The same process can be used to determine the
probability that one of Albert’s classmates will score higher than him on the SAT-Verbal (this
probability is 13.79%).
Interpretation of the Findings
Although the two tests use a different metric, z-scores tell us that Albert scored approximately
equally well on both the SAT (z = 1.09) and the ACT (z = 1.14). These scores are both well
above average (i.e., more than a standard deviation unit above the mean scores). Thus, the
probability that one of his classmates will score higher than Albert is not very high (12.71% and
13.79% for the ACT and SAT, respectively).
Practice Problem #1 for SPSS (answer in Appendix)
Following are the scores for 10 persons on the Stressful Life Events Inventory (SLEI) and the
scores for 10 other persons on the Major Stressors Questionnaire (MSQ). For both surveys,
higher scores mean that the person has experienced more stressors recently, suggesting a
greater risk for stress-related symptoms such as acute anxiety attacks.
Participant
1
2
3
4
5
6
7
8
9
10
SLEI
25
20
14
32
29
26
17
12
28
26
Participant
11
12
13
14
15
16
17
18
19
20
MSQ
110
90
83
115
105
100
77
83
118
120
A. Calculate z-scores for each of the SLEI and MSQ raw scores above.
B. Participants 1 and 11 above received SLEI and MSQ scores of 25 and 110, respectively.
Which person’s score is higher compared to the mean SLEI and MSQ scores?
C. What is the probability that someone will score higher than a 25 on the SLEI? What is the
probability that someone will score higher than a 110 on the MSQ? What is the probability that
someone will score lower than those scores on each test?
D. What is the probability that someone will score higher on the SLEI than Participant 7 (who
scored a 17)? What is the probability that someone will score lower than Participant 7?
E. What do you conclude about whether Participant 1 or 11 is at the greatest risk for acute
anxiety attacks?
Practice Problem #2 for Hand Calculation (answer in Appendix)
Below are the number of books owned by five different persons. The standard deviation of
number of books owned is 32.43.
Participant ID Books Owned
1
33
2
95
3
12
4
53
5
72
z=
x−x
s
Z
x = z (s) + x
A. Calculate z-scores for number of books owned for each of the five persons above.
B. Based on the mean and standard deviation of number of books owned for the five persons
above, what is the number of books a person would own if they had a z-score of 1.0? What
about a z-score of 2.32? How about a z-score of -1.53?
C. What is the probability that someone would own more books than participant 1? Answer the
same question for participants 2, 3, 4, and 5.
D. Based on properties of the normal curve, what percentage of persons own more than 12
books, but less than 95 books?
Practice Problem #3 for Hand Calculation and SPSS (answer in Appendix)
Calculate the z-scores for the following two sets of scores. Relative to the mean in each set, did
participant 1 score higher in the first set or the second set? What is the probability that someone
will score higher than participant 1 on the first set? The second set?
Participant ID Set 1 Set 2
1
10
1000
2
12
1100
3
19
1257
4
22
1555
5
4
872
6
18
1288
7
27
999
8
22
1442
9
12
1200
10
18
1301