Download Normal distributions in SPSS/PASW: Probabilities - BYU

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Normal distributions in SPSS
Bro. David E. Brown, BYU–Idaho Department of Mathematics
February 2, 2012
1
Calculating probabilities and percents from measurements: The
CDF.NORMAL command
1. Go to the Variable View and create a variable by typing a name for it under the Name heading.
2. Make any desired adjustments to your variable’s properties. Put 2, 3, or 4 in the Decimals box, to tell
SPSS to round your probability or percentile to 2, 3, or 4 decimal places, respectively.1
3. Return to the Data View.
4. Put any number at all in first row of your variable and press the Enter key.
5. In the Transform menu, select Compute Variable.... The Compute Variable dialog will appear.
6. Put the name of your variable in the Target Variable: box.
7. You have options. Please DRAW THE PICTURE, as we do in class, to help you sort out the options
and understand them:
(a) If you need a percentile or left-tailed probability:
i. Click CDF & Noncentral CDF in the Function group box.
ii. In the Functions and Special Variables box, double-click Cdf.Normal. The expression
CDF.NORMAL(?,?,?) will appear in the Numeric Expression box.
iii. Replace the first ? with the given measurement. Replace the second ? with the mean
of your normal distribution and the third ? with the standard deviation of your normal
distribution. IF YOU’RE WORKING WITH Z-SCORES, that is, with the standard
normal distribution, use 0 for the mean and 1 for the standard deviation.
Example: Say you need the probability of getting a measurement less than 97.5, when the
mean is µ = 98.6 and the standard deviation is σ = 0.62. You’ll have CDF.NORMAL(97.5,
98.6, 0.62), in the Numeric Expression box, and the probability SPSS gives you—which
is left-tailed—is 0.0380.
iv. Go to Step 8, below.
(b) If you need a right-tailed probability or the “top ‘so many’ percent,” you need to subtract the
corresponding left-tailed probability from 1. Here’s how:
i. Put the number 1 in the Numeric Expression box, and type a hyphen to tell SPSS to subtract
(or click the subtraction button in the keypad on the screen).
ii. Click CDF & Noncentral CDF in the Function group box.
iii. In the Functions and Special Variables box, double-click Cdf.Normal. The expression
CDF.NORMAL(?,?,?) will appear in the Numeric Expression box.
1 It’s
traditional among nerds to use 4 decimal places. This may vary from discipline to discipline.
1
iv. Replace the first ? with the given measurement. Replace the second ? with the mean
of your normal distribution and the third ? with the standard deviation of your normal
distribution. IF YOU’RE WORKING WITH Z-SCORES, that is, with the standard
normal distribution, use 0 for the mean and 1 for the standard deviation.
Example: Say you need the probability of getting a measurement greater than 112, when
the mean is µ = 100 and the standard deviation is σ = 15. You’ll have 1-CDF.NORMAL(112,
100, 15) in the Numeric Expression box, and the probability SPSS gives you—which is
right-tailed—is 0.2119.
v. Go to Step 8, below.
(c) If you need a two-tailed probability:
i. Determine the measurement or z-score that delineates the left tail and use Step 7a to get the
area of that tail (which is the left-tailed probability).
ii. Determine the measurement or z-score that delineates the right tail and use Step 7b to get
the are of the right tail (which is the right-tailed probability).
iii. Add your left-tailed probability to the right-tailed probability you have just gotten. You’re
done.
Example: Say you need the probability of getting a z-score either less than −2.00 or greater
than 2.00. Then the mean is µ = 0 and the standard deviation is σ = 1, because z-scores
obey the standard normal distribution. Your left tail is delineated by −2.00 and your right
tail by 2.00. You can2 use CDF.NORMAL(-2.00, 0, 1) to get a left-tailed area of 0.0228,
then use 1-CDF.NORMAL(2.00, 0, 1) to get a right-tailed area of 0.0228, then add to get
0.0228 + 0.0228 = 0.0456 as your two-tailed area.
(d) If you need a probability or percentage for values trapped between two numbers, you’ll have to
(a) calculate the left-tailed probability corresponding to the higher number, (b) calculate the lefttailed probability corresponding to the lower number, and (c) subtract. Example: Say you need
the probability that a z-score will be between −0.5 and 1.37. You’ll need to take the left-tailed
area corresponding to z = 1.37 and subtract from it the left-tailed area corresponding to z = −0.5.
Here’s how:
i. Click CDF & Noncentral CDF in the Function group box.
ii. In the Functions and Special Variables box, double-click Cdf.Normal. The expression
CDF.NORMAL(?,?,?) will appear in the Numeric Expression box.
iii. Replace the first ? with the highest given measurement (for example, the 1.37). Replace
the second ? with the mean of your distribution and the third ? with the standard deviation
of your distribution. IF YOU’RE WORKING WITH Z-SCORES, that is, with the
standard normal distribution, use 0 for the mean and 1 for the standard deviation.
iv. Type a hyphen to the right of the CDF.NORMAL() expression you already have to tell SPSS
to subtract. (You can click the subtraction button in the keypad on the screen, instead of
typing a hyphen.)
v. In the Functions and Special Variables box, double-click Cdf.Normal. A second copy of
the expression CDF.NORMAL(?,?,?) will appear in the Numeric Expression box.
vi. Replace the first ? in this second copy with the lowest given measurement (−0.5, in our
example). Replace the second ? with the mean of your normal distribution and the third
? with the standard deviation of your normal distribution. IF YOU’RE WORKING
WITH Z-SCORES, that is, with the standard normal distribution, use 0 for the mean and
1 for the standard deviation.
At this point, you should have CDF.NORMAL(#,#,#) - CDF.NORMAL(#, #, #) in the Numeric
Expression box, except instead of #’s, you’ll have numbers.
Example: If you need the area between z = −0.5 and z = 1.37, you’ll have
CDF.NORMAL(1.37, 0, 1)-CDF.NORMAL(-0.5, 0, 1) in the Numeric Expression box.
2 You can, but you don’t have to. In this example, the two tails are symmetric, so you could use SPSS to find the left-tailed
area and double it, instead.
Page 2
vii. Go to Step 8, below.
8. Click OK. The Change existing variable? dialog will appear.3
9. Click OK. The Output window may or may not appear. Either way, go to the Data View. The percentile
you seek is in the first row of your variable’s column. It is expressed as a decimal number, so
if you want a percentage, be sure to convert correctly. In our example, SPSS gives us 0.1902,
which is 19.02%.
2
Getting measurements or percentiles from probabilities: The
IDF.NORMAL command
1. Go to the Variable View and create a variable by typing a name for it under the Name heading.
2. Make any desired adjustments to your variable’s properties. For Decimals, use what makes sense.
Examples: (1) If your measurements are counts, they have to be whole numbers. So put 0 in the
Decimals box. (2) If your measurements are dollar amounts, you could put 2 in the Decimals box, to
round to the nearest penny. Or, if you prefer, you could put 0 in the Decimals box, to round to the
nearest dollar.
3. Return to the Data View.
4. Put any number at all in first row of your variable.
5. In the Transform menu, select Compute Variable.... The Compute Variable dialog will appear.
6. Put the name of your variable in the Target Variable: box.
7. You have options. Please DRAW THE PICTURE, as we do in class, to help you sort out the options
and understand them:
(a) If you need a percentile or have a left-tailed probability:
i. Click Inverse DF in the Function group box.
ii. In the Functions and Special Variables box, double-click Idf.Normal. The expression
IDF.NORMAL(?,?,?) will appear in the Numeric Expression box.
iii. Replace the first ? with the given percentile or left-tail probability, expressed as a decimal.
Replace the second ? with the mean of your normal distribution and the third ? with the
standard deviation of your normal distribution. IF YOU’RE WORKING WITH ZSCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for the
standard deviation.
Example: Say you need the 35th percentile, and the mean is µ = 100 and the standard
deviation is σ = 15. You’ll have IDF.NORMAL(0.30, 100, 15) in the Numeric Expression
box.
iv. Go to Step 8, below. In our example, SPSS will tell you that x = 94.2 is the 35th percentile.
(b) If you have a right-tailed probability or a “top ‘so many’ percent,” you’ll have to subtract it from 1
to convert it to a left-tailed probability or percentile and then calculate the measurement. Here’s
how:
i. Click Inverse DF in the Function group box.
ii. In the Functions and Special Variables box, double-click Idf.Normal. The expression
IDF.NORMAL(?,?,?) will appear in the Numeric Expression box.
3 More
than one student has told me that the Change existing variable? dialog does not appear. So we take a look at it
together, and every single time, the Change existing variable? dialog appears. Maybe this means we all need to pay closer
attention to what we’re doing and to how the computer responds.
Page 3
iii. Replace the first ? with “1−” followed by the given right-tailed area.4 Replace the second
? with the mean of your normal distribution and the third ? with the standard deviation of
your normal distribution. IF YOU’RE WORKING WITH Z-SCORES, that is, with
the standard normal distribution, use 0 for the mean and 1 for the standard deviation.
Example: Say you need the measurement that delineates the top 5% of measurements when
µ = 98.2 and σ = 0.62. You’ll have IDF.NORMAL(1-0.05, 98.2, 0.62) in the Numeric
Expression box.
iv. Go to Step 8, below. In our example, SPSS will tell you that x = 99.2 is the measurement
that delineates the top 5% of measurements.
(c) If you have a two-tailed probability:
i. Determine how much area is in the left tail and use Step 7a to get the measurement corresponding to your left-tailed area
ii. Determine how much area is in the right tail and use Step 7b to get the measurement corresponding to your right-tailed area. You’re done.
Example: Say you need the z-scores that delineate the lowest 1% and the highest 1%.
The area in the left tail is 0.01, and so is the area in the right tail. For the left tail,
you can use IDF.NORMAL(0.01, 0, 1), and get z = −2.33. For the right tail, you can5
IDF.NORMAL(1-0.01, 0, 1), and get z = 2.33.
(d) If you have a probability or percentage that’s trapped between two unknown values, you’ll have
to first calculate the left-tailed area (or probability or percentile) corresponding to its lower
boundary, second find the z-score or measurement corresponding to this left-tailed area, and
then repeat for the upper boundary of your percentage or probability. Here’s how:
i. Calculate the left-tailed area corresponding to the lower boundary of your probability or
percentage. Example: Suppose you need the range of measurements that correspond to the
middle 90% of some normal distribution. That means 10% will be outside the desired range.
Since it’s the “middle” 90% you need, the left-tail area must be the same as the right-tail
area. So divide the remaining 10% in half, to get 5% in each tail. So, the lower boundary of
your range is 5% (or, 0.05), because the left tail has an area of 5%.
ii. Click Inverse DF in the Function group box.
iii. In the Functions and Special Variables box, double-click Idf.Normal. The expression
IDF.NORMAL(?,?,?) will appear in the Numeric Expression box.
iv. Replace the first ? with the left-tail area of interest (0.05, in our example). Replace the
second ? with the mean of your normal distribution and the third ? with the standard
deviation of your normal distribution. IF YOU’RE WORKING WITH Z-SCORES,
that is, with the standard normal distribution, use 0 for the mean and 1 for the standard
deviation.
For our example, suppose the mean is µ = 100 and the standard deviation is σ = 15. Then
you’ll have IDF.NORMAL(0.05, 100, 15) in the Numeric Expression box.
v. Do Steps 8 and 9, below, and come back to this point to get the measurement corresponding the upper boundary. In the current example, SPSS will give us x = 75.33 as the
measurement that delineates the left 5% tail.
vi. Calculate the left-tailed area corresponding to the upper boundary of the given probability or
percentage. Example, continued: We saw above that taking out the middle 90% of a normal
distribution leaves a tail of area 5% on the left. So the left tail area corresponding to the
upper boundary is “left tail area” + “middle 90%” = 5% + 90% = 95%, or 0.95.
4 Why does the “1−” go inside the parentheses for IDF.NORMAL but outside the parentheses for CDF.NORMAL? Because on the
one hand, CDF.NORMAL is an area; subtracting 1−CDF.NORMAL is subtracting areas (the overall area of 1 minus the area that
CDF.NORMAL gives you). On the other hand, IDF.NORMAL is not an area. In fact, the first ? is supposed to be a left-tailed area.
You can’t just replace it with a right-tailed area. You subtract your left-tailed area from 1 inside IDF.NORMAL to make sure
IDF.NORMAL has a left-tailed area to work with.
5 You can, but you don’t have to. You could use the fact that the distribution is symmetric to conclude that the right 1%
tail is delineated by z = 2.33.
Page 4
vii. Go back to the Compute variable... dialog in the Transform menu. The expression
IDF.NORMAL(?,?,?) should already be in the Numeric Expression box, from when you
went through this procedure for the lower boundary. (If not, do Steps 7(d)ii and 7(d)iii and
continue with Step 7(d)viii.)
viii. Put in the IDF.NORMAL command the left-tailed area for the upper boundary (0.95, in our
example). Make sure the next number inside IDF.NORMAL is the mean and the third number
the standard deviation of your normal distribution. IF YOU’RE WORKING WITH
Z-SCORES, that is, with the standard normal distribution, use 0 for the mean and 1 for
the standard deviation.
So we’ll have IDF.NORMAL(0.05, 100, 15) in the Numeric Expression box.
ix. Do Steps 8 and 9, below.
In our example, SPSS will give us x = 124.67 as the measurement that delineates the top 5%
of the measurements.
8. Click OK. The Change existing variable? dialog appears.
9. Click OK. The Output window may or may not appear. Either way, the measurement value you seek is
in the first row of your variable’s column. It is expressed in the same units of measure as other values
of your variable. Example: If your data are measurements of time, in years, then the value SPSS has
just given you is also time, in years.
Note: If you are finding the values between which a given probability is trapped, and you have only
found the measurement corresponding to one of the boundaries, go back to Step 7(d)vi and continue
from there.
As always, if you have questions, please ask them!
Page 5