Download Probability and the Normal Curve, continued

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Probability and the Normal Curve, continued
Statistics for Political Science
Levin and Fox Chapter 5
Part II
1
Let’s take another look at standard deviation
Standard deviation is a measure of variation, and this variability is reflected
in the sigma values (σ) in our distribution.
Our mean (µ) establishes a standardized “zero” and our sigma values (σ)
indicate the distance (or variation from µ) of our score from the µ.
2
Note how the mean equals zero. Also, see how one standard deviation
away from the mean is represented by µ + 1σ or µ - 1σ
(depending on the direction of the deviation).
3
Area Under the Curve
Normal Curve:
Under the normal curve, measures of standard deviation (or sigma units)
correspond to specific percentages.
µ + 1σ: 34.13 %
µ - 1σ: 34.13 % = 68.26%
µ + 2σ: 47.72%
µ - 2σ: 47.72% = 95.44%
Thus, the area under the normal curve
between the mean and the point 1σ
always includes 34.13% of the total
cases and area 1σ above and below the
mean includes 68.26% of cases.
µ + 3σ: 49.87%
µ - 3σ: 49.87% = 99.74%
4
The Area Under the Curve
+ 3σ : 49.87%
+ 2σ : 47.72%
5
Clarifying the Standard Deviation
IQ and Gender:
Research suggests that both men and women have a mean IQ of 100, but
that they differ in terms of variability around the mean.
Men:
Specifically, the male distribution has a larger percentage of extremes
scores, representing a small number of very bright and very dull
individuals on the tails (and thus a larger range).
Women:
The distribution of women, by contrast, has a larger percentage of scores
located closer to the average.
6
Clarifying the Standard Deviation
IQ and Gender: Measures of Variability
Here are the numbers:
Men:
Mean = 100
σ = 15
Women:
Mean = 100
σ = 10
7
Clarifying the Standard Deviation: Men
Men:
Mean = 100
σ = 15
σ=15 x3 =45
99.74%
IQ: 55
IQ: 145
IQ: 100
+1 σ =115
+2 σ =130
+3σ =145
Clarifying the Standard Deviation: Women
Women:
Mean = 100
σ = 10
99.74%
IQ: 70
IQ: 130
IQ: 100
+1 σ =110 +2 σ =120
+3σ =120
Clarifying the Standard Deviation: Men
Men:
Mean = 100
σ = 15
σ=15
σ=15
68.26%
IQ: 85
IQ: 100
IQ: 115
Clarifying the Standard Deviation: Women
Women:
Mean = 100
σ = 10
σ=10
σ=10
68.26%
IQ: 90
IQ: 100
IQ: 110
Standard Deviation: Using Table A
Standard Deviation: Using Table A
So far, when analyzing the normal distribution, we have looked at distances
from the mean that are exact multiples of the standard deviation (+1 σ, +2
σ, +3 σ or -1 σ, -2 σ, -3 σ). How do we determine the percentages of cases
under the normal curve that fall between two scores, say +1 σ, +2 σ for
example.
Example: σ=1.40
What is the percentage of scores that fall between the mean (µ) and
σ=1.40. Since σ=1.40 is greater than 1, but less than 2, we know it
includes more than 34.13% but less than 47.72%.
12
Standard Deviation: Using Table A
34.13%
47.72%
?%
σ= 1.0
σ= 1.4
σ= 2.0
Standard Deviation: Using Table A
Standard Deviation: Using Table A
To determine the exact percentage between the mean (µ) and σ=1.40, we
need to consult Table A in Appendix B.
Table A: Shows you the percent under the normal curve and:
Column A: The sigma distances are labeled z in the left-hand column
Column B: The percentage of the area under the normal curve between the
mean and the various sigma distances from the mean
Column C: The percentage of the area at or beyond various scores toward
either tail of the distribution.
14
Standard Deviation: Using Table A
Using Table A:
A
B
C
z
µ and z
beyond Z
1.40
41.92
8.08
34.13%
47.72%
41.92%
σ= 1.4
Z Score Computed by Formula
We obtain the z score by finding the deviation (X - µ), which gives the
distance of the raw score from the mean, then dividing this raw score
deviation by the standard deviation.
z
where
=
X-µ
σ
µ = Mean of a distribution
σ = standard deviation of a distribution
z = standard score
16
Z Scores
Z Scores:
The z score indicates the direction and degree that any given raw score
deviates from the mean in a distribution on a scale of sigma units.
z
=
X-µ
σ
17
Z Scores
So why do we use z scores?
Z scores allow us to translate any raw score, regardless of unit of measure,
into sigma units (standard deviation within a probability distribution)
which provide us with a standardized/normalized way to evaluation the
variation of raw scores from a standardized mean.
BUT, the sigma distance is specific to particular distributions. It changes
from one distribution to another.
For this reason, we must know the standard deviation of a distribution
before we are able to translate any particular raw score into units of
standard deviation.
18
Z Scores
Let’s Practice!
Suppose we are studying the distribution of hours per month that federal
employees volunteer for partisan interest groups.
The mean is 4 hours and the standard deviation is 1.21 hours . We want to
know how far 7 volunteer hours is from the mean.
The z score allows us to translate any raw score (X) into sigma units (or a
measure of standard deviation within a probability distribution).
19
Z Scores
Let’s look at the data that we have:
Z=?
µ = 4 hours
σ = 1.21 hours
X = 7 hours
NOTE: The raw score that we want to translate into a standardized score is
7 hours.
z
z
=
X-µ
σ
=
7-4
1.21
z
= 3.30579
20
Clarifying the Standard Deviation: Men
Volunteerism
Mean = 4 hours
σ = 1.21
σ=1.21
σ=1.21
68.26%
2.79
4 Hours
5.21
Clarifying the Standard Deviation: Men
Volunteerism
Mean = 4 hours
σ = 1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
95.44%
1.58
2.79
4 Hours
5.21
6.42
Clarifying the Standard Deviation: Men
Volunteerism
Mean = 4 hours
σ = 1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
99.74%
.37
1.58
2.79
4 Hours
5.21
6.42
7.63
Clarifying the Standard Deviation: Men
Volunteerism
Mean = 4 hours
σ = 1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
?%
.37
1.58
2.79
4 Hours
5.21
6.42
7.63
7 hrs
?
Table A:
A
B
C
z
µ and z
beyond Z
3.30
49.95
.05
σ =1.21
z = 3.30 or 49.95%
4 hrs
7 hrs
What did we do: We took a raw score (7 Hours) and turned it into a sigma score
(z = 3.30) in order to determine the percentage likelihood of volunteering between
the mean hour of 4 and 7 hours.
Table A:
A
B
C
z
µ and z
beyond Z
3.30
49.95
.05
σ =1.21
z = 3.30 or 49.95%
4 hrs
7 hrs
Z Scores
Another Example: Cashiers’ Pay
Suppose we are studying the distribution of pay for cashiers at a fast-food
restaurant.
The mean is $10 and the standard deviation is $1.5 . We want to know how
far $ 12 is from the mean.
27
Z Scores
Let’s look at the data that we have:
Z=?
µ = $ 10
σ = $ 1.5
X = $ 12
NOTE: The raw score that we want to translate into a standardized score is
$ 12.
z
z
=
X-µ
σ
=
12 - 10
1.5
z
= 1.33
28
σ = 1.5
34.13%
$ 10
$ 11.50
?%
$ 10
$ 12
A
B
C
z
µ and z
beyond Z
1.33
40.82
9.18
σ = 1.5
z = 1.33
40.82%
$ 10
$ 12
What did we do: We took a raw score ($12) and turned it into a sigma score (z =
1.33) in order to determine the percentage likelihood of making between the mean
hour of 10 and 12 dollars.
A
B
C
z
µ and z
beyond Z
1.33
40.82
9.18
σ = 1.5
z = 1.33
40.82%
$ 10
$ 12
Probability and the Normal Curve
We have covered finding probability and z scores, so let’s discuss finding probability
under the normal curve.
The normal curve can be used in
conjunction with z scores and
Table A to determine the
probability of obtaining any raw
score in a distribution.
Remember, the normal curve is
a probability distribution in
which the total area under the
curve equals 100% probability.
33
Probability and the Normal Curve
The central area around the mean is where the scores occur most frequently.
The extreme portions toward the end are where the extremely high and low
scores are located.
So, in probability terms, probability decreases as we travel along the baseline
away from the mean in either direction.
To say that 68.26% of the total frequency under the normal curve falls
between -1σ and +1σ from the mean is to say that the probability is
approximately 68 in 100 that any given raw score will fall in this interval.
34
Clarifying the Standard Deviation: Women
68.26%
Or
68 in 100
Probability and the Normal Curve
Example: Campaign Phone-Bank
We are asked to calculate the z-score for the number of calls campaign
volunteers made in a 3-hour shift.
The mean number of calls is 21 with a standard deviation of 1.45σ. What is
the probability that a volunteer will complete 25 or more calls during the
3 hour period?
Let’s apply the z-score formula.
36
Example: Phone Banking
z=?
µ = 21 calls
σ = 1.45 calls
X = 25 calls
z
X-µ
=
z
=
σ
25 - 21
1. 45
z
=
2.75
Goal: Turn the raw score (25 calls) into
sigma units (z) in order to determine the
likely percentage of volunteers who make
between 21 and 25 calls, or more than 25
calls.
Remember our equation.
Plug in our values and
scores.
We have our z score.
From our z score, we know that a raw score of 25 is located
2.75σ above the mean.
37
Probability and the Normal Curve
Our next step is to use Table A to find the percent of
the total frequency under the curve falling between
the z score and the mean.
So,
1.
2.
3.
4.
Let’s find our z score (2.75) in Column A.
Column B tells us that 49.70% of all volunteers should be able to
complete between 21 and 25 calls in 3 hours.
By moving the decimal two places to the left, we see that the
probability is 50 in 100 (rounding up).
Or P = .4970 that a volunteer will complete between 21 and 25 phone
calls.
38
A
B
C
z
µ and z
beyond Z
2.75
49.70
.30
σ =1.45
z = 2.75
49.70 or 50 in 100
50 in 100 or P =.4970
21
25
P of Calls: 21-25
P = .4970
50 in 100
50% Chance
σ =1.45
z = 2.75
49.70 or 50 in 100
P =.4970
21
.003
25
P of Calls: 17-25
P = .9940
99 in 100
σ =1.45
z = -2.75
z = 2.75
99.40 or 100 in 100
.003
P =.9940
17
21
.003
25
P of Calls: less 17, more 25
P = .006
.6 in 100
.6 % Chance
P of Calls: more than 25
P = .003
.3 in 100
.3 % Chance
z = -2.75
σ =1.45
z = 2.75
99.40 or 100 in 100
.003
P =.9940
17
21
.003
25
Review of Probability
Probability refers to the relative likelihood of occurrence of a particular
outcome or event.
The probability associated with an event is the number of times that event
can occur relative to the total number of times any event can occur.
We use a capital P to indicate probability.
Probability varies from 1 to 1.0 although percentages rather than decimals
may be used to express levels of probability.
43
The Probability Spectrum
A zero probability indicates that something is impossible.
Probabilities near zero (like .05 or .10) imply very unlikely occurrences.
A probability of 1.0 constitutes certainty.
High probabilities like .90, .95, or .99 signify very probable or likely outcomes.
44
Equation for Calculating Probability
Probability of an outcome
or event
=
Number of times the
outcome or event can occur
Total number of times any
outcome or event can
occur
45
Extra: Z Scores
How do we determine the percent of cases for distances lying between any
two score values?
Example: A raw score lies 1.55σ above the mean.
– Obviously our score falls between 1σ and 2σ.
– So we know that this distance would include more than 34.15% but
less than 47.72% of the total area under the normal curve.
46
Extra: Z Scores
To determine the exact percentage in this interval, we must use Table A in
Appendix B!
Column A: The sigma distances are labeled z in the left-hand column
Column B: The percentage of the area under the normal curve between the
mean and the various sigma distances from the mean
Column C: The percentage of the area at or beyond various scores toward
either tail of the distribution.
47
Extra: Z Scores
To determine the exact percentage in this interval, we must use Table A in
Appendix B!
Column A: The sigma distances are labeled z in the left-hand column
Column B: The percentage of the area under the normal curve between the
mean and the various sigma distances from the mean
Column C: The percentage of the area at or beyond various scores toward
either tail of the distribution.
48