Download Probability and the Normal Curve, conPnued

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

Probability amplitude wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Probability
and
the
Normal
Curve,
con6nued
Sta$s$cs
for
Poli$cal
Science
Levin
and
Fox
Chapter
5
Part
II
1
Let’s
take
another
look
at
standard
devia6on
Standard
devia$on
is
a
measure
of
varia6on,
and
this
variability
is
reflected
in
the
sigma
values
(σ)
in
our
distribu$on.
Our
mean
(µ)
establishes
a
standardized
“zero”
and
our
sigma
values
(σ)
indicate
the
distance
(or
varia6on
from
µ)
of
our
score
from
the
µ.
2
Note how the mean equals zero. Also, see how one standard deviation
away from the mean is represented by µ + 1σ or µ - 1σ
(depending on the direction of the deviation).
3
Area
Under
the
Curve
Normal
Curve:
Under
the
normal
curve,
measures
of
standard
devia$on
(or
sigma
units)
correspond
to
specific
percentages.
µ + 1σ: 34.13 %
µ - 1σ: 34.13 % = 68.26%
µ + 2σ: 47.72%
µ - 2σ: 47.72% = 95.44%
µ + 3σ: 49.87%
µ - 3σ: 49.87% = 99.74%
Thus, the area under the normal curve
between the mean and the point 1σ
always includes 34.13% of the total
cases and area 1σ above and below the
mean includes 68.26% of cases.
4
The
Area
Under
the
Curve
+
3σ
:
49.87%
+
2σ
:
47.72%
5
Clarifying
the
Standard
Devia6on
IQ
and
Gender:
Research
suggests
that
both
men
and
women
have
a
mean
IQ
of
100,
but
that
they
differ
in
terms
of
variability
around
the
mean.
Men:
Specifically,
the
male
distribu$on
has
a
larger
percentage
of
extremes
scores,
represen$ng
a
small
number
of
very
bright
and
very
dull
individuals
on
the
tails
(and
thus
a
larger
range).
Women:
The
distribu$on
of
women,
by
contrast,
has
a
larger
percentage
of
scores
located
closer
to
the
average.
6
Clarifying
the
Standard
Devia6on
IQ
and
Gender:
Measures
of
Variability
Here
are
the
numbers:
Men:
Mean
=
100
σ
=
15
Women:
Mean
=
100
σ
=
10
7
Clarifying the Standard Deviation: Men
Men:
Mean = 100
σ = 15
σ=15
x3
=45
99.74%
IQ:
55
IQ:
145
IQ:
100
+1
σ
=115
+2
σ
=130
+3σ
=145
Clarifying the Standard Deviation: Women
Women:
Mean = 100
σ = 10
99.74%
IQ:
70
IQ:
130
IQ:
100
+1
σ
=110
+2
σ
=120
+3σ
=120
Clarifying the Standard Deviation: Men
Men:
Mean = 100
σ = 15
σ=15
σ=15
68.26%
IQ:
85
IQ:
100
IQ:
115
Clarifying the Standard Deviation: Women
Women:
Mean = 100
σ = 10
σ=10
σ=10
68.26%
IQ:
90
IQ:
100
IQ:
110
Standard
Devia6on:
Using
Table
A
Standard
Devia6on:
Using
Table
A
So
far,
when
analyzing
the
normal
distribu$on,
we
have
looked
at
distances
from
the
mean
that
are
exact
mul$ples
of
the
standard
devia$on
(+1
σ,
+2
σ,
+3
σ
or
‐1
σ,
‐2
σ,
‐3
σ).
How
do
we
determine
the
percentages
of
cases
under
the
normal
curve
that
fall
between
two
scores,
say
+1
σ,
+2
σ
for
example.
Example: σ=1.40
What is the percentage of scores that fall between the mean (µ) and
σ=1.40. Since σ=1.40 is greater than 1, but less than 2, we know it
includes more than 34.13% but less than 47.72%.
12
Standard Deviation: Using Table A
34.13%
47.72%
?%
σ=
1.0
σ=
1.4
σ=
2.0
Standard
Devia6on:
Using
Table
A
Standard
Devia6on:
Using
Table
A
To
determine
the
exact
percentage
between
the
mean
(µ)
and
σ=1.40,
we
need
to
consult
Table
A
in
Appendix
B.
Table
A:
Shows
you
the
percent
under
the
normal
curve
and:
Column
A:
The
sigma
distances
are
labeled
z
in
the
leX‐hand
column
Column
B:
The
percentage
of
the
area
under
the
normal
curve
between
the
mean
and
the
various
sigma
distances
from
the
mean
Column
C:
The
percentage
of
the
area
at
or
beyond
various
scores
toward
either
tail
of
the
distribu$on.
14
Standard Deviation: Using Table A
Using Table A:
A
B
C
z
µ
and
z
beyond Z
1.40
41.92
8.08
34.13%
47.72%
41.92%
σ=
1.4
Z
Score
Computed
by
Formula
We obtain the z score by finding the deviation (X - µ), which gives the
distance of the raw score from the mean, then dividing this raw score
deviation by the standard deviation.
z
=
X-µ
σ
where
µ
=
Mean
of
a
distribu$on
σ
=
standard
devia$on
of
a
distribu$on
z
=
standard
score
16
Z
Scores
Z
Scores:
The
z
score
indicates
the
direc$on
and
degree
that
any
given
raw
score
deviates
from
the
mean
in
a
distribu$on
on
a
scale
of
sigma
units.
z
=
X-µ
σ
17
Z
Scores
So
why
do
we
use
z
scores?
Z
scores
allow
us
to
translate
any
raw
score,
regardless
of
unit
of
measure,
into
sigma
units
(standard
devia6on
within
a
probability
distribu6on)
which
provide
us
with
a
standardized/normalized
way
to
evalua$on
the
varia$on
of
raw
scores
from
a
standardized
mean.
BUT,
the
sigma
distance
is
specific
to
par$cular
distribu$ons.
It
changes
from
one
distribu6on
to
another.
For
this
reason,
we
must
know
the
standard
devia8on
of
a
distribu$on
before
we
are
able
to
translate
any
par$cular
raw
score
into
units
of
standard
devia$on.
18
Z
Scores
Let’s
Prac6ce!
Suppose
we
are
studying
the
distribu$on
of
hours
per
month
that
federal
employees
volunteer
for
par6san
interest
groups.
The
mean
is
4
hours
and
the
standard
devia6on
is
1.21
hours
.
We
want
to
know
how
far
7
volunteer
hours
is
from
the
mean.
The
z
score
allows
us
to
translate
any
raw
score
(X)
into
sigma
units
(or
a
measure
of
standard
devia$on
within
a
probability
distribu$on).
19
Z
Scores
Let’s
look
at
the
data
that
we
have:
Z
=
?
µ
=
4
hours
σ
=
1.21
hours
X
=
7
hours
NOTE:
The
raw
score
that
we
want
to
translate
into
a
standardized
score
is
7
hours.
z
z
=
X-µ
σ
=
7-4
1.21
z
= 3.30579
20
Clarifying the Standard Deviation: Men
Volunteerism
Mean = 4 hours
σ = 1.21
σ=1.21
σ=1.21
68.26%
2.79
4
Hours
5.21
Clarifying the Standard Deviation: Men
Volunteerism
Mean = 4 hours
σ = 1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
95.44%
1.58
2.79
4
Hours
5.21
6.42
Clarifying the Standard Deviation: Men
Volunteerism
Mean = 4 hours
σ = 1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
99.74%
.37
1.58
2.79
4
Hours
5.21
6.42
7.63
Clarifying the Standard Deviation: Men
Volunteerism
Mean = 4 hours
σ = 1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
σ=1.21
?%
.37
1.58
2.79
4
Hours
5.21
6.42
7.63
7
hrs
?
Table A:
A
B
C
z
µ
and
z
beyond Z
3.30
49.95
.05
σ
=1.21
z
=
3.30
or
49.95%
4
hrs
7
hrs
What did we do: We took a raw score (7 Hours) and turned it into a sigma score
(z = 3.30) in order to determine the percentage likelihood of volunteering between
the mean hour of 4 and 7 hours.
Table A:
A
B
C
z
µ
and
z
beyond Z
3.30
49.95
.05
σ
=1.21
z
=
3.30
or
49.95%
4
hrs
7
hrs
Z
Scores
Another
Example:
Cashiers’
Pay
Suppose
we
are
studying
the
distribu$on
of
pay
for
cashiers
at
a
fast‐food
restaurant.
The
mean
is
$10
and
the
standard
devia6on
is
$1.5
.
We
want
to
know
how
far
$
12
is
from
the
mean.
27
Z
Scores
Let’s
look
at
the
data
that
we
have:
Z
=
?
µ
=
$
10
σ
=
$
1.5
X
=
$
12
NOTE:
The
raw
score
that
we
want
to
translate
into
a
standardized
score
is
$
12.
z
z
=
X-µ
σ
=
12 - 10
1.5
z
= 1.33
28
σ
=
1.5
34.13%
$
10
$
11.50
?%
$
10
$
12
A
B
C
z
µ
and
z
beyond Z
1.33
40.82
9.18
σ
=
1.5
z
=
1.33
40.82%
$
10
$
12
What did we do: We took a raw score ($12) and turned it into a sigma score (z =
1.33) in order to determine the percentage likelihood of making between the mean
hour of 10 and 12 dollars.
A
B
C
z
µ
and
z
beyond Z
1.33
40.82
9.18
σ
=
1.5
z
=
1.33
40.82%
$
10
$
12
Probability and the Normal Curve
We
have
covered
finding
probability
and
z
scores,
so
let’s
discuss
finding
probability
under
the
normal
curve.
The
normal
curve
can
be
used
in
conjunc$on
with
z
scores
and
Table
A
to
determine
the
probability
of
obtaining
any
raw
score
in
a
distribu$on.
Remember,
the
normal
curve
is
a
probability
distribu$on
in
which
the
total
area
under
the
curve
equals
100%
probability.
33
Probability and the Normal Curve
The
central
area
around
the
mean
is
where
the
scores
occur
most
frequently.
The
extreme
por$ons
toward
the
end
are
where
the
extremely
high
and
low
scores
are
located.
So,
in
probability
terms,
probability
decreases
as
we
travel
along
the
baseline
away
from
the
mean
in
either
direc6on.
To
say
that
68.26%
of
the
total
frequency
under
the
normal
curve
falls
between
‐1σ
and
+1σ
from
the
mean
is
to
say
that
the
probability
is
approximately
68
in
100
that
any
given
raw
score
will
fall
in
this
interval.
34
Clarifying the Standard Deviation: Women
68.26%
Or
68
in
100
Probability and the Normal Curve
Example:
Campaign
Phone‐Bank
We
are
asked
to
calculate
the
z‐score
for
the
number
of
calls
campaign
volunteers
made
in
a
3‐hour
shik.
The
mean
number
of
calls
is
21
with
a
standard
devia$on
of
1.45σ.
What
is
the
probability
that
a
volunteer
will
complete
25
or
more
calls
during
the
3
hour
period?
Let’s
apply
the
z‐score
formula.
36
Example: Phone Banking
z=?
µ = 21 calls
σ = 1.45 calls
X = 25 calls
z
=
z
z
=
X-µ
σ
25 - 21
1. 45
=
2.75
Goal: Turn the raw score (25 calls) into
sigma units (z) in order to determine the
likely percentage of volunteers who make
between 21 and 25 calls, or more than 25
calls.
Remember our equation.
Plug in our values and
scores.
We have our z score.
From our z score, we know that a raw score of 25 is located
2.75σ above the mean.
37
Probability and the Normal Curve
Our
next
step
is
to
use
Table
A
to
find
the
percent
of
the
total
frequency
under
the
curve
falling
between
the
z
score
and
the
mean.
So,
1. 
Let’s
find
our
z
score
(2.75)
in
Column
A.
2.  Column
B
tells
us
that
49.70%
of
all
volunteers
should
be
able
to
complete
between
21
and
25
calls
in
3
hours.
3. 
By
moving
the
decimal
two
places
to
the
leX,
we
see
that
the
probability
is
50
in
100
(rounding
up).
4. 
Or
P
=
.4970
that
a
volunteer
will
complete
between
21
and
25
phone
calls.
38
A
B
C
z
µ
and
z
beyond Z
2.75
49.70
.30
σ
=1.45
z
=
2.75
49.70
or
50
in
100
50
in
100
or
P
=.4970
21
25
P of Calls: 21-25
P = .4970
50 in 100
50% Chance
σ
=1.45
z
=
2.75
49.70
or
50
in
100
P
=.4970
21
.003
25
P of Calls: 17-25
P = .9940
99 in 100
σ
=1.45
z
=
‐2.75
z
=
2.75
99.40
or
100
in
100
.003
P
=.9940
17
21
.003
25
P of Calls: less 17, more 25
P = .006
.6 in 100
.6 % Chance
P of Calls: more than 25
P = .003
.3 in 100
.3 % Chance
z
=
‐2.75
σ
=1.45
z
=
2.75
99.40
or
100
in
100
.003
P
=.9940
17
21
.003
25
Review
of
Probability
Probability
refers
to
the
rela$ve
likelihood
of
occurrence
of
a
par$cular
outcome
or
event.
The
probability
associated
with
an
event
is
the
number
of
$mes
that
event
can
occur
rela$ve
to
the
total
number
of
$mes
any
event
can
occur.
We
use
a
capital
P
to
indicate
probability.
Probability
varies
from
1
to
1.0
although
percentages
rather
than
decimals
may
be
used
to
express
levels
of
probability.
43
The
Probability
Spectrum
A
zero
probability
indicates
that
something
is
impossible.
Probabili$es
near
zero
(like
.05
or
.10)
imply
very
unlikely
occurrences.
A
probability
of
1.0
cons$tutes
certainty.
High
probabili$es
like
.90,
.95,
or
.99
signify
very
probable
or
likely
outcomes.
44
Equa6on
for
Calcula6ng
Probability
Probability of an outcome
or event
=
Number of times the
outcome or event can occur
Total number of times any
outcome or event can
occur
45
Extra:
Z
Scores
How
do
we
determine
the
percent
of
cases
for
distances
lying
between
any
two
score
values?
Example:
A
raw
score
lies
1.55σ
above
the
mean.
–  Obviously
our
score
falls
between
1σ
and
2σ.
–  So
we
know
that
this
distance
would
include
more
than
34.15%
but
less
than
47.72%
of
the
total
area
under
the
normal
curve.
46
Extra:
Z
Scores
To
determine
the
exact
percentage
in
this
interval,
we
must
use
Table
A
in
Appendix
B!
Column
A:
The
sigma
distances
are
labeled
z
in
the
leX‐hand
column
Column
B:
The
percentage
of
the
area
under
the
normal
curve
between
the
mean
and
the
various
sigma
distances
from
the
mean
Column
C:
The
percentage
of
the
area
at
or
beyond
various
scores
toward
either
tail
of
the
distribu$on.
47
Extra:
Z
Scores
To
determine
the
exact
percentage
in
this
interval,
we
must
use
Table
A
in
Appendix
B!
Column
A:
The
sigma
distances
are
labeled
z
in
the
leX‐hand
column
Column
B:
The
percentage
of
the
area
under
the
normal
curve
between
the
mean
and
the
various
sigma
distances
from
the
mean
Column
C:
The
percentage
of
the
area
at
or
beyond
various
scores
toward
either
tail
of
the
distribu$on.
48