Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 7: The Normal Distribution Definition:
The
random
variable
X
is
said
to
have
normal
distribution
with
mean
µ and
variance
σ 2,
if
it
has
the
density
function
f (x)
=
€
1
2)[(x− µ )/σ ]2
e−(1/
,−∞ < x < ∞ .
€
σ 2π
2
In
our
text,
the
shorthand
notation
X ~ N(µ,σ ) is
used
to
indicate
the
normal
distribution
with
mean
µ and
variance
σ 2.
Properties of the Normal Distribution: €
1.
€
∫
∞
€
€
€
f
(x)dx
=
1
(To
prove
this
we
need
to
make
a
clever
move
and
find
a
means
−∞
to
rewrite
this
as
a
double
integral.
After
that
we
need
to
make
a
change
of
variables
and
integrate
in
polar
coordinates.)
2.
f (x) ≥ 0 ,
for
all
x.
3.
The
distribution
is
symmetric
about
the
mean,
f (µ + x) = f (µ − x) .
€
4.
The
maximum
value
of
f
occurs
at
x = µ .
Standard Normal Distribution – this
is
a
special
case
of
the
normal
€
distribution
with
mean
0
and
standard
deviation
1.
(Variance
is
also
1)
€
Notation :
X ~ N(0,1) Relation between Standard Normal Distribution and Normal Distribution X −µ
2
Suppose
that
X ~ N(µ,σ ) .
Then
the
random
variable
Z =
,
€
σ
is
normally
distributed
with
mean
0
and
standard
deviation
1,
i.e.
€
Z ~ N(0,1) .
€
€
CDF table for N(0,1)
is
in
back
of
book
(page
601).
To
utilize
the
table
you
need
to
compute
Z
values
first,
shade
area
in
question,
and
then
possibly
use
symmetry
notions
to
find
related
areas.
(e.g
Z<0
not
in
table.)
Symmetry Notions
for the Standard Normal Distribution −u
z
Note: Table
for
the CDF
function Φ(z) = ∫−∞ 1 e 2 du is
on
page
601.
2
2π
€
Note:
The
CDF
function
for
Normal
Distributions
is
available
on
the
TI‐
83
and
TI‐84
calculators.
We
will
utilize
this
to
solve
various
probability
problems.
WARNING:
In
many
statistic
texts
and
on
the
TI
calculators,
the
syntax
will
be
N( µ , σ )
with
the
parameters
being
the
mean and standard deviation.
nd VARS key, DIST menu TI Syntax: 2
€
€
normalcdf
(lowerbound,
upperbound,
mean,
standard
deviation)
Notes‐
if
you
want
a
lowerbound
of
−∞ ,
may
use
value
−10 99 ,
and
if
you
want
an
upperbound
of
+∞ ,
you
may
use
10 99 .
If
you
do
not
specify
the
mean
and
standard
deviation,
the
default
is
0
€
€
and
1,
respectively.
€
€
Examples:
1.
(see
7.1
in
book
for
table
use)
The
breaking
strength
of
a
2
fabric
in
Newtons
is
denoted
X,
and
is
distributed
X ~ N(µ,σ ) ,
where
µ = 800 and
σ 2 = 144.
Find
the
probability
that
the
strength
of
the
fabric
is
at
least
772
Newtons.
12 .
€***Use
σ =€
€
TI:
normalcdf
(772,
1099,
800,
12)
=
.99018
€Example
2.
(See
7.3
in
the
book
for
table
use)
The
diameter
of
a
thread
on
a
fitting
is
normally
distributed
with
mean
0.4008
cm
and
a
standard
deviation
of
0.0004
cm.
The
design
specifications
are
0.4000
0.0010
cm
.
Find
the
probability
that
specifications
are
met.
We
want
our
random
variable
X
to
satisfy
0.3990
≤ X
≤ 0.4010
±
So,
we
use
the
command
€ €
TI:
normalcdf
(.399,
.401,
.4008,
.0004).
We
find
the
final
answer
is
.691459.
€
Problems in “reverse” – Probability is predetermined and cutoff value is requested. Goal:
Find
the
value
x
so
that
P(X
≤ x)
=
p,
where
the
value
p
is
given.
This
is
also
referred
to
as
the
pth
percentile.
€
Strategy (this works on both TI­83 and TI­84). 1. Find
the
associated
Z
cutoff
for
the
cdf
of
the
standard
normal
distribution.
Use
the
command
invNorm
(p).
2. Solve
the
equation
Z =
X −µ
,
for
X.
σ
Example
3:
Suppose
that
IQ
scores
are
normally
distributed
with
a
mean
of
100
and
a
standard
deviation
of
15.
Find
the
90th
percentile
IQ
score.
€
We
seek
the
score
x
so
that
P(X
≤ x)
=
.9.
The
associated
Z‐score
is
invNorm
(.9)
=
1.282.
€
Solve
1.282 =
X −µ
,
for
X.
σ
Here
we
have
1.282 =
€
X −100
,
or
that
X
=
1.282*15
+100
=
119.23.
15
Reproductive Property of the Normal Distribution: Suppose
that
we
have
n
independent,
normal
random
variables,
X
1,
X2,
€
…Xn.
Each
is
normally
distributed
with
X i
2
~ N(µi ,σ i ) .
Then
the
random
variable
Y
=
X1
+
X2+
…+
Xn
is
normally
distributed
n
n
with
mean
E(Y ) = µY = ∑ µi and
variance
V (Y ) = σ
i=1
€
2
Y
= ∑σ i2 .
i=1
Remark: Must find the variance of the new distribution and use this to get new standard deviation. The proof relies upon the use of Moment Generating Functions. We discuss €
€
these later. Problems with multiple normal distributions: Example
4.
A
test
is
taken
in
three
parts.
For
the
first
part
the
mean
is
50
and
the
standard
deviation
is
10,
for
the
second
part
the
mean
is
70
and
the
standard
deviation
is
15,
and
for
the
final
part
the
mean
is
130
and
the
standard
deviation
is
18.
Assume
that
scores
on
each
part
of
the
exam
are
normally
distributed.
Find
the
probability
that
a
student
scores
more
than
270
points
on
the
three
parts
of
the
test
altogether.
Consider
Y
=
X1
+X2
+X3
as
the
sum
of
the
three
test
parts.
Use
mean
=
50
+70
+130
=
250.
Use
variance
=
10^2
+
15^2
+
18^2
=
649,
therefore
standard
deviation
of
total
exam
score
is
649 .
Answer:
normalcdf
(270,
10^99,
250,
649 )
=
.2162
€
Example
5.
(See
Example
7‐5
in
text
for
table
use)
An
assembly
is
built
in
three
parts.
For
the
first
part
the
mean
length
is
12cm
and
the
€
variance
is
0.02
for
the
second
part
the
mean
is
24
cm
and
the
variance
is
0.03,
and
for
the
final
part
the
mean
is
18
cm
and
the
variance
is
0.04.
Assume
that
the
lengths
of
the
parts
are
independently
and
normally
distributed.
Find
the
probability
that
the
total
length
of
the
assembly
lies
between
53.8
and
54.2
cm.
Consider
Y
=
X1
+X2
+X3
as
the
sum
of
the
three
assembly
parts.
Use
mean
=
12
+24
+18
=
54
cm.
Use
variance
=
.02
+.03
+.04,
thus
standard
deviation
=
.09 = .3 Answer:
normalcdf
(53.8,
54.2,
54,
.3)
=
.495
€
Central Limit Theorem: If X1, X2, …Xn is a sequence of n independent 2
random variables with E(Xi) = µi and V(Xi) = σ i and Y= X1+ X2+ +…Xn, then under certain general conditions, the distribution defined as n
Y − ∑ µi
Zn =
n
∑σ
€
€
has an approximate N(0,1) distribution as n
i=1
→ ∞ . 2
i
i=1
Special case: Let all of the random variables have the same €
€
€
2
distribution, that is E(Xi) = µ and V(Xi) = σ for each Xi. Let Y= X1+ X2+ +…Xn. Y − nµ
Then Zn = has an approximate N(0,1) distribution as σ n
€
n → ∞ . €
In
practice,
what
does
n → ∞ mean?
Here
are
some
numeric
€
guidelines
for
industrial
rule
of
thumb:
well
behaved
(use
this
approximation
for
n ≥ 4),
reasonably
behaved
(n ≥ 12),
ill
behaved
(n ≥
100).)
€€
€ 1+
X2+
+…Xn is
approximately
€
Practical shortcut for Special Case:
Y=
X
normal
with
mean
nµ and
standard
deviation
σ n .
Example
6:
Suppose
the
truncated
portion
of
a
number
has
a
uniform
distribution
€
€
on
the
interval
[0,1].
(A
number
will
be
truncated
by
taking
only
the
integer
portion
of
that
value.
For
example,
we
truncate
3.487
to
3
and
consider
.487
to
be
the
truncated
portion.
This
is
also
given
by
the
function
x
‐ x,
where
x
denotes
the
floor
or
greatest
integer
function.)
The
truncated
portion
of
20
numbers
is
calculated.
Estimate
the
probability
that
the
total
truncated
amount
is
less
than
9.1.
€
€
Solution:
Consider
this
to
be
a
sum
of
twenty
identical
uniform
distributions,
X1,
X2,
…X20.
Therefore
each
one
has
Mean
=
a+b 1
= 2
2
Variance
=
€
(b − a) 2 1
= ,
thus
standard
deviation
is
σ = 1 12 12
12
From
the
special
case,
Y
=
X1+
X2+
…+X20
is
approximately
normal
with
€ 20 12 .
mean
20*1/2
=
10
and
standard
deviation
€
Our
problem
requests
P(Y<9.1).
We
can
use
the
TI‐83
calculator
for
this
and
compute
€
Normalcdf
(‐10^99,
9.1,
10,
20 12 )=
.242.
To
use
the
table
in
our
book,
€
1.
Compute
the
associated
Z‐values,
Z
=
9.1−10
≈ −.0.70 .
20 12
€
2.
Draw
the
associated
area
for
the
CDF
function.
Shade
the
region
for
Z<–0.70.
3.
Use
the
symmetry
of
the
curve
and
the
table
values
to
compute
the
correct
probability.
The Normal Approximation to the Binomial Distribution Recall
we
can
view
a
binomial
distribution
Y
as
the
sum
of
n
independent
Bernouilli
Distributions:
Let
Y
=
X1+
X2+
…+Xn.
Then
each
Xi
has
mean
p
and
variance
pq.
The
mean
of
Y
is
np
and
the
variance
is
npq.
Thus
the
standard
deviation
is
npq .
If
we
apply
the
Central
Limit
Theorem
to
Y,
we
see
that
€
A
Normal
Distribution
Y
with
mean
np
and
variance
npq
is
roughly
normally
distributed
with
mean
np
and
standard
deviation
npq .
Example
7:
(See
example
7.9
in
book
for
table
use)
€
In
sampling
from
a
production
process
that
makes
items
of
which
20%
are
defective,
a
random
sample
of
100
items
is
selected.
The
number
of
defectives
in
the
sample
is
denoted
by
X.
Estimate
P(X<15)
using
the
Normal
Distribution.
Here
n
=100,
p
=
0.2,
and
q
=
0.8.
Thus
np
=
20,
and
σ = npq =
100(.2)(.8) = 4 Estimate
is
€
€
Normalcdf
(‐10^99,
15,
20,4)
=
0.1056.
To
get
more
precise
estimates
we
may
utilize
half‐interval
corrections
or
corrections
for
continuity.
This
will
help
account
for
the
fact
that
you
are
using
a
continuous
distribution
to
estimate
a
discrete
distribution.
For
example,
in
the
binomial
distribution
there
is
a
positive
probability
associated
with
the
event
X
=
15,
whereas
there
is
zero
probability
associated
to
P(X=15)
in
a
continuous
distribution.
Half­ interval Continuity Corrections on TI­83 Binomial Distribution X with λ= np and s.d. σ = npq Quantity desired from Binomial Distribution P (X=x) P(X≤x) Associated Ti­83 command Continuity Correction €
P(x–0.5 ≤X ≤ x+0.5) P(X≤ x+0.5) Normalcdf(x–0.5, x+0.5, λ,σ ) Normalcdf(­10^99, x+0.5, λ,σ ) P(X<x) = P(X≤x­1) P(X≤ x­1+0.5) = P(X≤ x–0.5) Normalcdf(­10^99, x­0.5, λ,σ ) P(X≥ x) P(X>x) =P(X≥ x+1) P(X≥ x–0.5) P(X≥ x+1–0.5) = P(X≥ x+0.5) P(a–0.5≤x≤ b+0.5) Normalcdf(x–0.5, 10^99, λ,σ ) Normalcdf(x+0.5, 10^99, λ,σ ) P(a≤X≤b) Normalcdf(a–0.5, b+0.5, λ,σ ) For
example
to
compute
the
probability
that
X
=
15,
we
would
instead
compute
P(14.5<
X<15.5).
This
would
then
allow
us
to
utilize
the
normal
approximation.
The
continuity
adjustments
either
add
a
half
interval
or
delete
a
half
interval,
depending
on
whether
or
not
the
equality
is
included
e.g.
<
versus
≤.
If
equality
is
included
in
the
original
requested
Binomial
probability,
just
check
that
this
value
is
in
your
adjusted
interval.
And
if
the
equality
case
is
not
included,
just
check
this
value
is
not
included
in
your
adjusted
interval.
Example
8:
Example
7‐
revisited
and
expanded:
(See
example
7.10
in
book
for
table
use)
In
sampling
from
a
production
process
that
makes
items
of
which
20%
are
defective,
a
random
sample
of
100
items
is
selected.
The
number
of
defectives
in
the
sample
is
denoted
by
X.
Here
n
=100,
p
=
0.2,
and
q
=
0.8.
Thus
np
=
20,
and
σ = npq =
100(.2)(.8) = 4 a)
Using
the
continuity
corrections
for
the
Normal
Distribution,
estimate
P(X<15)
using
the
Normal
Distribution.
€
€
We
compute
P(X<14.5).
We
calculate
normalcdf
(‐10^99,
14.5,
20,
4)
=
.08456.
b)
Estimate
P(X=15)
From
the
continuity
corrections
rules,
we
approximate
P(X=15)
by
P(14.5<X<15.5).
We
calculate
normalcdf
(14,5,15.5,
20,4)
=
0.0457.
c)
Estimate
P(X≤
15).
By
the
continuity
half‐interval
corrections
this
is
approximated
by
P(X<15.5)
for
the
normal
distribution.
This
value
is
normalcdf(‐10^99,
15.5,
20,4)
=
0.13029.