Download Unit 3 Review solutions 1. a. The sample space consists of the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Unit
3
Review
solutions
1.
a.
The
sample
space
consists
of
the
following
pairs:
AB,
AC,
AD,
AE,
AF,
AG,
BC,
BD,
BE,
BF,
BG,
CD,
CE,
CF,
CG,
DE,
DF,
DG,
EF,
EG,
FG
(21
equally
likely
outcomes)
b.
The
outcomes
for
which
both
are
female
are
AD,
AG,
DG.
This
probability
is
3/21,
which
is
about
.143.
c.
The
outcomes
for
which
the
average
age
exceeds
50
are
BF,
BG,
CF,
CG,
DF,
DG,
EF,
EG,
FG.
This
probability
is
9/21,
which
is
about
.429.
d.
The
only
such
outcome
is
FG,
so
the
probability
is
1/21,
which
is
about
.048.
e.
If
the
company
really
selected
the
two
employees
by
random
chance,
there
would
be
less
than
a
5%
chance
of
choosing
the
two
oldest
employees.
This
is
certainly
possible,
but
this
probability
is
small
enough
to
cast
considerable
doubt
on
the
claim
that
the
selection
of
the
two
oldest
employees
was
by
random
chance.
Therefore,
there
is
at
least
moderate
evidence
to
support
the
allegation
of
age
discrimination.
2.
a.
The
z‐score
is
(70
–
75)/5
=
1.00.
The
area
to
the
left
of
1.00
under
the
standard
normal
curve
is
.1587.
A
sketch
follows:
b.
To
reduce
this
probability
to
.05
requires
a
z‐score
of
–1.645.
(This
is
from
your
table.
Looking
up
the
area
of
.05
on
the
table
leads
to
the
z‐score
being
between
‐1.64
and
‐1.65
so
we
estimate
it
to
be
in
the
middle
or
1.645.
You
could
have
used
–1.64
or
–1.65)
If
you
let
k
denote
the
new
drying
time
cutoff
value,
you
need
(k
–
75)/5
=
1.645,
so
k
–
75
=
1.645(5)
or
66.775
minutes.
A
sketch
follows:
c.
With
a
smaller
standard
deviation,
the
error
probability
in
part
a
would
be
much
smaller.
The
normal
curve
would
be
less
spread
out
(taller
and
skinnier),
so
there
would
be
less
area
to
the
left
of
70.
With
a
smaller
standard
deviation,
the
revised
cutoff
value
in
part
b
would
be
larger.
The
normal
curve
would
be
less
spread
out
(taller
and
skinnier),
so
the
point
at
which
5%
of
the
area
is
to
the
left
would
be
closer
to
the
mean
and,
therefore,
a
larger
cutoff
value.
3.
a.
The
80%
value
is
a
parameter
because
it
is
a
number
that
describes
the
entire
population
(all
incoming
email
messages
at
this
college).
b.
The
symbol
used
to
represent
the
.8
would
be
π.
c.
The
mean
of
the
sampling
distribution
of
sample
proportions
will
equal
the
population
proportion
that
are
spam,
π
,
which
is
specified
as
equaling
.8.
The
standard
deviation
of
the
sampling
distribution
can
be
calculated
as
.p (1- p )
=
n
.8(1- .8)
ª .028 200
.5 - .8
d.
The
z‐score
would
be
z =
ª -10.71 so
yes,
it
would
be
incredibly
surprising
if
the
sample
.028
turned
out
to
have
50%
or
less
spam,
as
this
would
essentially
never
happen
when
the
population
consists
of
80%
spam.
That
is,
the
CLT
theorem
predicts
that
almost
all
of
the
sample
proportions
would
fall
within
3
standard
deviations
above
or
below
the
mean
of
the
sample
proportions.
A
sample
200
messages
that
only
had
50%
spam
given
that
the
population
proportion
is
80%,
would
be
more
than
10
standard
deviations
below
the
mean
and
that
would
not
be
expected
to
ever
happen.
4.
According
to
the
Central
Limit
Theorem,
the
sample
proportion
who
die
in
a
sample
of
371
patients
will
vary
according
to
a
normal
distribution,
with
mean
equal
to
the
population
proportion
of
deaths
(.20)
and
with
standard
deviation
equal
to
.2(1- .2)
ª .0208 371
A
sketch
follows:
b.
The
z‐score
is
(.213
–
.20)/.0208
or
0.63.
The
area
to
the
left
of
0.63
under
the
standard
normal
curve
is
.7357,
so
the
probability
is
1
–
.7357
or
.2643
that
the
sample
proportion
of
deaths
would
be
.213
or
higher.
5.
You
cannot
draw
a
reasonable
conclusion
about
whether
this
is
a
fair
coin
without
knowing
the
sample
size.
If
the
75%
heads
was
based
on
a
sample
of
only
four
flips,
then
there’s
no
reason
to
suspect
that
the
coin
is
not
fair.
But
if
the
75%
heads
was
based
on
a
large
number
of
flips,
that
would
suggest
that
the
probability
of
heads
is
close
to
.75
and
so
the
coin
is
not
fair.
6.
No,
the
distribution
of
house
prices
would
still
be
skewed
to
the
right.
This
is
one
sample
of
1000
homes
so
it
will
have
a
distribution
that
is
approximately
the
same
as
the
population
(prices
of
all
home
for
sale)
distribution.
The
Central
Limit
Theorem
tells
us
that
the
sampling
distribution
of
the
sample
mean
house
price
would
be
very
close
to
normal,
but
that
result
does
not
apply
to
the
distribution
of
individual
house
prices.
That
is,
the
CLT
refers
to
repeatedly
taking
samples
and
calculating
the
sample
mean.
The
distribution
of
those
sample
means
will
be
approximately
normal
with
greater
normality
as
the
number
of
samples
increases.
7.
a.
The
value
.45
is
a
parameter,
and
you
would
use
the
symbol
π
to
represent
it.
b.
Sampling
variability
refers
to
the
variability
in
a
statistic–
in
this
case
the
variability
in
sample
proportions
of
orange
candies.
A
parameter
pertains
to
the
entire
population
and
so
has
a
fixed
value;
it
does
not
vary.
c.
You
are
more
likely
to
get
between
35%
and
55%
orange
candies
if
you
take
a
random
sample
of
400
candies
than
if
you
take
a
random
sample
of
40
candies.
The
Central
Limit
Theorem
tells
you
that
the
sampling
distribution
of
the
sample
proportion
of
orange
candies
will
be
centered
around
.45
and
the
spread
of
the
distribution
will
decrease
as
the
sample
size
increases.
So,
results
using
a
larger
sample
size
will
be
more
likely
to
be
grouped
around
45%
than
will
results
using
a
smaller
sample
size.
d.
You
are
more
likely
to
get
more
than
60%
orange
candies
if
you
take
a
random
sample
of
40
candies
than
if
you
take
a
random
sample
of
400
candies
because
.60
is
an
extreme
value,
far
from
the
mean
of
.45.
The
larger
the
sample
size,
the
smaller
the
sampling
variability.
If
45%
of
all
Reese’s
Pieces
are
colored
orange,
it
would
be
very
difficult
to
find
a
sample
of
400
candies
in
which
60%
or
more
were
orange.
This
result,
however,
is
much
more
likely
to
happen
in
a
small
sample
of
only
40
candies.
e.
The
observational
units
are
the
samples
of
40
candies,
and
the
variable
is
the
proportion
of
orange
candy
in
each
sample.
8.
a.
No,
it
would
not
be
reasonable
to
model
the
duration
of
cell
phone
calls
with
a
normal
distribution
because
the
distribution
could
not
be
symmetric
with
these
mean
and
standard
deviation
values.
Two
standard
deviations
below
the
mean
would
indicate
negative
lengths
of
cell
phone
calls,
whereas
two
standard
deviations
above
the
mean
would
indicate
calls
lasting
4.5
minutes,
which
is
a
very
reasonable
(not
very
extreme)
cell
phone
call
length.
It
seems
more
plausible
to
use
a
distribution
that
is
skewed
to
the
right
to
model
cell
phone
call
lengths.
b.
Yes,
it
would
be
reasonable
to
use
the
Central
Limit
Theorem
to
describe
the
distribution
of
the
sample
mean
call
duration
because
the
sample
size
used
is
large
(
>
30).
c.
The
CLT
says
the
sampling
distribution
of
the
sample
mean
call
duration
will
be
approximately
normal,
with
mean
1.7
minutes
and
standard
deviation
1.4
= .181
minutes.
60
d.
Here
is
a
sketch
of
the
sampling
distribution:
e.
If
the
sample
size
were
160
calls
rather
than
60
calls,
the
curve
would
be
much
narrower
with
less
horizontal
spread.
The
area
would
be
much
more
concentrated
around
the
center
(1.7)
because
the
standard
deviation
of
the
curve
would
be
smaller.
9.
The
probability
of
winning
one
dollar
is
.474
means
that
if
you
play
this
game
a
very
large
number
of
times,
then
the
long‐run
proportion
of
spins
for
which
you
win
one
dollar
will
be
very
close
to
.474.
In
other
words,
you
will
win
one
dollar
in
very
close
to
47.4%
of
the
spins
if
you
play
a
very
large
number
of
times.