Download Unit 7: Confidence Intervals for a Population Mean (σ known)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Review Binomial
1. Use binomial when you are
looking at a response that has
only one of two possibilities (i.e.
success/failure). The probability
of success is called, p, and is the
same for each observation
2. There are a fixed number of n
observations and each
observation is independent
Formula:
n x
x
p
1

p


x
 
where
n
n!

 x  x! n  x !


 
  np   np 1  p 
Example: Twenty percent of
American households own three or
more motor vehicles. You choose 12
households at random.
1
a. What is the probability that none
of the chosen households owns
three or more vehicles? What is
the probability that at least one
household owns three or more
vehicles?
b. What is the probability that
between one and three
(inclusive) of the chosen
households own three or more
vehicles?
c. What are the mean and
standard deviation of the
number of households in your
sample that own three or more
vehicles?
Solution:
X~B(n,p); X~B(12, 0.20)
2
i. P(X = 0)
n x
 12  0
nx
120
p
1

p

0.2
1

0.20




x
0
 
 
 1(1)(0.0689)
 0.0689
ii. P(X≥1) = 1 - P(X = 0) =
1 – 0.0689 = 0.9313
b.P(X=1)+ P(X=2)+ P(X=3)
 12  1
 12 
121
12 2
2
0.2
1

0.20

0.2
1

0.2




1
2
 
 
 12 
123
   0.23 1  0.2 
3
 12  0.2  0.086   66  0.04  0.1074 
220  0.008  0.1342 
 0.2064  0.2835  0.2362
 0.7261
  np  12(0.2)  2.4
c.
  np(1  p )  2.4(1  0.2)  1.386
3
Module IV Introduction to
Statistical Inference
Unit 7: Confidence Interval for a
population Mean
Statistical Inference:
 provides methods for drawing
conclusions about a population
from sample data
 tells us how much we trust the
conclusion
 requires data produced through
a random sample or a
randomized experiment.
 The Law of Large Numbers
tells us that the sample mean x
from a large SRS will be close
to the unknown population
4
mean  . This is why we use x
to estimate the mean of the
population – we figure it will be
close to  .
How would the sample mean x
vary if we took many sample of
size n from the same pop’n?.
 CLT says that the mean x will
have a distribution close to
normal with mean  and
standard deviation 
.
n
Therefore, if we know  (let’s
assume we do) we can find the
standard deviation

 1.33333
= 4
n
9
5
Statistical Confidence
Recall the 68-95-99.7% rule. It
says that in 95% of all samples,
the mean x of the sample will be
within 2 standard deviations of
the population mean  . So, now
the mean x will be within 2
x is within 2

n

n
. If
of the unknown
 , then this means that  is

within 2
of x , in 95% of all
n
samples
6
So, in 95% of all samples, the
unknown  lies between
x 2

n
. and x  2

n
This interval is known as a
Confidence Interval (CI) for  . It
is a 95% CI because it contains
the unknown mean  in 95% of
all possible samples
A level C Confidence Interval for
a parameter has 2 parts:
1. An interval calculated from
the data, usually of the form:
estimate  margin of error
7
where estimate ( x in this case)
is our guess of the value of the
unknown parameter. The
margin of error is 2

n
it shows
us how accurate we believe our
guess to be based on the
variability of the estimate.
1. A Confidence level, C
which gives the probability
that the interval will capture
the true parameter valued in
repeated samples. You
choose C, usually it is 0.90,
0.95, or 0.99
8
Confidence Intervals for the Mean 
If we know  , we can standardize
it to get the one sample z statistic.
z
x

n
Z has a N(0,1) since x is normally
distributed. To find a level C, mark
the central area C under the
normal curve.
Let z* be the point on the standard
normal distribution that contains
the centre and c.
9
x z*

n
This is a level c confidence
interval for  .
The value z* is called the critical
values.
Lets try another example find the
critical value Z* for a 98%
Confidence Interval? Z*=2.32
10
Conf. Level
90%
95%
99%
Tail Area
0.05
0.025
0.005
z*
1.645
1.96
2.576
Confidence Interval for a pop’n mean
Draw a SRS of size n from a
population having unknown mean
 and known σ. A level c
confidence interval for μ is
x z*

n
The interval is exact when the
population distribution is normal
and is approximately correct for
large n in other cases.
11
Example: x  48,   4, n  9
95% CI
x z*

n
4
48  (1.96)
9
(45.4, 50.6)
Confidence Interval Behaviour
Let’s take a closer look at the
margin of error
z*

n
.
It is composed of 3 parts z*, σ,
and n .
12
What happens as each of these
change?
 As z* gets smaller, the margin
of error gets smaller and the CI
narrower.
 Also σ decreases, the CI gets
narrower. With smaller
variation, it is easier to pin
down μ.
 As n increases, the margin of
error gets smaller (for fixed
confidence level).
13
Example 6.6: A test for the level of
potassium in the blood is not
perfectly precise. Moreover, the
actual level of potassium in a
persons blood varies slightly from
day to day. Suppose that
repeated measurements for the
same person on different days
vary normally with  =0.2.
a. Julie’s potassium level is
measured once. The result is
x = 3.2. Give a 90%
Confidence interval for her
mean potassium level.
b. If three measurements were
taken on different days and
the mean result is x =3.2,
what is a 90% CI for Julie’s
mean blood potassium level?
14
a. x = 3.2, n=1, x =3.2,
x z*

n
0.2
= 3.2  (1.645)
1
=(2.9, 3.5)
b. n = 3
x =3.2
x z*

n
0.2
= 3.2  (1.645)
3
=(3.01, 3.39)
We are 90% confident that after
repeated sampling we may
capture the population mean in
our interval.
15
Choosing a Sample Size
Since the sample size has an
affect on the width of the CI, it is
something to be carefully
considered before any sampling is
done.
The margin of error is m = z*
n
To get the sample size
corresponding to the desired m,
substitute values for m, z* and σ
(known) and solve for n
2
z
*



n

 m 
NOTE: ALWAYS ROUND UP
16
Example 6.10: To assess the
accuracy of a lab. Scale a
standard weight known to weigh
10 grams is weighted repeatedly.
The scale readings are normally
distributed with unknown mean.
The standard deviation of the
scale readings is known to be
0.0002 g.
a. The weigh is weighed five
times. The mean result is
10.0023 g. Give a 98% CI for
the mean of repeated
measurements of the weight.
b. How many measurements
must be averaged to get a
margin of error of  0.0001 with
98% confidence.
17
a. n = 5, x =10.0023, 98%
z* = 2.326
x  z*
n
=10.0023  (2.326) 0.0002
=10.0023  0.00021
=(10.0021, 10.0025)
5
b. m = 0.0001
2 2.326 * 0.0002
z
*



2
(
)
=
n

0.0001
 m 
n = 21.64
NOTE: SAMPLE SIZE DETERMINES
MARGIN OF ERROR. POPULATION SIZE
DOES NOT INFLUENCE THE SAMPLE
SIZE WE NEED.
18
Cautions about the CI formula:
 Data must be a SRS from the
population.
 Do not use any design other
than the SRS
 Outliers can have a large effect
on the CI. Beware of outliers
before beginning any analysis,
and try to correct or remove them
before proceeding
 If n is small and the population
is not normal the confidence
level will not be the stated C
 A 95% confidence level does not
mean that there is a 95% chance that
 is contained in the specific interval
it means that 95% describes that
there is chance of capturing  in a
long set of samples.
19
 We double the margin of error
when we reduce the sample size to
one fourth of the original
 If your margin of error is too large
you can reduce it by:
o Using a lower level of
confidence
o Increase the sample size
o Reduce your standard deviation
 The square root in the formula
implies that we must multiply the
number of observations by 4 in order
to cut the margin of error in half
 The size of the population (as long
as the population is much larger than
the sample) does not influence the
sample size we need.
20