Download Empirical Rule for `X

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Empirical Rule for X
Consider a sample of size n from a population with mean  and standard deviation
. Suppose X is normal ( or approximately normal), with  =  and  = /n
X
X
(This would be the case if the population is normal or if the sample size is large).
Find the probability that X will be within (a) 2 of  (b) 3
of  .
X
X
(a)
P( X will be within 2 of  )
X
=
(88)
(b) P( X will be within 3 of  )
X
=
In general the statement “X will be within k of  “ means that X lies between
X
-k
X
and
 +k 
X
If X is normal ( or approximately normal), then
P( X will be within k of  ) = P(-k < Z <k)
X
(89)
Z Confidence Interval
Suppose we are given the following:
Normal Population: Scores on a standardized test.
Population Mean :  (unknown)
Population S.D.:  =1.5
To estimate  we will take a srs of size n =25 and use X as our estimator. Recall
that since the population is normal,
X is normally distributed with  =  and  =  /n = 1.5/5 =.3
X
X
We would like to be able to express this estimate in the form X  E or
(X – E, X + E ). Here E is some error which determines the accuracy of our
estimate. Let’s take E = 2 
for now .
X
Thus we have
For any given sample this interval may or may not contain the true mean  . It
would be useful to know what the probability is that this interval covers  .
If the interval covers the true mean  then  is somewhere in the interval above so
thatX is in fact within 2 
( =0.6) of  .
X
Thus P [ (X - 2  , X + 2  ) covers ]
X
X
= P (X is within 2  of  )
X
=
=
(90)
To make the probability above a nice number, .95, we should replace 2 by 1.96.
Thus we can say
“ For 95% of all samples of size n =25, the interval (X - 1.96  , X + 1.96  )
X
X
will cover the true value of  .”
Or,
“ For 95% of all samples of size n =25, X will be within 1.96 of the true
X
population mean .”
The 95% value is called the LEVEL OF CONFIDENCE. This tells us the
probability the interval will cover .
The 1.96
= .588 is called the margin of error. This tells us how accurate X is
X
(i.e. how closeX will be to  for 95% of all samples).
The interval (X - 1.96  , X + 1.96  ) is called a 95%
X
X
Z-CONFIDENCE INTERVAL.
The simulation below will illustrate how confidence intervals work.
(91)
MTB > random 25 c1-c40;
SUBC> norm 10 1.5.
MTB > zint 95 1.5 c1-c40.
[ The first two command lines select 40 random samples each of size n =25 from a
normal distribution with  =10 and  = 1.5. The third command line forms the 95%
Z-CONFIDENCE INTERVAL for each sample]
Confidence Intervals (The assumed sigma = 1.5)
Variable
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
C26
C27
C28
C29
C30
C31
C32
C33
C34
C35
C36
C37
C38
C39
C40
N
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
Mean
10.459
9.826
10.388
9.741
10.441
10.331
8.941
10.205
10.163
10.009
10.455
10.365
10.626
10.090
10.339
10.208
10.356
9.943
10.015
9.924
10.037
9.490
9.972
10.330
9.635
9.292
10.053
9.484
10.666
9.896
9.942
10.100
9.483
9.691
10.390
10.569
9.813
9.905
10.442
9.945
StDev
1.661
1.486
1.600
1.297
1.766
1.637
1.264
1.627
1.560
1.619
1.787
1.220
1.475
1.677
1.103
1.480
1.508
1.388
1.318
1.473
1.271
1.345
1.484
1.644
1.609
1.558
1.072
1.726
1.402
1.640
1.583
1.657
1.496
1.623
1.369
1.178
1.326
1.489
1.405
1.919
(92)
SE Mean
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
0.300
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
(
95.0% CI
9.871, 11.047)
9.238, 10.414)
9.800, 10.976)
9.153, 10.329)
9.853, 11.029)
9.743, 10.919)
8.353,
9.529)
9.617, 10.793)
9.575, 10.751)
9.421, 10.597)
9.867, 11.043)
9.777, 10.953)
10.038, 11.214)
9.502, 10.678)
9.751, 10.927)
9.620, 10.796)
9.768, 10.944)
9.355, 10.531)
9.427, 10.603)
9.336, 10.512)
9.449, 10.625)
8.902, 10.078)
9.384, 10.560)
9.742, 10.918)
9.047, 10.223)
8.704,
9.880)
9.465, 10.641)
8.896, 10.072)
10.078, 11.254)
9.308, 10.484)
9.354, 10.530)
9.512, 10.688)
8.895, 10.071)
9.103, 10.279)
9.802, 10.978)
9.981, 11.157)
9.225, 10.401)
9.317, 10.493)
9.854, 11.030)
9.357, 10.533)
QUESTIONS
1. (a) In theory, how many of the above intervals would you expect to cover the
true population mean  (=10)?
(b) In fact how many actually do?
(c) If this simulation were repeated would you always find that exactly 36 of the 40
intervals contain ? Explain.
2. Suppose you selected 40 samples of size n =25 from a real population ( where
typically the population mean and standard deviation are unknown).
(a) Could you form a 95% Z- confidence interval for each sample?
Explain.
(b) If you knew  and formed forty 95% Z-confidence intervals, how many of the
intervals would you expect to cover the population  ? Could you tell which?
Explain.
(93)
Note: (i) 100(1-)% Z-confidence interval of  is given by
X  Z/2  ; where
X
 =  /n
X
(ii) For 95% Z –confidence interval ,  = .05. hence 95% Z-confidence interval of 
is
X  1.96 
;
X
where  =  /n
X
(iii) 99% Z-confidence interval of  is
X  2.5758 
;
X
where  =  /n
X
(iv) 90% Z-Confidence Interval of  is
X  1.6449 
;
X
where  =  /n
X
(94)
The t-distribution
The t-distribution depends on a single parameter. This parameter is called its
degrees of freedom (df). If sampling is done from a normal distribution whose mean
is  and standard deviation , then
X - 
Z = 
 /n
follows standard normal distribution. Since,  in practice is mostly unknown;
therefore, we can replace it by its estimate s. The random variable
X - 
T = 
S /n
follows t-distribution with n-1 degrees of freedom.
Sketch of t-distribution In comparison with standard normal distribution, the tdistribution has more area in the tails while the standard normal distribution has
more area in the middle.
t-curve approaches Z-curve if df is large.
(95)
T-Interval: Confidence Interval for the Mean  of a Normal Population
( unknown)
If a random sample X1 , X2 . . . Xn is chosen from a normal distribution; then
100(1-)% Confidence Interval of  is
X  t/2 SE
where:
df for t is n-1,
SE = s/n = standard error of X ( the estimated sd of X),
X =
s2 =
s=
Margin of Error: E = t/2 SE = t/2 s/n
Level of Confidence ( Reliability) : 100(1-)%
Notes: 1. For all n, t/2 > z/2 .
2. For df = , t/2 = z/2 , which are the entries at the bottom of the t –table.
3. For large n (n >30), the normality assumption may be ignored because of the
Central Limit Theorem.
4. The estimate of , X is the mid-point of the CI and the margin of error is
one half the width of the CI.

L
X
U
Thus,
X = (L+U)/2
(96)
and
E = (U – L)/2
Example: In a health study the birth weights of a random sample of 100 newborns
from mothers with a low socioeconomic status in a large US city was recorded. The
sample yielded a mean of 3.21 kg with a standard deviation of 0.71 kg.
(a) Find a 90% confidence interval for the true mean birth weight of newborns
from mothers with a low socioeconomic status.
(b) Interpret the confidence interval.
Solution: Here we wish to estimate
 = mean birth weight of all newborns from mothers with a low socioeconomic
status in this US city.
Given:
n=
x =
[estimate of  ]
s=
[estimate of  ]
Since n > 30, it is not necessary that the population be normal ( due to the CLT).
For a 90% CI, t/2 =
=
, df = n –1 = 99
x  t/2 s/n
=
=
or,
(c) x = _________ estimates the true population mean  with margin of error
E =____________ and level of confidence (Reliability)____________.
The level of confidence gives the proportion of intervals found this way that
would cover .
(97)
Note: The interpretation of a confidence interval as given in the example above
is the popular interpretation often heard on television or reported in
newspapers. A mathematically precise interpretation of the confidence interval
for this example would be “ Prior to sampling there was a .90 probability that
the confidence interval to be formed would contain the true population mean 
“.
Example: For the data in the example above, find a 95% confidence interval for
the true mean birth weight of newborns from mothers with a low socioeconomic
status.
Solution:
Recall,
n = 100, x = 3.21,
For a 95% CI,
t/2 =
s = 0.71 .
=
,
df = n –1 = 99
x  t/2 s/n
=
=
or,
Interpretation:
x = _________ estimates the true population mean  with margin of error
E =____________ and level of confidence (Reliability)____________.
(98)
Example: For the data in the example above, find a 99% confidence interval for the
true mean birth weight of newborns from mothers with a low socioeconomic status.
Solution:
Recall,
n = 100, x = 3.21,
For a 99% CI,
t/2 =
s = 0.71 .
=
,
df = n –1 = 99
x  t/2 s/n
=
=
or,
Interpretation:
x = _________ estimates the true population mean  with margin of error
E =____________ and level of confidence (Reliability)____________.
Question: Considering these three examples, if the level of confidence is
increased and all other things remain the same, the width of the confidence
interval will_______________ .
(99)
Example: A study was conducted to determine the effect of acid rain on the lake
water in an industrial region of the country. The data below gives the pH levels
from a random sample of 10 lakes from this region. ( It was assumed that the
sample came from a normal distribution). Minitab was used to find a 95%
confidence interval for the mean pH level for all lakes in this region.
C1: 6.6
7.1
7.3
6.7
6.8
6.2
6.5
5.9
6.9
6.3
MTB > tint 95 c1
One-Sample T: C1
Variable
C1
N
10
Mean
6.630
StDev
0.424
SE Mean
0.134
(
95.0% CI
6.326,
6.934)
From the Minitab output answer the following:
(a) What is the 95% confidence interval of  ?
(b) What is the estimate of  and the estimated standard deviation of this
estimate?
(c) What is the margin of error E and level of confidence (reliability) for the
estimate of  ?
(100)
Sample Size Determination for Estimating 
Problem: Suppose you wish to estimate a population mean  with a specified
margin of error E and level of confidence. What sample size should be used?
Solution:
We know that
E = t/2 s/n .
Now we solve this equation for n.
E2 =
 nE2 =
 n=
=
[t/2 s/E]2
Of course since we have not sampled yet we do not have values for s or t/2 . In
practice

t/2 is replaced by z/2 and s is replaced by a prior estimate  .

Thus n  [z/2  / E]2 , rounded up to the next whole number.
Example: How large a sample would be required to estimate the mean pH level for
all lakes in the industrial region to within .1 with level of confidence 95%. Assume
that prior estimate for  is 0.424.
(101)