Download LAB 1 estim-244

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Mathematics 244
Lab 1 : t – CONFIDENCE INTERVALS and LARGE-SAMPLE CI’S
Part I: Normal distribution,  unknown
If X1, X2, ..., Xn are a random sample from a normal distribution with unknown mean  and known variance  2, then the
interval


 
 X  z/2
, X + z/2
 is a (1–)100%confidence interval for .
n
n

n
If  2 is unknown, it is estimated by S 2 
n–1 degrees of freedom, where



(X
i 1
i
 X )2
n 1
. In that case,
T
X  X
has a t distribution with  =
ˆ X
ˆ  S / n .
2
X
2


S
S 
 = 1   , and this is the same as saying that
< t/2  = P  t/2
< X   < t/2
S/ n
n
n



S
with a probability of 1 – , X is within t /2
of , so that we have:
n
Thus, P  t /2 <
X 
Definition: Let X1, X2, ..., Xn be a random sample from a normal distribution with unknown mean  and unknown
variance  2. Then the interval

S
S 
 X  t/2
 is a (1-
, X + t/2
n
n

t-confidence interval for .
Here,
E = t/2
S
S
is the margin of error and L = 2  t /2
is the length of the confidence interval.
n
n
Note that, unlike the z-interval case with known  2, E and L are not constant from one sample to another, even though
n is kept the same.
In the following exercise, we investigate the performance of such intervals by taking repeated samples from a normal
distribution.
i) Generate 40 column wise random samples of size n = 25 from a normal distribution with mean 100 and standard
deviation 8.
MTB > random 25 c1–c40;
SUBC> normal 100 8.
Now we will pretend that we do not know that the sample came from a population with = 100 and  = 8. Since our
1
objective is to estimate , we will do so using t-confidence intervals.
ii) Use MINITAB to generate 95% t-confidence intervals for . The TINTERVAL command is similar to the
ZINTERVAL command , except that it does not require (or take) a value for . The TINTERVAL command (may be
abbreviated to tint) requires two arguments: the level of confidence desired (0 – 100, default 95) and the column(s) of
data containing the sample(s).
MTB > tint 95 c1–c40
Copy your commands and results here. Mark (by hand or by typing a “*”) those confidence intervals which do NOT
cover the true mean  = 100.
iii) (a) What proportion of the CIs do actually contain the true mean  = 100?
(b) Theoretically, what proportion of the CIs would you expect to contain ?
(a)
(b)
iv) Do all your CIs have the same length? Give a reason for your answer.
v) Suppose you used 40 samples of sample size n=64 instead of n=25 and again formed forty 95% t-CIs. (Do not
actually calculate these.)
(a) Would you expect “more”, “fewer”, or “the same number” of intervals to contain ? (b) Explain.
(c) “On average”, how would the length of the twenty CIs compare to those you obtained in (a)? (d) Explain.
(a)
(b)
(c)
(d)
vi) Suppose you calculated 95% t-CIs from 40 samples of real data. (a) How many would you expect to contain the
population mean  ? (b) Could you tell which? (c) Explain.
(a)
(b)
(c)
Part II: Large-sample confidence intervals
In constructing a CI for a parameter , we often can find an estimator
is a function of the sample X1, X2, ..., Xn .) For n large,
variance
 2ˆ .
̂ which has the following property: (Note that ̂
̂ is approximately normally distributed with mean  and
ˆ


Then in such a case we can say that for n large, P  z/2 <    < z/2   1   . This, then, can be


 ˆ


the basis of a CI with a coverage probability of approximately 1 –  and of the form: Estimator
 z/2 (SE), that is,
ˆ  z/2   ˆ 
Often it happens that we cannot get our hands on the SE =
 ˆ , but must rely on the [estimated SE] = ˆ ˆ . Under fairly
2
general conditions that deal with the convergence of ˆ ˆ to
ˆ  
ˆ ˆ
 ˆ
as the sample size increases, it is often the case that
also has approximately a standard normal distribution, so that the interval ˆ 
z/2  ˆ ˆ still has an approximate
coverage probability of 1 –  .
(Be careful not to confuse approximate large-sample confidence intervals with z-confidence intervals, which have exact
coverage probabilities under the conditions of sampling from a normal distribution with known standard deviation .)
How well do such large-sample CIs do? In the following, we investigate the performance of such intervals in the
estimation of the parameter  in the Poisson distribution.
Consider a random sample X1, X2, ..., Xn taken from a Poisson distribution with parameter . Recall that the probability
mass function of the Poisson distribution is
p(x;  ) =
-
e  ,  > 0, x = 0, 1, 2, ... , and that E(X) = V(X) = . We are
x!
x
interested in finding a confidence interval for .
By the Central Limit Theorem,
X 
/ n

X 
/ n
has approximately a normal distribution with mean
ˆ  X , and  ˆ 

n ( X  )

X  
 N (0, 1) ; that is, for large sample size n, X
and variance
 X2 

n
. Thus in this situation,  = ,
.
n
Because we don’t know
 ˆ 
estimator of  X could be ˆ ˆ
 would be


, we must estimate it. Since
n
 ˆ X 
X
X is an unbiased estimator of , one choice for an
. Thus, an approximate (1–large-sample confidence interval for
n


 X  z/2 X , X + z/2 X  .

n
n 

Let’s see how such an approximate large-sample CI performs in a simulation of a Poisson distribution with parameter  =
1.
3
i) Generate 20 row wise samples of size n = 50 from a Poisson distribution with E(X) =  = 1.
MTB > random 20 c1–c50 ;
SUBC> poisson 1.
ii) Construct 20 large-sample CI's for  using an approximate coverage probability of 90%, that is, using z.05 = 1.6449.
MTB > rmean c1–c50 c51
[=
MTB > let c52 = sqrt(c51 / 50)
MTB > let c53 = c51 – 1.6449*c52
MTB > let c54 = c51 + 1.6449*c52
X]
X /n]
[=
[lower confidence limit]
[upper confidence limit]
Name columns C51–C54 so as to identify their contents and print them. Copy your commands and results to the box.
Mark (by hand or by typing a “*”) those confidence intervals which do not cover the true mean  = 1.
iii) What proportion of the confidence intervals do cover the true mean?
Since S2 is also an unbiased estimator for , a second choice for an estimator of  X could be ˆ
sample CI for  would be
iv) Since n > 30,
X  z / 2
S
X

S 2 , so a largen
.
n
t / 2  z / 2 , and we may use a t-interval X  t /2
S
to approximate the confidence intervals. Thus,
n
we can use the TINTERVAL command in MINITAB. To do so, you first will have to transpose the rows and columns
of your sample.
MTB > copy c1–c50 m1
MTB > transpose m1 m2
MTB > erase c1–c50
MTB > copy m2 c1–c20
MTB > tint 90 c1–c20
[copies the columns to a matrix]
[transposes the matrix]
[erases the original 50 columns]
[copies the transposed matrix into columns C1-C20]
Copy your commands and results here. Mark (by hand or by typing a “*”) those confidence intervals which do not cover
the true mean  = 1.
v) What proportion of the confidence intervals do cover the true mean?
vi) In general, are the confidence intervals constructed using S2 narrower, wider, or the same width as the confidence
intervals constructed using
X?
Part III: Point estimator for  in the Poisson distribution
4
In the last part, we used two different types of CIs to estimate the mean  =  of the Poisson distribution. Since the
variance of the Poisson is also , in the previous part of the lab we used
other.
X to estimate 2 =  in one case and S2 in the
So the question is: In the Poisson case, if we want to estimate the variance 2 = , is it better to use
explore this question by examining the variance of the two estimators.
X or S2? Lets
i) Generate 1000 random samples of size 5 from a Poisson distribution with  = 4.
MTB > random 1000 c1–c5;
SUBC> poisson 4.
ii) Compute the values of
X and S2 into columns C6 and C8, respectively.
MTB > rmean c1–c5 c6
MTB > rstdev c1–c5 c7
MTB > let c8 = c7**2
iii) Calculate the variances of
[= X ]
[= S ]
[= S2]
X and S2 and store them in columns C9 and C10, respectively, as follows.
MTB > stats c6 c8;
SUBC> variance c9 c10.
estimated V(S2) =
estimated V( X ) =
iv) What is the value of the theoretical variance of
v) (a) Which estimator,
(a)
(b)
X in this situation?
X or S2, appears to be better for estimating ? (b) Why?
vi) Explain why the confidence intervals in Part II constructed using S2 were wider, in general, than the confidence
intervals constructed using
X.
5