Download Confidence Intervals and Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
1
Confidence Intervals and Hypothesis Testing (c.f. book Chapters 8 & 9)
1. Confidence Intervals
On p.273 in section 8-1 the authors begin with the development of the concept of a confidence interval (CI) in relation to
the unknown mean,  X for X ~ N ( X ,  X ) . The development relies on the use of iid data collection variables { X k }nk 1 .


It is noted that the natural estimator of  X is  X  X , and that  X  X ~ N (  X ,  X )  N (  X , X / n ) . We can
summarize this formally as

Step 1: For an unknown paramter,  , identify an appropriate estimator,  , and its pdf (or a pdf related to it).

X  X
~ N (0,1) . [c.f. (8-1)] This may be summarized as
X / n



Step 2: Identify a ‘standard’ random variable, Q  Q( ,  ) that is a mathematical function of both  and  .
The authors then proceed to note that Z 
The authors then offer the definition of a (2-sided) CI estimate as a numerical interval of the form l   X  u (which here
will be denoted as [l , u ] ) where the endpoints are computed from the sample data. They then go on to point outhat for
repeated sample sets, different values of these endpoints are likely to result. This leads to the definition of the associated
random endpoint interval [ L, U ] . Having this, the authors are in a position to define a (1   )% (2-sided) CI for  X via
Pr[ L   X  U ]  1   . [c.f. (8-2)]
The resulting random endpoint interval [ L, U ] is defined to be the CI for  X . The computed numerical interval [l , u ] is
simply one estimate of the estimator interval [ L, U ] .
As I find the above development to be unappealing, I will now offer the following alternative development. Write:



Pr[ x1   X  x2 ]  1   . [Also, note that we require that Pr[ X  x1 ]   / 2 and Pr[  X  x2 ]   / 2 . ] Then, using
the concept of equivalent events, rewrite this probability as

Pr[Q( x1 )  Q( X )  Q( x2 )]  1   .
(1.1)

Now, because here Q( X )  Z , we can identify  z / 2  Q( x1 ) and z / 2  Q( x2 ) where the authors have defined


z / 2  Pr[ Z  1   / 2]   Pr[ Z   / 2] . [Note that many other books use the definition z / 2  Pr[ Z   / 2] .] Hence,
(1.1) becomes

Pr[ z / 2  Z  Q( X ,  X )  z / 2 ]  1  

where Z  Q(  X ,  X ) 
(1.2)

X  X
. It must be emphazed that at this point we know z / 2 . Now, once again, using the
X / n
concept of equivalent events we can express (1.2) as:

Pr[ g ( z / 2 ,  X )   X

 g ( z / 2 ,  X )]  1   .

  X
Here, the function g (*) is simply the mathematical steps to solve for Z  X
for  X . Specifically,
X / n
(1.3)
2

 X   X  Z  X / n . Applying this sequence of operations to the entire inequality in (1.2) gives

Pr[C1   X
 C2 ]  1  


(1.3)
where [C1 , C2 ]   X  z2 X / n ,  X  z1 X / n . We can summarize this in the following two steps:
Step 3: From step 2, compute q1  FQ1 ( / 2) and q2  FQ1 (1   / 2) .
Step 4: Use the concept of equivalent events to manipulate (1.1) into the form Pr[C1  
 C2 ]  1   . The resulting
random endpoint interval [C1 , C2 ] is called the (1   )% (2-sided) CI for  .
The above steps are the same for obtaining the CI for any unknown parameter,  .
Example 1.1 Confidence Intervals Book Example 8_1 (p.274)
Here, we repeat Example 8-1 on p.274 of the Book. The data is contained in the
Matlab code ‘ex8_1.m’. The data-based probability model for X=”the act of
measuring the impact energy” is shown in Figure 1. Estimates of  X and  X are


also given in this figure:  X  64.46 and  X  0.22706 .
We now repeat the development of a (1   )% 2-sided confidence interval (CI)
for the unknown mean  X .

Step 1 Identify all information about the estimator  X :
Figure 1. Data-based pmf for X.
We are told that, here, it is known that  X  1 . We are also to assume that the data collection variables { X k }10
1 are iid and

normally distributed. It then follows that  X is normally distributed with mean   X   X and with standard deviation
    X / n  1 / 10  0.3162 .
X

Step 2 Write Pr[ x1   X  x2 ] . Then, using the concept of equivalent events, re-write this in terms of an appropriate
standard test statistic:

  X
In this case, since  X ~ N (  X ,  X / n ) , it follows that Z  X
~ N (0,1) . Hence,
X / n

 x1   X  X   X x2   X 
Pr[ x1   X  x2 ]  Pr 


  Pr[ z1  Z  z 2 ] .
 X / n  X / n  X / n 

Step 3 Use the given value of α to find the numerical values of the endpoints associated with the standardized statistic: In
this case we have Pr[ z1  Z  z2 ]  1   . To find z1 use Pr[ Z  z1 ]   / 2  .025 . Then z1=norminv(.025,0,1) =-1.96.
To find z2 use Pr[ Z  z2 ]  1   / 2  .975 . Then z2=norminv(.975,0,1) =1.96.
Step 4 Use the concept of equivalent events to manipulate the event to one wherein the unknown  X is in the middle of
the inequality. The endpoints of this event are, by definition, the CI for  X .
3



X  X

z

 z2   z1 X / n   X   X  z2 X / n
 1
X / n






   X  z1 X / n    X    X  z2 X / n   X  z2 X / n   X   X  z1 X / n

In this case, we have:


 




Hence, the CI is: [C1 , C2 ]   X  z2 X / n ,  X  z1 X / n .
Now, given that in this case, per the authors’ notation, z2  z / 2 , along with the fact, z1  z2 , the above CI can be written




[C1 , C2 ]   X  z / 2 X / n ,  X  z / 2 X / n .
as:
(1)
The expression (1) is nearly identical to that given in Example 8-1 on p.274. For convenience, that expression is:


[c1 , c2 ]  x  z / 2 X / n , x  z / 2 X / n .
(2)
At this point, it is appropriate to examine the differences between (1) and (2). It should be clear that, as an estimate of

 X , we have  X  x . Because (2) is an interval with numeric endpoints, I have denoted it as [c1 , c2 ] . However, the CI
described by (1) uses [C1 , C2 ] to emphasize the fact that these endpoints are random variables. Specifically, both depend

on the estimator  X  X . For example:

Recall that it was assumed that  X is normally distributed with mean   X   X and with standard deviation

    X / n  1 / 10  0.3162 (see Step 1). It follows that C1   X  z / 2 X / n is also normally distributed with
X


C  E (C1 )  E ( X  z / 2 X / n )  E ( X )  z / 2 X / n   X  z / 2 X / n ,
1
and with


 C2  Var(C1 )  Var( X  z / 2 X / n )  Var( X )   X2 / n .
1
(3a)
(3b)
It is interesting to note that in this example the CI width C2  C1  2 z / 2 X / n is not a random variable. Hence, the
right endpoint is simply the left endpoint plus the constant given by this difference. So in this example there is really only
one random endpoint. This will generally not be the case.
Before leaving this example, we will address the reasonableness of the assumption that { X k }10
1 are iid and normally
distributed. While there is no reason to question the iid assumption, the normal assumption is clearly flawed. From Figure
1, we see that the sample space includes measurements that are 0.1 J apart. The measurement apparatus has limited
resolving power. Figure 1 supports the assumption that, in fact, { X k }10
1 correspond to a discrete distribution.
Question 1: How would you proceed to assign a discrete pdf to X ?
Answer 1: Clearly, this is difficult due to the fact that only n=10 measurements were used. This is likely the reason for the
fact that zero measurements occurred at x  64.4 . The extent of the sample space, S X is also difficult to ascertain. Is it
really the case that Pr[ X  64.0]  Pr[ X  64.9]  0 ? It is possble. But more likely the limited range is due to the small
number of measurements. Hence, my answer to this question would be: we simply do not have enough data to proceed.
4
Example 1.2 This is Example 8-2 on p.276. Recall from above that C2  C1  2 z / 2 X / n is a non-random quantity.
The fact that it depends on the sample size, n, is valuable for determining the value for n to be used. Recall that the CI is



the endpoints of the event  X  z2 X / n   X   X  z1 X / n . This event is equivalent to the event
 z




 X / n   X   X  z / 2 X / n , which, in turn, can be expressed as: |  X   X | z / 2 X / n . Now,

suppose that we desire the smallest sample size, such that we are (1   )% confident that the ‘error’ |  X   X | will be
 /2
no more than, say, ε. To find this n-value set   z / 2 X / n . Then n  ( z / 2 X /  ) 2 . Of course, as this will almost
surely not be an integer, we must round up to the nearest integer.
Example 1.3
From http://hyperphysics.phy-astr.gsu.edu/hbase/biology/actpot.html#c1
“The action potential sequence is essential for neural communication.
The simplest action in response to thought requires many such action
potentials for its communication and performance.” The figure at the
right describes the temporal sequence associated with an action potential
in the human brain. After a stimulus is applied, the potential rapidly
changes from  70 mv to  30 mv . For any given stimulus, let X denote
the act of measuring the magnitude of the change from the rest potential
to the peak potential. To investigate the distributional properties
tests on n cells in a localized region of a given brain will be conducted.
Figure 1.1 Action potential sequencing.
Let { X k }nk 1 denote the associated data collection variables. For simplicity, assume that they are iid with
X ~ N ( X , X ) . Now, were we to have prior knowledge of  X , then to obtain a CI for  X , we would use the test

X  X
statistic Z 
. Here, we will assume that we do not have such knowlege. And so, we will use the next best thing;
X / n

X  X
1 n
2
 2
namely, the estimator  X 
 ( X k   X ) . Our test statistic is then Tn1   / n , which is called the t-statistic
n  1 k 1
X

with n  1 degrees of freedom. Recall that because of our above assumptions,  X ~ N (  X ,  X / n ) . While Z is a linear

 
function of only the single random variable  X , Tn 1 a nonlinear function of the two random variables ( X ,  X ) . Hence,
it is no longer a normal random variable. Even so, the t-statistic is a standard random variable. And so, (1.2) in step 2
above is replaced by
Pr[t1  Tn1  t2 ]  1   .
(1.4)
In exactly the same manner as above, we arrive at






[C1 , C2 ]   X  t 2 X / n ,  X  t1 X / n .
(1.5)


Now suppose that we have obtained the following numerical results for sample size n  20 :  X  97.32 ,  X  4.69 .
5
(i) Compute the 95% 2-sided CI estimate for  X : From Matlab we have t1=tinv(.025,19) =-2.0930 and t2=tinv(.975,19)
= 2.0930. Hence,


[C1 , C2 ]  97.32  2.093(4.69) / 20 , 97.32  2.093(4.69) / 20  [97.32  2.195 , 97.32  2.195]  [95.125 , 99.915]
(i) Compute the 95% 1-sided upper CI estimate for  X : To address this type of CI , because we want an upper bound on
 X , we will rewrite (1.5) as:




[ , C2 ]   ,  X  t1 X / n .
(1.6)
Clearly, (1.6) is associated with setting t2   . And so, (1.3) becomes:
Pr[t1  Tn1 ]  1    Pr[Tn1  t1 ] or equivalently Pr[Tn1  t1 ]  
(1.7)
From Matlab we have t1=tinv(.05,19) = -1.7291. In this case
C2  97.32  1.729(4.69) / 20  99.133 .
□
While the above example may have seemed simple, that is only because we were handed the appropriate test statistic. The

average student should be able to reason that if  X is normal, then so is Z. To show that it is Tn 1 is not easy! Hence, one
should either be given that information, or have enough long term memory to recall it. Out in industry it is doubtful that it
would be given. For this reason, we now provide graphs of some Tn pdfs in comparison to a Z ~ N (0,1) pdf. The figures
below show that as the sample size n increases, the ‘fatter’ tails of the Tn pdf become ‘leaner’. Consequently, for a given
α, the number t1( n ) will become smaller, and will approach z1 as n   .
Figure 1.2 Graphs of normal (black) and Tn (red) pdfs and associated t1( n ) values [   .05 ] for n  10,20,30,40,50 .
From the graph on the right of Figure 1.2, one might reasonably
conclude that for sample sizes greater than n  50 , one can get by
with using the normal pdf, as the difference between t1(50)  1.676
and z1  1.645 , which is .031 , is minimal. However, suppose one
were to choose   .005 . The graph in Figure 1.3 at the right
reveals that such an approximation would be very inaccurate.
Figure 1.3 t1( n ) values [   .005 ] for n  10,20,30,40,50
6
The General Steps for Computing a Confidence Interval
In Section 8-1-4 on p.277 the authors do delineate the general steps to be carried out. They are repeated here, but in the
language that we have developed.
The Problem: To obtain a (1   )% 2-sided CI for an unknown parameter,  , associated with a random variable X.
Step 1: Obtain a collection, { X k }nk 1 of iid associated data collection variables.



Step 2: Find a ‘standard’ statistic, g { X k }nk 1 ;  G that depends on both  and { X k }nk 1 .
Step 3: Formulate the probability Pr[ g L  G  gU ]  1   .
Step 4: Use a standard table for G to recover the numerical values for ( g L , gU )
Step 5: Use algebra to reformulate the event [ g L  G  gU ] into an equivalent event of the form [ L    U ] .
The expressions for L and U, which will clearly include { X k }nk 1 , gives the CI interval [ L    U ] .
Remark 1. Typically, the most difficult step is step 2, for the very reason that, unless one is well-versed in a wide array of
standard random variables, one is hard-pressed to know where to turn to. It is for this reason that I prefer my version of

step 2, which states that one should first identify a reasonable estimator,  (that will clearly involve { X k }nk 1 ), and try to
identify its pdf. For this reason, I have included the following summary of results along these lines.
Some Handy Standard Test Statistics
Results when µ is of Primary Concern- Suppose that X ~ f X ( x ;  X ,  X2 ) , and that { X k }nk 1 are iid data collection

variables that are to be used to investigate the unknown parameter  X . For such an investigation, we will use  X  X .
Case 1: X ~ N ( x ;  X ,  X2 )

Z  (  X   X ) /( X / n ) ~ N (0,1) for any n, when  X2 is known.
T

X  X
1 n
2

2
~
t


( X k   X )2
for
any
n,
when
is
estimated
by


n 1
X
X
2
n  1 k 1
X /n
Case 2: X ~ f X ( x ;  X ,  X2 ) [i.e. an arbitrary pdf]
For n sufficiently large (i.e. the CLT is a good approximation), then:

Z  (  X   X ) /( X / n ) ~ N (0,1) when  X2 is known.

(Theorem 2.5):
 
1 n


Z  X X ~ N (0,1) when  X2 is estimated by  X2 
( X k   X )2

2
n  1 k 1
X /n
Remark 2. The authors include a separate section for the CI in relation to the parameter p associated with the true
proportion of a population. In terms of random variables, p is simply the parameter associated with X ~ Bernoulli ( p) .
7
In view of the fact that p   X , a CI for the proportion is the same as the CI for the mean. Now, recall that the estimator


of p   X is p   X  X . Hence, if the sample size, n, is sufficiently large, then we can invoke the CLT and proceed as
in Case 2 above (with   X unknown).
Results when σ2 is of Primary Concern- Suppose that X ~ f X ( x ;  X ,  X2 ) , and that { X k }nk 1 are iid data collection

variables that are to be used to investigate the unknown parameter  X2 . For such an investigation, we will use  X  X ,
when appropriate.
Case 1: X ~ N ( x ;  X ,  X2 )

n 2 /  2
1 n

 n2 , when  X2   ( X k   X ) 2 [i.e.  X is known]
n k 1
~

(n  1) X2
1 n


( X k   X ) 2 [i.e.  X is not known]
~  n21 for any n, when  X2 

n  1 k 1
 X2

 X2 /  X2
1

F  2
~ f n ,n when  X2 
2
 X / X
nj
1
1
1
2
2
j
2

nj
(X
k 1
( j)
k
  X j )2 ;
j  1,2 . [i.e.  X 1 and  X 2 are known]
n
 X2 /  X2
1
2

when
F  2
~
f


( X k( j )   X ) 2 ;

n 1,n 1
X
2
 X / X
n j  1 k 1
j
1
1
1
2
2
j
j
j  1,2 . [i.e.  X 1 and  X 2 are not known]
2
Case 2: X ~ f X ( x ;  X ,  X2 )
For n sufficiently large (i.e. the CLT is a good approximation), then:
Z

 X2   X2 
~ N (0,1) .
 X2 2 / n
8
Hypothesis Testing
1. Simple Hypothesis Testing
Example 1. . Thin-Film Oxygen Uptake (ASTMD-4742)
Ref.
http://oilspecialist.com/motor-oil-tests/comparative-motor-oil-testing.html
http://www.off-road-outdoors.com/Oil-breakdown.html
The Thin-Film Oxygen Uptake Test evaluates the oxidation stability of lubricating oils. A mixture of the test oil and
chemistries found in gasoline engine operation (oxidized/nitrated fuel, soluble metals and distilled water) are placed in a
test vessel, which is pressurized with oxygen and placed in a heated bath. Anti-oxidant breakdown is evident when the
oxygen pressure in the vessel rapidly decreases. At this point, the induction time (break point) of the oil is recorded. As
shown in the graph, AMSOIL Synthetic 10W-30 Motor Oil had the highest induction time of all the tested oils. In fact, it
didn't reach its break point in over 500 minutes of testing.
Figure 1. Sample mean of thin film oxidation uptake time for 11 motor oils.
The graph in Figure 1 reveals a large spread in the suggested mean uptake time for the various oils. Even though the
sample size for each oil is not given in the reference, it is clear that AMSOIL ATM and Mobil One Super Synthetic have
true means that are significantly greater than the others. In this example, we will address some specific comparisons.
9
Let X=the act of recording the induction time of an oil, and let its mean and standard deviation be denoted as µ and σ,
respectively. We will assume that for each oil, a total of n=10 samples were taken.
(a) Here, we are concerned with the Valvoline (#1) and Valvoline Synthetic (#2) oils. The sample means for these were


recorded to be 1  x1  219 min . and 2  x2  211 min . We will assume that the sample standard deviations were



1  s1  3.7 min . and  2  s2  3.2 min . , respectively. Define the parameter   1  2 . We are interested in
conducting the following hypothesis test:
H 0 :   0 versus
 
H1 :   0 .

A natural test statistic is   1   2 . The first order of business to understand as much as we can about this random
variable. To this end, we can express it as:
 

  1  2  X 1  X 2
10
(A1): We will assume that { X 1 k }10
k 1 have common mean, 1 , and that { X 2 k }k 1 have common mean,  2 . It follows that


E ( )    1   2 . And so,  is unbiased for  .
(A2): We will assume that { X 1 k }10
k 1 are mutually independent and have common standard deviation,  1 , and
that { X 2 k }10
k 1 are mutually independent and have common standard deviation,  2 . It follows that





 2  Var( )  Var(1  2 )  Var(1 )  Var (2 )  ( 12 / 10)  ( 22 / 10)  ( 12   22 ) / 10 .
(A3): We will assume that all the data collection random variables are normally distributed. It then follows that the
standardized random variable

 
 Z ~ N (0,1) .
 
However, since we do not know the true value of   , we need to use the standardized statistic

 
  T ~ t[2(n  1)  18] ,
 
where




2  (12   22 ) / 10  (3.72  3.22 ) / 10  2.393 , or    1.547 .
10
Suppose that we specify a false alarm probability   0.05 . Then Pr[T18  tth ]  .05 gives tth  1.73 . Our estimate

  219  211  8 min . Hence, assuming H0 is true, our standardized test statistic has the value
t
80
 5.17 .
1.547
Hence, we will reject H0. In fact, the p-value for the test is 1  Pr[T18  5.17]  1  3(105 )

(b) Repeat (a) for oils Castrol SYNTEC (#1) and Valvoline (#2). Assume that  CS  3.3 .
In this case, we have





2
2  ( CS
  22 ) / 10  (3.32  3.72 ) / 10  2.458 , or    1.568 , and   221  219  2 min . Hence,
20
 1.276 . So, we will accept H0. The p-value of this test is 1  Pr[T18  1.276]  0.891 . And so, we would still
1.568
accept H0, even for   0.1. □
t
Example 2.
http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
A person (the subject) is tested for clairvoyance. He is shown the reverse of a randomly chosen play card 25 times and
asked which suit it belongs to. The number of hits, or correct answers, is called X. As we try to find evidence of his
clairvoyance, for the time being the null hypothesis is that the person is not clairvoyant. The alternative is, of course: the
person is (more or less) clairvoyant. If the null hypothesis is valid, the only thing the test person can do is guess. For every
card, the probability (relative frequency) of guessing correctly is 1/4. If the alternative is valid, the test subject will predict
the suit correctly with probability greater than 1/4. We will call the probability of guessing correctly p. The hypotheses,
then, are:

null hypothesis
(just guessing)

alternative hypothesis
(true clairvoyant).
and
When the test subject correctly predicts all 25 cards, we will consider him clairvoyant, and reject the null
hypothesis. Thus also with 24 or 23 hits. With only 5 or 6 hits, on the other hand, there is no cause to consider
him so. But what about 12 hits, or 17 hits? What is the critical number, c, of hits, at which point we consider the
subject to be clairvoyant? How do we determine the critical value c? It is obvious that with the choice c=25 (i.e.
we only accept clairvoyance when all cards are predicted correctly) we're more critical than with c=10. In the
11
first case almost no test subjects will be recognized to be clairvoyant, in the second case, some number more
will pass the test. In practice, one decides how critical one will be. That is, one decides how often one accepts
an error of the first kind - a false positive, or Type I error. With c = 25 the probability of such an error is:
and hence, very small. The probability of a false positive is the probability of randomly guessing correctly all 25
times.
Being less critical, with c=10, gives:
Thus, c=10 yields a much greater probability of false positive.
IN-CLASS QUESTION #1: How was this number arrived at?
ANSWER: ____________________________________________________________________________
Before the test is actually performed, the desired probability of a Type I error is determined. Typically, values
in the range of 1% to 5% are selected. Depending on this desired Type 1 error rate, the critical value c is
calculated. For example, if we select an error rate of 1%, c is calculated thus:
From all the numbers c, with this property, we choose the smallest, in order to minimize the probability of a
Type II error, a false negative. For the above example, we select: c = 12.
IN-CLASS QUESTION #2: How was this number arrived at?
ANSWER: ___________________________________________________________________________
A Little “Twist” in the Study- But what if the subject did not guess any cards at all? Having zero correct
answers is clearly an oddity, too. The probability of guessing incorrectly once is equal to p'=(1-p)=3/4. Using
the same approach we can calculate that probability of randomly calling all 25 cards wrong is:
This is highly unlikely (less than 1 in a 1000 chance). While the subject can't guess the cards correctly,
dismissing H0 in favour of H1 would be an error. In fact, the result would suggest a trait on the subject's part of
avoiding calling the correct card. A test of this could be formulated:
For a selected 1% error rate the subject would have to answer correctly at least twice, for us to believe that card
calling is based purely on guessing.
IN-CLASS QUESTION #3: How was this number arrived at?
12
 1 25
ANSWER: H 0 : p  3 / 4 versus H 0 : p  3 / 4 . Our test statistic is p 
 X k where
25 k 1

{ X k }25
k 1 ~ iid Ber ( p ) where p  Pr[ X  1]  Probability of a wrong answer. We will announce H1 if p  pth .
Our false alarm probability is

  Pr[ p  pth ]  Pr[ B  25  pth ]  Pr[ B  bth ]
where B ~ Binomial( n=25 , p=3/4). Since 1    .99  Pr[ B  bth ] , we obtain bth via the Matlab command
binoinv(0.99 , 25 , .75) = 23 ; that is, if the subject gets 24 or more incorrect answers, we will assume he/she IS
clairvoyant, and is messing with the tester’s brain. □
Example 3. It is desired to know whether pine boards from three different manufacturers have the same mean bending
strength, µ. Let { X ( m ) }3m1 denote the bending strengths (psi) of the three brands. The following sample means and
variances were obtained:
x1  1001 ; s1  4.0 , x2  1003 ; s2  4.1 , x3  1005 ; s1  3.9 .
Each group used n=20 samples.

In order to test whether or not 1   2 , one can define the parameter   1   2 . Then the hypothesis test is
H 0 :   0 versus


H1 :   0 .


The test statistic is   1   2  X (1)  X ( 2 ) . It has mean E ( )  1   2 , which, under H0, equals zero. It has

variance Var ( )  Var ( X (1)  X ( 2) )  Var ( X (1) )  Var ( X ( 2) )   12 / n
  12 / n. In view of the test results, we will
assume that  12   22   32   4  4 2  16 .
(a)(10pts) For false alarm probability   0.1, determine which hypothesis you will announce.
Solution: Since it is assumed that the variances are known, then the standardized test statistic is:

(   ) / 2 2 / n  Z ~ N (0,1) . Since norminv(.05,0,1) = z1 = -1.645, the acceptance region is [-1.645 , 1.645].
Assuming that H0 is true, the value of this test statistic is z  (2  0) / 2(16) / 20  1.58. Hence, we should announce
that H0 is true at false alarm level 0.1.



(b)(5pts) Repeat (a) but in relation to   2  3  X ( 2)  X (3) .
Solution: This statistic has exactly the same form as in (a), and even has the same z-value. Hence, announce H0.
13



(c)(5pts) Repeat (a), but in relation to   1  3  X (1)  X (3) .
Solution: Again, this has the same form. But now, z  (4  0) / 2(16) / 20  3.16. And so we reject H0.
(d)(5pts) You should have observed a contradiction in parts (a)-(c); that is: you think 1   2 and  2  3 , but 1  3 .
The solution to this is to use ANOVA to test the hypothesis H 0 : 1   2  3 against the alternative, which is simply
that the null is not true. Use ANOVA to decide whether you will announce this null hypothesis.
14
Analysis of Variance (ANOVA)
Consider the random vector X  [ X 1 ,, X m ]tr ~ N ( , ) . Express the mean as
        [1 ,,  m ]tr .
(1)
Definition 1. In (1), the constant μ is referred to as the grand mean, and the elements of  are referred to as the
treatment effects.
PROBLEM 1. Test H o :   0 versus
H1 :   0 .
GOAL: The goal of ANOVA is to decide whether or not all of the means in (1) are equal.
The decision rule for this hypothesis test is not easy to construct, unless we make a number of simplifying assumptions
(that may or may not, in fact, be true). The first is:
Assumption (A1):    2  .
It follows that we can express X as:
X   
where E( )  0 and Cov( )   2  .
(2).
The method known as ANOVA (Analysis Of VAriance) is a standard method for solving PROBLEM 1. The reason for
choosing this method is simple. Consider the following assumption:
Assumption (A2): {X k }nk 1 ~ iid f X ( x) .
Definition 2. For x   m ,
2
x
m
 x x   x k2 .
tr
k 1
THEOREM 1:
E X  1  E 
2
2
 
2
(3)
Proof:
E X  1  E X  ( 1   )  
2
 E  
E
2
 ▄
2
 2 E ( )  
tr
2
E
2
2
 E[(   ) tr (   )]
15
Definition 3. The term on the left side of (3) is called the Total Mean Squared Error (TMSE). The leftmost term on the
right side of (3) is called the Total Error Mean Squared Error (TEMSE). And the rightmost term on the right side of (3)
is called the Total Treatment Squared Error (TTSE).
It is now a simple task to obtain the form of Theorem 15.1 (p.489). Consider the collection of random variables
{X j }nj1 ~ iid f X ( x) . Use this collection, and replace expected values by their approximants; namely averages:
In the equation E X  1
2
E
2
  we obtain the following approximations:
2


  x ;  i  xi  x ; hence for fixed i :  ij  xij  xi . Thus,
1 n
1 n tr
 
tr
(
x

x
)
(
x

x
)

 j  j   tr  . or


j


n j 1
n j 1
THEOREM 2
m
n

i 1 j 1
m
n
m
( xij  x ) 2  ( xij  xi ) 2  n ( xi  x ) 2 .
i 1 j 1
(4)
i 1
Remark. Replacing quantities by their moment estimators does not guarantee the equality in (4). This equality holds in
this particular instance, as is proven in the book.
The reason for presenting THEOREM 1 was to offer twofold motivation. First, it offered clear insight into the variability
associated with the TMSE. Second, it provided the motivation to use moment estimators to arrive at THEOREM 2. In this
way, we are guided to use the test statistic on the left side of (4) in relation to PROBLEM 1. Before we address the
distribution of this statistic [of course, this requires x’s to be replaced by X’s in (4)], it is appropriate to take advantage of
the simplicity of (3) in order to gain some insight into the behavior of this hypothesis testing approach.
Properties of (3) in Relation to PROBLEM 1:
(P1) Assuming H o is true, then the TTSE is zero, and (3) is an identity.
(P2) Assuming H o is false, then the TMSE must be greater than if it were true. This observation leads immediately to the
form of the decision rule: If the test statistic exceeds a given threshold, we announce H 1 .
(P3) Since the TTSE is the squared norm of  , there is no way to incorporate any prior information about this parameter
into the test. For example, the following two extremely different  structures would yield similar test results:
  [1, 1, 1, 1]tr ;   [2, 0, 0, 0]tr .
The insight (P3) provided by (3) would suggest that, if one did in fact have some prior knowledge of how the components
of  were distributed, then this knowledge could be used to assign a prior pdf for it. This is, in a sense, the essence of
Bayesian Estimation Theory.
16
Development of the Most Appropriate Test Statistic for PROBLEM 1We alluded to a test statistic in (P2) above. Here, we develop the standard test statistic associated with PROBLEM 1. To
this end, we express (4) in its random variable form:
m
n
m

n
m
( X ij  X  ) 2  ( X ij  X i ) 2  n ( X i  X  ) 2
i 1 j 1
i 1 j 1
SST
=
i 1
SSE
+
SS(Tr)
(5)
where
SST = Sum of Squares- Total
SSE = Sum of Squares- Error
SS(Tr) = Sum of Squares- Treatment
Assumption: H o is true.
Note that for each i,j we have X i , j ~ N ( ,  2 ) , hence ( X i , j   ) 2 /  2 ~ 12 . Also, X i ~ N (  ,  2 / n) , hence
( X i   ) 2 /( 2 / n) ~ 12 . Thus, we obtain the following:
SSE:
m
1
SST:

 ( X
2
i 1 j 1
m
1

SS(Tr):
i 1 j 1
1
2
ij
n
 ( X
2

n
m
(X
i 1
i
ij
2
  ) 2 ~  mn

2
  i ) 2 ~  mn

  ) 2 ~  m2

1

2
1

1
2
2
2
SST ~  mn
1
(6a)
SSE ~  m2 ( n1)
(6b)
SS (Tr ) ~  m2 1
(6c)
Remark. Recall that, if random variables U ~  u2 and V ~  v2 are independent, then U  V ~  u2 v . Even though the
distribution (6a) happens to correspond to the distribution of (6b) + (6c), this may not imply that (6b) and (6c) are
independent, since the above relation is not if and only if.
There are a variety of decision rules that can be constructed using the statistics in (6). The most common derived test
statistic is based on the following result:
17
Result 1. If U ~  u2 and V ~  v2 are independent, then
U /u
~ F (u , v)
V /v
From this result, we obtain the most commonly used test statistic related to PROBLEM 1:
F
SS (Tr ) /( m  1)  MS (Tr )

~ F[m  1, m(n  1)] .
SSE /( m(n  1)
MSE
(7)
We now repeat PROBLEM 1, and provide the solution to it:
PROBLEM 1. Consider the random vector X  [ X 1 ,, X m ]tr ~ N ( , ) . Express the mean as
        [1 ,,  m ]tr . Then to conduct the test of
H o :   0 versus H1 :   0
we use (7) as our test statistic, along with the corresponding
Decision Rule: If F  f m1,m ( n1) (1   ) we announce H1 with false alarm probability α ▄
18
APPENDIX Application of the above CI results to (X,Y)For a 2-D random variable, ( X , Y ) , suppose that they are independent. Then we can define the random variable
W  X  Y . Regardless of the independence assumption, we have W   X  Y . In view of that assumption, we have
 W2   X2   Y2 . Hence, we are now in the setting of a 1-D random variable, W, and so many of the above results apply.
Even if we drop the independence assumption, we can still apply them. The only difference is that in this situation
 W2   X2   Y2  2 XY . Hence, we need to realize that a larger value of n may be needed to achieve an acceptable level

of uncertainty in relation to, for example. W  W . Specifically, if we assume the data collection variables {Wk }nk1 are
iid, then
 2  W2 / n  ( X2   Y2  2 XY ) / n .
W
The perhaps more interesting situation is where are concerned with either  XY or  XY   XY /( X  Y ) .
Of these two parameters,  XY is the more tractable one to address using the above results. To see this, consider the
estimator:

 XY 
1 n
 ( X k   X )(Yk  Y ) .
n k 1
Define the random variable W  ( X   X )(Y  Y ) . Then we have

 XY 
1 n
1 n

(
X


)(
Y


)

Wk  W  W .


k
X
k
Y
n k 1
n k 1


And so, we are now in the setting where we are concerned with W   XY . In relation to W   XY , we have:
1 n
 1 n
1 n
E ( W )  E ( XY )  E   ( X k   X )(Yk  Y )   E[( X k   X )(Yk  Y )]   XY   XY  W .
n k 1
 n k 1
 n k 1

We also have Var(W )  Var(W ) / n . The trick now is to obtain an expression for Var (W ) . To this end, we begin with:


2
Var (W )  E[(W  W ) 2 ]  E (W 2 )  W2  E[{( X   X )(Y  Y )}2 ]   XY
.
This can be written as:
 X  
X
Var (W )    E 


X
2
X
2
Y



2
 Y  Y

 Y



2

2
   XY
  X2  Y2{E[( Z1Z 2 ) 2 ]   Z21Z2 }   X2  Y2Var (Z1Z 2 ) .

And so, we are led to ask the
Question: Suppose that Z1 ~ N (0,1) and Z 2 ~ N (0,1) have E(Z1Z2 )   . Then what is the variance of W  Z1Z2 ?
19
Answer: As noted above, the mean of W is E(Z1Z2 )   . Before getting too involved in the mathematics, let’s run some
simulations : The following plot is for   0.5 :
Simulation-based pdf for W with r = 0.5
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-6
-4
-2
0
2
4
6
8
10
12
Figure 3.1 Plot of the simulation-based estimate of fW (w) for   0.5 . The simulations also resulted in W 0.4767 and
 W2  1.1716 .
The Matlab code is given below.
% PROGRAM name: z1z2.m
% This code uses simulations to investigate the pdf of W = Z1*Z2
% where Z1 & Z2 are unit normal r.v.s with cov= r;
nsim = 10000;
r=0.5;
mu = [0 0]; Sigma = [1 r; r 1];
Z = mvnrnd(mu, Sigma, nsim);
plot(Z(:,1),Z(:,2),'.');
pause
W = Z(:,1).*Z(:,2);
Wmax = max(W); Wmin=min(W);
db = (Wmax - Wmin)/50;
bctr = Wmin + db/2:db:Wmax-db/2;
fw = hist(W,bctr);
fw = (nsim*db)^-1 * fw;
bar(bctr,fw)
title('Simulation-based pdf for W with r = 0.5')
Conclusion: The simulations revealed a number of interesting things. For one, the mean was W  0.4767 . One would
have thought that for 10,000 simulations it would have been closer to the true mean 0.5. Also, and not unrelated to this, is
the fact that fW (w) has very long tails. Based on this simple simulation-based analysis, one might be thankful that the
mathematical pursuit was not readily undertaken, as it would appear that it will be a formidable undertaking!