Download SDC 3 Appendix: Background information and formulas for the p

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
SDC 3 Appendix: Background information and formulas for the p-values and
confidence interval for the problem of comparing two means.
Brief summary: The purpose of this technical appendix is to present the detailed formulas
used to compute the p-values and the confidence interval for the problem of comparing two
means. The derivation of the formulas relating pb and ph to p-values and the derivation of
the confidence interval are also presented..
The model we assume is that Y11 …Y1n1 are independent and normally distributed with mean
1 and variance  12 and, independently, Y21 , Y2n are independent and normally distributed
2
2
with mean  2 and variance  2 . The problem is to make inferences about the difference in
means 2  1 .
The sample mean of sample i  1 2 is Y i  ni1  ji1 Yij (presented in L62 and L93), and the
n
sample variance of sample i  1 2 is si2  (ni  1) 1  ji1 (Yij  Y i )2 , (the sample standard
n
deviation si is presented in L63 and L94). The standard error is
SE Y 2  Y 1  s12  n1  s22  n2 (presented in L64 and L95) and the approximate degrees of


freedom is   SE Y 2  Y 1   s12  n1    n1  1   s22  n2    n2  1 (presented in L65 and
4
2
2
L96). Under the assumed model, the sampling distribution of
Y
2
 Y 1   2  1   SE Y 2  Y 1 is approximated by the Student t distribution with 
degrees of freedom. In the equal variance case (  12   22 ), the sampling distribution of
Y
2
 Y 1   2  1   SE Y 2  Y 1 where
SE Y 2  Y 1 
 n 1 s   n 1 s    n  n
2
1
1
2
2
2
1
2
 2 1  n1  1  n2 , is exactly the Student t
distribution with   n1  n2  2 degrees of freedom.
Let G be the distribution function of the Student t distribution with  degrees of freedom
and let T denote a random variable with this distribution. The Student t distribution is
symmetric so T and T have the same distribution. The one-sided p-value for testing the
null hypothesis that 2  1   against the alternative that 2  1   is

Pr T   Y2  Y1    / SE  Y2  Y1  2  1  


 Pr   T     Y  Y  / SE  Y  Y  

 G    Y  Y  / SE  Y  Y  



 Pr T      Y2  Y1  / SE  Y2  Y1  2  1   



2

2
1
1
2
2
1
2
 1   

1
 ph
given by [2]. Thus ph can be interpreted as the one-sided p-value for testing the null
hypothesis that 2  1   against the alternative that 2  1   (i.e. the probability
under the null hypothesis 2  1   that we observe a t-ratio greater than or equal to the
observed value of the t-ratio Y 2  Y 1    / SE Y 2  Y 1 ). Similarly, pb can be interpreted as
the one-sided p-value for testing the null hypothesis that 2  1   against the alternative
that 2  1   . The derivation is

Pr Tv   Y2  Y1    / SE  Y2  Y1  2  1  


 Pr   T     Y  Y  / SE  Y  Y  

 1  G    Y  Y  / SE  Y  Y  



 Pr Tv      Y2  Y1  / SE  Y2  Y1  2  1   


v
v
 pb .
2
2
1
1
2
2
1
1
2
 1   

The confidence interval [1] for 2  1 is obtained as
1    Pr  t  Y 2  Y 1   2  1   SE Y 2  Y 1  t 
 Pr Y 2  Y 1  t SE Y 2  Y 1  2  1  Y 2  Y 1  t SE Y 2  Y 1  .
The critical value t  G 1 1   / 2 , where G 1 is the inverse of the distribution function
of the Student t distribution.