* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download April 21
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Categorical variable wikipedia , lookup
Statistical inference wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Statistics The difference of two means April 21, 2008 Outline Populations ⇑ Parameter X, Y =⇒ X1 , . . . , X m , Y1 , . . . , Y n µX − µY ⇐= X −Y Samples ⇓ Statistic 1. Assumptions: X1 , . . . , Xm and Y1 , . . . , Yn are independent random samples from populations that have a normal distribution with unknown means µX , µY and unknown variances. (a) As in the last section, we use also consider the case that the X’s and Y ’s result from a randomized comparative experiment with two treatments the same as sampling from two independent populations. (b) Unlike the last section, in the case that we sample from one population and then use a categorical variable to categorize observations as X’s or Y ’s, we will also analyze the data as sampling from two independent populations. 2. The key fact is this: under these assumptions, X − Y − (µX − µY ) q 2 ∼ Norm(0, 1) . 2 σX σY + m n This follows from the fact that variances and means add and that the sum of independent normal random variables is normal. 3. We replace σX and σY by the corresponding sample standard deviations to get that this random variable X − Y − (µX − µY ) q 2 2 SX SY m + n has approximately a t-distribution with ν degrees of freedom where ν is ν= 2 SX m 2 /m)2 (SX m−1 2 SY n + + 2 2 /n)2 (SY n−1 4. Insert long story here about “old-fashioned” practice and the Behrens-Fisher problem. 5. Confidence intervals for µ1 − µ2 : r ∗ x−y±t 6. Robustness. s2X s2 + Y m n ! t∗ = tα/2,ν Statistics The difference of two means Homework - due Thursday, April 24, 2008 1. Read Section 7.2. 2. Do problems 7.4,5,6. Useful R > iris2=subset(iris,Species!=’virginica’) > t.test(Sepal.Length~Species,data=iris2) Welch Two Sample t-test data: Sepal.Length by Species t = -10.521, df = 86.538, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -1.1057074 -0.7542926 sample estimates: mean in group setosa mean in group versicolor 5.006 5.936 April 21, 2008