Download April 21

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics
The difference of two means
April 21, 2008
Outline
Populations
⇑
Parameter
X, Y
=⇒
X1 , . . . , X m , Y1 , . . . , Y n
µX − µY
⇐=
X −Y
Samples
⇓
Statistic
1. Assumptions: X1 , . . . , Xm and Y1 , . . . , Yn are independent random samples from populations that have a normal
distribution with unknown means µX , µY and unknown variances.
(a) As in the last section, we use also consider the case that the X’s and Y ’s result from a randomized
comparative experiment with two treatments the same as sampling from two independent populations.
(b) Unlike the last section, in the case that we sample from one population and then use a categorical variable
to categorize observations as X’s or Y ’s, we will also analyze the data as sampling from two independent
populations.
2. The key fact is this: under these assumptions,
X − Y − (µX − µY )
q 2
∼ Norm(0, 1) .
2
σX
σY
+
m
n
This follows from the fact that variances and means add and that the sum of independent normal random
variables is normal.
3. We replace σX and σY by the corresponding sample standard deviations to get that this random variable
X − Y − (µX − µY )
q 2
2
SX
SY
m + n
has approximately a t-distribution with ν degrees of freedom where ν is
ν=
2
SX
m
2 /m)2
(SX
m−1
2
SY
n
+
+
2
2 /n)2
(SY
n−1
4. Insert long story here about “old-fashioned” practice and the Behrens-Fisher problem.
5. Confidence intervals for µ1 − µ2 :
r
∗
x−y±t
6. Robustness.
s2X
s2
+ Y
m
n
!
t∗ = tα/2,ν
Statistics
The difference of two means
Homework - due Thursday, April 24, 2008
1. Read Section 7.2.
2. Do problems 7.4,5,6.
Useful R
> iris2=subset(iris,Species!=’virginica’)
> t.test(Sepal.Length~Species,data=iris2)
Welch Two Sample t-test
data: Sepal.Length by Species
t = -10.521, df = 86.538, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.1057074 -0.7542926
sample estimates:
mean in group setosa mean in group versicolor
5.006
5.936
April 21, 2008