Download N13-testing, part 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Psychometrics wikipedia , lookup

Confidence interval wikipedia , lookup

Generalized linear model wikipedia , lookup

Least squares wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
BIOINF 2118
Statistical Testing Part 3: F distribution
Page 1 of 2
Two sample test, unknown UNEQUAL variances.
This is a difficult problem, called the Behrens-Fisher problem. There are lots of classical-statistics
solutions, but none of them are simple or exact. That’s because, unlike when the variances are the
same, there is no “pivotal quantity” that can eliminate both unknown variances at once.
In the R function t.test( ), there is a parameter var.equal-- if set to FALSE, one of those solutions is
computed instead of the regular t-test.)
The F distribution
How do you know whether the 2 variances are the same?? The F test!!!
If
. Then
,
We say “V follows the F distribution on degrees of freedom m and n.”
The F distribution is used to test whether two population variances are equal, given two independent
sample variances. Suppose independent normal samples of X’s and Y’s,
X ~ N(m X ,s 2 ), Y ~ N(mY ,s 2 ) , with THE SAME variance. Let
.
Then
.
This is true, of course, only if the two variances are the same,
So you can test
H0: the two variances are the same
by comparing V to the quantiles in the F tables.
, so that they cancel.
Example: Six observations (X), with sum of the squared deviations from the mean equalling 30.
Another 21 observations (Y), with sum of squared deviations = 40. Could the 2 samples come from
distributions with the same variance or not?
30 / 5
V=
= 3, and in R, we get
40 / 20
P-value = 1 - pf(3,5,20) = 0.03520134, a fairly small tail probability.
In general,
, so V is a good estimate of
s X2 / s Y2 . Note that, although
E(Fd1,d 2 ) = d2 / (d2 - 2) is not equal to one, (because E(1/ cn2 ) = 1/ (n - 2) ),
BIOINF 2118
Statistical Testing Part 3: F distribution
Page 2 of 2
it is the ratio of two independent variables both with mean one. So “extreme” values are far from one.
A real moment estimator, then, would be
V(d2 - 2) / d2 = 3/20*18 = 2.7.
Confidence intervals. It’s easy to generate a confidence interval for
s X2 / s Y2 .
See F_distribution.Rmd.
to get the confidence interval
w
In the example,
(1.106648, 13.674394).
========
dfF = c(20,5)
curve(from=0, to=2, df(x, dfF[1], dfF[2]), type="l",
main=paste("df = ", dfF[1], ",", dfF[2]), ylab="F density")
abline(v=c(1,dfF[2]/(dfF[2]-2)))
Trick question: In this code, what is “df()”?
See F_distribution.Rmd for 2nd plot.

Future topic: The F distribution is very important in analysis of variance & regression.