Download N13-testing, part 3

BIOINF 2118 Statistical Testing Part 3: F distribution Page 1 of 2 Two sample test, unknown UNEQUAL variances. This is a difficult problem, called the Behrens-Fisher problem. There are lots of classical-statistics solutions, but none of them are simple or exact. That’s because, unlike when the variances are the same, there is no “pivotal quantity” that can eliminate both unknown variances at once. In the R function t.test( ), there is a parameter var.equal-- if set to FALSE, one of those solutions is computed instead of the regular t-test.) The F distribution How do you know whether the 2 variances are the same?? The F test!!! If . Then , We say “V follows the F distribution on degrees of freedom m and n.” The F distribution is used to test whether two population variances are equal, given two independent sample variances. Suppose independent normal samples of X’s and Y’s, X ~ N(m X ,s 2 ), Y ~ N(mY ,s 2 ) , with THE SAME variance. Let . Then . This is true, of course, only if the two variances are the same, So you can test H0: the two variances are the same by comparing V to the quantiles in the F tables. , so that they cancel. Example: Six observations (X), with sum of the squared deviations from the mean equalling 30. Another 21 observations (Y), with sum of squared deviations = 40. Could the 2 samples come from distributions with the same variance or not? 30 / 5 V= = 3, and in R, we get 40 / 20 P-value = 1 - pf(3,5,20) = 0.03520134, a fairly small tail probability. In general, , so V is a good estimate of s X2 / s Y2 . Note that, although E(Fd1,d 2 ) = d2 / (d2 - 2) is not equal to one, (because E(1/ cn2 ) = 1/ (n - 2) ), BIOINF 2118 Statistical Testing Part 3: F distribution Page 2 of 2 it is the ratio of two independent variables both with mean one. So “extreme” values are far from one. A real moment estimator, then, would be V(d2 - 2) / d2 = 3/20*18 = 2.7. Confidence intervals. It’s easy to generate a confidence interval for s X2 / s Y2 . See F_distribution.Rmd. to get the confidence interval w In the example, (1.106648, 13.674394). ======== dfF = c(20,5) curve(from=0, to=2, df(x, dfF[1], dfF[2]), type="l", main=paste("df = ", dfF[1], ",", dfF[2]), ylab="F density") abline(v=c(1,dfF[2]/(dfF[2]-2))) Trick question: In this code, what is “df()”? See F_distribution.Rmd for 2nd plot.  Future topic: The F distribution is very important in analysis of variance & regression.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download N13-testing, part 3