Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BIOINF 2118 Statistical Testing Part 2 Page 1 of 4 TOPICS Variance known variance unknown z- test t-test One-sample tests (or paired samples) two-sample tests z- test same variance t-test Testing H0:”variances are equal” different variances Welsh test F-test One-sample tests One-sample tests are tests of LOCATION. “Is this sample from a distribution that is “centered” at zero?” “Center” = mean? Or “center” = median? Here, it’s “mean”. A test can be parametric or nonparametric. Today, it’s parametric. Parametric tests are derived from some assumption, together with a criterion for the “best” test. (A) Example: the ONE-SAMPLE z-test. (A.1): PLANNING A STUDY Suppose with known, and H0: unknown. The “null hypothesis” is = 0. If the alternative hypothesis is HA: >0 then “best test” against this “one-sided hypothesis” is the upper tail: RR = “rejection region” = . This has a Type I error of . . Notice that X is sufficient. You don’t need the entire dataset. X is a “sufficient statistic”. BIOINF 2118 Statistical Testing Part 2 Page 2 of 4 For a specific alternative, HA: = > 0, The probability of rejecting is where is the Type II error. Determining the sample size needed Suppose you’ve picked the acceptable Type I and II errors, α, β and the alternative can get the sample size needed. From the end of the last equation, So n has to quadruple if you double σ, double , or halve Specific example of one-sample test: See R code handout, “normal and t test one-sample.Rmd”. . . Then you BIOINF 2118 Statistical Testing Part 2 Page 3 of 4 (A.2) Data analysis, z-test Suppose you observe X = 6.0. This is bigger than the critical value. So “Reject H0”. The P-value is the probability of observing a result as extreme or more extreme, given H0. P-value = See R code handout. (B) Example: the ONE-SAMPLE t-test. If the variance is NOT known, you can estimate it. But the uncertainly needs to be taken into consideration. Remember that a t variate is a ratio between a standard normal Z and the square-root of a chisquare divided by the degrees of freedom . ( If the Z and independent.) The “divided by ” ensures that the denominator converges to the true standard deviation and therefore the t converges to the normal. . In our case, Notice that the , and . cancels out: . This will be the test statistic. See R code handout for detail. (B.1): PLANNING A STUDY RR = “rejection region” = . This has a Type I error of ; see discussion of the normal test above. BIOINF 2118 Statistical Testing Part 2 Page 4 of 4 . The power is a little more complicated. It involves a “non-central t distribution”, which is not just a tdistribution shifted over. The “noncentrality parameter” is . So the power will depend on the alternative mean, but also what is now an UNknown variance. (B.2) Data analysis See handout. Note the difference in P-values if you can’t assume that you know the variance. Two-sample tests H0 is that the two samples have the same mean. When the samples are MATCHED, just subtract the matched pairs to get a single sample. Proceed as above. Remember, the variance of the difference is the SUM of the variances. So for the normal test, where you see sigma, replace it with sigma times the square root of 2. When UNmatched, the samples may have different sizes, say m and n. C. Example: the TWO-SAMPLE z-test. If the variance is known, then will be standard normal under the null hypothesis. D. Example: the TWO-SAMPLE t-test. If the variance is UNknown, but the SAME, then you POOL the estimates on the two samples where , . E. Example: the TWO-SAMPLE t-test, variances not known to be the same. We’ll visit this in “N13-testing, part 3 - F test.doc” and “F_distribution.R”.