Download Lab: Testing - CMU Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
CSSS 508: Intro to R
3/08/06
Lab 9: Testing
This lab is just a summary of several common tests used in statistics.
Reading material can be found in Dalgaard: Ch. 4, 7, and parts of 6.
T-Test: (one sample, two sample, paired)
T-tests assume that the data come from a normal distribution. The distribution varies
depending on the sample size (the length of the vector(s) of data you’re analyzing).
A smaller sample size will be tested against a t-distribution with larger tails.
That is, when we have a small sample, we are more likely to see an average that is
extreme or not representative than when we have a larger sample.
One sample test:
We are testing whether the mean of our sample, mu, is equal to some null hypothesis
value. For example: Ho: mu = 0. The null hypothesis, or previously/currently held belief,
is that the mean of the population is zero. We have collected some data, hopefully a
representative sample of our population, and we’re going to test whether or not we have
evidence that zero is incorrect.
There are three possible alternative hypotheses:
Ha: mu < 0 ; Ha: mu > 0 ; Ha : mu != 0
The first two are one-sided alternatives; the second is two-sided.
Let’s test some small samples:
x<-rnorm(10,0,1)
t.test(x)
The default mu (population mean) is zero; the default alternative hypothesis is two-sided.
The default type 1 error (how often we’re allowing ourselves to make a mistake) is 0.05.
This error is set by conf.level = 1 - error (default = 0.95).
This t-test looked at the sample mean of x and found the probability, if the true mean is
zero, that we would see something as extreme as the sample mean. If this probability is
small, the sample mean is very extreme, and we should not have seen it. Since we did,
we conclude that the null hypothesis is wrong and that the true mean is not zero.
x2<-rnorm(10,1,1)
t.test(x2)
x3<-rnorm(10,2,1)
t.test(x3)
x4<-rnorm(10,3,1)
t.test(x4)
Rebecca Nugent, Department of Statistics, U. of Washington
-1-
The two-sided alternative allows for error on both sides; the error is split on both sides.
The one-sided tests put all the error on one side.
In the above data, we are gradually further above the population mean.
t.test(x,alternative=”greater”)
t.test(x2,alternative=”greater”)
t.test(x3,alternative=”greater”)
t.test(x4,alternative=”greater”)
What happens if I change the alternative to “less”?
t.test(x,alternative=”less”)
t.test(x2,alternative=”less”)
t.test(x3,alternative=”less”)
t.test(x4,alternative=”less”)
Many t-tests you have seen in other procedures have tested the null hypothesis that a
parameter equals zero (linear regression coefficients, etc).
Don’t forget to specify your mu if you would like to test another value.
Two sample t-tests:
A common hypothesis used to test for group differences is to test equality of their means.
You might have two samples from two different groups (ex: control, treatment).
The null hypothesis is then Ho: mu1 = mu2.
The alternatives are Ha: mu1 > mu2, Ha: mu1 < mu2, Ha: mu1 != mu2.
g1<-rnorm(10,0,1)
g2<-rnorm(10,0.1,1)
g3<-rnorm(10,0.4,1)
g4<-rnorm(10,-0.4,1)
t.test(g1,g2)
t.test(g1,g3)
t.test(g1,g4,alternative=”greater”)
t.test(g4,g1,alternative=”greater”)
Order matters. On the last two tests, the alternative changes from g1 > g4 to g4 > g1.
The t-test results include a confidence interval.
If this interval contains zero, the t-test is not significant.
One choice on the two-sample test is the option to treat the variances from the two
samples as equal. That is, are you assuming that the two groups have the same variance?
If so, var.equal = TRUE. If not, var.equal = FALSE (the default) and a weighted
variance is calculated and used in the test.
Rebecca Nugent, Department of Statistics, U. of Washington
-2-
A real-data example:
Energy expenditure data on 22 women characterized as either obese or lean.
We’ll download it from the class website: energy.dat
energy<read.table(“http://www.stat.washington.edu/rnugent/teaching/csss508/ene
rgy.dat”)
We can split our group into the two subgroups using the split command.
energy.sub<-split(energy$expend,energy$stature)
t.test(energy.sub$obese,energy.sub$lean)
Paired t-test:
Often our two samples are two sets of measurements on the same subjects
(ex: before/after). In this case, we don’t actually have two populations to compare.
We have two samples from the same population that we want to test for differences.
Have the measurements on the subjects changed? (ex: Did they lose weight with an
intervention program?). This question gets converted into a one sample t-test analysis
where we analyze the differences between the two sets of measurements. If there is no
change, we would expect the average difference to be zero.
We use the t-test with the option paired=TRUE.
x<-rnorm(20,0,1)
y<-rnorm(20,0,1)
z<-rnorm(20,3,1)
t.test(x,y,paired=TRUE)
t.test(x,z,paired=TRUE)
Note: the vectors must be of the same length (otherwise won’t have pairs).
Real-life data: pre- and post-menstrual energy intake in 11 women: intake.dat
intake<read.table(“http://www.stat.washington.edu/rnugent/teaching/csss508/int
ake.dat”)
attach(intake)
post-pre
t.test(pre,post,paired=TRUE)
Again, if we have hypotheses about if the differences are positive or negative, we can use
one of the one-sided alternative options (greater, less).
Rebecca Nugent, Department of Statistics, U. of Washington
-3-
Pairwise T-Tests for All Groups
Often we have more than two groups. We can test for differences between two groups at
a time, but we start to run into multiple comparison problems. Performing many tests
increases the probability that you will find a significant one. P-values tend to be more
significant than they should be.
pairwise.t.test computes tests for all possible two-group comparisons.
Recall the low birthwt data; we had three race factors (white, black, other).
Let’s test for differences in age and weight among the three groups.
library(MASS)
attach(birthwt)
race<-factor(race,labels=c(“white”,”black”,”other”))
pairwise.t.test(age,race)
pairwise.t.test(lwt,race)
There are two common adjustment methods. Bonferroni divides the significance level
(often 0.05) by the number of tests to determine test significance; so the p-values reported
have been multiplied by the number of tests – compare them to the original significance
level. Holm adjusts as it goes – the first test is adjusted for the n-1 remaining tests; the
second test is adjusted for the n-2 tests left and so on. Holm is the default.
Rank Test: (one sample, two sample)
The t-tests require the assumption that the data originally come from a normal
distribution. If you are not willing to make that assumption, you can use a (nonparametric) rank test. These tests usually replace the data with ordered ranks.
Wilcoxon signed rank test (one sample):
We still have a null hypothesis: Ho: mu = a. The mu from the null hypothesis is
subtracted from the data, giving us a vector of differences. These differences are ordered
1 through n. We then calculate the sum of the ranks for the positive differences or the
negative differences. If the hypothesized mean is correct, we would expect the
differences to be pretty evenly split as positive/negative. The sum of the positive ranks
should be close to the sum of the negative ranks. The distribution of this sum is known;
the Wilcoxon test finds how likely our sum would be if the null hypothesis is true.
mu<-0
x<-rnorm(10,0,1)
diff<-x-mu
rank<-order(diff)
sum(rank[diff<0])
sum(rank[diff>0])
wilcox.test(x)
Rebecca Nugent, Department of Statistics, U. of Washington
-4-
Note: the results do not include a confidence interval or parameter estimates. Recall, this
is a distribution-free test; no model assumed.
This test also has the option to set an alternative or a mu.
Wilcoxon two sample rank test:
There is also a rank test for comparing two groups. Here the combined data are
replaced with their ranks. The sum of the ranks for one group is calculated.
If the groups are similarly distributed, their rank sums should be close.
x<-rnorm(20,0,1)
y<-rnorm(20,0,1)
z<-rnorm(20,3,1)
data<-c(x,y)
rank<-order(data)
sum(rank[1:20])
sum(rank[21:40])
wilcox.test(x,y)
data<-c(x,z)
rank<-order(data)
sum(rank[1:20])
sum(rank[21:40])
wilcox.test(x,z)
back to our energy data...
data<-c(energy.sub$obese,energy.sub$lean)
rank<-order(data)
sum(rank[1:11])
sum(rank[12:22])
wilcox.test(energy.sub$obese,energy.sub$lean)
Rebecca Nugent, Department of Statistics, U. of Washington
-5-
Paired Wilcoxon Test:
The paired Wilcoxon test is the non-parametric analog of the paired t-test. It is just the
Wilcoxon signed rank test on the differences between the two samples. Note again that
the vectors must be of the same length.
x<-rnorm(20,0,1)
y<-rnorm(20,0,1)
z<-rnorm(20,3,1)
diff<-x-y
rank<-order(diff)
sum(rank[diff<0])
sum(rank[diff>0])
wilcox.test(x,y,paired=TRUE)
diff<-x-z
rank<-order(diff)
sum(rank[diff<0])
sum(rank[diff>0])
wilcox.test(x,z,paired=TRUE)
Looking again at our pre/post energy intake data:
wilcox.test(pre,post,paired=T)
Testing Equality of Variances:
In the two sample t-test, you had the option to assume that the variances of your two
groups were the same. You many want to test if this is true.
F test:
If two variances are equal, their ratio is one. This ratio of variances has an F distribution.
The F test assumes a null hypothesis of the ratio of variances = a number you set (default
= 1). It finds the probability of seeing a ratio as extreme as you did. If small (< 0.05??),
you reject the null hypothesis that the variances of the two groups are equal.
x<-rnorm(30,0,1)
y<-rnorm(30,0,1)
z<-rnorm(30,0,5)
var(x)
var(y)
var(z)
var.test(x,y)
var.test(x,z)
We again have the same alternative options.
Rebecca Nugent, Department of Statistics, U. of Washington
-6-
Let’s look at the two sample t-test on the energy expenditure data.
First, are the variances of the obese group and the lean group the same?
var.test(energy.sub$obese,energy.sub$lean)
We did not reject the null hypothesis that the variances are the same.
We use the t-test with var.equal = TRUE.
t.test(energy.sub$obese, energy.sub$lean, var.equal=TRUE)
Testing Tabular Data:
Single Proportions:
We have seen how to test a hypothesis about the population mean. Sometimes we want
to test a hypothesis about a proportion of successes. That is, we have n trials with x
successes and n-x failures. The proportion of successes is x/n. For example, we ask 200
people whether or not they will vote for an initiative. Then we can test if the proportion
of the population who would vote yes is high enough to enact the initiative.
Ho: p = po
Ha: p > po; Ha: p < po; Ha: p != po
n<-100
x<-23
prop.test(x,n,p=0.20)
prop.test(x,n,p=0.10)
prop.test(x,n,p=0.50)
The test uses a normal approximation; Default is two-sided alternative; p = 0.50.
Look at binom.test as well.
Two Proportions:
We can also test the equality of two proportions (the success probabilities of two groups).
For example, do two neighborhoods in Seattle vote similarly for an initiative?
Ho: p1 = p2
Ha: p1 > p2; Ha: p1 < p2; Ha: p1 != p2
We create a vector of numbers of successes and a vector of the numbers of trials.
We ask 135 people in Greenlake a question; 37 say Yes.
We ask 147 people in Ravenna the same question; 47 people say Yes.
suc.vec<-c(37,47)
n.vec<-c(135,147)
prop.test(suc.vec,n.vec)
Default is two-sided alternative.
Rebecca Nugent, Department of Statistics, U. of Washington
-7-
k proportions:
We can extend this to k proportions.
For example, asking the same question to different numbers of people
in 5 Seattle neighborhoods.
The null hypothesis is that all proportions are equal.
The alternative is that they are not equal (at least one is different).
suc.vec<-c(37,47,25,63,42)
n.vec<-c(135,147,120,200,180)
prop.test(suc.vec,n.vec)
We can also test for an increasing or decreasing trend in proportions.
suc.vec<-c(15,48,55,81)
n.vec<-c(130,210,150,180)
prop.trend.test(suc.vec,n.vec)
The third argument is the score argument (default: 1, 2, 3, …, k),
the score given to each of the groups.
r by c tables:
For categorical tables with two or more categories for both questions, we can use a chisquare test. We ask a group of people two categorical questions and are interested to see
if there is a relationship between the two questions.
The null hypothesis is that the two variables are independent.
The alternative hypothesis is that the variables are dependent on each other.
A real-life example: Caffeine Consumption and Marital Status: caffmarital.dat
caff.marital<read.table(“http://www.stat.washington.edu/rnugent/teaching/csss508/caf
fmarital.dat”)
res<-chisq.test(caff.marital)
res
names(res)
The chisq.test finds a p-value using the chi-square distribution. If you would like
an exact p-value, you might want to think about fisher.test, but it can be very
computationally intensive.
Please take a closer look at Chapter 6 for a more in-depth look at analysis of variance
Rebecca Nugent, Department of Statistics, U. of Washington
-8-