Download Worksheet3

Worksheet 3 One and two-sample Tests 3.1 One sample t test The t test is based on the assumption that the data comes from a Normal (aka Gaussian) distribution It is also motivated by the CENTRAL LIMIT THEOREM (see lecture 3) As in Worksheet 1 we generate some pretend data, relating to energy intake of mice To begin, load in the data from the Excel spreadsheet Intake. Assign the data to the data frame called daily.intake > > daily.intake <- read.table(“Intake.txt”, header=T) > We can look at some summary statistics > > mean(daily.intake) > > > sd(daily.intake) > > > summary(daily.intake) > Suppose we wish to test whether the mice's intake was significantly different to the value of 7725. We could use a t-test > > t.test(daily.intake, mu=7725) > You get the following output (highlighted in italics) One Sample t-test data: daily.intake t = -2.8208, df = 10, p-value = 0.01814 alternative hypothesis: true mean is not equal to 7725 95 percent confidence interval: 5986.348 7520.925 sample estimates: mean of x 6753.636 The interpretation of the above is as follows -------One Sample t-test data: daily.intake --------This tells us the test being performed and the data used -------------------t = -2.8208, df = 10, p-value = 0.01814 ---------------------This tells us the value of the t-statistic (-2.8208), which is the figure before it is converted into a probability; the degrees of freedom, df=10, as we have 11 data points; and finally a p-value stating how extreme the null hypothesis is The t-statistic for "t.test(daily.intake, mu=7725)" is calculated by the following > > (mean(daily.intake)-7725)/(sd(daily.intake)/sqrt(11)) See lecture 3 ……Now, back to the output……. ---------------------alternative hypothesis: true mean is not equal to 7725 ---------------------This states that we are performing a two-sided test. That is we are interested in testing for a difference in mean AT LEAST AS BIG AS (mean(daily.intake)-7725) -----------------------95 percent confidence interval: 5986.348 7520.925 sample estimates: mean of x 6753.636 -----------------------The 95 percent confidence interval gives a region for the population mean, mu, which has a p-value GREATER than 0.05. That is, the population mean is likely (probability greater that 95%) to lie in this region. Alternalty, outside of this region the null has a p-value less that 0.05. To see this, we can take the upper limit of the confidence interval, in our case (above) 7520.925, and test it > t.test(daily.intake, mu=7520.925) We see that, as we expect, the p-value is 0.05, and for just inside the confidence region > t.test(daily.intake, mu=7520) we see a p-value greater than 0.05 Now……Suppose we were interested in testing whether the actual unknown and never known population mean, mu, was less than 7725 This is an example of a one-sided test > t.test(daily.intake, mu=7725,alternative=c("less")) One Sample t-test data: daily.intake t = -2.8208, df = 10, p-value = 0.009069 alternative hypothesis: true mean is less than 7725 95 percent confidence interval: -Inf 7377.781 sample estimates: mean of x 6753.636 Note that the "alternative hypothesis" has changed, as has the p-value, which is now more extreme Also, the confidence interval now extends from –Infinity to the left to 7377.781 to the right What do you think happens when you perform a one-sided t-test with population mean, mu, set to the sample mean? THINK ABOUT WHAT ANSWER YOU EXPECT BEFORE YOU DO IT!! > > t.test(daily.intake, mu=mean(daily.intake)) > What is the p-value? Is this what you expected? Note: the confidence intervals are unaltered by the change in tested mu value Now do a two-sided test, say, > t.test(daily.intake, mu=mean(daily.intake),alternative=c("greater")) What is the p-value? Is this what you expected? As an alternative to t-test you can use a nonparametric Wilcoxon signed-rank test > > ? wilcox.test > This looks at the size of the positive and negative values of (x_i – mu) If mu was the sample mean then these would tend to cancel 3.2 Two sample t test The two-sample t-test is when we have collected samples from two populations (conditions) and we wish to test for a difference in means Let's load some data > > library(ISwR) > data(energy) > attach(energy) > energy > > t.test(expend~stature, var.equal=T) Two Sample t-test data: expend by stature t = -3.9456, df = 20, p-value = 0.000799 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.411451 -1.051796 sample estimates: mean in group lean mean in group obese 8.066154 10.297778 which has the same structure as the one-sample test Note that the statement "expend~stature" in the t-test() states that we wish to test for differences in the variable "expend" based on the value of variable "stature" (lean/obese) To allow for un-equal variance we use the Welsh approximation t.test(expend~stature, var.equal=F) Welch Two Sample t-test data: expend by stature t = -3.8555, df = 15.919, p-value = 0.001411 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.459167 -1.004081 sample estimates: mean in group lean mean in group obese 8.066154 10.297778 3.3 Two sample Wilcoxon test For a distribution free (nonparametric) test use the two-sample Wilcoxon based on comparing the ranks of the data This test make no assumption about the distribution of the underlying data. That is, unlike the t-test it does not assume that the underlying population variability is Normal > wilcox.test(expend~stature) Wilcoxon rank sum test with continuity correction data: expend by stature W = 12, p-value = 0.002122 alternative hypothesis: true mu is not equal to 0 Warning message: Cannot compute exact p-value with ties in: wilcox.test.default(x Note that when there are ties (same values) in the data set then the Wilcoxon will be an approximate test

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Worksheet3