Download Worksheet3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Worksheet 3
One and two-sample Tests
3.1 One sample t test
The t test is based on the assumption that the data comes from a Normal (aka Gaussian)
distribution
It is also motivated by the CENTRAL LIMIT THEOREM (see lecture 3)
As in Worksheet 1 we generate some pretend data, relating to energy intake of mice
To begin, load in the data from the Excel spreadsheet Intake.
Assign the data to the data frame called daily.intake
>
> daily.intake <- read.table(“Intake.txt”, header=T)
>
We can look at some summary statistics
>
> mean(daily.intake)
>
>
> sd(daily.intake)
>
>
> summary(daily.intake)
>
Suppose we wish to test whether the mice's intake was significantly different to the value
of 7725. We could use a t-test
>
> t.test(daily.intake, mu=7725)
>
You get the following output (highlighted in italics)
One Sample t-test
data: daily.intake
t = -2.8208, df = 10, p-value = 0.01814
alternative hypothesis: true mean is not equal to 7725
95 percent confidence interval:
5986.348 7520.925
sample estimates:
mean of x
6753.636
The interpretation of the above is as follows
-------One Sample t-test
data: daily.intake
--------This tells us the test being performed and the data used
-------------------t = -2.8208, df = 10, p-value = 0.01814
---------------------This tells us the value of the t-statistic (-2.8208), which is the figure before it is converted
into a probability; the degrees of freedom, df=10, as we have 11 data points; and finally a
p-value stating how extreme the null hypothesis is
The t-statistic for "t.test(daily.intake, mu=7725)" is calculated by the following
>
> (mean(daily.intake)-7725)/(sd(daily.intake)/sqrt(11))
See lecture 3
……Now, back to the output…….
---------------------alternative hypothesis: true mean is not equal to 7725
---------------------This states that we are performing a two-sided test. That is we are interested in testing
for a difference in mean AT LEAST AS BIG AS (mean(daily.intake)-7725)
-----------------------95 percent confidence interval:
5986.348 7520.925
sample estimates:
mean of x
6753.636
-----------------------The 95 percent confidence interval gives a region for the population mean, mu, which has
a p-value GREATER than 0.05. That is, the population mean is likely (probability greater
that 95%) to lie in this region. Alternalty, outside of this region the null has a p-value less
that 0.05. To see this, we can take the upper limit of the confidence interval, in our case
(above) 7520.925, and test it
> t.test(daily.intake, mu=7520.925)
We see that, as we expect, the p-value is 0.05, and for just inside the confidence region
> t.test(daily.intake, mu=7520)
we see a p-value greater than 0.05
Now……Suppose we were interested in testing whether the actual unknown and never
known population mean, mu, was less than 7725
This is an example of a one-sided test
> t.test(daily.intake, mu=7725,alternative=c("less"))
One Sample t-test
data: daily.intake
t = -2.8208, df = 10, p-value = 0.009069
alternative hypothesis: true mean is less than 7725
95 percent confidence interval:
-Inf 7377.781
sample estimates:
mean of x
6753.636
Note that the "alternative hypothesis" has changed, as has the p-value, which is now more
extreme
Also, the confidence interval now extends from –Infinity to the left to 7377.781 to the
right
What do you think happens when you perform a one-sided t-test with population mean,
mu, set to the sample mean?
THINK ABOUT WHAT ANSWER YOU EXPECT BEFORE YOU DO IT!!
>
> t.test(daily.intake, mu=mean(daily.intake))
>
What is the p-value? Is this what you expected?
Note: the confidence intervals are unaltered by the change in tested mu value
Now do a two-sided test, say,
> t.test(daily.intake, mu=mean(daily.intake),alternative=c("greater"))
What is the p-value? Is this what you expected?
As an alternative to t-test you can use a nonparametric Wilcoxon signed-rank test
>
> ? wilcox.test
>
This looks at the size of the positive and negative values of (x_i – mu)
If mu was the sample mean then these would tend to cancel
3.2 Two sample t test
The two-sample t-test is when we have collected samples from two populations
(conditions) and we wish to test for a difference in means
Let's load some data
>
> library(ISwR)
> data(energy)
> attach(energy)
> energy
>
> t.test(expend~stature, var.equal=T)
Two Sample t-test
data: expend by stature
t = -3.9456, df = 20, p-value = 0.000799
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.411451 -1.051796
sample estimates:
mean in group lean mean in group obese
8.066154
10.297778
which has the same structure as the one-sample test
Note that the statement "expend~stature" in the t-test() states that we wish to test for
differences in the variable "expend" based on the value of variable "stature" (lean/obese)
To allow for un-equal variance we use the Welsh approximation
t.test(expend~stature, var.equal=F)
Welch Two Sample t-test
data: expend by stature
t = -3.8555, df = 15.919, p-value = 0.001411
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.459167 -1.004081
sample estimates:
mean in group lean mean in group obese
8.066154
10.297778
3.3 Two sample Wilcoxon test
For a distribution free (nonparametric) test use the two-sample Wilcoxon based on
comparing the ranks of the data
This test make no assumption about the distribution of the underlying data. That is, unlike
the t-test it does not assume that the underlying population variability is Normal
> wilcox.test(expend~stature)
Wilcoxon rank sum test with continuity correction
data: expend by stature
W = 12, p-value = 0.002122
alternative hypothesis: true mu is not equal to 0
Warning message:
Cannot compute exact p-value with ties in: wilcox.test.default(x
Note that when there are ties (same values) in the data set then the Wilcoxon will be an
approximate test