Download Chapter 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
1
Chapter 7
L7_S1 Introduction
We have already examined the use of the z distribution and Z scores in cases where you have a
sample, and you wish to determine if this sample could have come from a population with a known
mean.
You will recall that it allows us to convert a sample mean to a value from the standard distribution, which,
together with a selected alpha value, allows us to determine what the probability is that the sample mean
we observed could have been obtained by chance only.
L7_S2 T-test
However, we will not always know what the population parameter, or mean, is. In those cases, we use a
similar test, a T-test, which allows us to evaluate our data against a hypothesis even when we don’t
know the population mean. In this chapter, we will discuss how to use the t-test, and the t distribution.
We use the sample standard deviation in order to calculate an estimated standard error, which we enter
into the equation here. Our sample mean is converted into a t-score, which we can use in order to go
about hypothesis testing. Our p-value will be the probability of getting a t-test score as extreme as the
one we observed, under the null distribution.
When using the t-test, we make the assumption that the data sample is a random sample from a
normally distributed population.
The t test is a robust test – in other words, it can withstand moderate deviations from the assumption
without any important adverse effect, particularly with large sample sizes (n > 25).
L7_S3 T-distribution
When applying the t-test, we are using s in order to approximate the population mean, but the sampling
distribution will be a t-distribution, and not a normal distribution.
L7_S4 T-distribution cont.
With a very large sample, the estimated standard error will be very near the true standard error, and thus
t will be almost exactly the same as Z.
Unlike the standard normal (z) distribution, t is a family of curves. As n gets bigger, t becomes more
normal.
For smaller n, the t distribution is platykurtic – with narrower peak, fatter tails
We use “degrees of freedom” to identify which t curve to use. For a basic t-test, df = n -1
2
There are too many different curves to put each one in a table, and so we use Table B.3 which shows
just the critical values for one tail at various degrees of freedom and various levels of alpha.
L7_S5 Critical t-value
You can use table B.3 in a similar way to which you used table B.2. Use the degrees of freedom relevant
to your data in the left column to identify the correct row to use, and use the top row to select your alpha
level. You can now compare your calculated t-value to a critical t-value found in the table. You will then
decide whether to reject the null hypothesis in favour of your alternate hypothesis, or whether you will fail
to reject the null hypothesis. If the absolute value of our calculated t –value is less than the critical tvalue, then we fail to reject the null hypothesis that the means of the populations are equal.
L7_S6 Practice 1
In this and the next 2 slides, you can test yourself quickly be seeing if you can find the same values as
those given here, using Table B.3 in your text book by Zar.
Work through this example. Do you get the same values as those shown in green?
L7_S7 Practice 2
What about here? Do you get the same values as those shown in green?
L7_S8 Practice 3
And here? Do you get the same values?
L7_S9 Two-sample testing
Up till now in this chapter, we have been discussing a one-sample type of test. In a one-sample test we
determine the possibility that the sample mean is a random sample mean from a population with a mean
equal to a constant value. However, we often collect two samples of observations, and want to use the
data to infer if the two populations from which the samples came from, are the same. We refer to this
type of test as a two-sample test, and we will discuss how to use the t-distribution in particular, in order
to do a two-sample test. We will look at comparing two means to see if there is a difference.
3
L7_S10 Calculating a test statistic
If there is no difference between our two sample means, then the mean of one sample minus the mean
of the other sample should be equal to zero, and this is our null hypothesis. The alternate hypothesis is
that the means are indeed different. We will therefore investigate the alpha possibility that the two
means could have come from the same population and are therefore not different.
L7_S11 T-statistic: general formulae
As with a one-sample t-test, we begin by calculating a t-statistic. Remember that the general formula
was the difference between the two statistics, divided by the pooled amount of variance. We will assume
that both samples come from populations whose variances are equal. We can test this assumption using
the variance ratio test which we will cover at a later stage.
L7_S12 Two-sample t-statistic
To calculate the t-statistic (or t-test value) of a two-sample test, apply the formula given here.
L7_S13 Two-sample t-statistic
Note that here you are using a standard error term in your formula that is different to that of the onesample formula for calculating the t-statistic.
L7_S14 T-statistic: std error
In order to calculate the standard error term for a two-sample t-statistic calculation, use the formula
given here.
L7_S15 T-statistic: std error
Each of the two sample means represents its own population mean, but in each case there is sample
error. The amount of error associated with each sample mean can be measured by computing the
standard errors.
To calculate the total amount of error involved in using two sample means to approximate two population
means, we will find the error from each sample separately , and then add the two errors together.
4
L7_S16 T-statistic: std error
But…This formula only works when n1 = n2. When the two samples are different sizes, this formula is
biased.
This comes from the fact that the formula treats the two sample variances equally. But we know that the
statistics obtained from large samples are better estimates, so we need to give larger samples more
weight in our estimated standard error.
L7_S17 Adjusted formulae – std error
We are going to change the formula slightly so that we use the pooled sample variance instead of the
individual sample variances.
This pooled variance is going to be a weighted estimate of the variance derived from the two samples.
L7_S18 Review - one-sample t-test
To review: when we are applying a one-sample t-test, we will go through the steps listed here. We will
use the sample mean we calculated in order to calculate the standard error, and the t-statistic. Together
with the relevant degrees of freedom, and table B.3, we will see whether the t-statistic is more than, or
less than, the t-critical value, and we will decide whether to reject, or fail to reject, the null hypothesis
that the sample mean is different to a population constant.
L7_S19 Review - two-sample t-test
When using the two-sample t-test, we are comparing the sample means from two different samples. We
use the difference between the two means, and the pooled variance in order to calculate a standard
error. Once we have calculated these values, and the relevant degrees of freedom, we follow the same
procedure as with a one-sample t-test.
L7_S20 Example
Let’s look at an example where the verbal skills of 8-yr old boys are compared to those of 8-yr old girls.
A sample of ten children from each population is randomly selected and assessed, and the statistics
listed here are observed.
L7_S21 Example – mean difference
First of all, we obtain the difference between the sample means.
5
L7_S22 Example – pooled variance
Next we compute the pooled variance
L7_S23 Example – std error
We use that value in order to calculate the standard error
L7_S24 Example – t-statistic, df
Once we have those values, we are able to calculate a t-statistic, which we use, together with our
calculated degrees of freedom and table B.3, in order to see whether our t-statistic is larger than, or less
than, a selected t-critical.
L7_S25 Example – conclusion
We have chosen an alpha level of 0.01 in this example, and using table B.3 we see that our calculated t
value is more extreme (is greater than the t-critical in the table), and so we reject the null hypothesis that
there is no difference between girls and boys, in favour of the alternative hypothesis that there is indeed
a difference.
L7_S26 Summary of equations
We’ll end off this chapter with a summary of all the equations you might need when applying the t-test.