Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Chapter 7 L7_S1 Introduction We have already examined the use of the z distribution and Z scores in cases where you have a sample, and you wish to determine if this sample could have come from a population with a known mean. You will recall that it allows us to convert a sample mean to a value from the standard distribution, which, together with a selected alpha value, allows us to determine what the probability is that the sample mean we observed could have been obtained by chance only. L7_S2 T-test However, we will not always know what the population parameter, or mean, is. In those cases, we use a similar test, a T-test, which allows us to evaluate our data against a hypothesis even when we don’t know the population mean. In this chapter, we will discuss how to use the t-test, and the t distribution. We use the sample standard deviation in order to calculate an estimated standard error, which we enter into the equation here. Our sample mean is converted into a t-score, which we can use in order to go about hypothesis testing. Our p-value will be the probability of getting a t-test score as extreme as the one we observed, under the null distribution. When using the t-test, we make the assumption that the data sample is a random sample from a normally distributed population. The t test is a robust test – in other words, it can withstand moderate deviations from the assumption without any important adverse effect, particularly with large sample sizes (n > 25). L7_S3 T-distribution When applying the t-test, we are using s in order to approximate the population mean, but the sampling distribution will be a t-distribution, and not a normal distribution. L7_S4 T-distribution cont. With a very large sample, the estimated standard error will be very near the true standard error, and thus t will be almost exactly the same as Z. Unlike the standard normal (z) distribution, t is a family of curves. As n gets bigger, t becomes more normal. For smaller n, the t distribution is platykurtic – with narrower peak, fatter tails We use “degrees of freedom” to identify which t curve to use. For a basic t-test, df = n -1 2 There are too many different curves to put each one in a table, and so we use Table B.3 which shows just the critical values for one tail at various degrees of freedom and various levels of alpha. L7_S5 Critical t-value You can use table B.3 in a similar way to which you used table B.2. Use the degrees of freedom relevant to your data in the left column to identify the correct row to use, and use the top row to select your alpha level. You can now compare your calculated t-value to a critical t-value found in the table. You will then decide whether to reject the null hypothesis in favour of your alternate hypothesis, or whether you will fail to reject the null hypothesis. If the absolute value of our calculated t –value is less than the critical tvalue, then we fail to reject the null hypothesis that the means of the populations are equal. L7_S6 Practice 1 In this and the next 2 slides, you can test yourself quickly be seeing if you can find the same values as those given here, using Table B.3 in your text book by Zar. Work through this example. Do you get the same values as those shown in green? L7_S7 Practice 2 What about here? Do you get the same values as those shown in green? L7_S8 Practice 3 And here? Do you get the same values? L7_S9 Two-sample testing Up till now in this chapter, we have been discussing a one-sample type of test. In a one-sample test we determine the possibility that the sample mean is a random sample mean from a population with a mean equal to a constant value. However, we often collect two samples of observations, and want to use the data to infer if the two populations from which the samples came from, are the same. We refer to this type of test as a two-sample test, and we will discuss how to use the t-distribution in particular, in order to do a two-sample test. We will look at comparing two means to see if there is a difference. 3 L7_S10 Calculating a test statistic If there is no difference between our two sample means, then the mean of one sample minus the mean of the other sample should be equal to zero, and this is our null hypothesis. The alternate hypothesis is that the means are indeed different. We will therefore investigate the alpha possibility that the two means could have come from the same population and are therefore not different. L7_S11 T-statistic: general formulae As with a one-sample t-test, we begin by calculating a t-statistic. Remember that the general formula was the difference between the two statistics, divided by the pooled amount of variance. We will assume that both samples come from populations whose variances are equal. We can test this assumption using the variance ratio test which we will cover at a later stage. L7_S12 Two-sample t-statistic To calculate the t-statistic (or t-test value) of a two-sample test, apply the formula given here. L7_S13 Two-sample t-statistic Note that here you are using a standard error term in your formula that is different to that of the onesample formula for calculating the t-statistic. L7_S14 T-statistic: std error In order to calculate the standard error term for a two-sample t-statistic calculation, use the formula given here. L7_S15 T-statistic: std error Each of the two sample means represents its own population mean, but in each case there is sample error. The amount of error associated with each sample mean can be measured by computing the standard errors. To calculate the total amount of error involved in using two sample means to approximate two population means, we will find the error from each sample separately , and then add the two errors together. 4 L7_S16 T-statistic: std error But…This formula only works when n1 = n2. When the two samples are different sizes, this formula is biased. This comes from the fact that the formula treats the two sample variances equally. But we know that the statistics obtained from large samples are better estimates, so we need to give larger samples more weight in our estimated standard error. L7_S17 Adjusted formulae – std error We are going to change the formula slightly so that we use the pooled sample variance instead of the individual sample variances. This pooled variance is going to be a weighted estimate of the variance derived from the two samples. L7_S18 Review - one-sample t-test To review: when we are applying a one-sample t-test, we will go through the steps listed here. We will use the sample mean we calculated in order to calculate the standard error, and the t-statistic. Together with the relevant degrees of freedom, and table B.3, we will see whether the t-statistic is more than, or less than, the t-critical value, and we will decide whether to reject, or fail to reject, the null hypothesis that the sample mean is different to a population constant. L7_S19 Review - two-sample t-test When using the two-sample t-test, we are comparing the sample means from two different samples. We use the difference between the two means, and the pooled variance in order to calculate a standard error. Once we have calculated these values, and the relevant degrees of freedom, we follow the same procedure as with a one-sample t-test. L7_S20 Example Let’s look at an example where the verbal skills of 8-yr old boys are compared to those of 8-yr old girls. A sample of ten children from each population is randomly selected and assessed, and the statistics listed here are observed. L7_S21 Example – mean difference First of all, we obtain the difference between the sample means. 5 L7_S22 Example – pooled variance Next we compute the pooled variance L7_S23 Example – std error We use that value in order to calculate the standard error L7_S24 Example – t-statistic, df Once we have those values, we are able to calculate a t-statistic, which we use, together with our calculated degrees of freedom and table B.3, in order to see whether our t-statistic is larger than, or less than, a selected t-critical. L7_S25 Example – conclusion We have chosen an alpha level of 0.01 in this example, and using table B.3 we see that our calculated t value is more extreme (is greater than the t-critical in the table), and so we reject the null hypothesis that there is no difference between girls and boys, in favour of the alternative hypothesis that there is indeed a difference. L7_S26 Summary of equations We’ll end off this chapter with a summary of all the equations you might need when applying the t-test.