* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download here - Mathematical and Computer Sciences - Heriot
History of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
1 Topic 5 Hypothesis Tests Contents 5.1 Introduction to Tests of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 2 5.1.1 5.1.2 5.1.3 5.2 Single . . . . 3 4 5 6 5.3 Single proportion - large samples . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Difference of two means - large samples . . . . . . . . . . . . . . . . . . . . . . 5.5 Difference of two proportions - large samples . . . . . . . . . . . . . . . . . . . 10 12 14 5.6 Small Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Single mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Confidence Intervals with Small Samples . . . . . . . . . . . . . . . . . 17 19 21 5.6.3 Difference of 2 Means from Small Samples . . 5.6.4 Paired t test . . . . . . . . . . . . . . . . . . . . 5.7 The Chi-Squared Distribution . . . . . . . . . . . . . . 5.7.1 Checking for Association - Hair and Eye Colour . . . . 21 24 26 27 5.7.2 Limitations of Chi-squared test . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Goodness of Fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Coursework 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 32 36 5.9 Summary and assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Type 1 and 2 Errors . . . . . . One-tailed and two-tailed tests Different Significance Levels . mean - large samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Learning Objectives identify situations in experimentation where a hypothesis test will produce a useful result appreciate the ideas of null and alternative hypotheses use the standardised Normal distribution in hypothesis tests involving large samples use the student’s t distribution in hypothesis tests involving small samples explain Type 1 and Type 2 Errors use the formulae for standard error and test statistic in the cases of 2 a) single mean - large samples b) single proportion - large samples c) difference between two means - large samples d) difference between proportions - large samples e) single mean - small samples f) difference between two means - large samples decide when to use a One or Two Tailed Test appreciate the concept of degrees of freedom c calculate confidence interval for population mean based on a sample mean from a small sample use a paired t test H ERIOT-WATT U NIVERSITY 2003 5.1. INTRODUCTION TO TESTS OF HYPOTHESIS 5.1 3 Introduction to Tests of Hypothesis In the last Topic it was seen that a sample could be used to infer a confidence interval for the mean of the population that it was taken from. A very useful fact is that this method can be turned on its head and instead of being used to estimate a property of the population, a sample can help prove whether it is likely that a population has a particular mean value (or proportion). In this chapter most of the worked examples will start off by suggesting a hypothesis (assumption) and effectively either proving it to be true or deciding that it is false. Imagine that a company states in its sales pitch that a particular model of its mobile phones lasts for 150 hours before it is required to be next charged up. If you were thinking of buying one, you may like to obtain some proof that this assertion is true. One way of doing this is to take a sample of phones, make a number of measurements and then calculate the mean number of hours between charging. It would be impossible to do this for every phone produced since the population is so large, so the best that can be done is to calculate a sample mean. Suppose that a sample of 40 was taken and this produced a mean value of 147.4 hours. Does this mean that the manufacturer’s claim has been disproved? Clearly 147.4 is less than 150 so it looks as if the manufacturer is over-estimating the time between charging. However, it must be appreciated that this was just one sample; it was shown in the last topic that if another sample was taken it might give a very different result (for example, it could give a value of 152.3 hours, in which case the phones are doing better than the manufacturers claim!). The method of hypothesis testing starts by making an assertion about the population, usually an assumption that the mean is equal to a stated result. In this case it is hypothesised that the population mean, , for the mobile phones is 150 hours. The Central Limit Theorem will next be used, and to do this a value for the standard deviation is required. Assume that in this case the population standard deviation is 12 hours. From the last topic, 95% of all sample means lie between The term (which is the standard deviation of the sample means) is often called the Standard Error (S.E.). In this case it is equal to 1.90. So the upper and lower bounds calculate as 150 3.72. So 95% of the sample means lie between 146.28 and 153.72. All the calculations so far have been based on the population; it is only now that the sample mean value needs to be used - recall that this was calculated as 147.4 hours. This is within the range of values that 95% of sample means are expected to fall between, so it has not been possible to disprove the hypothesis that the mean is 150. In other words there is only a 5% chance that the population mean is not 150. This is known as a significance test with level 0.05. There is no evidence to dispute the manufacturer’s claim at the 5% level. The supposition that the population mean is equal to 150 can be written as H0 : = 150 This is called the Null Hypothesis. c H ERIOT-WATT U NIVERSITY 2003 5.1. INTRODUCTION TO TESTS OF HYPOTHESIS To decide whether or not this assertion is true, it is necessary to have a comparison with an alternative hypothesis (so that one or the other will be true). This is written as H1 : 150 It is usual to then draw a Normal distribution curve and shade in the appropriate significance level (here 5%). The whole calculation can then be expressed more briefly in a diagram as: Since the sample mean, 147.4, is not in the shaded region H 0 is accepted. There is no evidence at the 0.05 level of significance that the population mean is not 150. 5.1.1 Type 1 and 2 Errors Since probabilities are used in the hypothesis tests, there is always the chance of an error in the conclusion being made. In the mobile phone example it is only being said that the sample mean value is consistent with a population mean value with 95% confidence. There is a 5% chance that the population mean value is not 150 hours. If the population mean is, in fact, not 150 hours but the hypothesis test resulted in accepting H 0 , it is said that a Type 2 Error has occurred. Conversely, if H 0 is actually true, but the sample mean resulted in it being rejected, it is said that a Type 1 Error has been made. This can be summarised in the table below. c H ERIOT-WATT U NIVERSITY 2003 4 5.1. INTRODUCTION TO TESTS OF HYPOTHESIS Decision Accept H0 Reject H0 5.1.2 State of Nature H0 is true correct decision Type 1 Error 5 H0 is false Type 2 Error correct decision One-tailed and two-tailed tests In the mobile phone example, recall that the diagram of the normal distribution curve had a 5% area shaded and this was split between both "tails". This will always be the case when the alternative hypothesis has a "not equal to" sign and is called a two-tailed test (for obvious reasons!). In some hypothesis tests, the alternative hypothesis is given as " is less than" or " is greater than" some value. In cases like this, only one side of the normal distribution curve is shaded and, not surprisingly, the test is called a one-tailed test. The example will now be re-worked as a one-tailed test. A competing mobile phone manufacturer wishes to prove that the time between charging for his rival’s phone is less than 150 hours. The hypotheses (plural of hypothesis) now become H1 : H0 : 150 150 The normal distribution curve in this case now has only one side shaded !"$#&%')(* ,+ To calculate the "cut-off" point, this time it is not that is used, but (From tables, the value of 1.64 gives an area under the normal distribution of approximately 0.05, whereas 1.96 gave 0.025) ".-$#%')(/ This means that the lower bound is Cc 1023+4".-5# 9 68:<7 ;>= ?-@"AB! H ERIOT-WATT U NIVERSITY 2003 5.1. INTRODUCTION TO TESTS OF HYPOTHESIS 6 Using the same sample value as before since the sample mean, 147.4, is not in the shaded region, again the null hypothesis is accepted. There is no evidence at the 0.05 level of significance that the population mean is less than 150. D Notice that for one-tailed tests with " " in the alternative hypothesis it is the left hand side of the Normal distribution curve that is shaded, whilst if it is " " in the alternative hypothesis the right hand side is shaded. 5.1.3 E Different Significance Levels The significance level of 5% (or 0.05 in decimals) has been used in the mobile phone example. This is a very common value to use but it is not the only one that can be employed. It implies that there is a 5% chance of making a mistake. However, if it is necessary for the margin of error to be less (in medical matters, say) then this can be reduced to 1% or even 0.1% (or, indeed any other value). Changing the significance level will have an effect on the "cut-off" point. For example, in the mobile phone example for a two-tailed test and a significance level of 1%, the upper and lower bounds would be calculated as 150 F 2.58 x S. E. i.e. from 145.10 to 154.90 The lower the significance level, the more difficult it is to prove the alternative hypothesis (which is often what you hope to do). If an alternative hypothesis is proved at the 5% level it is said to be significant; a level of 1% is termed highly significant whilst a 0.1% level is deemed to be a highly significant result. To help in calculations at different significance levels, for the general result the appropriate z values are given in the table below Sc H ERIOT-WATT U NIVERSITY 2003 GHFJILKNMPO'Q)R/Q , 5.2. SINGLE MEAN - LARGE SAMPLES T T U .50 .45 .40 .35 .30 Z 0.0000 0.1257 0.2533 0.3853 0.5244 .25 .20 .15 .10 .05 0.6745 0.8416 1.0364 1.2816 1.6449 T U .020 .019 .018 .017 .016 Z 2.0537 2.0749 2.0969 2.1201 2.1444 .015 .014 .013 .012 .011 2.1701 2.1973 2.2262 2.2571 2.2904 T U U .050 .048 .046 .044 .042 Z 1.6449 1.6646 1.6849 1.7060 1.7279 .030 .029 .028 .027 .026 Z 1.8808 1.8957 1.9110 1.9268 1.9431 .040 .038 .036 .034 .032 1.7507 1.7744 1.7991 1.8250 1.8522 .025 .024 .023 .022 .021 1.9600 1.9774 1.9954 2.0141 2.0335 T U T U .010 .009 .008 .007 .006 Z 2.3236 2.3656 2.4089 2.4573 2.5121 .050 .010 .001 .0001 .00001 Z 1.6449 2.3263 3.0902 3.7190 4.2649 .005 .004 .003 .002 .001 2.5758 2.6521 2.7478 2.8782 3.0902 .025 .005 .0005 .00005 .000005 1.9600 2.5758 3.2905 3.8906 4.4172 T 5.2 7 : significance level Single mean - large samples Hypothesis tests can be carried out on many different types of experimental data but the method of implementation is always the same. The main points to note are that the analysis should always begin by stating the null and alternative hypotheses, an appropriate measure of standard error should then be calculated and finally the sample value should be plotted on the appropriate distribution curve - depending on where it lies the null or alternative hypothesis will be accepted. Comparisons with the Normal distribution curve are only valid if the sample size is greater than 30; when this is the case the sample is categorised as large. Small samples will be considered later. The formula for the Standard Error in problems involving one large sample comes straight from the Central Limit Theorem given earlier. V'W)X*WAY [ Z \ ]c H ERIOT-WATT U NIVERSITY 2003 5.2. SINGLE MEAN - LARGE SAMPLES Examples 1. The time between server failures in an organisation is recorded for a sample of 32 failures and the mean value calculates as 992 hours. The organisation works on the assumption that the mean time between server failures is 1000 hours with a standard deviation of 20. Is it justified to use this figure of 1000 hours? Use a significance level of 0.05. H0 : ^ = 1000 ^` _ 1000 a'b)c*b ` g dfh e `ji b k i l d H0 : This is a two-tailed test with 2.5% shaded on each side of the Normal distribution so the cut-off points are given by 1000 1.96 x 3.536 , i.e. 993.070 and 1006.930. m This is shown on the diagram below, together with the sample mean of 992. Since the sample mean is in the shaded area, the null hypothesis is rejected and so the alternative hypothesis accepted. This means that there is evidence at the 5% level that the population mean is not 1000, so the organisation might like to review their specification for the server which, in fact, is performing better than they indicate. It is often the case when performing hypothesis tests that a test statistic is calculated from the sample value and this is compared with the standardised normal curve. This is doing exactly the same thing that was shown in Chapter 2 when converting Normal distributions into a form that could be compared with the tables. In this case, the test statistic is n ` ous prt vwq t This gives z = -2.26 This is now compared with the standardised Normal curve xc H ERIOT-WATT U NIVERSITY 2003 8 5.2. SINGLE MEAN - LARGE SAMPLES 9 It is clearly seen that the test statistic falls in the shaded area so H 1 is accepted as before. The two previous diagrams show that the two methods are identical but simply involve considering different scales. 2. It is suspected that in a particular experiment the method used gives an underestimate of the boiling point of a liquid. 50 determinations of the boiling point of water were made in an experiment in which the standard deviation was known to be 0.9 degrees C. The mean value is calculated to be 99.6C. The correct boiling point of water is 100 degrees C. Use a significance level of 0.01. Since it would be desirable to prove that the population mean is less than 100, it is sensible to use a one-tailed test with alternative hypothesis 100. So the hypotheses become y{z y| H0 : yz H0 : 100 100 The standard error is ~ } $j 1u Test statistic = 99.6 - 100/0.127 = -3.15 The standardised normal distribution curve gives a z value of -2.33 for an area of 0.01. Thus the diagram, with test statistic marked in, has the appearance: c H ERIOT-WATT U NIVERSITY 2003 5.2. SINGLE MEAN - LARGE SAMPLES 10 Since the test statistic is in the shaded region, H 1 is accepted. There is evidence at the 1% level that the population mean is less than 100. It is logical to assume, then, that the method of the experiment is underestimating the boiling point. In the above examples the standard deviation of the population mean was known. Often this will not be the case so as long as the samples are large ( 30) it is acceptable to estimate this value by using the sample values (as was done in Topic 3 with the confidence intervals). Hypothesis testing Q1: A particular questionnaire is designed so that it can be completed in 2 minutes. Over a number of days a researcher measures the time taken by everyone who fills in the form. The results are given in the table below (and can be downloaded here). Take a random sample and carry out a hypothesis test to check whether the 2-minute expected completion time is valid. Times are given in minutes. 2.44 2.71 2.46 2.53 1.76 2.52 2.60 2.26 1.87 1.80 c 2.20 2.48 2.14 2.71 1.89 2.97 2.32 1.46 2.16 2.16 1.49 2.99 2.95 2.33 2.54 1.37 2.01 2.04 2.21 1.93 2.39 1.92 2.19 2.12 2.62 0.91 1.25 1.53 2.19 2.29 H ERIOT-WATT U NIVERSITY 2003 2.59 2.59 1.67 2.12 1.86 2.99 1.79 1.78 2.08 2.49 2.63 3.21 2.31 2.08 2.05 2.87 1.84 1.87 2.24 1.26 2.20 1.92 2.38 1.73 1.03 2.22 2.03 1.98 2.40 2.34 2.66 1.73 1.52 2.41 2.49 2.58 2.10 1.72 1.73 2.12 5.3. SINGLE PROPORTION - LARGE SAMPLES 2.16 1.60 1.90 2.04 1.77 2.02 2.19 1.49 1.94 2.29 2.12 2.23 1.88 1.75 1.81 2.41 2.21 2.03 2.96 2.12 5.3 1.73 1.99 2.19 1.06 1.77 1.66 2.54 1.69 2.26 1.13 1.81 2.80 2.22 1.95 1.50 1.83 2.38 1.97 1.99 2.22 11 1.83 2.25 2.56 2.35 2.67 2.57 2.32 2.58 2.23 2.12 2.29 1.97 1.50 2.06 2.23 1.95 2.07 2.42 2.39 1.99 1.51 1.91 1.44 2.59 2.58 1.88 2.57 2.19 2.04 2.04 Single proportion - large samples It was shown in Topic 3 that sample proportions also follow the theory of the Central Limit Theorem. The standard deviation of the proportions (which will now be referred fr to as the Standard Error) was given by the formula , where is the population proportion and n is the sample size (again considered to be 30). Hypothesis tests can be carried out in much the same way as before. Examples 1. A survey of the first beverage that residents of the UK take when they waken up in the morning has shown that 17% have a cup of tea. It is thought that this figure might be higher in the county of Yorkshire, so a random sample of 550 Yorkshire residents is questioned and out of that number 115 said they had tea first thing. Using a significance level of 0.05, test the idea that the tea figure is higher in Yorkshire. The population proportion is thought to be 17% (or 0.17 as a decimal) so this is the figure that must be used in the hypotheses (like in the "mean" case, where it was always the population mean that was mentioned in the null hypothesis). Since it is hoped that it can be proved that the Yorkshire figure is higher than average, the alternative hypothesis 0.17. must have the form, , The hypotheses are therefore: , H1 : , H0 : 0.17 0.17 fr ¡f¢ f f£ £ ¤)¤¦¥1§¤ Now calculate the Standard Error (S.E.) In this case, ')/ r ¨ ª P is the population proportion, in this case 115/550 or 0.209 © « f¬wf® ,¢where ¯ ¨ 8°f j±A²² The test statistic here is comparable to the one for means. The standardised Normal curve can be drawn as before, with the value of 2.33 being used as the cut-off point for 1% (or 0.01). ³c H ERIOT-WATT U NIVERSITY 2003 5.3. SINGLE PROPORTION - LARGE SAMPLES Since the test statistic for the sample proportion is in the shaded area, there is enough evidence to reject the null hypothesis and accept the alternative one. In other words, Yorkshire folk drink more tea than the National average (using a significance level of 1%). Note that it is harder to prove a fact using a significance level of 1% than it is for 5%, so it can be said that this is a highly significant result. 2. In an ESP test, a subject has to identify which of the five shapes appears on a card. In a test consisting of 100 cards, would you be fairly convinced that a subject does better than just guessing if he gets 30 correct? Test at 1% significance level and at 0.1% significance levels. If he just guesses the proportion of times he would get the answer right is 1/5 = 0.2. So it is hoped to prove that the sample corresponds to a population proportion greater than 0.2. The hypotheses are therefore: ´,µ H1 : ´,¶ H0 : 0.2 0.2 Now calculate the Standard Error (S.E.) ·'¸)¹/¸º¼» ½¾¿f Àr½Á º¼» ÃÄ Å¾¿8¿fÃfÀ¢Ã ÃÄ Å<Á ºÆ¸)ÆÇ The test statistic is ÈɺËÊ Ì Àr½ , where P = 30/100 = 0.3 Ä ÍwÄ So Èκ ÃÄ ÏÀ¢ÃÄ Å ºjÑA¸Ò ÃÄ Ã¯Ð The standardised Normal curve with appropriately shaded regions is shown below. In this case, Óc H ERIOT-WATT U NIVERSITY 2003 12 5.4. DIFFERENCE OF TWO MEANS - LARGE SAMPLES The test statistic is in the shaded region for the 1% significance test so accept H 1 here. There is evidence at the 1% level that the subject displays powers of ESP. However at the 0.1% level of significance, the test statistic is not in the shaded region. Therefore the null hypothesis has to be accepted in this case. This shows that there is a highly significant evidence of the subject displaying ESP, but not a very highly significant result. Ô Ô Õ Note that in both examples n and n(1- ) Limit Theorem to be valid. 5.4 5, a property that is required for the Central Difference of two means - large samples So far in this chapter the hypothesis tests have been used to compare one sample mean or proportion with a known value. However, it is very often the case that comparisons are required between two samples in order to decide which is the better of the two for a certain purpose. For example, if a new piece of software is introduced into an office and workers think that their job is now taking longer on the new system, it would be useful to have a statistical test to check out their claims. The Central Limit Theorem provides useful information about the distribution of sample means. However, it can be extended to also give information about the distribution of the difference of two sample means. In fact, it can be proved that this distribution is Normally distributed with mean 0. This is a very useful and interesting result and it highlights once again why the Normal distribution is so important in statistics! The same rules apply as for the single mean case that the original populations do not have to be Öc H ERIOT-WATT U NIVERSITY 2003 13 5.4. DIFFERENCE OF TWO MEANS - LARGE SAMPLES Normally distributed as long as the sample size is greater than 30. The standard deviation of the difference of two sample means, referred to again in this section as the Standard Error (S.E.) is given by the formula: ×'Ø)Ù*ØAÚÜÛ à.Ý Þßâß á àÝãÞÞÞ where the subscripts 1 and 2 refer to population 1 and population 2 respectively. As in previous examples for large samples, if the population standard deviation is unknown it is fine to use the sample standard deviations (usually referred to as s 1 and s2 ) The hypothesis tests usually start of by assuming that there is no difference between the population means ( 1 - 2 = 0) and either confirming this or proving the assumption to be wrong. ä ä Example The response times of two hard drives are measured and the values are given in the table below (times are measured in seconds). Disk 1 n1 = 35 Disk 2 n2 = 38 s1 = 5 s2 = 4 åçæ Úéè1ê åìë Úíè1î Is there a significant difference between response times? Start off by assuming that there is no difference between the populations that the two samples come from. ä H0 : 1 - ä 2 =0 There is no need to check whether one disk is better or worse than the other so a two-tailed test is a reasonable thing to use. Therefore the alternative hypothesis is: H1 : ä 1 - ä 2 Úï 0 The method of the test follows the same pattern as the previous examples in this chapter. The next step is to calculate the standard error and use it in the test statistic. ×'Ø)Ù*ØAÚ Û ðñ ðÞ í á ñfò óÞ ÚéèØ)ôuêê Since now it is the difference of means that is being considered, the test statistic takes the form: õ Ú÷ö ø ß?ù ø Þ?ýuú ùþ ÿwöüþû ß ù û Þ ú Since it is being assumed in the null hypothesis that m1 = m2, the second bracketed term on the numerator is equal to zero Thus õ Ú ö æ æ ù þ æ ñ ú ù ÚAØè Now make a sketch of the standardised Normal distribution curve and choose a significance level of 0.05. c H ERIOT-WATT U NIVERSITY 2003 14 5.5. DIFFERENCE OF TWO PROPORTIONS - LARGE SAMPLES It can be seen that the test statistic is in the shaded region so the null hypothesis is rejected and the alternative hypothesis accepted. It has therefore been shown that there is a significant difference between the response times. It would, of course, have been possible to carry out a one-tailed test if required in the example. The hypotheses would change to: H0 : 1 - 2 0 H1 : 1 - 2 0 5.5 Difference of two proportions - large samples In the same way as the theory relating to one sample mean was extended to the comparison of two sample means, exactly the same thing can be done for sample proportions. The Central Limit Theorem provides useful information about the distribution of sample proportions. However, it can be extended to also give information about the distribution of the difference of two sample proportions. In fact, it can be proved that this distribution is Normally distributed with mean 0. This is a very useful and interesting result and it highlights once again why the Normal distribution is so important in statistics! The same rules apply as for the single proportion case that the original populations do not have to be Normally distributed as long as the sample size is greater than 30. Also it is required that np and n(1- p) are greater than 5 for each population. The standard deviation of the difference of two sample proportions, referred to again in c H ERIOT-WATT U NIVERSITY 2003 15 5.5. DIFFERENCE OF TWO PROPORTIONS - LARGE SAMPLES this section as the Standard Error (S.E.) is given by the formula: !#" $ % '&() '&" '& where the subscripts 1 and 2 refer to population 1 and population 2 respectively. Now, usually the population proportions are unknown and the null hypothesis will be assuming in any case that they are the same. For these reasons, a pooled value of the sample proportions is used in the formula instead of * 1 and * 2 . This is referred to as + . Thus the formula for the standard error becomes: , , " $ % , ) , " $ & The hypothesis tests usually start of by assuming that there is no difference between the population proportions (* 1 - * 2 = 0) and either confirming this or proving the assumption to be wrong. Example It is desired to investigate the proportion of people who attend church regularly in Scotland and in England, so two random samples are taken and the results are given below. Scotland England Attend regularly 47 31 Do not attend regularly 136 183 106 137 Total Is there any evidence that more people in Scotland attend church than in England? This is a problem dealing with two proportions so the method of solution is to use the formulae for the difference of two proportions. Since it is desired to prove that the Scottish proportion is higher than the English proportion, a one-tailed test has to be used. If Scotland is referred to with subscripts "1" and England with subscripts "2", the alternative hypothesis will have to be of the form H1 : * 1 - * 2 - 0 So the null hypothesis will be H0 : * 1 - * 2 . 0 The problem is solved using exactly the same procedures as all the previous ones. 1. Hypotheses H0 : * 1 -* 2 . 0 H1 : * 1 -* 2 - 0 2. Calculation of Standard Error / First calculate + c H ERIOT-WATT U NIVERSITY 2003 16 5.5. DIFFERENCE OF TWO PROPORTIONS - LARGE SAMPLES This is calculated as Now, C >DE>:GF HBI 5LNJ M 17 0(124365 5738295381;:=<?>@BA!A O H'K 5 L!J Q HBI HPK : F R6S T 00'U 1WVX 573 R6S O R6S T 00'U 1WVX 5381 R6S :=<?><YANZ![ 3. Calculate test statistic M Q _ J J IH H K ^I ] ] K : S `aS have b 1 hypothesis we In this case In the null 0. M \ Q J - b 2 c 0 so take the extreme case that b 1 -b 2 = Now, P1 = 47/183 = 0.257 and P2 = 31/137 = 0.226 This gives \ : I R6S T V81dJ 4. R6S R R608S 7TV T X K J R :=<?>e!f!g Compare the test statistic with the standardised Normal distribution curve (use a 5% significance level). 5. Offer a conclusion. Since the test statistic is not in the shaded area the null hypothesis in accepted. There is no evidence, at the 5% level, that a higher proportion of the Scottish population attend church than does the English population. p Notice that in this example sizes. c h i H ERIOT-WATT U NIVERSITY 2003 and hkjWlnm i4o are greater than 5 for both sample 5.6. SMALL SAMPLES 5.6 Small Samples In the large sample (n q 30) problems discussed earlier in this Topic it was acceptable to estimate the population standard deviation by using the sample standard deviation. With small samples, where more chance variation must be allowed for, there is more uncertainty in estimating this value and hence also the standard error. Some modification of the procedure of using the test statistic is needed, and the technique to use is the t test. Its foundations were laid by W.S. Gosset [1876-1937], who wrote under the pseudonym "student", so that it is sometimes known as student’s t test. The procedure does not differ greatly from the one used for large samples, and in the one sample case the test statistic looks very like the one used earlier, namely rks tYx u w y v This t value is no longer compared with the standardised Normal distribution curve. In fact, if the underlying distribution is Normal then this random variable is said to follow a student t distribution with parameter z = n - 1. This is similar to the Normal distribution in the sense that it is a symmetrical "bellshaped" curve, but it is slightly flatter and hence wider (the total area under it, of course, still equals 1). Note, though, that as n gets larger, the curve becomes indistinguishable from the Normal distribution. Unlike the Normal distribution, however, where the same values for "cut-off" points were used whatever the sample size (e.g. 1.96 for an area of 0.025), this is not the case in the t distribution. These values change depending on what the sample size is. They can be obtained from Statistical tables (or computer packages) and are categorised in terms of a quantity called the degrees of freedom ( z ). To grasp the concept of degrees of freedom, imagine you have been asked to select 5 numbers whose mean is 30 - the sum of these numbers will therefore be 150. If the first four numbers selected were 25, 26, 29 and 33 there is no choice for the fifth one other than 37. In other words there are only 4 degrees of freedom. In general if you have n numbers and the mean is specified then you have n - 1 degrees of freedom. An example of the t distribution curve with 10 degrees of freedom ( z = 10) is drawn below with a shaded area of 2.5% in each tail. { c H ERIOT-WATT U NIVERSITY 2003 18 5.6. SMALL SAMPLES 19 Part of the t tables are shown below. The number highlighted refers to an area of 5% and degrees of freedom = 6. | = =1 =2 =3 =4 =5 =6 =7 =8 =9 = 10 } } } } } } } } } } | } } } } } } } } } } = =1 =2 =3 =4 =5 =6 =7 =8 =9 = 10 0.10 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 0.01 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 0.05 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 0.005 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 0.025 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 0.001 318.31 22.326 10.213 7.173 5.893 5.208 4.785 4.501 4.297 4.144 0.0005 636.62 31.598 12.924 8.610 6.869 5.959 5.408 5.041 4.781 4.587 Notice that the numbers are all positive, so if the shading is on the left-hand side of the curve, a negative sign is placed in front of the appropriate number. ~ c H ERIOT-WATT U NIVERSITY 2003 5.6. SMALL SAMPLES The following diagram shows how the t-distribution changes as (and hence the sample size) increases. To summarise, the properties of the t-distribution are: 1. The t-distribution is "bell-shaped" and symmetric 2. The t-distribution is actually a family of curves, each determined by a parameter called the degrees of freedom ( ), with = n - 1 3. The total area under a t-curve is 1 4. The mean, median, and mode of the t-distribution are equal to zero 5. As the degrees of freedom increase, the t-distribution approaches the standard normal z-distribution 5.6.1 Single mean The method of the t test is best illustrated by an example. Example A paint manufacturer claims that on average one litre of paint will cover 14 square metres. A firm buying the paint suspects that this is an exaggeration so they take a random sample of 12 litres and measure the area covered by each. The data are: 13.6 13.9 13.2 14.5 12.6 12.6 13.2 13.8 13.4 12.4 14.3 13.2 The population standard deviation is unknown so it has to be estimated from the sample. Since it has a size less than 30 the z test statistic cannot be used, but the t test can be employed (as long as the original data follow a Normal distribution). It is a straightforward process to show that the sample mean, !! and the sample standard deviation, =?!! c H ERIOT-WATT U NIVERSITY 2003 20 5.6. SMALL SAMPLES 21 Now the hypotheses are set up as before. Since it is suspected that the area of paint coverage is less than 14 square metres, a one-tailed test is used. The alternative hypothesis should therefore be of the form 14. The hypotheses are summarised as: H0 : 14 H1 : 14 Now the standard error has to be calculated. In the case of small samples with a single mean the formula is simply E? In this case, E?6 ¡ =¢?¤£¥!¦ The t statistic is calculated by the formula § So « © ª ¨Y ¬ ® ® ¯¡ § ¯¡ 6 ±°³²¤£P´ Now the t distribution curve is drawn with a 0.05 significance interval shaded. Note that the "cut-off" value is obtained from tables using 11 degrees of freedom ( µ = 11) and reading down the appropriate column. Since the test statistic is in the shaded region the null hypothesis is rejected and the alternative accepted. There is evidence at the 5% level that the paint coverage is less than 14 square metres and so the manufacturer is exaggerating. ¶ c H ERIOT-WATT U NIVERSITY 2003 5.6. SMALL SAMPLES 5.6.2 22 Confidence Intervals with Small Samples In this section a short diversion from hypothesis tests is taken to fill in the gap of estimating a population mean from a small sample. As is the case with large samples, a point estimate of the population mean is given by the sample mean. However in the small sample case, the confidence interval will depend on the sample size as well as the mean. The formula is given by ·¹¸»º¼½ ¾À Á ¿ ÂÄÃÆÅà ·ÈÇɺ¼½ ¾É Á ¿Â where · is the sample mean, s is the sample standard deviation, n is the sample size and ta,v is available from tables. Example The lengths in cm of a random sample of 7 components taken from a the output of a manufacturing process are: 3.1 3.4 3.4 3.3 3.2 3.3 3.0 Give a 95% confidence interval for the population mean. By calculation, ·ËÊÍÌÎÏBÐNÌ and Ñ ÊÓÒ?ΤÔÕ?Ô A 95% confidence interval results in a shaded area of 2.5% in the tails of the t curve. From tables then, using a significance level of 0.025 and a value of Ö = 6, the value 2.447 is obtained. Substituting the values in the appropriate formula gives: ·¹¸»º×½ ¾ Á ¿ Â ÃØÅà ÌÎÏBÐNÌÚ¸ÛÏÎÜÐ!ÐÞÝàß6Á áÜâã6 ä â 3.103 Å æ æ ·ÈÇɺW×Ù½ ¾ Á ¿ Â ÃØÅà ÌÎÏBÐN̳ÇåÏÎÜÐ!ÐÞÝàß6ÁáÜâã6 ä â 3.383 In other words, it can be deduced that the population mean will lie between 3.103 and 3.383 with 95% confidence. 5.6.3 Difference of 2 Means from Small Samples The sampling distribution of the difference between two means of small samples follows a t distribution with mean Å 1 - Å 2 and standard error given by the rather complicated looking formula: ç ÎèEÎÊêé ÑÞë with Ñ é Êï Ñ é Ô ì â Ç Ô ìîí being estimated from the two sample standard deviations as: ð ÂNñWò ñ#õ  ò â#ó)¿ô ð ô í â#ó)¿ô  ñ õ  ò ô ô The subscripts 1 and 2 refer to sample and population 1 and 2 respectively. The format of the hypothesis test follows exactly that of the large sample case, but clearly uses a different formula for the standard error and the comparisons are made with t distribution curves rather than standardised Normal. The degree of freedom ( Ö ) for problems of this type can be calculated as n 1 + n2 - 1. ö c H ERIOT-WATT U NIVERSITY 2003 5.6. SMALL SAMPLES 23 Example A survey was carried out to investigate the number of hours worked per week by people in various countries and two specific countries were highlighted, Japan and Russia. It had always been believed previously that Russians worked the highest number of hours per week, but the data do not seem to support this. The results are given below. Russians (hours worked) Japanese (hours worked) ÷ùøûúýüNþÿ ÷ úYÿ ø úÓü ÿ úÿþ ø ú ü ú Is there a significant difference between the number of hours worked per week by Russians and the Japanese? Test at the 5% level. 1. Set up the hypotheses: H0 : 1 - 2 H1 : 1 - 2 =0 ú 0 (Subscripts 1 refer to Russia and 2 Japan). 2. Calculate the Standard Error: ø ÿEÿú so ú ø Therefore, with ! ø&%(' # " ! ) ø&%(' ! # " ! # , ) ÿEÿ?ú< ! ! &) " ø&%(' !*# +! # úÓü ÿ:Yü; ! ø.-0/132Üø. ø41 ! ) ø = ø&%(' $# ú ø " ! ) ) ø.9 ! ! ø415/-62 7$8 # ! ø ø41 ø ø.9 ú=ü ÿ:Yü; ú>Yÿ+:Yü 3. Use the standard error in the test statistic: In the case of difference of two means for small samples this is given by ? ú $# " @ @ % ! # C "A 2 DE2 B# A ! % And by the null hypothesis, So, ? ú 96ø$2 F # ø$2 $ 9 GH1 1B762 - 1 2 = 0. ú>IJYÿ!þK 4. M Compare with t distribution curve with 27 degrees of freedom (recall that calculated as n1 + n2 - 1): c H ERIOT-WATT U NIVERSITY 2003 L is 5.6. SMALL SAMPLES 5. Make a conclusion: Since the test statistic is not in the shaded region the null hypothesis is accepted. There is no evidence of a significant difference, at the 5% level, between the number of hours worked per week by Japanese and Russian people. To summarise, it has not been proved that Russians still work the longest hours per week (as had been previously thought), but it has been shown that although the Japanese figures initially seemed higher, there is, in fact, no significant difference between the Russians and the Japanese in terms of the number of hours worked per week. Play length Q2: A music producer is interested in estimating whether there is a difference in the average play length of Country and Pop CD singles. Random samples are taken from each category and the results are shown here. N c H ERIOT-WATT U NIVERSITY 2003 24 5.6. SMALL SAMPLES Country (duration in minutes) 25 Pop (duration in minutes) 3.80 3.88 3.30 4.13 3.43 4.11 3.30 3.98 3.03 3.98 4.18 3.93 3.18 3.92 3.83 3.98 3.22 4.67 3.38 Assuming that the duration times of both types of music come from Normal distributions, carry out a hypothesis test to investigate for a significant difference in duration. 5.6.4 Paired t test In the last section, hypotheses were tested about the difference in two population means when the samples were independent. A method is presented here to analyse situations when this is not the case, for example, if some quantity was measured before and after a specific treatment, clearly one set of results would depend on the other. The test statistic in situations like this is given by a much simpler formula than that of 4.6.3, namely, OQP R3S T UWV X Y , with Z = n-1 Note that: n = number of pairs (by default, both sample sizes must be the same). d = sample difference in pairs D = mean population difference Sd = standard deviation of population difference [ P mean sample difference. Example Five keyboard operators were asked to perform the same task on two types of machine. Test if there is any significant difference in the time taken to do the task. Test at the 5% level. Times are in minutes. \ c H ERIOT-WATT U NIVERSITY 2003 5.6. SMALL SAMPLES Operator 26 Machine A 9.6 8.4 7.7 10.1 8.3 1 2 3 4 5 Machine B 7.2 7.1 6.8 9.2 7.1 The null hypothesis assumes there is no difference in the population means, so D = 0. Therefore: H0 : D = 0 ^ H1 : D ] 0 Now calculate the differences, d. These are 2.4, 1.3, 0.9, 0.9 and 1.2 (Note that they are all positive here, but it would be perfectly feasible to have both negative and positive results). Now the mean and standard deviation of d are calculated in the usual way. _ ^`+abdcfe,g0hi^kjal`m The test statistic is given by n ^ h3o p s qWr t o{z ^vu$w|~xH} ~ y ( s ^cfadc j The t distribution curve with side) is given below. = 5 - 1= 4 and a significance level of 0.05 (0.025 each Since the test statistic is in the shaded region the null hypothesis is rejected and the c H ERIOT-WATT U NIVERSITY 2003 5.7. THE CHI-SQUARED DISTRIBUTION alternative accepted. There is evidence at the 5% level of a difference in times taken to perform the task on both machines. Note that if it was desired to prove that machine B takes longer to do a task, a one-tailed test can be employed in the usual way. The hypotheses would then become: H0 : D 0 H1 : D 0 5.7 The Chi-Squared Distribution So far the distributions discussed in the examples have all had graphs with a very similar shape. Apart from having different points where they cut the axes, both the standardised Normal and the t distributions have bell-shaped, symmetric curves as shown below. However, do not be misled into thinking that every statistical distribution looks like this. The first non-parametric test now considered deals with comparison with the chisquared distribution which has a graph shaped like the one below. (Chi is pronounced "kye" and is the Greek letter ). The chi-square distribution, like the t distribution, depends on the degrees of freedom and so is actually a family of curves. Some examples are given below. c H ERIOT-WATT U NIVERSITY 2003 27 5.7. THE CHI-SQUARED DISTRIBUTION 28 Note that the curve is NOT symmetrical. There are two main uses of the chi-squared distribution. The first is to test whether there is a significant association between two variables (like hair colour and a person’s sex) and the second is what is called a "goodness of fit" test - a check as to whether observed data follows a particular expected distribution. 5.7.1 Checking for Association - Hair and Eye Colour The contingency table below was obtained from an experiment designed to examine whether there is a relationship between hair and eye colour in humans. Blue Grey Hazel Brown Total Blond 60 20 10 10 100 Brown 40 50 50 160 300 Black 60 20 10 10 100 Red 40 10 30 20 100 Total 200 100 100 200 600 The first thing to do when analysing problems of this type is none other than the old familiar process of setting up hypotheses. In testing for association there is only one possibility for what they should be so there is no need to worry about whether it is a one or two tailed test that is required. The general form of the hypothesis test is: H0 : The two criteria of classification are independent H1 : The two criteria of classification are not independent In this particular case, then, the hypotheses will be H0 : There is no relationship between hair and eye colour H1 : There is a relationship between hair and eye colour There is no concept of standard error in non-parametric tests, but it is still necessary to calculate a test statistic. In examples checking for association, this test statistic will follow the chi-squared distribution with an appropriate number of degrees of freedom. In order to calculate its value, the contingency table has to be redrawn with expected c H ERIOT-WATT U NIVERSITY 2003 5.7. THE CHI-SQUARED DISTRIBUTION 29 values in each cell. These expected values are calculated by assuming that both of the classifications are independent and therefore probabilities can be multiplied using the equation p(A and B) = p(A) p(B) There are 16 numbers to be calculated here, so only two will be carried out in full. Blue Eyes/blonde Hair 200 1 = 600 3 100 1 = p(blond) = 600 6 1 1 1 p(blue and blond) = = 3 6 18 p(blue) = Out of 600 people, then, it would be expected that 1 /18 of them would have blue eyes and blonde hair. This calculates as 33.3 to one decimal place (expected values are often not whole numbers and should be given to an appropriate degree of accuracy in problems). Grey Eyes/Brown Hair 100 1 = 600 6 300 1 = p(brown) = 600 2 1 1 1 p(grey and brown) = = 6 2 12 p(grey) = Out of 600 people, then, it would be expected that 1 /12 of them would have grey eyes and brown hair.This calculates as 50. The contingency table can be redrawn now to show expected values. Blue Grey Hazel Brown Total Blond 33.3 16.7 16.7 33.3 100 Brown 100 50 50 160 300 Black 33.3 16.7 16.7 33.3 100 Red 33.4 16.6 16.6 33.4 100 Total 200 100 100 200 600 Notice that the results in the "red" column were all rounded so that the "total" column was the same for the both expected results and for the original observed values. This should always be done (and, in fact, saves some calculations of probabilities, since all that is then required at that last stage is a subtraction). Now the test statistic needs to be defined as it is this that follows the chi-squared distribution. The easiest way to do this is to let O represent each original "observed" value and E represent each "expected" value in turn and then calculate: Test statistic = c H ERIOT-WATT U NIVERSITY 2003 (O - E)2 E 5.7. THE CHI-SQUARED DISTRIBUTION 30 This is often referred to as H4 The degrees of freedom, , for contingency table problems is calculated by (number of rows -1) (number of columns -1). Note that the "total" row and column is not counted. So in this case, = (4 - 1) (4 - 1) = 9 Also in this problem then, the test statistic is given by (60 - 33.3)2 33.3 ~4= + (40 - 100)2 100 + + (20 - 33.3)2 33.3 = 180.06 Just like there are statistical tables for the standardised Normal and t distributions, so there are tables for the chi-squared distribution. Part of a set of tables is shown here. Since the curve is not symmetrical, separate "cut-off" points need to be given for the left and right hand sides. = =1 2 3 4 5 6 7 8 9 10 .99 .03157 .0201 .115 .297 .554 .872 1.239 1.646 2.088 2.558 .975 .03982 .0506 .216 .484 .831 1.237 1.690 2.180 2.700 3.247 .95 .00393 .103 .352 .711 1.145 1.635 2.167 2.733 3.325 3.940 .90 .0158 .211 .584 1.064 1.610 2.204 2.833 3.490 4.168 4.865 .50 .455 1.386 2.366 3.357 4.351 5.348 6.346 7.344 8.343 9.342 .20 1.642 3.219 4.642 5.989 7.289 8.558 9.803 11.030 12.242 13.442 =1 2 3 4 5 6 7 8 9 10 .05 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 .025 5.024 7.378 9.348 11.143 12.832 14.449 16.013 17.535 19.023 20.483 .02 5.412 7.824 9.837 11.668 13.388 15.033 16.622 18.168 19.679 21.161 .01 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 .005 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 .001 10.827 13.815 16.268 18.465 20.517 22.457 24.322 26.125 27.877 29.588 = .10 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 These Tables are taken from Murdoch + Barns Statistical Tables The tables reveal that for 9 degrees of freedom, the "cut-off" points for 5%, 1% and 0.1% are 16.919, 21.666 and 27.877 respectively. A diagram is now shown with the area shaded appropriate to a significance level of 0.001. c H ERIOT-WATT U NIVERSITY 2003 5.7. THE CHI-SQUARED DISTRIBUTION Since the test statistic is in the shaded region, the null hypothesis is rejected. There is evidence at the 0.1% level, therefore, that there is an association between hair and eye colour. In other words there is a very highly significant relationship between hair and eye colour. 5.7.2 Limitations of Chi-squared test Chi squared is a mathematical distribution and has been used so far without any proof given as to why it is useful in measuring whether there is an association between criteria. The mathematical details are not required in this course so are not provided here (although at the end of this topic the distribution will be re-visited and put in a different context which may shed some light on how it comes about). However, account must be taken of some limitations so that it can be used validly for statistical tests. The first problem occurs if there is only one degree of freedom. This happens more often than you might think, since if the contingency table only has 2 rows and 2 columns, the degrees of freedom will be (2 - 1) (2 - 1) = 1. In cases like this, a Yates’ continuity correction must be made. This also occurs in other areas in probability where discrete distributions are being approximated by continuous ones. Basically what happens is that 0.5 is subtracted from each calculated value of "O - E", ignoring the sign (plus or minus). In other words, an "O - E" value of + 5 becomes + 4.5, and an "O - E" value of -5 becomes -4.5. That number is then squared and divided by E. In terms of a formula, the test statistic is now given by: ¡ ¢H£4¤=¥¦ ( § O - E § - 0.5)2 E The second limitation in the use of the chi-squared distribution, again to satisfy the underlying mathematical assumptions, the expected values should be relatively large. The following simple rules are applied: 1. No expected category should be less than 1 (it does not matter what the observed values are) ¨ 2. AND no more than one-fifth of expected categories should be less than 5. c H ERIOT-WATT U NIVERSITY 2003 31 5.7. THE CHI-SQUARED DISTRIBUTION 32 If data do not meet these criteria then either larger samples have to be taken, or the data for the smaller "expected" categories can be combined until their combined expected value is 5 or more. This should only be done, however, if combinations are sensible Example The example from Topic 4, where differences between two sample proportions were considered, will now be re-worked using a chi-squared test instead of the method used previously of calculating a z value and comparing it with standardised Normal distribution. The problem examined church attendance in two countries, Scotland and England, and asked if there was a significant difference between the church visiting patterns of the Scots and the English. These were the results: Scotland England Total Attend regularly 47 31 78 Do not attend regularly 136 106 242 Total 183 137 320 The hypothess test is given as: H0 : There is no relationship between church attendance and Country H1 : There is a relationship between church attendance and Country A table of expected values is now calculated in the same way as before assuming the null hypothesis to be true. These expected values are listed below. (This can be done very quickly by noticing that, in fact, only one probability calculation is required - the others are obtained by subtractions). Scotland England Total Attend regularly 44.6 33.4 78 Do not attend regularly 138.4 103.6 242 Total 183 137 320 2 ª~«4¬=¯®±°B²´³µ·¶¸²dµ © 0.5 ¹ 2 ¶ 47 - 44.6 ²dµ 0.5 ¹ 2 31 - 33.4 ²5µ 0.5 ¹ °B² °B² º 44.6 33.4 0.081 + 0.108 + 0.026 + 0.035 2 º °B² 136 - 138.4 ²5µ 0.5 ¹ 138.4 2 º °B² 106 - 103.6 ²dµ 0.5 ¹ 103.6 0.25 Now compare with a chi-squared curve with one degree of freedom. The "cut-off" point for 5% is 3.841. » c H ERIOT-WATT U NIVERSITY 2003 2 5.7. THE CHI-SQUARED DISTRIBUTION 33 Since the test statistic is not in the shaded region, the null hypothesis is accepted. There is no evidence, at the 5% level, of a relationship between people who attend church regularly and whether they live in Scotland or England. This supports the conclusion reached in the previous chapter. Note: No worked examples have been given in this section which show the limitations of the test when small expected values are calculated, but the reader should be aware of these limitations and address them appropriately if they are encountered in calculations. Charter airlines A consumer association has done some research on customer views on the reliability of charter airlines. The results are tabulated below: Airline High Life Good Average Poor 50 40 30 Sky Coaxing 40 80 Up and Away 35 50 55 50 Carry out an appropriate hypothesis test to determine if there is an association between the airline and reliability. 5.7.3 Goodness of Fit Tests The Chi-squared test can also be used in other situations where observed and expected values are being compared. The test statistic will again be: ½ ¼¡½¾H¿4À ÁñÄ4ÅkÆ·ÇÉÈ Ç The degrees of freedom, Ê , will depend on the particular problem, but in general = (number of classes) - (number of parameters estimated) - 1 Ê Ë c H ERIOT-WATT U NIVERSITY 2003 5.7. THE CHI-SQUARED DISTRIBUTION 34 Examples 1. Unfair die A gambler suspects the die being used for a game is loaded and producing unfair results. A survey of 120 throws gave the following results: Throw Frequency 1 17 2 16 3 19 4 23 5 22 6 23 These are clearly the observed values. The estimated values are quite simply 20 for each throw if the die is fair. The hypothesis test takes the form: H0 : The expected distribution is true (in this case "the die is fair") H1 : The expected distribution is false (in this case "the die is loaded") The test will be carried out using a significance level of 0.01. The following table shows how the calculations are carried out. O E (O - E) (O - E)2 17 16 19 23 22 23 20 20 20 20 20 20 3 4 1 3 2 3 9 16 1 9 4 9 0.45 0.80 0.05 0.45 0.20 0.45 Total: 2.40 ÌÍÏÎ{ÐÒÑ No parameters have been estimated in this problem so, Tables give a Ô 2 value of 15.086 (1% level) The diagram is as follows: Õ c H ERIOT-WATT U NIVERSITY 2003 2 E Ó =6-1=5 5.7. THE CHI-SQUARED DISTRIBUTION 35 Since the test statistic is not in the shaded region the null hypothesis must be accepted. There is no evidence that the die is loaded. 2. Company feelings In this example a test will be carried out to verify whether a particular distribution is Normally distributed. An attitude survey is taken by employees to see how they feel about their company. Answers from a questionnaire could potentially produce scores from 0 to 50 and the actual results are shown below: Frequency (f) Class intervals 10 - under 15 15 - under 20 20 - under 25 25 - under 30 30 - under 35 35 - under 40 Ö 11 14 24 28 13 10 f = 100 Test at the 5% level whether this is a Normal distribution. Solution There are two parameters to be estimated here, the mean and standard deviation. Using the usual formulae (and approximating each class interval by its mid-point) these are calculated as Ø× ± ÙÛÚ Ù 12.5 Ü 11 Ý + Ú 17.5 Ü 14 Ý + Ú 22.5 Ü 26 Ý + Ú 27.5 100 Ü 26 Ý + Ú 32.5 Ü 13 Ý + Ú 37.5 Ü 10 Ý 24.9 Similarly, the standard deviation, s, is calculated as 7.194. Now various probabilities using the Normal curve and statistical tables must be calculated. As an example, consider 30 - under 35. The Normal curve has the appearance: Þ c H ERIOT-WATT U NIVERSITY 2003 5.7. THE CHI-SQUARED DISTRIBUTION 36 The area to the right of 35 is given by a z value of 35 - 24.9 7.194 = 1.40 30 - 24.9 7.194 = 0.71 Tables give the area to the right of 1.40 as 0.0808 The area to the right of 30 is given by a z value of Tables give the area to the right of 0.71 as 0.2389 This means the probability of obtaining a score between 30 and 35 is 0.2389 - 0.0808 = 0.1581. Multiplying this by 100 gives the expected number in this category, namely 15.81. The other expected values are calculated in the same way and the results are as follows: Frequency (f) Class intervals ß 10 10 - under 15 15 - under 20 20 - under 25 25 - under 30 30 - under 35 35 - under 40 à 40 1.92 6.46 16.45 25.57 25.71 15.81 6.29 1.79 Notice that these add to 100 (that is why the extra categories at the start and end had to be added). Now though, since these "extra" categories give values less than 5 (one of the limitations of the chi-squared test) it makes sense to combine them with the adjacent categories. The revised table is as follows: Frequency (f) Class intervals 10 - under 15 15 - under 20 20 - under 25 25 - under 30 30 - under 35 35 - under 40 8.38 16.45 25.57 25.71 15.81 8.08 Compare this now with the observed values. All that remains now is to set up the hypotheses and calculate the test statistic (notice that the most time-consuming part of this problem was the mundane calculations!). H0 : Data follow a normal distribution with mean 24.9 and standard deviation 7.194 H1 : Data do not follow a normal distribution with mean 24.9 and standard deviation 7.194 á âHã4ä=åÃæ±ç4èké·êÉë~ì 2 ê åíç 11 - 8.38 ë 8.38 2 + ç 14 - 16.45 ë 16.45 2 ç îïïïïdî 10 - 8.08 ë 8.08 2 = 2.44 The degrees of freedom, ð , is given by ð = (number of classes) - (number of parameters estimated) - 1 ñ c H ERIOT-WATT U NIVERSITY 2003 5.8. COURSEWORK 1 so = 6 - 2 (mean and standard deviation) - 1 = 3 ò The critical value from chi-squared tables is 7.815 as shown on the graph below. Since the test statistic is not in the shaded region, the null hypothesis cannot be rejected. There is evidence at the 5% level that suggests that these data follow a normal distribution with mean 24.9 and standard deviation 7.194 5.8 Coursework 1 This is the first of two coursework exercises the second is at the end of Topic 10 This work should be submitted to your tutor at a date to be notified. For the exercise it is expected that you will have access to an appropriate computer package (such as Microsoft Excel or Minitab) in order to help analyse the data. You are not required to perform calculations manually. Task 1 An insurance company wishes to investigate if there is a difference between the claims received by their Aberdeen and Dumfries offices. One week of the year is randomly selected and all the claims to each office during that week are recorded. The results are given in Table 5.1 and Table 5.2 ó c H ERIOT-WATT U NIVERSITY 2003 37 5.8. COURSEWORK 1 38 Table 5.1: 339 297 392 345 342 335 335 201 284 268 259 222 332 353 342 160 447 191 292 412 349 223 186 205 350 267 197 280 134 292 119 1293 374 378 320 270 Aberdeen Claims 220 283 171 219 268 363 323 105 272 307 408 241 292 285 456 403 349 59 281 135 247 246 344 221 270 278 328 1381 334 277 400 173 198 253 160 371 364 245 382 476 351 256 349 318 198 398 196 191 224 310 171 1249 383 206 299 365 190 420 208 188 290 418 224 301 361 344 275 394 363 231 Table 5.2: 193 164 486 331 319 208 506 371 128 445 400 445 372 51 374 256 174 265 275 257 355 230 79 325 319 189 422 313 307 224 168 Dumfries Claims 560 451 1303 420 255 51 370 60 333 201 408 300 234 334 247 458 385 137 273 343 413 a) Summarise each data set. Obtain mean, median, standard deviation and interquartile range. Produce relevant graphs that will show any patterns in the data. b) Use a hypothesis test to investigate if there is a significant difference between the claims received by the two offices. Set out your hypotheses clearly and show all your working. c) Produce 95% confidence intervals that will give an estimation of the average amount of all the claims received by each office. Task 2 A warehouse ships out 500 cartons of strawberries one day, each of which contains 20 strawberries. It is desired to analyse the distribution of rotten strawberries. Tests are carried out and the results are shown in the following frequency table Table 5.3 ô c H ERIOT-WATT U NIVERSITY 2003 5.9. SUMMARY AND ASSESSMENT 39 Table 5.3: Number of rotten strawberries Observed Counts 0 1 3 10 34 2 3 4 5 6 7 8 9 63 100 100 82 56 34 13 10 11 12 0 2 more than12 2 1 Perform an appropriate test, showing all the details, to check whether or not the distribution of rotten strawberries can be represented by a Binomial distribution with n = 20. (You will need to calculate p). 5.9 Summary and assessment At this stage you should be able to: õ identify situations in experimentation where a hypothesis test will produce a useful result õ appreciate the ideas of null and alternative hypotheses õ use the standardised Normal distribution in hypothesis tests involving large samples õ use the student’s t distribution in hypothesis tests involving small samples õ explain Type 1 and Type 2 Errors õ use the formulae for standard error and test statistic in the cases of a) single mean - large samples b) single proportion - large samples c) difference between two means - large samples d) difference between proportions - large samples e) single mean - small samples ö f) difference between two means - large samples c õ decide when to use a One or Two Tailed Test õ appreciate the concept of degrees of freedom õ calculate confidence interval for population mean based on a sample mean from a small sample õ use a paired t test H ERIOT-WATT U NIVERSITY 2003 ANSWERS: TOPIC 5 Answers to questions and activities 5 Hypothesis Tests Hypothesis testing (page 9) Q1: Using a systematic sampling technique of taking every fifth number, a sample is obtained. 2.59 2.16 1.51 2.59 1.60 1.91 1.67 1.90 1.44 2.04 2.59 2.12 1.86 2.58 1.77 2.99 2.02 1.88 1.79 2.19 2.57 1.78 1.49 2.19 2.08 1.94 2.04 2.49 2.29 2.04 Now using the statistical functions on a calculator, or using Excel, Minitab or another statistical package, the mean and standard deviation for the sample is found. ÷±øùúû;üdû ýþøkûúÿ From the sample results it looks as if the time taken may be more than 2 minutes so set up a Hypothesis test in the form: H0 : 2.000 H1 : 2.000 The sample standard deviation can be used as an estimate for the population value since this is a large sample. Now the standard error can be calculated as ú úfø ø øûúû;üdû & økûú)((( The test statistic is ø ! "# ø%$ & $ ' Comparing with the standardised Normal distribution curve, and using a significance level of 5%, the test statistic is in the shaded region. * c H ERIOT-WATT U NIVERSITY 2003 40 ANSWERS: TOPIC 5 41 Therefore the null hypothesis is accepted. The claim that the questionnaire takes 2 minutes to complete is valid using a significance level of 5%. Note: Your numbers will be different if you took a different sample Play length (page 23) Q2: 1. Set up the hypotheses: H0 : + 1 - + 2 = 0 H1 : + 1 - + 2 - , 0 (Subscripts 1 refer to Country and 2 Pop). 2. Calculate sample means and sample standard deviations .;: -<53 =>75 .0/ -214365879 ?/ -@=A3)19>B ?C: -@=A3)D58D E#/ -GFH= E0: -2I 3. j Calculate the Standard Error: / J 3 KL34-%?4M N / N O8PRQ OS with ? M /WVYX SP Z O SU /WVYX SS N T O PU T ? M - N so O P Z O S[U : / / J Therefore, 3 KL34-f? M N O P Q O S c /WVYX SPWZ O SU /WVYX SS T O PU T O8P Z OS U : /]\C^4\_ `a&b S Zdc ^4\_ :e&: S - =A3)1D7 @ /]\ Zdc U : / / -@=A3)1D7 N /]\ Q -gA = 3hFi9= c H ERIOT-WATT U NIVERSITY 2003 ANSWERS: TOPIC 5 42 4. Use the standard error in the test statistic: In the case of difference of two means for small samples this is given by kmlon pqr pCsy>turv z {#nxw>z qr wst And by the null hypothesis, | kml%} z ~& z6r ]~ z ~ l 4) 1 -| 2 = 0. 5. Compare with t distribution curve with 17 degrees of freedom (recall that n is calculated as n1 +n2 -1). Use a significance level of 0.05 (0.025 each side): 6. Make a conclusion: Since the test statistic is in the shaded region, reject the null hypothesis and accept the alternative one. There is evidence at the 5% level that the duration times of Country and Pop CD singles are different. Charter airlines (page 32) Step1: Add totals to the table and calculate probabilities Airline High Life Good Average Poor Total 50 40 30 120 Sky Coaxing 40 50 80 170 Up and Away 35 125 55 145 50 160 140 430 Total c H ERIOT-WATT U NIVERSITY 2003 ANSWERS: TOPIC 5 43 p(H) = 120/430 = 0.279; p(S) = 0.395; p(U) = 0.326; p(G) = 125/430 = 0.291; p(A) = 0.337; p(P) = 0.372. Step 2: Hypothesis Test and Expected Values (EV) H0 : There is no relationship between airline and reliability H1 : There is a relationship between airline and reliability p(H and G)=0.279 0.291 = 0.081 EV(H and G)=0.081 430 = 34.9 EV(H and A) = 0.279 0.337 430 = 40.4 By subtraction, EV( H and P) = 120 - 40.4 - 34.9 = 44.7 EV(S and G) = 0.395 0.291 430 = 49.3 EV(S and A) =0.395 0.337 430 = 57.2 EV(S and P) = 170 - 49.3 - 57.2 = 63.5 The last row can be done by subtractions to make up column totals. Expected Values Airline High Life Good Average Poor Total 34.9 40.4 44.7 120 Sky Coaxing 49.3 57.2 63.5 170 Up and Away 40.8 125 47.4 145 51.8 160 140 430 Total Step 3: Calculate Test Statistic Test statistic = #0 The degrees of freedom, , for contingency table problems is calculated by (number of rows - 1) (number of columns - 1) So in this case, = (3 - 1) (3 - 1) = 4 Also in this problem then, the test statistic is given by: ¢¡ 50 - 34.9 £ 34.9 2 40 - 40.4 £ + ¡ 40.4 2 ¤<¥¦¥¦¥¦¥¦¥¦¥¤ ¡ 50 - 51.8 £ 51.8 2 = 20.425 Step 4: Compare with chi-squared tables. With 9 degrees of freedom the "cut-off" point for a 5% significance test is 16.9. Since the test statistic is in the shaded region (see graph below) there is evidence at the 0.05 level that there is a relationship between airline and reliability. § c H ERIOT-WATT U NIVERSITY 2003 ANSWERS: TOPIC 5 ¨ c H ERIOT-WATT U NIVERSITY 2003 44