* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Stats Practical 3 2006
Psychometrics wikipedia , lookup
Eigenstate thermalization hypothesis wikipedia , lookup
Confidence interval wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
GY2170 Statistics Practical 3 STATISTICS PRACTICAL 3 The Normal Distribution, Student’s t Test & Confidence Levels 1. If the population of shell length-to-width ratios of a species of bivalve is normally distributed with a mean of 1.65 and a standard deviation of 0.05, what is the probability that any one shell picked at random has a length-to-width ratio: (i) less than 1.65; (ii) greater than 1.75; (iii) within two standard deviations of the mean; and (iv) between 1.58 and 1.69? 2. Assess whether a correlation coefficient value r = 0.9308 (N=18) represents a statistically significant relationship between two variables at the 99% confidence level (i.e. = 0.01) by testing your value against the hypotheses: Ho (null hypothesis) = 0 and Ha (alternate hypothesis) > 0 (N.B. your lecture handout will help you with this question). 3. This question follows on from Q1 of the Statistics 1 practical. Maximum pebble lengths were also measured at a second point-bar locality 5km upstream first. A sample of 31 pebbles yielded the following data (lengths in cm.): 9.7 12.6 14.8 10.6 12.6 14.9 10.9 12.8 15.1 11.0 12.9 15.6 11.5 12.9 16.2 11.8 13.0 17.4 11.9 13.1 17.9 12.2 13.1 18.2 12.2 13.4 19.6. 12.3 13.8 of the 12.4 14.3 Your task is to establish whether or not the pebble lengths at this locality are significantly larger than those at the first locality, i.e. whether or not the two samples are likely to have been taken from different populations. If the means of the two samples are x 1 and x 2 ,the null hypothesis (Ho) may be stated thus: “ x 2 is not significantly larger than x 1 ”. The alternative hypothesis (Ha) is: “ x 2 is significantly larger than x 1 ”. Student’s t test may be applied to this problem. When comparing two sample means, the test statistic, t, must be calculated using the expression:x1 x 2 t S S 1 N1 1 N2 N 1 1s12 N 2 1s22 N1 N2 2 where N1 and N2 are the sizes of samples 1 and 2, respectively, x 1 and x 2 are their means, s1 and s2 are their standard deviations, and the number of degrees of freedom is given by = N1+N2-2. (a) Calculate x 2 and s2 ( x 1 =10.51; s1=1.91 cm). (b) Supposing you wish to be 99% confident in your final decision, fix , the size of the critical region. (c) Decide whether you need to use the one-tailed or two-tailed test. (d) Refer to the table listing critical values of t as a function of and , and define the critical value of t in this case. (e) Calculate the t-statistic as given by the formula above. (f) Do you accept Ho or Ha? State your conclusion (hint: ignore the sign of the t value that you calculate, a negative sign just means that x 1 is smaller than x 2 ). 4. Volcanologists have measured the hydrogen content (in % of total number of atoms) of samples of gases collected from the 1970 and 1971 Mount Etna volcanic eruptions. Values are given in the following table: GY2170 Statistics Practical 3 1970 Hydrogen content (%) 35.8 45.5 35.5 32.0 50.0 39.0 37.0 47.0 38.5 36.0 40.5 35.5 45.5 37.0 36.0 53.0 1971 Hydrogen content (%) 42.0 57.0 42.0 54.5 35.0 52.5 43.5 48.0 45.0 44.6 48.5 63.0 55.0 40.0 37.5 53.7 (a) Calculate a mean hydrogen value for the 1970 eruption and use Student’s t-distribution to find the 95% confidence limits for the true value. (b) Use the Student's t-test for comparing means to determine whether there is a difference in the hydrogen content of the gases between the two eruptions at the 99% confidence level. GY2170 Statistics Practical 3 Statistics Practical 3: Answers 1. This question is best answered through the use of z-values and use of the standard normal curve: (i) The probability is 0.5 (i.e. 50% chance) that a shell will have a L:W ratio <1.65. z (ii) z 165 . 165 . 0 0 .05 i.e. 50% of the normal curve 175 . 165 . 2 i.e twice the standard deviation. 97.73% of the normal curve is under z = 2, so 0.05 the probability of picking a shell at random with L:W > 1.75 is 1-0.9773 = 0.0227. (iii) The probability that L:W is within 2 standard deviations of the mean = 0.9545. (iv) Probability of the L:W ratio being between 1.58 and 1.69 z 1.58 165 . 1.4 0.05 z 169 . 165 . 0.8 0.05 Cumulative probability z = +0.8 = 0.7881 Cumulative probability z = -1.4 = 0.0808 Pr = 0.7881-0.0808 = 0.7073 2. The correlation coefficient for r = 0.9308, N = 18. The test statistic is Student's t given as: N 2 t r 1 r2 t = 10.19 with (N - 2) degrees of freedom with 16 degrees of freedom The level of significance = 0.01 and, from the table, the critical value of t with = 16 is 2.583. We are dealing with a one tailed test so we reject Ho if t > critical value of t. Since this is true, we accept Ha and say that there is less than 1 in 100 chances of such an extreme correlation coefficient coming from a population coefficient = 0. 3. (a) For the additional pebble data x 2 = 13.57 cm and s2 = 2.32 cm (31 observations). (b) The critical region = (100-99)/100 = 1%. GY2170 Statistics Practical 3 (c) We are testing to see if x 2 is significantly larger than x 1 so a one-tailed test is used. (If we were testing to see if x 2 is significantly different from x 1 then we would use a two-tailed test). (d) The number of degrees of freedom, = N1 + N2 - 2 = 51 + 31 - 2 = 80. Reading from the table of t values for 80 degrees of freedom and = 0.01, then the critical value of t is 2.37. (e) x1 x 2 t S S S 1 N1 1 N2 N 1 1s12 N 2 1s22 N1 N 2 2 . 2 31 1 2.32 2 51 1 189 51 31 2 t 340.077 2.062 80 10.51 13.57 3.06 6 .52 2.061 151 131 0.4694 (g) The calculated value of t is negative, but all this means is that x 2 is less than x 1 (which we know already), when comparing t values we use the absolute value (i.e. 6.52). Therefore the calculated t value is greater than the critical value, thus Ho is rejected and Ha is accepted. We can state that there is a 99% certainty that x 2 is greater than x 1 and that the pebbles from the second locality are significantly larger, on balance, than those at the first locality. 4. (a) For the 1970 sample of gas: x1 40.24 s1 6.07 N 1 16 The interval containing the true hydrogen content is given by: 1 Pr x t SE( x ) x t SE( x ) 2 2 The significance level = 5% for a two-tail distribution and the number of degrees of freedom, = N-1 = 15. From the table the critical value of t = 2.131. GY2170 The standard error, SE( x ) of the mean is: Statistics Practical 3 s SE( x ) 1.518 N The 95% confidence interval for the value hydrogen content is therefore: 40.24 2.131 1.518 40.24 3.23% (b) For the 1971 eruption x 2 47.61 s 2 7.76 N 2 16 For comparing means the test statistic is x1 x 2 t S 1 1 N1 N 2 ( N 1 1 )s12 ( N 2 1 )s 22 S N1 N 2 2 Substituting values gives S t ( 16 1 ) 6.07 2 ( 16 1 ) 7.76 2 6.97 16 16 2 40.24 47.61 2.99 1 1 6.97 16 16 The calculated value of t is negative, but all this means is that x 1 is less than x 2 (which we know already), when comparing t values we use the absolute value (i.e. 2.99). The critical value of t for =1%, =N1+N2-2 = 30 and two-tailed distribution is 2.75. Therefore the calculated t value is greater than the critical value, thus Ho is rejected and Ha is accepted. We can state that there is a 99% certainty that x 2 is greater than x 1 and that the amount of hydrogen from the 1971 eruption was significantly higher, on balance, than that released during 1971.