* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Answers to Practice Exam #2
History of statistics wikipedia , lookup
Inductive probability wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Misuse of statistics wikipedia , lookup
Answers to Practice Exam #2 Case 1 - We need to use a hypothesis test for this case since we are trying to establish if women athletes have a higher mean blood cell count compared to the population of young women in general whose mean is generally 4.5 million. Since is not known we need to use the t test in order to calculate the probability. The sample mean ( x ) is 4.9 and the sample standard deviation is s = 0.42 . H0 = = 4.5 million t Ha = > 4.5 million. 4.9 - 4.5 3.30. So we want to know what is the probability that we could have observed a value of 4.9 for the sample 0.42 12 mean of twelve women athletes if the true population mean for the red blood cell count is 4.5 million. The t value of 3.30 will help us determine this question. P( x > 4.9) = P(t > 3.30 ) with 11 degrees of freedom. Using Excel I get P(t > 3.30) = 0.0035. This is less than 1%, so this provides strong evidence against the null hypothesis. If the true blood cell count of women athletes is 4.5 million we would observe a sample blood cell count as high as 4.9 from averaging twelve women, less than 1% of the time. So we can say that this result is significant at the 1% level, for example. Using the table of values I see that 0.0025 < P-value < 0.005. Case 2 – In this case we use a hypothesis test, however, since is known we will use a z score to calculate the probability. Ho : = 100 z Ha: < 100 93 100 -2.95. If the true mean IQ is 100, what is the probability of observing a sample average of 93 for 40 inmates? 15 40 P( x < 93) = P(z < -2.95) 0.0027 (p-value) using excel. This means we have less than 1% chance of observing such a low number (an IQ of 93) if the true mean is 100. So our result is significant at the 1% level ( = 1%). Case 3 - In this case we need to use a hypothesis test where we compare two population means. Since is known, then we will use the z statistic instead of the t statistic. Ho: w = nw Ha: w < nw. Let 1 equal the mean sentence for white convicted criminals and let 2 equal the mean sentence for non white criminals. s 2p t (30 1)26 (38 1)30 28 30 38 2 34 32 1 1 28 30 38 1.55. So if there is no difference between the means of each population the probability of observing a difference as large as 2 years between the two population is given by P(t > 1.55) = 0.0684. My p-value is 0.0684. Case 4 – The 95% confidence interval is (1400 1200) 1.976 802 1202 using 149 degrees of freedom. 200 150 I used Excel and the TINV command: =tinv(0.05,149) = 1.976 = t* My confidence interval is (178, 222). So my procedure has a 95% chance of capturing the true mean difference between the average lifetime of the two bulb brands. So it would be safe to say that this study shows that brand A bulbs last longer on average than brand B bulbs. Case 5 – We are dealing with proportions here and we want to estimate the true proportion, so a confidence interval is in order. If I use the large sample C.I. formula I get .55 1.96 .55(1 .55) which yields (0.452, 0.648) . If I use the Wilson 100 estimate to create the confidence interval I get 57 57 1 55 2 104 104 1.96 which yields the interval (0.452, 0.644) nearly the same as above. There is two 100 4 104 reasons for this; n is 104 which is considered to be a big sample, but the other factor which plays a bigger role is that our sample proportion is near 0.5. The Wilson Estimate was created to address small sample problems and value near 0 or 1. Some argue however that you need a large sample anyway to create confidence interval for proportions using a Z statistic. Case 6 - In this case we are comparing the population means of two populations. The standard deviations are not known so we need to use a t statistic in order to calculate the probability. x op = 150.2 x np sop = 10.1 = 158.2 snp = 9.2 Ho: np = op Ha: np > op. t 158.2 150.2 10.12 9.2 2 10 10 1.85. The degrees of freedom is 9. Now calculate P(t > 1.85) = 0.0487 (p-value found using excel =tdist(1.85, 9, 1). Thus we would observe such a difference or larger about 4.87% of the time even though the mean of the compression tests for the two processes are equal. Thus at a 10% significance level our result is significant. But at 1% this difference is not significant. Case 7 – In this case we are looking at prescribing some treatment to the same subject and testing them before and after to compare the results. Since we can identify with a particular subject the before and after measurement we can determine what was the change due to the treatment on that particular subject. So a matched t pairs experiment is the prescribed test of hypothesis for this situation. Subject Pretreatment Post treatment Difference 1 3.4 4.5 -1.1 2 3.6 3.4 0.2 3 4.8 6.5 -1.7 4 3.4 3.7 -0.3 5 4.8 7.4 -2.6 6 5.8 6.0 0.2 7 4.2 8.4 -4.2 8 5.7 8.5 -2.8 9 4.1 7.5 -3.4 10 4.3 4.3 0 Thus our population of interest is going to be the differences of the before and after treatment measurements. H0: D = 0 Ha:D < 0. Notice that my alternative says that the mean is less than zero. This is due to the way I subtracted the numbers. x = -1.57 sD = 1.61 t 1.57 0 -3.08 with 9 degrees of freedom. So now we calculate P(t < -3.08) 0.0066 (p-value). 1.61 10 Clearly the difference observed is not very common if we assume that the post and pre treatment measurements means are supposed to be equal. So our result is clearly significant at the 1% significance level. Case 8 – Our proportions will give the proportion of people that were divorced after 5 years. pˆ c 145 110 = 0.29 , pˆ nc = 0.22. 500 500 pˆ 145 110 255 0.51 500 500 500 Ho: c = nc Ha: c nc. z 0.29 0.22 1 1 0.51(.49) 500 500 = 2.21. P(Z > 2.21) = 0.0136. The P-value is 2(.0136) = 0.0272. Case 9 – Since all we care to do is estimate the mean diameter a confidence interval is in order. Since the population standard deviation is not known then I will use a t statistic. The degrees of freedom is 199, and the associated t value for 95% confidence is 1.972 which is very close to the associated z value 1.96. If I round to the nearest thousands of a unit there is no difference between the answers. 0.042 200 (0.818 , 0.830) 0.824 1.972 Case 10Veteran 1 TCDD in plasma 2.5 TCDD in fat tissue4.9 Difference -2.4 x D = -1.0 2 3.1 5.9 -2.8 3 1.8 4.2 -2.4 4 6.0 10 -4.0 5 6.9 7.0 -0.1 6 7.2 7.7 -0.5 sD = 1.8. Ho: = 0, Ha: 0. 1.0 0 P( x < -1.0) = P t 1.8 14 = P(t < -2.08) = .0289 using excel tdist(2.08, 14, 1). P-value = 2(0.0289) = .0578. 0.025< P(t < -2.08) < 0.05, .05 < P-value < 0.1. 7 2.5 2.3 0.2 8 4.1 2.5 1.6 9 4.7 4.4 0.3 10 3.3 2.9 0.4 11 1.6 1.4 0.2 12 1.8 1.1 0.7 13 3.5 6.9 -3.4 14 1.8 4.2 -2.4 Case 11 - Since the distribution is severely skewed (it is not by the way), and thus a matched t test is not appropriate, we then should employ the sign test. Car MPG without Additive MPG with Additive Difference 1 35.8 2 37.7 3 39.4 4 36.8 5 36.6 6 42.1 7 38.4 8 37.3 9 45.2 10 24.1 36.2 39.8 40.1 39.3 36.6 40.9 38.6 39.1 42.1 29.5 0.4 + 2.1 + 0.7 + 2.5 + 0 0.8 + 0.2 + 1.8 + -3.1 - 5.4 + Car number five showed no improvement, so it will not get counted, thus I really have just nine observations. Of those eight showed improved gas mileage, one did not. How likely is this to happen if we fuel additive did nothing to improve gas mileage? Binomial Probability Distribution, n = 9, p = 0.5 # of plus signs 0 1 2 3 4 5 6 7 8 9 Probability 0.001953 0.017578 0.070313 0.164063 0.246094 0.246094 0.164063 0.070313 0.017578 0.001953 Ho: The median mpg with or without the additive are equal. Ha: The median mpg with additive is larger than without. I was expecting 0.5(9) = 4.5 plus signs; no change. To the left is the probability of observing x plus signs. Since I yielded eight plus signs, the question will be what is the likely hood of observing that many plus signs or more if we were expecting 4.5 plus signs. Let X count the number of plus signs. P(X ≥ 8) = 0.017578 + 0.001953. The sum is my p-value; so my p-value is just under 2%. This shows that the result is not very likely to occur just by chance, but like always it is always up to the reader to decide if this result is rare enough for them to say it is statistically significant.