* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Statistics MINITAB
Sufficient statistic wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Statistics MINITAB - Lab 12 Comparing Two Population Means - Large Samples 1. When dealing with large samples we can use the Central Limit Theorem to test hypotheses about the difference between two population means. Summary From Lecture Notes 1. The mean of the sampling distribution of x1 x2 1 2 . is 2. If the two samples are independent, the standard deviation of the sampling distribution is 12 x x 1 Where 12 , 22 n1 2 22 n2 s12 s 22 n1 n2 are the variances of the two populations, and n1, and n2 are the sample sizes of the two groups respectively. x x 1 2 is also known as the standard error of the statistic x1 x2 . 3. The sampling distribution of x1 x2 is approximately normal for large sample (i.e. where n1, and n2 are both > 30) by the central limit theorem. 4. A large sample confidence interval for formula 1 2 may be calculated using the following 2 2 x1 x2 zcrit x1 x2 x1 x2 zcrit 1 2 x1 x2 zcrit n1 n2 s12 s22 n1 n2 On the onlineclass you will find a data set called IQ.mtw. This data set contains two variables, IQ Group1 are the IQ scores for children in their first year of school who had not attended any pre-school, IQ Group2 are the IQ scores for children in their first year of school who had attended pre-school for 1 year. An educational psychologists wished to investigate whether the data supports any evidence that children who attend pre-school display higher IQ scores. Here are the familiar steps in hypothesis testing - amended to reflect comparing two population means from independent large samples. Step 1. Choose the population characteristic of interest. D0 - the difference between the population means (i.e. 1 - 2) Step 2. Choose the significance level. = .10 (or 10% level). Step 3. State null hypothesis. Ho: D0 = 0 (i.e. 1 - 2 = 0 ) 1 Step 4. State alternative hypothesis. Ha: D0 < 0. Step 5. Choose a test statistic. In this case the test statistic chosen is z x1 x2 D0 12 n1 22 n2 x1 x2 D0 s12 s 22 n1 n2 Step 6. Choose a rejection region Since the alternative hypothesis includes only means of less than 0, this is a one-tailed test. The rejection region will be in the lower tail of the standard normal distribution. First get the z critical value such that 10% of the standard normal distribution is to the right and therefore 90% is to the left with the following command and then take the negative of the answer since this is a lower tailed test. MTB > INVCDF .90; SUBC> NORMAL 0 1. What is the answer ? _____________ Verify this in the Cambridge tables. Reject if z is ______________________ Step 7. Calculate the test statistic Fill in the following equation. z x1 x2 D0 2 1 n1 2 2 n2 x1 x2 D0 s12 s 22 n1 n2 = ______________________________ = ______________ Step 8. State Conclusion in the context of the question Reject / Fail To Reject the Ho: at = _______, that ______________________________________________________ ______________________________________________________ 2 Now calculate a two sided 90% confidence interval for the difference between the 2 population means: 90% Confidence Interval = (____________________, to _____________________) Comparing Two Population Means - Small Samples 2. We can also compare two population means for small samples. As in the one sample case we can no longer assume that the sample standard deviation gives a good estimate for the population standard deviation . Therefore the test statistic is no longer normally distributed by the central limit theorem. However a two sample t-test for independent samples is available in these cases. Summary From Lecture Notes 1. If the two samples are random and independently sampled, the mean of the sampling distribution of x1 x2 is 1 2 . 2. If the two samples are random and independently sampled, and both populations have the same variance, then a pooled estimate of variance may be obtained from the following weighted average of variances formula s 2p n1 1s12 n2 1s212 n1 n2 2 3. The test statistic t is distributed as t with n1 + n2 -2 degrees of freedom. t 4. A (1-)100% confidence interval for x1 x2 tcrit x1 x 2 D0 1 1 s 2p n1 n 2 1 2 is given by 1 1 s 2p n1 n2 where tcrit is the appropriate quantile from the student t distribution with (n1 + n2 - 2) degrees of freedom 3 The same educational psychologist that tested for the effects of pre-school education on IQ scores above implemented a new technique of teaching reading to ‘slow learners’. This new technique was implemented with 8 children from a class group and the remaining 12 members of the class were given the standard teaching method. After 6 months a standardised reading test was given to all the students and the results bellow were obtained. The psychologist wanted to report two things – the result of a two sample independent t test on the data to detect for any difference between the two teaching methods (with = .10) and to report a 2 sided 95% confidence interval for the mean difference between the two teaching methods. Reading Test Scores for Slow Learners New Method Standard Method 80 77 79 86 80 66 62 73 79 79 70 72 81 76 68 68 73 75 76 66 Since a standardised reading test is administered it is reasonable to assume that the distribution of reading scores on the test are normally distributed with equal variance. Enter this data into two columns in MINITAB and name these columns appropriately. Conduct this test using MINITAB’s two sample t-test for independent sample function. Go to STAT > BASIC STATISTICS > 2 SAMPLE T… 1. Specify the Columns where the data is stored 2. Click on assume equal variances 3. Click on options 4 4. Specify the appropriate confidence level i.e. (1-)100% 5. Specify the hypothesised D0 under the null hypothesis 6. Select the appropriate alternative hypothesis Fill in the following and state your conclusion fully. 1. What is the statistic of interest ? _________________________________ 2. What is the significance level ? ______________________ 3. State H0: _________________________________________________________ 4. State HA: _________________________________________________________ 5. What is the appropriate test statistic formula? 6. What is the rejection region? (Use Minitab or Cambridge tables) _____________________________________________________________ 7. What is the result of calculating the test statistic ? ____________________ What is pooled estimate of variance ? 8. ____________________ What is the conclusion from you hypothesis test ? ______________________________________________________________ ______________________________________________________________ 9. What is the 95% confidence interval for the mean difference between the two teaching methods ? 5 From (_______________, to _______________). 10. There is an apparent contradiction between the results of the hypothesis test and the range of the confidence interval. What is the apparent contradiction and why is it only apparent – i.e. explain why it is not a real contradiction in statistical terms. ___________________________________________________________________ ___________________________________________________________________ Assignment: Open the dataset called Data_lab12.mtw on onlineclasses and answer the following two questions: (a) There are two types of trees in a forest and two samples of each one are taken at random and their heights (in metres) are recorded (in column 1 and column 2). Is there evidence at 99% confidence that tree1 is taller than tree 2? [Hint: Notice the sample size by generating the summary statistics. There is no inbuilt function in minitab for this so the test statistic must be worked out by hand] Answer: H0: HA: : Test Statistic: Critical Value: P-Value: Conclusion: 6 (b) In the same forest there are two types of flowers, a and b. A random sample of each is taken and their heights (in centimetres) is recorded (in column 3 with column 4 indicating which type each flower is). Is there any evidence of a difference between the heights of the two flowers? [Hint: To familiarise yourself with the data, first display the summary statistics by using the flower type variable as the ‘by variable’.] Answer: H0: HA: : Test Statistic: Critical Value: P-Value: Conclusion: REVISION SUMMARY After this lab you should be able to : - perform a hypothesis test for two large independent samples - perform a hypothesis test for two small independent samples - be able to determine when to use normal distribution tables and when to use t distribution tables when dealing with 2 samples tests END 7