* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lab 11
Survey
Document related concepts
Transcript
MAT 214 Lab 10: Hypothesis Testing: Large and Small Samples and Assessing normality In this lab we will practice writing up hypothesis tests for the mean. As we have discussed in class, our hypothesis testing procedure has several steps. 1. State the null and alternative hypotheses in words and symbols. 2. Choose a level of significance, α, test to use and corresponding critical value(s). 3. State the test statistic that is to be used. 4. State the decision rule. Calculate the rejection region. (optional) 5. State the assumptions that were used. 6. Calculate the test statistic and p-value. 7. State the decision and write your conclusion in terms of the problem. 8. Discuss type of error possible and it probability. Hypothesis Testing – Review H0: μ= μ0 (for some particular value μ0) Ha: μ< μ0 (One-tailed test to the left) Choose α and calculate appropriate statistic. This is a one-tailed test, so reject H0 if statistic lies in the left-tail rejection region (containing area α). The statistic should be negative. H0: μ= μ0 (for some particular value μ0) Ha: μ > μ0 (One-tailed test to the right) Choose α and calculate appropriate statistic. This is a one-tailed test, so reject H0 if statistic lies in the right-tail rejection region (containing area α). The statistic should be positive. H0: μ= μ0 (for some particular value μ0) Ha: μ≠ μ0 (Two-tailed test, either to the left or to the right) Choose α and calculate appropriate statistic. This is a two-tailed test, so reject H0 if statistic lies in the left-tail rejection region (containing area α/2) OR in the right-tail rejection region (containing area α/2). The statistic could be negative or positive. When we calculate the statistic, we can also determine the p-value (the total area in the appropriate tail or tails (depending upon the hypotheses) determined by the calculated value of the statistic. Quick-and-dirty decision-making: If the p-value is less than α, we reject H0. If the p-value is not less than α, we cannot reject H0. We will use this approach throughout this lab. I. This time you will learn how MINITAB can help a statistician on testing hypotheses for the mean using a large sample. As usually, any information or graphs you create should be copied to your write-up. The data file 1STGRADE.MTW can be found in MINITAB data folder. While running hypotheses tests, please follow the steps from handout. 1) Hypothesis Testing – Using Minitab to aid in the calculations, i.e. perform the descriptive statistics. a) The U.S. government claims that the mean height of a first grader is 42 inches. A researcher feels that the actual mean height of first grade students exceeds 42 inches. Set up a hypothesis test for this experiment and use the descriptive statistics of Height in the 1STGRADE.MTW file as the results of the experiment. Do the test at the 0.05 level. b) Would the conclusions have changed if we had used 0.01 ? What about 0.1? Make sure you state the new rejection regions. 2) Hypothesis Testing – Using Minitab to do the entire test. a) We will redo the above test using Minitab to calculate the z-values as well as calculate the p-value of the test. b) Consider the above hypothesis test for the height (1a). i) Do the following. (1) →Stat → Basic Statistics → One-Sample t. Even though we are using the normal distribution here Minitab will not use s for unless we use the t-test. We will discuss in class this week what the t-distribution is and how it relates to the z-distribution. In the Samples in columns box select the Hgt variable. (2) In the test mean box type in 42, which is 0 → Options. (3) Make sure the confidence level is set to 95.0, which means that 0.05 (4) Change the Alternative to greater than → OK → OK. ii) What are the T and P values for the test? iii) Thinking of the T as the Z value what is your conclusion? iv) Change the significance level to 0.01 and 0.1. What are the T and P values for each test? Thinking of the T as the Z value what is your conclusion? What have you noticed about the p-value when you change the significance level? Recall from class that when P value we reject the null hypothesis. For each of the three values of used in each of the two tests compare them with the p-value to determine your conclusion. c) Next use Minitab to test the hypotheses for the following situation: The U.S. government claims that the mean weight of a first grader is 43 pounds. A researcher feels that the actual mean weight of first grade students exceeds 43 pounds. Set up a hypothesis test for this experiment using Weight. Do the test at the 0.05 level. d) As the final answer to hypotheses tests about mean height and weight of first graders state the conclusion and type of error associated with it. II. Next we will be considering small samples. Using the t-test for small samples requires a normality of data set. Before checking our data set for normality, let’s see how some of the tests for normality work and how the tests detect normal and non normal data. We want to get a sense of what these tests produce when applied to several distributions. We will look at the normal, Cauchy, and exponential distributions. Use a new worksheet and name columns C1, C2, and C3, N(0,12), C(0,12), and Exp(12), respectively A. Generate the Random Samples: 1. Put a random sample of size 25 from N(0,12) in column N(0,12): Click Calc Click Random Data Click Normal Generate 25 rows of data Store the data in N(0,12) Let the mean equal 0 Let the standard deviation equal 12 Click OK 2. Put a random sample of size 25 from Cauchy(0,12) in column C(0,12): Click Calc Click Random Data Click Cauchy Generate 25 rows of data Store the data in Cauchy(0,12) Let the location equal 0 Let the scale equal 12 Click OK 3. Put a random sample of size 25 from Exp(12) in column Exp(12) Click Calc Click Random Data Click Exponential Generate 25 rows of data Store the data in Exp(12) Let the mean equal 12 Click OK B. Check each column for normality. We will do this by producing a probability plot and the Anderson Darling Test for each column. A probability plot is a scatter plot of the data on “normal” graph paper and a fitted straight line. If you are sampling from normal the scatter plot should look like a straight line. The Anderson Darling Test tests the null hypothesis: the data is normal against the alternative hypothesis: the data is not normal. A small p-value would mean the test suggests the data is not normal. 1. Producing the plot and the Anderson-Darling Test: Click Stat Click Basic Stat Click Normality Test Double Click the column name, so that it appears in the variable box Select the Anderson-Darling Test and Click OK Discuss both the probability plot and the Anderson Darling Test for each column. Conclude, which of the generated samples comes from normal population. III. Go to the help menu and look up the data file BANKING.MTW. Copy and paste the information from the help system into your word document. Also, load the BANKING.MTW file into Minitab. A. Complete hypothesis test for the mean of the 1991 year. Test the mean being less than 450 against the mean being equal to 450. Use the 0.05 level. 1. Using the sign test Click Stat Click Nonparametrics Click 1-sample sign Double Click 1991(so that it appears in the variable box) Click Test Median, type 450 in the box Click the Alternative box, and select “less than” Click OK 2. Using the Wilcoxon test Click Stat Click Nonparametrics Click 1-sample Wilcoxon Double Click 1991(so that it appears in the variable box) Click Test Median, type 450 in the box Click the Alternative box, select “less than” Click OK 3. Using the t-test Click Stat Click Basic Stat Click 1-sample t-test Double Click 1991 (so that it appears in the variable box) Click Test Mean, type 450 in the box Click Options Click the Alternative box, select “less than” Click OK For each test decide whether Ho would be rejected at the .05 level of significance. How do the p-values for each test compare? Do all of the above tests lead to the same conclusion? B. Repeat part A 1-3 for the mean of the 1990 year. Test the mean being greater than 350 against the mean being equal to 350. Use the 0.05 level. IV. Check the 1990 and the 1991 data sets for normality. Was the t-test used in Part II appropriate for these data sets? Explain. V. To use (non-parametric) Wilcoxon Test we need to know that the data comes from a symmetric continuous distribution. Discuss how you might assess symmetry and then try to address whether or not the Wilcoxon test used in Part II was appropriate for the 1990 and 1991 data sets. (Hint: Minitab can make a histogram and there is also a symmetry plot, if you click on Stat and then Quality Tools.)