Download Inference for Means - Columbia Statistics

Inference for Means We can use the STATA commands ci and ttest to construct confidence intervals and perform hypothesis tests for problems dealing with population means. A. T-tests and confidence intervals The Cars data set gives the price in dollars and the weight in pounds for a number of 1991 model four-door sedans listed in a particular auto guide. American made cars are coded with a "0" and foreign brands are coded with a "1". The car data can be accessed by typing: use http://www.stat.columbia.edu/~martin/W1111/Data/Cars in the STATA command window. To construct a level C confidence interval for the variable var, use the command ci var, level(C) For example, to get a 90% CI for the average price among all cars use the command ci price, level(90) This gives the following output: From the output we see that a 90% CI for the average price is (15143.16, 18266.87). To construct either a one or two-sample t-test use the command ttest. For example suppose the null hypothesis of our test is that the mean price of all fourdoor sedans is equal to $18,000 and the alternative hypothesis is that the mean is less than $18,000. To investigate this claim we need to use a one-sample ttest. This can be done using the command ttest price = 18000 This gives the following output: The output lists summary statistics for the variable price as well as the results of three different tests of significance that correspond to each possible alternative hypothesis. For our test, we need to look under the column that reads Ha: mean<18000, as this corresponds to the alternative hypothesis we stated above. The P-value of this particular test is equal to 0.0858. Suppose instead we want to determine whether there is a significant difference between the mean price of foreign and domestic four-door sedans. To investigate this claim we need to conduct a two-sample t-test to compare the mean price of the foreign and domestic cars. To perform a two-sample test use the command ttest var, by(type) Here var is the response variable of interest (e.g., car price) and type is a categorical variable that splits the data into samples from separate populations (e.g., foreign/domestic). By default STATA assumes that the variances in both populations are equal. Typically, we do not want to make this assumption. To override this default, and allow the variances to be unequal, we must include the option unequal at the end of the command, i.e. ttest var, by(type) unequal To perform a two-sample t-test to compare the mean price of the foreign and domestic cars, use the command: ttest price, by(cartype) unequal which gives the following output: Again, this command gives summary statistics as well as the results of three different tests of significance corresponding to each of the possible alternative hypothesis. For example, if our alternative hypothesis was that there is a difference between the means, the corresponding P-value is equal to 0.3174. B. Immediate commands As an alternative to the commands ttest and ci, we can use the immediate commands for confidence intervals and tests of significance ttesti and cii. An immediate command is a command that obtains data not from the data stored in memory but from numbers typed as arguments. Immediate commands, in effect, turn STATA into a glorified hand-calculator. There are instances where you may not have the data, but you know something about the data and what you do know is adequate to perform the statistical test. For example suppose we want a 90% confidence interval for  and we do not have access to the data but we know that n  100 y  50 and s  8 To construct the confidence interval use the command: . cii 100 50 8, level(90) This gives the following output: A 90% confidence interval for  is (48.67, 51.33) To test H 0 :    0 using a one-sample t-test, use the command: ttesti n ybar s mu0 where n is the sample size, ybar is the sample mean, s is the sample standard deviation and mu0 is the hypothesized sample mean  0 . Ex. Estimate the mean height of all Columbia students. The population of students has mean  and standard deviation , both unknown. We take a sample of 12 students and obtain y  66.30 and s  4.35 . To test: H 0 :   68 H a :   68 use the command ttesti 12 66.3 4.35 68 This gives the following output: Since the alternative hypothesis is two-sided, the p-value is 0.2030. To test H 0 :  1   2 using a two-sample t-test we can use the command: ttesti n1 ybar1 s1 n2 ybar2 s2, unequal where n1 and n2 are the sample sizes, ybar1 and ybar2 are the sample means and s1 and s2 are the sample standard deviations of each of the two samples. Ex. Testing the effect of a new medication on pulse rate - 60 subjects are randomly divided into two groups of 30. One group is given the new medicine and the other a placebo. Group 1 – Medicine 2 – Placebo Sample size 30 30 Sample mean 65.2 70.3 Does the medicine reduce pulse rate? To test H 0 : 1   2  0 and H a : 1   2  0 use the command: ttesti 30 65.2 7.8 30 70.3 8.4 , unequal which gives the following output: According to the output, the p-value is 0.0090. Sample standard deviation 7.8 8.4 HOMEWORK: Q1. Answer the following questions about the Cars data set described above. 1. Read the Cars data by typing: use http://www.stat.columbia.edu/~martin/W1111 /Data/Cars in the STATA command window. 2. Construct a 95% CI for the average weight of all four-door sedans. 3. Is there significant evidence that the mean weight of all four-door sedans is below 3,100 pounds? (a) State the appropriate null and alternative hypothesis. (b) What is the P-value of the test? (c) Are the results significant at the 5% level? 4. If there a significant difference in weight between foreign and domestic cars? (a) State the appropriate null and alternate hypothesis. (b) What is the P-value of the test? (c) Is there a significant difference between the weights at the 5% level? Q2. Do problem 23.32 from the textbook. Solve the problem using STATA and the ttesti command. Make sure to hand in your log file and answers to any questions in the text. Hand in your log file together with the answers to the questions above.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Inference for Means - Columbia Statistics