Download Inference for Means - Columbia Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Inference for Means
We can use the STATA commands ci and ttest to construct confidence intervals
and perform hypothesis tests for problems dealing with population means.
A. T-tests and confidence intervals
The Cars data set gives the price in dollars and the weight in pounds for a
number of 1991 model four-door sedans listed in a particular auto guide.
American made cars are coded with a "0" and foreign brands are coded with a
"1". The car data can be accessed by typing:
use http://www.stat.columbia.edu/~martin/W1111/Data/Cars
in the STATA command window.
To construct a level C confidence interval for the variable var, use the command
ci var, level(C)
For example, to get a 90% CI for the average price among all cars use the
command
ci price, level(90)
This gives the following output:
From the output we see that a 90% CI for the average price is (15143.16,
18266.87).
To construct either a one or two-sample t-test use the command ttest. For
example suppose the null hypothesis of our test is that the mean price of all fourdoor sedans is equal to $18,000 and the alternative hypothesis is that the mean
is less than $18,000. To investigate this claim we need to use a one-sample ttest.
This can be done using the command
ttest price = 18000
This gives the following output:
The output lists summary statistics for the variable price as well as the results of
three different tests of significance that correspond to each possible alternative
hypothesis.
For our test, we need to look under the column that reads Ha: mean<18000, as
this corresponds to the alternative hypothesis we stated above. The P-value of
this particular test is equal to 0.0858.
Suppose instead we want to determine whether there is a significant difference
between the mean price of foreign and domestic four-door sedans. To investigate
this claim we need to conduct a two-sample t-test to compare the mean price of
the foreign and domestic cars.
To perform a two-sample test use the command
ttest var, by(type)
Here var is the response variable of interest (e.g., car price) and type is a
categorical variable that splits the data into samples from separate populations
(e.g., foreign/domestic).
By default STATA assumes that the variances in both populations are equal.
Typically, we do not want to make this assumption. To override this default, and
allow the variances to be unequal, we must include the option unequal at the end
of the command, i.e.
ttest var, by(type) unequal
To perform a two-sample t-test to compare the mean price of the foreign and
domestic cars, use the command:
ttest price, by(cartype) unequal
which gives the following output:
Again, this command gives summary statistics as well as the results of three
different tests of significance corresponding to each of the possible alternative
hypothesis. For example, if our alternative hypothesis was that there is a
difference between the means, the corresponding P-value is equal to 0.3174.
B. Immediate commands
As an alternative to the commands ttest and ci, we can use the immediate
commands for confidence intervals and tests of significance ttesti and cii. An
immediate command is a command that obtains data not from the data stored in
memory but from numbers typed as arguments. Immediate commands, in effect,
turn STATA into a glorified hand-calculator.
There are instances where you may not have the data, but you know something
about the data and what you do know is adequate to perform the statistical test.
For example suppose we want a 90% confidence interval for  and we do not
have access to the data but we know that
n  100 y  50 and s  8
To construct the confidence interval use the command:
. cii 100 50 8, level(90)
This gives the following output:
A 90% confidence interval for  is (48.67, 51.33)
To test H 0 :    0 using a one-sample t-test, use the command:
ttesti n ybar s mu0
where n is the sample size, ybar is the sample mean, s is the sample standard
deviation and mu0 is the hypothesized sample mean  0 .
Ex. Estimate the mean height of all Columbia students. The population of
students has mean  and standard deviation , both unknown.
We take a sample of 12 students and obtain y  66.30 and s  4.35 .
To test:
H 0 :   68 H a :   68
use the command
ttesti 12 66.3 4.35 68
This gives the following output:
Since the alternative hypothesis is two-sided, the p-value is 0.2030.
To test H 0 :  1   2 using a two-sample t-test we can use the command:
ttesti n1 ybar1 s1 n2 ybar2 s2, unequal
where n1 and n2 are the sample sizes, ybar1 and ybar2 are the sample means
and s1 and s2 are the sample standard deviations of each of the two samples.
Ex. Testing the effect of a new medication on pulse rate - 60 subjects are
randomly divided into two groups of 30. One group is given the new medicine
and the other a placebo.
Group
1 – Medicine
2 – Placebo
Sample
size
30
30
Sample
mean
65.2
70.3
Does the medicine reduce pulse rate?
To test
H 0 : 1   2  0 and H a : 1   2  0
use the command:
ttesti 30 65.2 7.8 30 70.3 8.4 , unequal
which gives the following output:
According to the output, the p-value is 0.0090.
Sample standard
deviation
7.8
8.4
HOMEWORK:
Q1. Answer the following questions about the Cars data set described above.
1. Read the Cars data by typing:
use http://www.stat.columbia.edu/~martin/W1111 /Data/Cars
in the STATA command window.
2. Construct a 95% CI for the average weight of all four-door sedans.
3. Is there significant evidence that the mean weight of all four-door sedans
is below 3,100 pounds?
(a) State the appropriate null and alternative hypothesis.
(b) What is the P-value of the test?
(c) Are the results significant at the 5% level?
4. If there a significant difference in weight between foreign and domestic
cars?
(a) State the appropriate null and alternate hypothesis.
(b) What is the P-value of the test?
(c) Is there a significant difference between the weights at the 5% level?
Q2. Do problem 23.32 from the textbook.
Solve the problem using STATA and the ttesti command. Make sure to hand in
your log file and answers to any questions in the text.
Hand in your log file together with the answers to the questions above.