Download One-sample Hypothesis Tests in R.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
One‐sample Hypothesis Tests in R. Example. (The student’s t‐test) According to www.fueleconomy.gov, the Scion tC averages 23 mpg in city driving. A consumer group test drives 12 different tC’s and measures the fuel economy when one full tank of fuel is consumed driving in varying city conditions. In mpg, the following fuel economies were observed: 23.1 23.7 22.7 22.0 21.6 23.6 22.0 21.1 24.1 23.2 23.4 21.9 Based on this data, determine whether or not the fuel economy of the Scion tC differs significantly from the government’s estimate using .05. Solution. > fuel = c(23.1,23.7,22.7,22,21.6,23.6,22,21.1,24.1,23.2,23.4,21.9) > t.test(fuel, alternative='two.sided', mu=100, conf.level=.95) The first argument is a vector of data to be used, alternative can be either ‘two.sided’, ‘less’, or ‘greater’ depending if a two‐tailed, left‐tailed, or right‐tailed alternate hypothesis is used (more on that in a moment), mu is the null hypothesis for the mean, and conf.level is the confidence level of the test (1
). The output is below. One Sample t‐test data: fuel t = ‐1.0867, df = 11, p‐value = 0.3004 alternative hypothesis: true mean is not equal to 23 95 percent confidence interval: 22.09238 23.30762 sample estimates: mean of x 22.7 Since the p‐value is not less than , we do not reject the null and we conclude that the fuel economy of the tC is not significantly different then 23 mpg (at least on the basis of this data). On the other hand, maybe the consumer group is only interested in the case where the fuel economy is significantly less than what is being claimed. In this case a left‐tailed test might be appropriate, and the R command would be > t.test(fuel, alternative='less', mu=23, conf.level=.95) One Sample t‐test data: fuel t = ‐1.0867, df = 11, p‐value = 0.1502 alternative hypothesis: true mean is less than 23 95 percent confidence interval: ‐Inf 23.19578 sample estimates: mean of x 22.7 Again, the p‐value is not less than alpha, so we would not reject the null. We would not conclude the fuel economy of the tC is less than what is claimed. Example. In an elementary stats course, you see many problems which are artificially simplified. Using the above example, numSummary(fuel) gives a sample mean of 22.7 mpg with a sample standard deviation of 0.9563187 mpg. So, suppose the problem was posed as follows. According to www.fueleconomy.gov, the Scion tC averages 23 mpg in city driving. A consumer group test drives 12 different tC’s and measures the fuel economy when one full tank of fuel is consumed driving in varying city conditions. The mean mpg observed was 22.7 with a sample standard deviation of 0.956 mpg. Based on this data, determine whether or not the fuel economy of the Scion tC differs significantly from the government’s estimate using .05. Solution. This time, we use the fact that the test statistic follows a t distribution with n‐1 √
degrees of freedom. We find the rejection region and determine whether or not our observed t falls within the rejection region. > # Compute t‐statistic > t = (22.7‐23)/(.956/12^.5) > (22.7‐23)/(.956/12^.5) [1] ‐1.087061 > # Find rejection region by determining the middle 95% of a suitable > # t‐distribution > qt(c(.025,.975), df=11, lower.tail=TRUE) [1] ‐2.200985 2.200985 Just as in the normal distribution, qt returns the indicated percentiles in a t‐distribution. df is the number of degrees of freedom. We would reject the null hypothesis if we see a value of t either greater than 2.20 or less than ‐2.20. Since the observed t does not fall in the rejection region, we do not reject the null, and obtain the same conclusion as before. Example. (the z‐test) The z‐test is rarely used in practice, since the ‘true’ population standard deviation is seldom known. However, the z‐test can be done in R. From experience, a professor believes that the final course grades in a given course follow a normal distribution with mean 70 and population standard deviation 12.5. One semester, the mean final grade in such a class of 40 students is 74. Test the claim that this result differs significantly from 70, using a 10% significance level. Here, compute the test statistic and use the fact that follows a normal distribution with √
mean 0 and standard deviation 1. > # Compute z test statistic > (74‐70)/(12.5/40^.5) [1] 2.023858 > # Find the rejection region > qnorm(c(.05,.95), mean=0, sd=1, lower.tail=TRUE) [1] ‐1.644854 1.644854 We reject the null if we observe a z greater than 1.64 or less than ‐1.64. Since z = 2.02… falls in the rejection region, we reject the null hypothesis and conclude there was a significant difference between this particular class and the expected outcome. Example. (Testing Proportions) To test whether a coin is fair, it is tossed 50 times and 31 heads are observed. Determine whether the coin is fair on the basis of this data, using .05. The R command would be: > prop.test(31, 50, p=.5, alternative='two.sided', conf.level=.95, correct=FALSE) The first term is the number of observed successes, followed by the number of trials, the alternate hypothesis of choice, and the confidence level. correct=FALSE is a technical matter; it means Yates’ continuity correction is not used. All you need to know about that is the correction (if applied) will increase the p‐value of the test and thus make it harder to reject a null hypothesis using small samples of data. Output: 1‐sample proportions test without continuity correction data: 31 out of 50, null probability 0.5 X‐squared = 2.88, df = 1, p‐value = 0.08969 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.4815045 0.7413721 sample estimates: p 0.62 Since the p‐value is not less than alpha, do not reject the null. We would not conclude the coin is unfair at the 5% level (we would however conclude the proportion of heads is not 50% at the 10% significance level!). If Yates’ correction is used, the output is: > prop.test(31,50,p=.5,alternative='two.sided',conf.level=.95,correct=TRUE) 1‐sample proportions test with continuity correction data: 31 out of 50, null probability 0.5 X‐squared = 2.42, df = 1, p‐value = 0.1198 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.4716328 0.7500196 sample estimates: p 0.62 Note the slight increase in the p‐value with the correction applied. Note: In the various hypothesis tests in R, if ‘alternative’ is not specified, the default assumption will be a two‐tailed (two‐sided) test.