* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download You construct a 95% confidence interval for the mean time taken to
Survey
Document related concepts
Transcript
You construct a 95% confidence interval for the mean time taken to process a new insurance policy. The values are (11,12) days. Which are the following statements is correct? 1. Only 5% of all policies take less than 11 or more than 12 days to process 2. Only 5% of all policies take between 11 and 12 days to process 3. The probability is 0.95 that all policies take between 11 and 12 days to process 4. About 95 out of every 100 such intervals constructed from random samples of the same size will contain the population mean processing time Which of the following statements about CI is incorrect? 1. If we keep the sample size fixed, the CI gets wider as we increase the confidence level. 2. A CI for a mean always contains the sample mean. 3. If we keep the confidence level fixed, the CI gets narrower as we increase the sample size. 4. If the population standard deviation increases the confidence interval decreases in width. 5. If the confidence intervals for two means do not overlap. There is evidence that the two population means are different. 1 A Canadian railway company claims that its trains block crossing take 8 min per train on average. To examine this claim 10 randomly selected trains block crossing were recorded. The average was 9.05. The population standard deviation for the times is known to be 0.9. 1. Test the company’s claim at a 5% significance level. 2. Provide the 95% confidence interval. The average growth of a certain variety of pine tree is 10.1 inches in three years. A biologist claims that a new variety will have a greater three year growth. A SRS of 25 of the new variety has an average three year growth of 10.8 inches. Assume that the population standard deviation of the new variety average three year growth is 2.1 inches. 1. Write the appropriate hypotheses. 2. Tests the hypotheses 3. What is your conclusion 2 Which are the following statements is correct? 1. The p-value measures the probability that the hypothesis is true 2. The larger the p-value, the stronger the evidence against the null hypothesis 3. A large p-value indicates that the data is consistent with the alternative hypothesis. 4. An extremely small p-value indicates that the actual data differs markedly from the expected if the null hypothesis were true. Statistics 111 - Lecture 12 Introduction to Inference More Hypothesis Testing 3 Tests and Intervals • There is a close connection between confidence intervals and two-sided hypothesis tests • 100·C % confidence interval contains likely values for a population parameter, like the pop. mean • Interval is centered around sample mean • Width of interval is a multiple of s n • A -level hypothesis test rejects the null hypothesis that = 0 if the test statistic T has a p-value less than T X 0 s n Tests and Intervals 4 Example: NYC blackout baby boom • Births per day from two weeks in August 1966 T X 0 433.6 430 0.3418 s 39.4 n 14 p-value for NYC dataset T with 13df prob =0.3689 T= -0.342 prob = 0.3689 T= 0.342 • Since are alternative hypothesis was two-sided our pvalue is the sum of both tail probabilities • p-value is 0.73796 5 Example: NYC blackout baby boom • Births per day from two weeks in August 1966 • Difference between our sample mean and the population mean 0 = 430 had a p-value of 0.7379, so we did not reject the null hypothesis at -level of 0.05 • We could have also calculated a 100·(1-) % = 95 % confidence interval: s s 39.4 39.4 X t ( n 1) 433.6 2.16 , X t( n 1) ,433.6 2.16 n n 14 14 2 2 (410.855,456.345) • What is the meaning of this 95% Confidence interval? Example: NYC blackout baby boom More Hypothesis Testing 6 Another Example: Calcium in the Diet • Calcium most abundant element in body, and one of the most important. Recommended daily allowance (RDA) for adults is 850 mg/day • Random sample of 18 people below poverty level: • Does the data support claim that people below the poverty level have a different calcium intake from RDA? Hypothesis Test for Calcium • Let be the mean calcium intake for people below the poverty line • Null hypothesis is that calcium intake for people below poverty line is not different from RDA: 0 = 850 mg/day • Two-sided alternative hypothesis: 0 850 mg/day • To calculate test statistic we know s = 188 mg T 747.4 850 2.315 188 18 • Need p-value: if 0 = 850, what is the probability we get a sample mean as extreme (or more) than 747.4 ? 7 p-value for Calcium • We have two-sided alternative, so p-value includes standard normal probabilities on both sides: prob = 0.017 prob = 0.017 T = -2.315 T = 2.315 • Looking up probability in table, we see that the two-sided p-value is 0.017+0.017 = 0.034 • Since the p-value is less than 0.05, we can reject the null hypothesis • Conclusion: people below the poverty line have significantly (at a =0.05 level) different calcium intake than the RDA Confidence Interval for Calcium • Alternatively, we calculate a confidence interval for the calcium intake of people below poverty line • Use confidence level 100·C = 100·(1-) = 95% • 95% confidence level means critical value T*=2.109 188 188 ,747.4 2.109 747.4 2.109 18 18 (653.509,840.890) • Since our hypothesized value 0 = 850 mg is not in the 95% confidence interval, we can reject that hypothesis right away! 8 Tests and Intervals • If our confidence level C is equal to 1 - where is the significance level of the hypothesis test, then we have the following connection between tests and intervals: A two-sided hypothesis test rejects the null hypothesis ( =0) if our hypothesized value 0 falls outside the confidence interval for • So, if we have already calculated a confidence interval for , then we can test any hypothesized value 0 just by whether or not 0 is in the interval! Cautions about Hypothesis Tests • Statistical significance does not necessarily mean real significance • If sample size is large, even very small differences can have a low p-value • Lack of significance does not necessarily mean that the null hypothesis is true • If sample size is small, there could be a real difference, but we are not able to detect it • Many assumptions went into our hypothesis tests • Presence of outliers, low sample sizes, etc. make our assumptions less realistic • We will try to address some of these problems next class 9 Small Samples • We have used the sample standard deviation and t distribution to correct our assumption of known population SD • However, even t distribution intervals/tests not as accurate if data is skewed or has influential outliers • Rough guidelines from your textbook: • Large samples (n> 40): t distribution can be used even for strongly skewed data or with outliers • Intermediate samples (n > 15): t distribution can be used except for strongly skewed data or presence of outliers • Small samples (n < 15): t distribution can only be used if data does not have skewness or outliers • What can we do for small samples of skewed data? Techniques for Small Samples • One option: use log transformation on data • Taking logarithm of data can often make it look more symmetric • Another option: non-parametric tests like the sign test • Not required for this course, but mentioned in text book if you’re interested 10