Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Answers Practice Test, Section 3 Fall 2016 1. Regression Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Per capita consumption of margarine 8.2 7 6.5 5.3 5.2 4 4.6 4.5 4.2 3.7 Divorce rate in Maine 5 4.7 4.6 4.4 4.3 4.1 4.2 4.2 4.2 3.7 Predicted Divorce rate 5.02 4.73 4.62 4.34 4.31 4.03 4.17 4.15 4.08 3.96 Residual -.017 -.035 -.017 .065 -.012 .070 .029 .053 .123 -.260 Regress the Divorce rate in Maine (Y) on Per capita consumption of margarine (X) (data is real) a) Calculate the predicted Y for all the observations and enter in the chart above. b) Calculate the residual for all the observations and enter in the chart above. c) Write out the regression line: Divorce = 3.09 + .235 MargC d) Find and interpret the r & r2: r2 = .917. 91.7% of the variance of Divorces in Maine can be explained by the variance in consumption of margarine. r = .957. There is a strong positive correlation between Divorces in Maine and consumption of margarine. e) Find and interpret b: .235. For every additional unit (pound?) consumed of margarine, the divorce rate in Maine will rise by .235. f) Find and interpret the Y intercept: 3.09. If people consume no margarine, then the Divorce rate in Maine will be 3.09. g) Find and interpret the X intercept: -13.16. People would have to consume -13.16 units (pounds?) of margarine so that there would be no divorces in Maine. h) If in 2015, people ate 2.6 pounds of margarine, what is the predicted divorce rate? 3.70 divorce rate. Conduct a hypothesis test to see if the consumption of margarine has any impact on the Divorce rate in Maine. Use α=.02 significance level. Note: degrees of freedom = n - 2 i) Using the critical value method, what is/are the critical value(s), and which distribution is being used? First, note: Ho: β = 0, H1: β ≠ 0 Two-tailed test. We don’t know standard deviation, so it’s a t-test. Critical t-values: -2.90 & 2.90 j) Using the critical value method, what is the result (statistically) of the hypothesis test and why? The test statistic (from running the regression) is: 9.38 Since the test statistics > critical value (that is, 9.38 > 2.90), we reject the null hypothesis and accept the alternative hypothesis. k) Draw a diagram to represent the previous test rejection rejection -2.90 crit. value 0 2.90 crit. value 9.38 Test stat l) Use the p-value method to conduct a hypothesis test p = .0000137. Since p < α (that is, .00001 < .02), we reject the null and accept the alternative. m) State in English your results. There is sufficient, statistically significant evidence to reject the hypothesis that the consumption of margarine has no effect on the divorce rate in Maine, and we accept the alternative, that the consumption of margarine does affect the divorce rate in Maine. 2. A random sample of 300 college students was conducted. The students were asked if they had watched the TV show “Angry Housewives of Aptos”. 113 of them said yes, they had. a. What is the point estimate for the population proportion of college students who have seen the show? 37.7% b. Construct a 95% confidence interval for the population proportion (32.2%, 43.2%) c. What is the margin of error? 5.5% d. How many students would need to be sampled to have a 1% margin of error while maintaining the 95% confidence interval? 9023 (note, I used the p-hat estimate, .377, from above to estimate this) 3. (10 points) Central Limit Theorem a. What does the Central Limit Theorem say, and why is it so important? The CLT says that no matter how a population is distributed, that the sample mean (which is a random variable because it is the result of a random sample) will approximate a normally distributed random variable with a mean equal to the population mean and a standard deviation distributed equal to the population mean divided by the square root of the sample size. This implies that if the sample size is large enough, the sample mean will approximate this distribution. The common assumption is that a sample size of greater than 30 is large enough (or 10 & 10 if it’s proportion). It is important because even if we know nothing about the population, we can draw probabilistic and valid conclusions about the results of statistical samples if the population is large enough. b. The height of a maple tree is distributed normally with a mean of 31 meters and a standard deviation of 4 meters. What is the probability of a tree being taller than 36 meters? Represent this graphically 31 =normalcdf(36, 1E99, 31, 4) = 10.6% 36 c. A group of 20 trees is selected at random. What is the probability that the average height of these trees is more than 33 meters? Represent this graphically. =normalcdf(33, 1E99, 31, 4/√20) = 1.3% 31 33 d. A group of 20 trees is selected at random. What is the probability that the average height of these trees is between 30 and 32 meters? Represent this graphically. =normalcdf(30, 32, 31, 4/√20) = 73.6% 30 31 32 e. Calculate the Standard Error of the Mean Assuming the sample size is still 20 (from previous two questions). SE of Mean = σ/√n = 4/√20 = .8944 4. A random sample of 65 households is conducted, and they are asked about how much they spend on vacations and travel. The sample mean is $1,780. The population has a standard deviation of σ = $450. a. Construct a 95% confidence interval for the population mean. (1671, 1889) b. What assumption is made to create this confidence interval? Since the sample size is large enough (greater than 30), the Central Limit Theorem can be utilized so no matter how the population is distributed, x-bar will distributed close to normal. c. The population data is known to be heavily skewed to the right (the very rich spend a lot). Does this invalidate your results? No, since the sample size is large enough, no matter how the population is distributed, such as heavily skewed, then we can still assume that x-bar is distributed normally. d. What is the margin of error? 109 e. How many household would need to be surveyed for the margin of error to be $50 (at the same confidence level)? 312 5. A sample of beers that were bought at a particular bar were measured for their volume. The beers should be 16 ounces. Use Data Set A to represent the sample. Data Set A: 15.7, 15.8, 16.0, 15.5, 15.7, 15.9, 16.3, 15.1, 15.4, 15.9, 16.0, 15.9, 16.1, 15.8, 15.5, 15.4, 15.8, 15.7, 16.2, 15.6 a. Construct a 98% confidence interval for the population mean. (15.60, 15.93) b. What assumption is made to create this confidence interval? Since the sample size is 30 or less (small), the population must be distributed approximately normal to be able to construct a confidence interval with a given level of confidence. c. What is the margin of error? .167 d. What does it mean that you are 98% confident in that interval? If we were to repeat the experiment, collecting 20 observations of volume of beer pours, we would expect that 98% of those experiments would result in a confidence interval that includes (or captures) the true population parameter – what the actual true mean of the pours is. 6. Use Data Set A above. A sample of beers that were bought at a particular bar were measured for their volume. Test whether the average beer was less than 16 ounces. Use α=.05 significance level. a. State the null and alternative hypothesis H0: µ= 16 ounces H0: µ< 16 ounces b. Using the critical value method, what is/are the critical value(s), and which distribution is being used? Note, α = .01 (stated in class). Also, we MUST assume that the data is normally distributed since the sample size is 30 or less. It is a t-test, one-tail (left), 19 degrees of freedom. Critical value = -2.54 c. Using the critical value method, what is the result (statistically) of the hypothesis test and why? Test stat = -3.57 Since the tests stat < critical value (that is -3.57 < -2.54), we reject the null and accept the alternative hypothesis. d. Draw a diagram to represent the previous test. rejection -3.57 -2.54 Test crit. stat value 0 e. Using the P-value method, what is the result (statistically) of the hypothesis test and why? p = .001. Since p < α (that is .001 < .01), then reject the null, accept the hypothesis f. State, in English, the result of the hypothesis test There is sufficient information to conclude that the average beer poured is not 16 ounces, and we accept that the average beer poured is less than 16 ounces. 7. 150 randomly selected voters were surveyed. 81 of the voters said they would vote “yes” on Proposition O and/or P. Use α=.01 significance level. Conduct a hypothesis test to see if a majority of voters will pass the propositions. a. What conditions must hold make valid conclusions and are these conditions met? We must have at least 10 people of each category (10 who would vote yes and 10 who would vote no). We also much have 20 times as many people in the population as in the sample. This means the population must be at least 3000 (150 * 20). b. State the null and alternative hypothesis H0: p = .5 (50%) H0: p > .5 (50%) c. Using the critical value method, what is/are the critical value(s), and which distribution is being used? Proportions use the normal distribution (z values). One tail (right) test. z = 2.33 d. Using the critical value method, what is the result (statistically) of the hypothesis test and why? The test statistic = .980 We fail to reject the Null Hypothesis since the test stat is not more extreme than the critical value (that is .98 < 2.33) e. Draw a diagram to represent the previous test. rejection 0 .980 2.33 Test crit. stat value f. Using the P-value method, what is the result (statistically) of the hypothesis test and why? We fail to reject the null because the p-value is not less than the level of significance, that is p is not < α. p value = .164, which is not < .01. g. State, in English, the result of the hypothesis test There is insufficient evidence to conclude that a majority of voters are in favor Proposition O & P. 8. Use Data Set A above. A sample of beers that were bought at a particular bar were measured for their volume. Test whether the standard deviation of the pours is greater than .2 ounces. a. State the null and alternative hypothesis H0: σ = .2 H1: σ > .2 b. Using the critical value method, what is/are the critical value(s), and which distribution is being used? Note: a mistake, the level of significance was not specified – and it needs to be. So I will assume an α=.01. The distribution being used is χ2. This is a one tail (right) test. Since n=20, the degrees of freedom is 19. The critical value is 36.19 c. Using the critical value method, what is the result (statistically) of the hypothesis test and why? You need to calculate s (sample standard deviation). Use 1 Var Calc on the calculator. It will find that s = .294. Now use the formula for the χ2 which is it is equal to (n -1)s2/σ2. So now calculate 19*.2942/.22 = 41.14 Note, you should not use a rounded s estimate (the .294), use the actual value stored in the calculator. Since the test statistics (41.14) is more extreme ( > ) than the critical value 36.19, we reject the null and accept the alternative d. Draw a diagram to represent the previous test. rejection 0 36.19 41.14 crit. Test value stat e. State, in English, the result of the hypothesis test There is sufficient evidence to reject the hypothesis that the standard deviation of the pour of beers is .2 ounces. And we accept the alternative that the standard deviation is greater than .2 ounces. 9. What kind errors can be made when doing hypothesis testing, and how do we control those errors? There are two types of errors Type I error is when you reject the null hypothesis when it IS true. Type II error is when you fail to reject the null hypothesis when it is not true. In statistics, you choose explicitly the probability of Type I error. This is called the “level of significance” and is represented by an “α”. This means that in doing a hypothesis test, the probability of rejecting the null hypothesis that is true is α. If you reduce one type of error, you’ll increase the size of the other. For example, if you make α smaller, you make the Type II error larger. 10. What are the different probability distributions used in hypothesis testing and under what conditions are each used? There are three probability distributions. There’s the normal distribution where you use z-scores. When z-scores are used, this is the “standardized” normal distribution which has a mean µ = 0 and standard deviation σ = 1. The normal distribution is a bell-shaped curve. You use this when testing a mean and the population standard deviation is known. To use this distribution, the data must be distributed normally or the sample size must be greater than 30 so you can use the CLT. Normal distribution is also used in proportions because when you make an assumption of mean, p, you also form an assumption about the standard deviation. The assumption for proportions, since the data is a binomial, is that the sample size must be large enough to be distributed normally. This happens when there are at least 10 observations in each category. Also the size of the population must be 20 times the size of the sample There’s the t-distribution. This is used when you are testing a mean, but the population standard deviation is not known. The t-distribution is also a bell-shaped curve with a mean µ = 0 and standard deviation σ = 1 though it’s a bit heavier in the tails than the normal distribution. When using the t-distribution, you need to specific the degrees of freedom. The degrees of freedom will equal n-1 when testing a mean. When testing the β of a regression, degrees of freedom will equal n-2. To use the t-distribution, the data must be distributed normally or the sample size is large enough (>30) to use CLT. There’s the χ2 distribution. This distribution is used to test whether a standard deviation is equal to some quantity. The distribution starts at 0 and is skewed right. You must know the degrees of freedom which are n-1 for tests of a standard deviation. To use this distribution the data must be distributed normal. CLT doesn’t apply for this distribution. CLT is about the distribution of 𝑥̅ if the sample size is large enough, not about σ. So CLT can’t be used when testing σ.