Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Inductive probability wikipedia , lookup
Foundations of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Statistics 302 Midterm 2 Larget, Spring 2014 Solutions 1. True/False Problems. 3 points each, 15 points total. Write very brief explanations. (a) Circle either True or False (and explain/correct if False): The random variable X is the number of heads in 10 independent coin tosses with head probability 0.4. The random variable Y is the number of heads in 10 independent coin tosses with head probability 0.6. These random variables are independent of each other. Suppose that Z = X + Y : then Z is a binomial random variable with n = 20 and p = 0.5. Solution: False. Success probability is not the same for all trials. (b) Circle either True or False (and explain/correct if False): In a hypothesis test, the p-value is 0.03. Therefore, there is a 3 percent chance that the null hypothesis is true. Solution: False. The p-value is the probability of observing a test statistic at least as extreme as from the original data given that the null hypothesis is true. It is not the probability of the null hypothesis. (c) Circle either True or False (and explain/correct if False): One legitimate way to construct a confidence interval for a parameter is to include in the interval all values where the loglikelihood at that value is above a given threshold selected to be a certain distance below the maximum. Solution: True. (d) Circle either True or False (and explain/correct if False): If the p-value from a hypothesis test is not statistically significant, then the null hypothesis is true. Solution: False. Failure to reject the null hypothesis is not strong evidence that the null hypothesis is true. (e) Circle either True or False (and explain/correct if False): An appropriate 95% confidence interval for a population proportion with sample size n = 26 uses the 0.975 quantile from a t distribution with 25 degrees of freedom as the multiple of the standard error for the margin of error. Solution: False. Inference about proportions never uses t distributions, which arise from independent estimates of population means and standard deviations. 2. 5 parts, 5 points each, 25 points total. If genetic theory A is true, then offspring plants from a given cross have white flowers with probability 0.5 and yellow flowers with probability 0.5. If this theory is not true, then offspring plants have white flowers with probability 0.75 and yellow flowers with probability 0.25. There is a 50 percent chance that genetic theory A is true. In the actual cross, 7 offspring plants have white flowers and 3 have yellow flowers. For each problem, show how you calculate the probability and give a numerical answer, rounded to 4 decimal places. (a) If genetic theory A is true and there are 10 offspring plants, what is the probability that 7 will have white flowers? Solution: Let X = number of plants with white flowers from 10 offspring plants. If A is true, then X ∼ Binomial(n = 10, p = 0.5). 10 . P (X = 7 | A) = (0.5)7 (0.5)3 = 120(0.5)10 = 0.1172 7 (b) If genetic theory A is not true and there are 10 offspring plants, what is the probability that 7 will have white flowers? Solution: If A is not true, then X ∼ Binomial(n = 10, p = 0.75). 10 . P (X = 7 | not A) = (0.75)7 (0.25)3 = 120(0.75)7 (0.25)3 = 0.2503 7 (c) What is the unconditional probability that 7 of 10 offspring plants will have white flowers? Solution: This is a weighted average of the two conditional probabilities, or the sum of the probabilities of the two paths through the tree. . (0.5)(0.1172) + (0.5)(0.2503) = 0.1837 (d) Given the observed data, what is the probability that genetic theory A is true? Solution: This is Bayes’ rule or the result of a probability tree calculation. P (A | X = 7) = P (A and X = 7) . (0.5)(0.1172) . = = 0.3189. P (X = 7) 0.1837 (e) Given the observed data, what is the probability that there will be exactly 7 offspring plants with white flowers among an additional 10 offspring plants? Solution: This one is tricky, but follows an example from lecture. Let Y be the number of plants with white flowers in the next ten offspring. We want to know P (Y = 7 | X = 7). Just like earlier, the answer is a weighted average of the binomial probabilities of observing 7 success from 10 trials; however, the weights are not P (A) = 0.5 and P (not A) = 1 − 0.5, but, rather, are P (A | X = 7) = 0.3189 and P (not A | X = 7) = 1 − 0.3189. So, the answer is . (0.3189)(0.1172) + (1 − 0.3189)(0.2503) = 0.2079 Notice that the proportion of plants with white flowers among the first ten plants was 0.7 which is closer to 0.75 than the 0.5 predicted by theory A. This is consistent with the observation that the likelihood if A were true, 0.1172, is smaller than if A was not true, 0.2503. Hence, after seeing 7 plants with white flowers, we no longer think A and not A are equally likely, but have shifted the opinion toward the not A explanation. So, when we compute the probability of 7 plants with white flowers from the next batch of ten plants, we count the not A contribution to the calculation more heavily (1 − 0.3189 = 0.6811) than we did before (0.5). 3. 7 points each for (a), (c), (e), 3 points each for (b), (d), (f ), 30 points total. In a random sample of n = 210 students from a given university, each of whom lives with one other roommate, the following data was collected. Student brought videogame No Yes No Yes Roommate brought videogame No No Yes Yes Sample size 88 44 38 40 Mean GPA 3.128 3.039 2.932 2.754 Std. Dev. GPA 0.590 0.689 0.699 0.639 The following table has quantiles from t distributions with various degrees of freedom. ## ## ## ## ## df 39 43 82 83 0.9 1.304 1.302 1.292 1.292 0.95 1.685 1.681 1.664 1.663 0.96 1.798 1.793 1.773 1.772 0.97 1.937 1.932 1.907 1.907 0.975 2.023 2.017 1.989 1.989 0.98 2.125 2.118 2.087 2.087 0.99 2.426 2.416 2.373 2.372 0.995 0.9995 2.708 3.558 2.695 3.532 2.637 3.413 2.636 3.412 (a) Consider the population of students where both the student and roommate brought a video game to school. Construct a 95% confidence interval for the mean population GPA among such students. Show how you compute the margin of error. Solution: 0.639 or 2.75 ± 0.20 2.754 ± (2.023) √ 40 The multiplier is from a t distribution with 39 degrees of freedom. (b) Write an interpretation of the previous confidence interval in context. Solution: We are 95% confident that the population mean GPA at this university among students where the student and his or her roommate both brought a video game to school is between 2.55 and 2.95. (c) Construct a 95% confidence interval for the population proportion of students at this university that bring a video game to school. Show how you compute the margin of error. (Ignore roommate data for this problem.) Solution: The data shows that 84 (40 + 44) of the 210 students brought a video game to school, so p̂ = 84/210 = 0.4. A 95% confidence interval for the population proportion is r (0.4)(0.6) or 0.400 ± 0.066 0.4 ± 1.96 210 (d) Write an interpretation of the previous confidence interval in context. Solution: We are 95% confident that the proportion of students at the school that bring a video game to school is between 0.334 and 0.466. (e) Consider the population of students that did bring a video game to school. Test the hypothesis that the population mean GPA of such students is the same whether or not the roommate also brought a video game versus the alternative that the population mean GPA is lower when the roommate also brought a video game to school. Include: (1) a test statistic; (2) a sketch of how to compute the p-value, specifying a shaded area, number(s) on the axis, and name of distribution; and (3) a numerical range for the p-value based on quantiles above. Solution: Let µ1 be the population mean GPA of students where the student and roommmate both brought a video game to school and let µ2 be the population mean GPA of students where the student and brought a video game to school and the roommate did not. The hypotheses are H0 : µ1 = µ2 versus HA : µ1 < µ2 . The test statistic is 2.754 − 3.039 . = −1.97 t= q 2 2 (0.639) + (0.689) 40 44 The p-value is the area to the left of −1.97 under a t distribution with 39 degrees of freedom (using the safe and simple book method). You should draw a sketch for this. Based on quantiles above, 0.025 < p value < 0.03 as 1.937 < 1.97 < 2.023. (f) Interpret the results of the hypothesis test in context. Solution: There is fairly strong evidence that the mean GPA of students who bring video games to school is lower when their roommates also have video games than when the roommates do not. (p < 0.03, independent sample t-test.) 4. 6 parts, 5 points each, 30 points total. Data was collected on a sample of 273 female subjects from a population of individuals who have undergone a particular elective surgical procedure with a given result. (We exclude the 42 male patients on whom data was also available.) We compare the number of calories consumed per day between older women (aged 55 and older) versus younger women (aged less than 55 years). Here is a plot of the data (x values jittered to avoid overplotting). ● 4000 Calories per day ● ● ● 3000 2000 1000 ● ● ● ● ●●● ●● ●● ● ●● ● ● ● ● ● ●● ● ●● ●●●● ● ● ●● ●● ● ● ●●●● ● ● ● ●● ● ● ● ●●● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●●●●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ●●● ● ●●●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ●●● ● ● ● ●●● ● ●●●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●●●● ● ●● ●● ●● ● ● ●● ● ● ●●● ● ● ●● ●●●●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● Younger than 55 55 and Older Age Groups There are 187 women younger than 55 who consume an average of 1824.6 calories per day with a standard deviation of 653.2 and there are 86 women aged 55 and older who consume an average of 1560.5 calories per day with a standard deviation of 499.1. Here is the output from t.test(). younger = with(d, Calories[AgeGroup == "Younger than 55"]) older = with(d, Calories[AgeGroup == "55 and Older"]) t.test(younger, older, alternative = "greater") ## ## ## ## ## ## ## ## ## ## ## Welch Two Sample t-test data: younger and older t = 3.671, df = 211.7, p-value = 0.000153 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 145.3 Inf sample estimates: mean of x mean of y 1825 1560 (a) Define parameters and state null and alternative hypotheses for the test. Solution: Let µ1 be the population mean daily calories among women younger than 55 in the population of interest and µ2 be the same for women 55 and older. The hypotheses are H0 : µ1 = µ2 versus HA : µ1 > µ2 . (b) Compute the point estimate and standard error for the difference in sample means. Solution: The point estimate is 1824.6 − 1560.5 = 264.1 and the standard error is r (653.2)2 (499.1)2 . + = 72 SE = 187 86 (c) The computer output provides a test statistic. Show how it was calculated. Solution: . 264.1 t = 3.671 = 72 (d) How many degrees of freedom would we have used had we used the simpler method from the textbook? Solution: Simple rule is 86 − 1 = 85. (e) List at least two assumptions that underlie the statistical method used for this test. Are you concerned about the validity of the test? Briefly explain. Solution: Multiple acceptable answers: mine are normally distributed populations and random samples. The plot of the data shows no extreme outliers or skewness, so I am confident that the method is valid. Not many details are given about the sampling method, so I have questions about the validity of the inference due to potential bias in the sampling process. (f) Summarize the results of the hypothesis test in context. Solution: There is very strong evidence that women in this population who are younger than 55 years of age consume more calories per day than do women who are 55 years old and older (p < 0.0002, independent sample t test).