Download Statistics 302 Midterm 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistics 302 Midterm 2
Larget, Spring 2014
Solutions
1. True/False Problems. 3 points each, 15 points total. Write very brief explanations.
(a) Circle either True or False (and explain/correct if False):
The random variable X is the number of heads in 10 independent coin tosses
with head probability 0.4. The random variable Y is the number of heads
in 10 independent coin tosses with head probability 0.6. These random
variables are independent of each other. Suppose that Z = X + Y : then Z
is a binomial random variable with n = 20 and p = 0.5.
Solution: False. Success probability is not the same for all trials.
(b) Circle either True or False (and explain/correct if False):
In a hypothesis test, the p-value is 0.03. Therefore, there is a 3 percent
chance that the null hypothesis is true.
Solution: False. The p-value is the probability of observing a test statistic at least as
extreme as from the original data given that the null hypothesis is true. It is not the
probability of the null hypothesis.
(c) Circle either True or False (and explain/correct if False):
One legitimate way to construct a confidence interval for a parameter is to
include in the interval all values where the loglikelihood at that value is above
a given threshold selected to be a certain distance below the maximum.
Solution: True.
(d) Circle either True or False (and explain/correct if False):
If the p-value from a hypothesis test is not statistically significant, then the
null hypothesis is true.
Solution: False. Failure to reject the null hypothesis is not strong evidence that the null
hypothesis is true.
(e) Circle either True or False (and explain/correct if False):
An appropriate 95% confidence interval for a population proportion with
sample size n = 26 uses the 0.975 quantile from a t distribution with 25
degrees of freedom as the multiple of the standard error for the margin of
error.
Solution: False. Inference about proportions never uses t distributions, which arise from
independent estimates of population means and standard deviations.
2. 5 parts, 5 points each, 25 points total. If genetic theory A is true, then offspring
plants from a given cross have white flowers with probability 0.5 and yellow flowers with
probability 0.5. If this theory is not true, then offspring plants have white flowers with
probability 0.75 and yellow flowers with probability 0.25. There is a 50 percent chance
that genetic theory A is true. In the actual cross, 7 offspring plants have white flowers
and 3 have yellow flowers. For each problem, show how you calculate the probability
and give a numerical answer, rounded to 4 decimal places.
(a) If genetic theory A is true and there are 10 offspring plants, what is the probability
that 7 will have white flowers?
Solution: Let X = number of plants with white flowers from 10 offspring plants. If
A is true, then X ∼ Binomial(n = 10, p = 0.5).
10
.
P (X = 7 | A) =
(0.5)7 (0.5)3 = 120(0.5)10 = 0.1172
7
(b) If genetic theory A is not true and there are 10 offspring plants, what is the
probability that 7 will have white flowers?
Solution: If A is not true, then X ∼ Binomial(n = 10, p = 0.75).
10
.
P (X = 7 | not A) =
(0.75)7 (0.25)3 = 120(0.75)7 (0.25)3 = 0.2503
7
(c) What is the unconditional probability that 7 of 10 offspring plants will have white
flowers?
Solution: This is a weighted average of the two conditional probabilities, or the sum
of the probabilities of the two paths through the tree.
.
(0.5)(0.1172) + (0.5)(0.2503) = 0.1837
(d) Given the observed data, what is the probability that genetic theory A is true?
Solution: This is Bayes’ rule or the result of a probability tree calculation.
P (A | X = 7) =
P (A and X = 7) . (0.5)(0.1172) .
=
= 0.3189.
P (X = 7)
0.1837
(e) Given the observed data, what is the probability that there will be exactly 7
offspring plants with white flowers among an additional 10 offspring plants?
Solution: This one is tricky, but follows an example from lecture. Let Y be the
number of plants with white flowers in the next ten offspring. We want to know
P (Y = 7 | X = 7). Just like earlier, the answer is a weighted average of the binomial probabilities of observing 7 success from 10 trials; however, the weights are not
P (A) = 0.5 and P (not A) = 1 − 0.5, but, rather, are P (A | X = 7) = 0.3189 and
P (not A | X = 7) = 1 − 0.3189. So, the answer is
.
(0.3189)(0.1172) + (1 − 0.3189)(0.2503) = 0.2079
Notice that the proportion of plants with white flowers among the first ten plants
was 0.7 which is closer to 0.75 than the 0.5 predicted by theory A. This is consistent
with the observation that the likelihood if A were true, 0.1172, is smaller than if A
was not true, 0.2503. Hence, after seeing 7 plants with white flowers, we no longer
think A and not A are equally likely, but have shifted the opinion toward the not A
explanation. So, when we compute the probability of 7 plants with white flowers from
the next batch of ten plants, we count the not A contribution to the calculation more
heavily (1 − 0.3189 = 0.6811) than we did before (0.5).
3. 7 points each for (a), (c), (e), 3 points each for (b), (d), (f ), 30 points total.
In a random sample of n = 210 students from a given university, each of whom lives
with one other roommate, the following data was collected.
Student
brought videogame
No
Yes
No
Yes
Roommate
brought videogame
No
No
Yes
Yes
Sample size
88
44
38
40
Mean GPA
3.128
3.039
2.932
2.754
Std. Dev. GPA
0.590
0.689
0.699
0.639
The following table has quantiles from t distributions with various degrees of freedom.
##
##
##
##
##
df
39
43
82
83
0.9
1.304
1.302
1.292
1.292
0.95
1.685
1.681
1.664
1.663
0.96
1.798
1.793
1.773
1.772
0.97
1.937
1.932
1.907
1.907
0.975
2.023
2.017
1.989
1.989
0.98
2.125
2.118
2.087
2.087
0.99
2.426
2.416
2.373
2.372
0.995 0.9995
2.708 3.558
2.695 3.532
2.637 3.413
2.636 3.412
(a) Consider the population of students where both the student and roommate brought
a video game to school. Construct a 95% confidence interval for the mean population GPA among such students. Show how you compute the margin of error.
Solution:
0.639
or
2.75 ± 0.20
2.754 ± (2.023) √
40
The multiplier is from a t distribution with 39 degrees of freedom.
(b) Write an interpretation of the previous confidence interval in context.
Solution: We are 95% confident that the population mean GPA at this university
among students where the student and his or her roommate both brought a video
game to school is between 2.55 and 2.95.
(c) Construct a 95% confidence interval for the population proportion of students at
this university that bring a video game to school. Show how you compute the
margin of error. (Ignore roommate data for this problem.)
Solution: The data shows that 84 (40 + 44) of the 210 students brought a video
game to school, so p̂ = 84/210 = 0.4. A 95% confidence interval for the population
proportion is
r
(0.4)(0.6)
or
0.400 ± 0.066
0.4 ± 1.96
210
(d) Write an interpretation of the previous confidence interval in context.
Solution: We are 95% confident that the proportion of students at the school that
bring a video game to school is between 0.334 and 0.466.
(e) Consider the population of students that did bring a video game to school. Test the
hypothesis that the population mean GPA of such students is the same whether
or not the roommate also brought a video game versus the alternative that the
population mean GPA is lower when the roommate also brought a video game to
school. Include: (1) a test statistic; (2) a sketch of how to compute the p-value,
specifying a shaded area, number(s) on the axis, and name of distribution; and
(3) a numerical range for the p-value based on quantiles above.
Solution: Let µ1 be the population mean GPA of students where the student and
roommmate both brought a video game to school and let µ2 be the population mean
GPA of students where the student and brought a video game to school and the
roommate did not. The hypotheses are H0 : µ1 = µ2 versus HA : µ1 < µ2 . The test
statistic is
2.754 − 3.039
.
= −1.97
t= q
2
2
(0.639)
+ (0.689)
40
44
The p-value is the area to the left of −1.97 under a t distribution with 39 degrees of
freedom (using the safe and simple book method). You should draw a sketch for this.
Based on quantiles above,
0.025 < p value < 0.03
as 1.937 < 1.97 < 2.023.
(f) Interpret the results of the hypothesis test in context.
Solution: There is fairly strong evidence that the mean GPA of students who bring
video games to school is lower when their roommates also have video games than
when the roommates do not. (p < 0.03, independent sample t-test.)
4. 6 parts, 5 points each, 30 points total. Data was collected on a sample of 273
female subjects from a population of individuals who have undergone a particular
elective surgical procedure with a given result. (We exclude the 42 male patients on
whom data was also available.) We compare the number of calories consumed per day
between older women (aged 55 and older) versus younger women (aged less than 55
years). Here is a plot of the data (x values jittered to avoid overplotting).
●
4000
Calories per day
●
●
●
3000
2000
1000
● ●
●
● ●●● ●●
●●
● ●●
●
●
●
● ●
●●
● ●● ●●●● ●
●
●●
●●
● ●
●●●●
●
●
●
●● ●
●
●
●●● ●●
●●●
●
●
●
● ●
● ●●
●
●
●
●
●●●
●
●
●●●●●●
●
●●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●●●●
●●●
●
●●●●
●
●
●
● ●● ●
●
●
●
●●●
●
●
●●
●
●● ● ●
●
● ●●
●
●●
●●
●
● ●
●●●
●
●
●
●●● ●
●●●●
●
●
●
●● ●
● ●●●
●
●
●
●
●
●●
●●
●
●●
●
● ● ●● ● ●
●
●
●
●
●●●●●
●
●●
●●
●●
●
●
●● ●
●
●●●
●
●
●● ●●●●●
●●●
● ● ●
●●
●
●
●
●
● ● ●
●
● ●●
●
●
●
●
●●
●●
●
●
●
Younger than 55
55 and Older
Age Groups
There are 187 women younger than 55 who consume an average of 1824.6 calories per
day with a standard deviation of 653.2 and there are 86 women aged 55 and older who
consume an average of 1560.5 calories per day with a standard deviation of 499.1.
Here is the output from t.test().
younger = with(d, Calories[AgeGroup == "Younger than 55"])
older = with(d, Calories[AgeGroup == "55 and Older"])
t.test(younger, older, alternative = "greater")
##
##
##
##
##
##
##
##
##
##
##
Welch Two Sample t-test
data: younger and older
t = 3.671, df = 211.7, p-value = 0.000153
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
145.3
Inf
sample estimates:
mean of x mean of y
1825
1560
(a) Define parameters and state null and alternative hypotheses for the test.
Solution: Let µ1 be the population mean daily calories among women younger than
55 in the population of interest and µ2 be the same for women 55 and older. The
hypotheses are H0 : µ1 = µ2 versus HA : µ1 > µ2 .
(b) Compute the point estimate and standard error for the difference in sample means.
Solution: The point estimate is 1824.6 − 1560.5 = 264.1 and the standard error is
r
(653.2)2 (499.1)2 .
+
= 72
SE =
187
86
(c) The computer output provides a test statistic. Show how it was calculated.
Solution:
. 264.1
t = 3.671 =
72
(d) How many degrees of freedom would we have used had we used the simpler method
from the textbook?
Solution: Simple rule is 86 − 1 = 85.
(e) List at least two assumptions that underlie the statistical method used for this
test. Are you concerned about the validity of the test? Briefly explain.
Solution: Multiple acceptable answers: mine are normally distributed populations and
random samples. The plot of the data shows no extreme outliers or skewness, so I am
confident that the method is valid. Not many details are given about the sampling
method, so I have questions about the validity of the inference due to potential bias
in the sampling process.
(f) Summarize the results of the hypothesis test in context.
Solution: There is very strong evidence that women in this population who are younger
than 55 years of age consume more calories per day than do women who are 55 years
old and older (p < 0.0002, independent sample t test).