Download Practice Exam

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Stat 2470, Practice Exam #3, Spring 2014
1. A study comparing different types of batteries showed that the average lifetimes of Duracell
Alkaline AA batteries and Eveready Energizer Alkaline AA batteries were given as 4.5 hours and
4.2 hours, respectively. Suppose these are the population average lifetimes.
a.
Let
be the sample average lifetime of 150 Duracell batteries and be the sample average
lifetime of 150 Eveready batteries. What is the mean value of
(i.e., where is the distribution of
centered)? How does your answer depend on the specified sample sizes?
b.
Suppose the population standard deviations of lifetime are 1.8 hours for Duracell batteries and
2.0 hours for Eveready batteries. With the sample sizes given in part (a), what is the variance of the
statistic
, and what is its standard deviation?
c.
For the sample sizes given in part (a), what is the approximate distribution curve of
(include a measurement scale on the horizontal axis)? Would the shape of the curve necessarily be the
same for sample sizes of 10 batteries of each type? Explain.
2. Suppose
are true mean stopping distances at 50 mph for cars of a certain type
equipped with two different types of braking systems. The following statistics are given: m = 6,
Calculate a 95% CI for the difference between true
average stopping distance for cars equipped with system 1 and cars equipped with system 2.
Does the interval suggest that precise information about the value of this difference is available?
3. A study includes the accompanying data on compression strength (lb) for a sample of 12-oz
aluminum cans filled with strawberry drink and another sample filled with cola. Does the data
suggest that the extra carbonation of cola results in a higher average compression strength?
Base your answer on a -value. What assumptions are necessary for your analysis?
Beverage
Sample Size
Sample Mean
Strawberry drink
Cola
15
15
546
560
Sample St.
Dev.
21
15
4. Consider the accompanying data on breaking load (kg/25 mm width) for various fabrics in both
an unabraded condition and an abraded condition. Use the paired t test at significance level .01
to test
.
Fabric
U
A
1
25.6
26.5
2
48.8
52.5
3
49.8
46.5
4
43.2
36.5
5
38.7
34.5
6
55.0
20.0
7
36.4
28.5
8
51.5
46.0
5. Obtain or compute the following quantities using the table of “critical values for F distribution”
available in your text
a.
b.
c.
d.
e. The 95th percentile of the F distribution with
f.
The 5th percentile of the F distribution with
g.
h.
6. In a one-way ANOVA problem involving four populations or treatments, the null hypothesis of
interest is
7. In a single-factor
ANOVA problem involving five populations or treatments, which of the following
statements are true about the alternative hypothesis?
a. All five population means are equal.
b. All five population means are different.
c. At least two of the population mean are different.
d. At least three of the population mean are different.
e. At most, two of the population means are equal.
8. In a single-factor ANOVA problem involving five populations or treatments with a random sample of four
observations form each one, it is found that SSTr = 16.1408 and SSE = 37.3801. Then the value of the test statistic
is
a. 1.619
b. 2.316
c. 0.432
d. 1.522
e. 4.248
9. The distribution of the test statistic in single-factor ANOVA is the
a. binomial distribution
b. normal distribution
c. t distribution
d. F distribution
e. None of the above answers are correct.
10. In a single-factor ANOVA problem involving five populations or treatments with a random sample of
nine observations from each one, suppose that
is rejected at .05 level. Which of the following values
are correct for the appropriate
critical value needed to perform Tukey’s procedure?
a. 4.76
b. 3.79
c. 4.04
d. 3.85
e. 4.80
11. In a single-factor ANOVA problem involving five populations or treatments with a random sample of
four observations form each one, it is found that SSTr = 16.1408 and SSE = 37.3801. Then the value of
the test statistic is
a. 1.619
b. 2.316
c. 0.432
d. 1.522
e. 4.248
12. Consider the accompanying data on plant growth after the application of different types of growth
hormone.
1
2
Hormone 3
4
5
15
23
20
9
8
19
15
17
13
13
9
22
22
20
17
16
19
19
12
10
a. Perform an F test at level
b. What happens when Tukey’s procedure is applied?
13. An experiment is conducted to investigate how the behavior of mozzarella cheese varied with
temperature. Consider the accompanying data on x = temperature and y = elongation (%) at failure of
the cheese.
x
y
59
118
63
182
67
247
72
208
74
197
78
160
83
132
a.
Construct a scatter plot in which the axes intersect at (0,0). Mark 0, 20, 40, 60, 80, and
100 on the horizontal axis and 0, 50, 100, 150, 200, and 250 on the vertical axis.
b.
Construct a scatter plot in which the axes intersect at (55,100). Does this plot seem
preferable to the one in part (a)? Explain your reasoning.
c.
What do the plots of parts (a) and (b) suggest about the nature of the relationship
between the two variables?
14. The accompanying data was read from a graph that appeared in a recent study. The independent
variable is
and the dependent variable is steel weight loss (g/m ).
x
y
14
280
18
350
40
470
43
500
45
560
112
1200
a.
Construct a scatter plot. Does the simple linear regression model appear to be
reasonable in this situation?
b.
Calculate the equation of the estimated regression line.
c.
What percentage of observed variation in steel weight loss can be attributed to the
model relationship in combination with variation in deposition rate?
d.
Because the largest x value in the sample greatly exceeds the others, this observation
may have been very influential in determining the equation of the estimated line. Delete this
observation and recalculate the equation. Does the new equation appear to differ substantially
from the original one (you might consider predicted values)?
15. A study reports the results of a regression analysis based on n = 15 observations in which x = filter application
temperature ( C) and y = % efficiency of BOD removal. Calculated quantities include
a.
Test at level .01
which states that the expected increase in % BOD removal is 1 when filter
application temperature increases by 1 C, against the alternative
b. Compute a 99% CI for
application temperature.
the expected increase in % BOD removal for a 1 C increase in filter
16. Infestation of crops by insects has long been of great concern to farmers and agricultural scientists.
A study reports data on x = age of a cotton plant (days) and y = % damaged squares. Consider the
accompanying n = 12 observations:
x
y
9
11
12
12
12
23
15
30
18
29
18
52
x
y
21
41
21
65
27
60
30
72
30
84
33
93
a.
Why is the relationship between x and y not deterministic?
b.
Does a scatter plot suggest that the simple linear regression model will describe the
relationship between the two variables?
c.
The summary statistics are
Determine the
equation of the least squares line.
d. Predict the percentage of damaged squares when the age is 20 days by giving an interval of
plausible values.
17. Wear resistance of certain nuclear reactor components made of Zircaloy-2 is partly determined by
properties of the oxide layer. The following data appears in a study that proposed a new nondestructive
testing method to monitor thickness of the layer. The variables are x =oxide-layer thickness (
and
y =eddy-current respond (arbitrary units).
x
x
0
20.3
7
19.8
17
19.5
114
15.9
133
15.1
142
14.7
190
11.9
218
11.5
237
8.3
285
6.6
The equation of the least squares line is =20.6 - .047x. Calculate and plot the residuals against x and
then comment on the appropriateness of the simple linear regression model.
18. Suppose that the expected value of thermal conductivity y is a linear function of
lamellar thickness.
x
x
a.
b.
240
12.0
410
14.7
460
14.7
490
15.2
520
15.2
590
15.6
745
16.0
where x is
8300
18.1
Estimate the parameters of the regression function and the regression function itself.
Predict the value of thermal conductivity when lamellar thickness is 500 angstroms.
19. Let y = sales at a fast food outlet (1000’s of $),
number of competing outlets within a 1-mile
radius,
the population within a 1-mile radius (1000’s of people), and
be an indicator variable
that equals 1 if the outlet has a drive-up window and 0 otherwise. Suppose that the true regression
model is
a.
What is the mean value of sales when the number of competing outlets is 2, there are
8000 people within a 1-mile radius, and outlet has a drive-up window?
b.
What is the mean value of sales for an outlet without a drive-up window that has three
competing outlets and 5000 people within a 1-mile radius?
c.
Interpret
20. A multiple regression model with four independent variables to study accuracy in reading liquid
crystal displays was used. The variables were
y = error percentage for subjects reading a four-digit liquid crystal display
= level of backlight (ranging from 0 to 122
= character subtense (ranging from
= viewing angle (ranging from
)
)
)
=level of ambient light (ranging from 20 to 1500 lux)
The model fit to data was
a.
b.
c.
d.
e.
The resulting estimated coefficient were
Calculate an estimate of expected error percentage when
Estimate the mean error percentage associated with a backlight level of 20, character subtense of .5,
viewing angle of 10, and ambient light level of 30.
What is the estimated expected change in error percentage when the level of ambient light is increased
by 1 unit while all other variables are fixed at the values given in part (a)? Answer for a 100-unit
increase in ambient light level.
Explain why the answers in part ( c ) do not depend on the fixed values of
Under
what conditions would there be such a dependence?
The estimated model was based on n=30 observations, with SST=39.2 and SSE=20.0. Calculate and
interpret the coefficient of multiple determination, and then carry out the model utility test using
ANS:
1.
a.
irrespective of sample sizes.
b.
and the standard deviation of
c.
A normal curve with mean and standard deviation as given in parts “a” and “b” (because
the
CLT implies that both
have approximately normal distributions, so
does also). The shape is not
necessarily that of a normal curve when
because the CLT cannot be invoked. So if the two lifetime
population distributions are not normal, the distribution of
will typically be quite complicated.
2.
We want a 95% confidence interval for
so the 95% interval is
Because the interval is so wide, it does not appear that precise information is
available.
3.
Let
= the true average compression strength for strawberry drink and let
strength for cola. A lower tailed test is appropriate. We test
= the true average compression
versus
The test statistic is
We use degrees of freedom
so use
The
This
-value indicates strong support for the alternative
hypothesis. The data do suggest that the extra carbonation of cola results in a higher compression strength.
4.
Parameter of Interest:
condition.
,
= true average difference of breaking load for fabric in unabraded or abraded
The value of the test statistic is:
The rejection region is:
Since t is not
2.998, we fail to reject
two fabric load conditions.
5.
a.
b.
c.
d.
The data do not indicate a difference in breaking load for the
e.
f.
g.
h.
Since
=.95-.01=.94.
6.
7. C
8. A
9. D
10. C
11. A
12. a. Let
= true average growth when hormone #i is applied.
will be rejected in
favor
of
Source
Treatments
Error
Total
df
4
15
19
SS
200.3
215.5
415.8
MS
50.075
14.3667
f
3.49
Because
There appears to be a difference in the average growth with
the application of the different growth hormones.
b.
The sample means are, in increasing order, 12.00, 13.50, 14.75, 19.50, and 19.75. The most extreme difference
is 19.75 – 12.00 = 7.75, which doesn’t exceed 8.28, so no differences are judged significant. Tukey’s method
and the F test are at odds.
13. C. A parabola appears to provide a good fit to both graphs
14. a. According to the scatter plot of the data, a simple linear regression model does appear to be plausible.
b.
c.
d.
The regression equation is y = 138 + 9.31x
The desired value is the coefficient of determination,
The new equation is y* = 190 + 7.55x*. This new equation appears to differ significantly. If we were
to predict a value of y for x = 50, the value would be 567.9, where using the original data, the predicted
value for x = 50 would be 603.5.
15.
a. We reject
is rejected in favor of
b.
= (1.08,2.32)
16. a.
b.
Based on a scatterplot of the data, a simple linear regression model does seem a reasonable way
to describe the relationship between the two variables.
c.
d.
17.
The (x, residual) pairs for the plot are (0, -.335), (7, -.508), (17, -.341), (114, .592), (133, .679), (142, .700),
(190, .142), (218, 1.051), (237, -1.262), and (285, -.719). The plot shows substantial evidence of
curvature.
18. a.
The suggested model is
The summary quantities are
and the estimated regression function is
b.
19. For
x (remember the units of
window) the average sales are
are in 1000,s) and
(since the outlet has a drive-up
b.
For
the average sales are
c. When the number of competing outlets
an outlet has a drive-up window.
remained fixed, the sales will increase by $15,400 when
20.
b.
c.
d.
There are no interaction predictors – e.g.,
dependence of interaction predictors involving
e.
There would be
had been included.
at least one among
not zero, the test statistic is
Because
the value of
in the model.
is not all that impressive).
is
will be rejected if
is rejected and the model is judged useful (this even though