Download ORF 245 – Fundamentals of Engineering Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Regression toward the mean wikipedia , lookup

Linear regression wikipedia , lookup

German tank problem wikipedia , lookup

Confidence interval wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Princeton University
Department of Operations Research
and Financial Engineering
ORF 245 – Fundamentals of Engineering Statistics
Final Exam
May 22, 2008
7:30pm-10:30pm
PLEASE DO NOT TURN THIS PAGE AND START THE EXAM
UNTIL YOU ARE TOLD TO DO SO.
Instructions: This exam is open book and open notes. Calculators are
allowed, but not computers or the use of statistical software packages. Write
all your work in the space provided after each question. There are questions
on both sides of each page. Explain as thoroughly and as clearly as possible
all your steps in answering each question. Full or partial credit can only be
granted if intermediate steps are clearly indicated.
Name: _______________________________________________________
Pledge: I pledge my honor that I have not violated the honor code during
this examination.
Signature: ____________________________________________________
1: (12) _______
6: (15) _______
11: (12) _______
2: (06) _______
7: (20) _______
12: (10) _______
3: (10) _______
8: (10) _______
13: (10) _______
4: (05) _______
9: (20) _______
14: (12) _______
5: (05) _______
10: (08) _______
15: (20) _______
Total: (175) ___________
Descriptive Statistics:
1) Let xn and sn2 denote the sample mean and variance for the sample x1 ,..., xn and let
xn +1 and sn2+1 denote these quantities when an additional observation xn +1 is added to
the sample.
a) (4 pts.) Show how xn +1 can be computed from xn and xn +1 .
xn +1 =
⎞
1 n +1
1 ⎛ n
1
xi =
( nxn + xn+1 )
⎜ xi + xn +1 ⎟ =
n + 1 i =1
n + 1 ⎝ i =1
⎠ n +1
∑
∑
b) (8 pts.) Show that
n
2
( xn +1 − xn )
n +1
can be computed from xn +1 , xn , and sn2 .
nsn2+1 = (n − 1) sn2 +
so that sn2+1
Item dropped – do not grade.
2
2) Consider the following histogram that shows the time in months that articles
submitted to a certain scientific journal in 2002 took to be reviewed for publication.
a) (3 pts.) Which class interval contains the median review time?
Reading the approximate areas under the histogram (prob/cum):
0-1: 0.27/0.27; 1-2: 0.1/0.37;
2-3: 0.105/0.475; 3-4: 0.11/0.585;
5-6: 0.08/0.735; 6-7: 0.075/0.810; 7-8: 0.125/0.935; 8-9: 0.065/1.0
4-5: 0.07/0.655;
The median review time falls in the 3 to 4 month category.
b) (3 pts.) Which class interval contains the third quartile of the review times?
The third quartile of the review times falls in the 6 to 7 month category.
(An answer of 5 to 6 months is also acceptable, given the uncertainty in the reading of
areas in the histogram; but 7 to 8 months is not acceptable.)
3
Probability:
3) Items are inspected for flaws by two quality inspectors. If a flaw is present, it will be
detected by the first inspector with probability 0.9, and by the second inspector with
probability 0.7. Assume that the inspectors function independently.
a) (4 pts.) If an item has a flaw, what is the probability that it will be found by at
least one of the inspectors?
Let I i : event that inspector i finds a flaw, i = {1, 2}
Pr( I1 | flaw) = 0.9, Pr( I1c | flaw) = 0.1, Pr( I 2 | flaw) = 0.7, Pr( I 2c | flaw) = 0.3
Pr(flaw found by at least one inspector) = Pr( I1 ∪ I 2 | flaw)
= Pr( I1 | flaw) + Pr( I 2 | flaw) − Pr( I1 ∩ I 2 | flaw) = 0.9 + 0.7 − 0.9 × 0.7 = 0.97
b) (6 pts.) Assume that both inspectors inspect every item and that if an item has no
flaw, then neither inspector will detect a flaw. Assume also that the probability
that an item has a flaw is 0.10. If an item is passed by both inspectors, what is the
probability that it actually has a flaw?
Pr( I1 | no flaw) = 0, Pr( I1c | no flaw) = 1, Pr( I 2 | no flaw) = 0, Pr( I 2c | no flaw) = 1
Pr(an item passed by both inspectors is actually flawed) =
Pr (flaw | I1c ∩ I 2c ) =
Pr( I1c ∩ I 2c ∩ flaw)
Pr( I1c ∩ I 2c )
Pr( I1c ∩ I 2c ) = Pr( I1c ∩ I 2c | flaw) × Pr(flaw) + Pr( I1c ∩ I 2c | no flaw) × Pr(no flaw)
= Pr( I1c | flaw) × Pr( I1c | flaw) × Pr(flaw) + Pr( I1c | no flaw) × Pr( I1c | no flaw) × Pr(no flaw)
= 0.1 × 0.3 × 0.1 + 1 × 1× 0.9 = 0.903
Pr( I1c ∩ I 2c ∩ flaw) = Pr( I1c | flaw) × Pr( I1c | flaw) × Pr(flaw) = 0.1 × 0.3 × 0.1 = 0.003
Finally: Pr (flaw | I1c ∩ I 2c ) =
0.003
=0.003322
0.903
Alternative solution: diagram tree combined with conditional probability.
4
4) (5 pts.) An urn contains 3 red balls and 7 black balls. Players A and B withdraw balls
from the urn consecutively until a red ball is selected. Namely, A draws the first ball,
then B draws the second one, then A again, and so on, until the first one of them
draws a red ball. If there is no replacement of the drawn balls, find the probability that
A selects the red ball.
Pr(A selects red ball) = Pr(red on 1st draw) + Pr(first red on 3rd draw) +
Pr(first red on 5th draw) + Pr(first red on 7th draw)
3 7 6 3 7 6 5 4 3 7 6 5 4 3 2 3
+ × × + × × × × + × × × × × ×
10 10 9 8 10 9 8 7 6 10 9 8 7 6 5 4
3
7
1
1
7
= +
+ +
=
= 0.5833 or 58.33%
10 40 12 40 12
Alternative solution:
=
⎛7⎞
⎛7⎞
⎛7⎞
⎜ ⎟
⎜ ⎟
⎜ ⎟
3 ⎝ 2⎠ 3 ⎝ 4⎠ 3 ⎝ 6⎠ 3
= +
× +
× +
× = 0.5833
10 ⎛ 10 ⎞ 8 ⎛10 ⎞ 6 ⎛ 10 ⎞ 4
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎝2⎠
⎝4⎠
⎝6⎠
Random Variables:
5) (5 pts.) Two types of coins are produced at a factory: a fair coin and a biased one
that comes up heads 55 percent of the time. We have a coin from this factory but do
not know whether it is a fair coin or a biased one. In order to ascertain which type of
coin we have, we will perform the following statistical test: we will toss the coin 1000
times. If the coin lands on heads 525 or more times, then we will conclude that it is a
biased coin, whereas, if it lands heads less than 525 times, then we will conclude that
it is the fair coin. If the coin is actually fair, what is the probability that we will reach
a false conclusion? [Hint: use the Normal approximation with continuity correction.]
Let X be the # of heads in 1000 tosses of a fair coin
Then X ∼ Bin(1000,0.5) ⇒ X ≈ N (500, 250)
Pr(test yields false conclusion) = Pr( X ≥ 525)
⎛
525 − 0.5 − 500 ⎞
= Pr ⎜ Z ≥
⎟ = 1 − Φ (1.5495) = 1 − 0.9394 = 0.0606 or 6.06%
250
⎝
⎠
5
6) (15 pts.) A bus travels between two cities A and B, which are 100 miles apart. If the
bus has a breakdown, the distance from the breakdown to city A has a uniform
distribution over (0, 100). There is a bus service station in city A, in B, and in the
center of the route between A and B. It is suggested that it would be more efficient to
have the three stations located 25, 50, and 75 miles, respectively, from A. Do you
agree? Why? [Hint: compare the expected distance that the bus would have to be
towed, from the breakdown point to the nearest service station.]
Let X be the distance from A to where the bus breaks down: X ∼ Unif (0,100)
Let Y be the distance from the breakdown point to the nearest service station in case 1
X if 0 ≤ X ≤ 25
⎧
⎫
⎪ 50 − X if 25 < X ≤ 50 ⎪
⎪
⎪
Then Y = ⎨
⎬ is uniformly distributed in each of these intervals
⎪ X − 50 if 50 < X ≤ 75 ⎪
⎪⎩100 − X if 75 < X ≤ 100 ⎪⎭
EY = E[ X | 0 ≤ X ≤ 25] × Pr(0 ≤ X ≤ 25) + E[50 − X | 25 < X ≤ 50] × Pr(25 < X ≤ 50)
+ E[ X − 50 | 50 < X ≤ 75] × Pr(50 < X ≤ 75) + E[100 − X | 75 < X ≤ 100] × Pr(75 < X ≤ 100)
= 12.5 × 0.25 + (50 − 37.5) × 0.25 + (62.5 − 50) × 0.25 + (100 − 87.5) × 0.25 ⇒ EY = 12.5
Now let Z be the distance from the breakdown point to the nearest service station in case 2
⎧ 25 − X if 0 ≤ X ≤ 25 ⎫
⎪ X − 25 if 25 < X ≤ 37.5⎪
⎪
⎪
⎪⎪50 − X if 37.5 < X ≤ 50 ⎪⎪
Then Z = ⎨
⎬ is uniformly distributed in each of these intervals
X
X
50
if
50
62.5
−
<
≤
⎪
⎪
⎪75 − X if 62.5 < X ≤ 75⎪
⎪
⎪
⎩⎪ X − 75 if 75 < X ≤ 100 ⎭⎪
EZ = E[25 − X | 0 ≤ X ≤ 25] × Pr(0 ≤ X ≤ 25) + E[ X − 25 | 25 < X ≤ 37.5] × Pr(25 < X ≤ 37.5)
+ E[50 − X | 37.5 < X ≤ 50] × Pr(37.5 < X ≤ 50) + E[ X − 50 | 50 < X ≤ 62.5] × Pr(50 < X ≤ 62.5)
+ E[75 − X | 62.5 < X ≤ 75] × Pr(62.5 < X ≤ 75) + E[ X − 75 | 75 < X ≤ 100] × Pr(75 < X ≤ 100)
= (25 − 12.5) × 0.25 + (31.25 − 25) × 0.125 + (50 − 43.75) × 0.125 + (56.25 − 50) × 0.125
+ (75 − 68.75) × 0.125 + (87.5 − 75) × 0.25 ⇒ EZ = 9.375
As EZ < EY , then having service stations at 25, 50 and 75 miles IS more efficient.
Alternate solutions: computing the expected values as integrals rather than
conditional expectations; or graphing the distances and computing the areas under the
graphs (but, in this case, the areas have to be proportional to the values above).
6
Joint Probability Distributions:
7) Choose a number X at random from the set of numbers {1,2,3,4,5} . Now choose a
number at random from the subset no larger than X , that is, from {1,..., X } . Call this
second number Y .
a) (10 pts.) Find the joint probability mass function of X and Y .
X→
Y↓
1
2
3
4
1
2
3
4
5
1 5 1 10 1 15 1 20 1
1 10 1 15 1 20 1
1 15 1 20 1
1 20 1
5
p X ( x) 1 5
15
15
15
pY ( y )
25 137 300
25 77 300
25 47 300
25 9 100
1 25
15
1 25
b) (7 pts.) Find the expected value and the variance of Y .
137
77
47
9
1
+ 2×
+ 3×
+ 4×
+ 5×
⇒ EY = 2
300
300
300
100
25
137
77
47
Var (Y ) = (1 − 2) 2 ×
+ (2 − 2) 2 ×
+ (3 − 2) 2 ×
300
300
300
9
1 400
+ (4 − 2) 2 ×
+ (5 − 2)2 ×
=
⇒ Var (Y ) = 1.333
100
25 300
EY = 1 ×
c) (3 pts.) Are X and Y independent? Explain.
Note that p X ,Y (5,5) =
1
1 1
1
≠ p X (5) × pY (5) = ×
=
25
5 25 125
Since there is at least one pair of values ( x, y ) for which
p X ,Y ( x, y ) ≠ p X ( x) × pY ( y ), then X and Y are NOT independent.
7
Statistical Estimation:
8) (10 pts.) Maximum likelihood estimates possess the property of functional
invariance, which means that if θˆ is the MLE of θ , and h(θ ) is any function of
θ,
then h(θˆ) is the MLE of h(θ ) . Given a random sample X 1 ,..., X n from a geometric
distribution with parameter p , find the MLE of the odds ratio p (1 − p ) .
Let X 1 , X 2 ,..., X n be a random sample of variable distributed as a Geom( p)
Then: p X ( x) = (1 − p ) x p, for x ≥ 0
The joint p.m.f. of X 1 ,..., X n is given by:
p X1 ,..., X n ( x1 ,..., xn ; p ) = (1 − p ) x1 p × (1 − p ) x2 p × ... × (1 − p ) xn p
= (1 − p )∑ i i p n
x
The likelihood function is thus:
ln[ p X1 ,..., X n ( x1 ,..., xn ; p )] = (
∑ x ) ln(1 − p) + n ln p
i i
The MLE for the parameter p is obtained by derivation of the
likelyhood function with respect to p:
(
d ln[ p X1 ,..., X n ( x1 ,..., xn ; p )]
dp
) = 0 ⇒ −∑ x
i i
1 − pˆ
8
+
n
pˆ
=0⇒
=
pˆ
1 − pˆ
n
∑x
i i
or
pˆ
1
=
1 − pˆ x
Confidence Intervals:
9) Let X represent the number of events that are observed to occur in n units of time or
space, and assume that X ∼ Poisson ( nλ ) , where λ is the mean number of events that
occur in one unit of time or space. Assume that X is large, so that X ∼ N ( nλ , nλ ) . A
suitable estimator of λ is given by λˆ = X n , with standard error SE (λˆ) = λ n .
a) (4 pts.) Assuming that X is large, what is the distribution of λ̂ ? (Name the
distribution and tell the values of its parameters.)
1
⎧
⎫
E (λˆ ) = EX = λ
⎪⎪
⎪⎪
n
⎨
⎬ ⇒ λˆ ≈ N λ , λ n
⎪Var (λˆ ) = 1 Var ( X ) = λ ⎪
n ⎪⎭
n2
⎩⎪
(
)
b) (4 pts.) Use the distribution found in the previous item and the fact that
SE (λˆ ) ≈ λˆ n to derive an expression for the 100(1 − α ) % confidence interval for
λ.
ˆ
Given that (λ − λ )
λˆ n
(λˆ − z
α
≈ N (0,1), then the 100(1-α )% CI for λ is given by:
2
λˆ n , λˆ + zα
λˆ n
2
)
c) (4 pts.) A 5 mL sample of a certain suspension is found to contain 300 particles.
The mean number of particles per mL in the suspension is ____60___, give or
take ___3.464__.
λˆ = 300 5 = 60
and
SE(λˆ ) ≈ 60
5
= 12 = 3.464
d) (4 pts.) After 4 minutes, a geologist counted 256 particles emitted from a certain
radioactive rock. Find a 95% confidence interval for the rate of emissions in units
of particles per minute.
λˆ = 256 4 = 64
SE(λˆ) ≈ 64 = 4 and z0.025 = 1.96
4
Thus the 95% CI for λ is: ( 64 − 1.96 × 4,64 + 1.96 × 4 ) = (56.16,71.84)
and
9
e) (4 pts.) For how many minutes should particles be counted so that the 95%
confidence interval specifies the rate to within ±1 particle per minute?
ˆ
We want z0.025 λ
n
2
= 1 ⇒ n = λˆ z0.025
⇒ n = 64 × 1.962 = 245.9
For 246 minutes.
10) A sample of seven concrete blocks had their compressive strength measured in MPa.
The results were 1367.6, 1411.5, 1318.7, 1193.6, 1406.2, 1425.7, and 1572.4. Ten
thousand bootstrap samples were generated from these data, and the bootstrap sample
means were arranged in order. Refer to the smallest mean as Y1 , the second smallest
as Y2 , and so on, with the largest being Y10000 . Assume that Y50 = 1283.4 , Y51 = 1283.4 ,
Y100 = 1291.5 , Y101 = 1291.5 , Y250 = 1305.5 , Y251 = 1305.5 , Y500 = 1318.5 , Y501 = 1318.5 ,
Y9500 = 1449.7 , Y9501 = 1449.7 , Y9750 = 1462.1 , Y9751 = 1462.1 , Y9900 = 1476.2 , Y9901 = 1476.2 ,
Y9950 = 1483.8 , and Y9951 = 1483.8 .
a) (4 pts.) Compute the 95% bootstrap confidence interval for the mean
compressive strength.
⎛Y +Y Y +Y ⎞
95% CI for the mean = ⎜ 250 251 , 9750 9751 ⎟ = (1305.5,1462.1)
2
2
⎝
⎠
b) (4 pts.) Was this a parametric or a nonparametric bootstrap procedure? Explain.
Nonparametric: the ten thousand samples were generated through random sampling,
with replacement, from the given sample, without any information on the distribution
of the population underlying the sample.
10
Tests of Hypothesis:
11) An article by Abdel-Aty et al. in the Journal of Transportation Engineering presents
a tabulation of types of car crashes by the age of the driver over a three-year period in
Florida. Here is the table:
Age of drivers
Total # of accidents
# of accidents in driveways
15-24 years
82,486
4,243
25-64 years
219,170
10,701
a) (4 pts.) The difference between the proportions of driveway accidents for drivers
aged 15-24 and drivers aged 25-64 is __0.261__%, give or take __0.0896__%.
pˆ15 = 4243
82486
SE ( pˆ15 − pˆ 25 ) =
= 0.05144
pˆ 25 = 10701
= 0.04883 pˆ15 − pˆ 25 = 0.00261 or 0.261%
219170
pˆ15 (1 − pˆ15 ) pˆ 25 (1 − pˆ 25 )
0.05144 × 0.9486 0.04883 × 0.9512
+
=
+
n 15
n25
82486
219170
= 0.0008963 or 0.0896%
b) (4 pts.) Can you conclude that driveway accidents among 15-24 year-olds in FL
are indeed likely to be proportionately higher than driveway accidents among 2564 year-old Floridians? State the hypotheses clearly and answer this question
using the P-value.
H 0 : p15 − p25 ≤ 0
H1 : p15 − p25 > 0
pˆ − pˆ 25 − ( p15 − p25 )
4243 + 10701
= 0.04954
where pˆ pool =
z = 15
82486 + 219270
SE ( pˆ pool )
SE ( pˆ pool ) =
z=
pˆ pool (1 − pˆ pool ) ⎛⎜ 1 + 1 ⎞⎟ = 0.0008864
n25 ⎠
⎝ n15
0.00261
= 2.94, thus P-value = Pr( Z ≥ 2.94) = 0.0016 or 0.16%
0.0008864
⇒ reject H 0 at significance level 1%
Thus: younger Floridians do have a higher rate of driveway accidents than older ones
c) (4 pts.) Assuming that young drivers in Florida do present a higher proportion of
driveway accidents than older drivers, does this mean that younger Floridian
drivers should be required to take a special course on how to drive on driveways,
but not older drivers? Explain.
Though statistically speaking younger Floridians do have a higher rate of driveway
accidents than older ones, practically speaking the difference is too small (0.261%)
to justify differentiated driving training for the two groups. This is thus a typical case
in which statistical significance does not translate into practical significance.
11
12) An engineer claims that a new type of hard disk for laptops lasts longer than the old
type. Independent random samples of 75 of each of the two types are chosen, and the
sample means and standard deviations of their lifetimes are computed:
New:
Old:
X 1 = 4387 h
s1 = 252 h
X 2 = 4260 h
s2 = 231 h
a) (4 pts.) Can you conclude that the mean lifetime of new hard disks is greater
than that of the old hard disks? State the hypotheses clearly and answer this
question at the 1% significance level.
Item dropped – do not grade.
b) (4 pts.) If the new hard disks have indeed a mean lifetime 40 h longer than the
old ones, what is the probability ( β ) that the test performed in the previous item
will incur into error of type II (that is, failing to reject H 0 )?
Item dropped – do not grade.
c) (2 pts.) Recompute the probability of error type II for the case of the new hard
disks having a mean lifetime 80 h longer than the old ones.
Item dropped – do not grade.
12
Correlation and Linear Regression:
13) A chemical engineer is studying the effect of temperature and stirring rate on the
yield of a certain product. The process is run 16 times, at the settings indicated in the
following table. The units for yield are percent of a theoretical maximum.
The matrix of sample correlation coefficients among the variables in question is as
follows:
a) (5 pts.) Based on the analysis of sample correlation above, would you try and fit
a multiple linear regression model in which the yield is the response variable and
temperature and stirring rates are the covariates? Explain.
No, it is not advisable to fit a model where both covariates are used, because there is a
high level of linear correlation between temperature and stirring rate (0.9064). This is
known as multicollinearity, and it will confound the least squares estimation of the
linear regression coefficients.
13
b) (5 pts.) Find the 95% confidence interval for the coefficient of correlation
between the stirring rate and the yield. What assumptions did you make in order
to compute this confidence interval?
Assuming that stirring rate and yield come from a bivariate normal distribution, then,
V=
by the Fisher transformation:
⎛ 1 1+ ρ 1 ⎞
1 1+ r
ln
,
∼ N ⎜ ln
⎟
2 1− r
⎝ 2 1− ρ n − 3 ⎠
⎛ e 2c1 − 1 e 2c2 − 1 ⎞
and a 95% CI for ρ will be given by ⎜ 2c
, 2c
⎟ , where c1 = v − z0.025 / n − 3 and c2 = v + z0.025 / n − 3
1
2
⎝ e +1 e +1⎠
1 1 + 0.7513
Thus: for v = ln
= 0.9759, z0.025 = 1.96 ⇒ c1 = 0.9759 − 1.96 / 13 = 0.4321 and
2 1 − 0.7513
c2 = 0.9759 + 1.96 / 13 = 1.519
And finally, the 95% CI for ρ is: (0.407, 0.909).
14) The chemical engineer from the previous question has decided to calibrate a simple
linear regression model with the yield as the response variable ( Y ) and stirring rate as
the covariate ( X ). The results of the calibration obtained through Excel are:
a) (2 pts.) What proportion of the observed variation in yield can be attributed to
the simple linear regression relationship between yield and stirring rate?
r 2 = 0.75132 = 0.564
⇒
56.4%
b) (5 pts.) Can you say that an increase of 10 rpm in the stirring rate will produce
an increase in yield of at least 2%? State the hypotheses clearly and answer this
question at the 5% significance level.
H 0 : β1 ≤ 0.2
tn − 2 =
βˆ1 − β10
SEβˆ
H1 : β1 > 2
=
10
= 0.2
0.3119 − 0.2
= 1.528
0.07322
1
P-value = Pr(T14 ≥ 1.528) = 0.0744 or 7.4% ⇒ cannot reject H 0 at 5%
Thus, we cannot say that 10 rpm will increase yield by at least 2%.
14
c) (5 pts.) Construct the 95% confidence interval for the prediction of the yield
percentage that corresponds to a stirring rate of 55 rpm. In order to compute this
interval, you may need the following additional information:
Given x* = 55 and yˆ = 61.5563 + 0.3119 x, then yˆ * = 61.5563 + 0.3119 × 55 = 78.71
The 95% CI for y* is: yˆ * ± t0.025,14 × SE pred ( yˆ * | x* ), where
(
⎡
x* − x
1
*
*
⎢
SE pred ( yˆ | x ) = σˆ 1 + +
⎢ n
S xx
⎢⎣
)
2
⎤
⎥
⎥
⎥⎦
1
2
Computing:
σˆ =
S yy (1 − r 2 )
n−2
=
234.5 × (1 − 0.564)
= 2.70
14
⎡
1 (55 − 45) 2 ⎤
SE pred ( yˆ | x = 55) = 2.70 ⎢1 + +
⎥
1360 ⎦
⎣ 16
*
*
1
2
= 2.88
Finally:
yˆ * − t0.025,14 × SE pred ( yˆ * | x* ) = 78.71 − 2.144 × 2.88 = 72.5
yˆ * + t0.025,14 × SE pred ( yˆ * | x* ) = 78.71 + 2.144 × 2.88 = 84.9
Thus, the 95% CI for (y* | x* = 55) is: (72.5,84.9)
15
Multiple Linear Regression:
15) A study was made in which data was obtained to relate y = specific surface area
( cm3 /g ) to x1 = % NaOH used as a pretreatment chemical and x2 = treatment time
(min) for a batch of pulp. The following R output resulted from a request to fit the
model Y = β 0 + β1 x1 + β 2 x2 + ε .
a) (6 pts.) Fill in the blanks in the tables above by computing the following values:
the coefficients of determination – regular and adjusted, the regression sum of
squares, the mean sums of squares – regression and residuals, and the value of the
F statistics. Show your computations.
Item dropped – do not grade.
b) (2 pts.) What proportion of observed variation in specific surface area can be
explained by the model relationship?
Item dropped – do not grade.
16
c) (4 pts.) Does the chosen model appear to specify a useful relationship between
the response and the covariates? Explain.
Item dropped – do not grade.
d) (4 pts.) Provided that % NaOH remains in the model, would you suggest that the
covariate treatment time be eliminated? Explain.
Item dropped – do not grade.
e) (4 pts.) Calculate a 95% confidence interval for the expected change in specific
surface area associated with an increase of 1 % in NaOH when treatment time is
held fixed.
Item dropped – do not grade.
17