Download solution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
.
CLABE Statistics
Homework assignment - Problem sheet 2
.
River Hill Hospital is interested in determining the eectiveness of a new drug for reducing the
time required for complete recovery from knee surgery. Complete recovery is measured by a series
of strength tests that compare the treated knee with the untreated knee. The drug was given
in varying amounts to 18 patients over a 6-month period. For each patient the number of drug
units, X , and the days for complete recovery, Y , are given in the table below
x
y
50
55
60
65
70
Plot the data points in the panel below;
compute the covariance,
compute the correlation coecient,
briey discuss the relationship between the number of drug units and the recovery time.
What dosage might we recommend based on this initial analysis?
45
(a)
(b)
(c)
(d)
5 21 14 11 9 4 7 21 17 14 9 7 9 21 13 14 9 4
53 65 48 66 46 56 53 57 49 66 54 56 53 52 49 56 59 56
time (days)
1.
5
10
15
drug units
1
20
SOLUTION
70
(a) The scatterplot of the data is the following
●
65
●
●
●
●
●
●
55
time (days)
60
●
●
●
●
●
50
●
●
●
●
45
●
5
10
15
20
drug units
(b) x̄ = 11.61, ȳ = 55.22 and sXY = 4.268;
(c) sX = 5.65, sY = 5.89, rXY = 0.128.
(d) The scatterplot of a data is elliptical and therefore the correlation coecient can be used
to measure the strength of the association between the two variables. Both the visual
inspection of the scatterplot and the value of the correlation coecient suggest that the
amount of drug is weakly associated with the time of recovery and, furthermore, a larger
amount of drug seems to correspond to a (slightly) longer recovery time. Hence, according
to this initial analysis a low dosage should be suggested.
2.
Beijing Books oers discounted books online. They are priced at either $3, $5, or $10. The
owner wants to know whether the price has any relationship with the number of days it takes for
a custormer to decide on a purchase. The following data shows the price (X ) and the number of
days the book was on sale before it was sold (Y ) for 15 books.
x
y
(a)
(b)
(c)
(d)
3 5 10 3 5 10 3 5 10 3 5 10 3 5 10
7 5 2 9 6 5 6 6 1 10 7 4 5 6 4
Describe the data numerically with their covariance and correlation;
compute the parameters b0 and b1 of the regression line where Y is the response variable;
plot the data points and the regression line in the panel below;
discuss the relationship between price and speed of sale.
2
10
8
6
0
2
4
time (days)
2
4
6
8
10
price
SOLUTION
(a) x̄ = 6, sX = 3.05, ȳ = 5.53, sY = 2.33, sXY = −5.5, rXY = −0.78;
(b) b0 = 9.09, b1 = −0.59
(c) the scatterplot of the data is given below
3
10
●
6
●
●
●
●
●
●
4
●
●
2
time (days)
8
●
●
0
●
2
4
6
8
10
price
(d) The data show a negative linear association between price and time. The number of days
books were on sale is shorter for more expensive books.
3.
The table below gives the number of candies (X ) and net weight (in grams, Y ) for a sample of
30 bags of M&M's. The advertised net weight is 47.9 grams. The summary statistics for these
data are: x̄ = 57.1, ȳ = 49.215, sX = 2.383, sY = 1.522 and rXY = 0.794.
x
58
59
59
57
55
58
58
61
58
53
y
49.79
48.98
50.40
49.16
47.61
49.80
50.23
51.68
48.45
46.22
x
58
58
55
50
56
55
58
58
57
60
y
50.43
49.80
46.94
47.98
48.49
48.33
48.72
49.69
48.95
51.71
x
60
59
56
56
55
56
57
54
58
61
y
51.53
50.97
50.01
48.28
48.74
46.72
47.67
47.70
49.40
52.06
Consider the number of candies as the predictor variable and net weight as the response variable.
(a)
(b)
(c)
(d)
Compute the covariance.
Compute the least squares regression line.
Draw in the panel below the scatterplot and the regression line together.
Predict the net weight of a bag of M&M's with 56 candies.
4
52
50
48
46
weight (grams)
50
52
54
56
number of candies
SOLUTION
(a) sXY = 0.794 × 2.383 × 1.522 = 2.88;
2.88
(b) b1 = 2.383
2 = 0.507, b0 = 20.28;
(c) the scatterplot with the regression line is given below
5
58
60
62
52
●
●
●
●
50
●
●
●
●
●
●
●
●
●
48
weight (grams)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
46
●
50
52
54
56
58
60
62
number of candies
(d) The prediction, based on the least squares regression line, of the net weight of a bag of
M&M's with 56 candies is 20.28 + 0.507 × 56 = 48.65 grams.
4.
Let y = b0 + b1 x be the least squares regression line for the data set (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ).
Show that
(a) rXY × ssXY = b1 ;
(b) the regression line intersects the point (x̄, ȳ);
P
(c) ni=1 (yi − ŷi ) = 0 where ŷi = b0 + b1 xi .
SOLUTION
(a)
rXY ×
sXY
sY
sXY
sY
=
×
= 2 = b1
sX
sX sY
sX
sX
(b) If y = b0 + b1 x, then for x = x̄ and recalling that b0 = ȳ − b1 x̄ one has
b0 + b1 x̄ = (ȳ − b1 x̄) + b1 x̄ = ȳ.
(c) Since b0 = ȳ − b1 x̄ then
n
X
n
X
(yi − ŷi ) =
(yi − b0 − b1 xi )
i=1
i=1
n
X
=
(yi − ȳ + b1 x̄ − b1 xi )
i=1
6
=
=
n
X
i=1
n
X
{(yi − ȳ) − b1 (xi − x̄)}
(yi − ȳ) − b1
i=1
because both
5.
Pn
i=1 (yi
− ȳ) = 0 and
= 0
Pn
i=1 (xi
n
X
(xi − x̄)
i=1
− x̄) = 0.
The regression line for a sample of 100 people relating X =years of education and Y =annual
income (in euros) is ŷ = −6000 + 3000x and the correlation coecient is rXY = 0.5.
(a) Find the predicted annual income for a person with 5 years of education.
(b) Interpret the slope, b1 .
(c) Suppose that Y is treated as the explanatory variable and X is treated as the response
variable. Will the correlation coecient change in value? Explain.
SOLUTION
(a) For x = 5 one has −6000 + 3000 × 5 = 9000;
(b) b1 = 3000 can be interpreted as the average increase in annual income corresponding to an
additional year of education;
(c) the correlation coecient is a symmetric measure of association. This is also clear from the
formula for the computation of the correlation coecient where the two variables are clearly
on an equal footing. Hence, the value of the correlation coecient will not change if X is
treated as the response variable and Y as the explanatory variable.
7
Related documents