Download Question) A certain brand of lightbulb is expected to last 1000 hours

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Question) A certain brand of lightbulb is expected to last 1000 hours and a classroom has 5 of these
lightbulbs. If all of these lightbulbs are operating at the moment, compute the probability that at least
one of them fails within the next 1000 hours.
X : lifetime of a lightbulb ~Expo (λ = 1/1000)
P(X≤1000) = 1 – e-1000/1000 = 0.632
Y: number of lightbulbs that fails withing the next 1000 hours ~Binom(n=5,k,p=0.632)
5
P(Y≥1) = 1 – P(Y=0) = 1 – (1-0.632) = 0.993
Question) You are given the following data set, where Y is the response variable and X1 and X2 are
the explanatory variables.
The correlation coefficient between Y and X1 is 0.152, the correlation
coefficient between Y and X2 is 0.932, and the correlation coefficient
between X1 and X2 is 0.363.
The standard deviation of X1 is 2.52, standard deviation of X2 is 6.00 and
the standard deviation of Y is 22.23.
The mean of X1 is 8.3, the mean of X2 is 17.15 and the mean of Y is 44.4.
The F statistic value for the regression of Y against X1 and X2 is 84.62.
(a) Compute correlation coefficient between Y and the estimated values of Y
2
SST = (22.23) x 19 = SSR + SSE
F = 84.62 = MSR / MSE = (SSR/2) / (SSE/17) ! Solve with the above equation for SSR and SSE
2
SSR / SST = R ! Multiple-R = R
(b) Compute coefficient of determination of the regression model
(c) Compute standard error of the estimates of the regression model
Question) You are given a data set (which is partially shown below), where you attempt to explain Y
as a linear function of X1, X2, X3 and X4.
The Excel output for the related analysis is given below.
a) Write down the regression equation. Y^ = 24.11 + 17.64X1 -53.62X2 + 6.31X3 + 0.378X4
b) Interpret the coefficient of X1.
If the value of X1 increases by 1, we expect the value of Y to increase by 17.64.
c) What is the coefficient of determination for this model? Interpret.
85.67% of the total variation in the response is predicted by the regression model.
d) Conduct an overall significance test. Clearly write down the null and alternative hypotheses
and test it at the 0.05 level of significance.
H0: β1= β2= β3= β4=0
H1: at least one coeff. is different than 0
Since 4.51E-81 is less than 0.05 we reject H0.
e) Identify which variables significantly affect the value of the response variable at a significance
level of 0.05. Write down one of the null and alternative hypotheses to demonstrate.
f) What does the [16.286, 19.006] interval mean in the output (check the X1 row)? Interpret.
It is the 99% confidence interval for the population coeff. of X1. β1 is between these limits
with 0.95 probability.
g) Briefly explain how you would go about eliminating the insignificant variables.
Question) Assume that you are given a cross-sectional data set (which is partially shown below)
where Y is the response variable and the X is the explanatory variable. The scatter plot of Y versus X
is also given.
Scatterplot of Y vs X
400
Y
300
200
100
0
0
20
40
60
X
The Excel output for the simple linear regression run is:
80
100
Residuals Versus the Fitted Values
Normal Probability Plot of the Residuals
(response is Y)
(response is Y)
99.9
100
99
50
80
70
60
50
40
30
20
Residual
Percent
95
90
10
0
-50
5
1
0.1
-100
-100
-50
0
Residual
50
100
150
100
150
200
250
Fitted Value
300
350
Histogram of the Residuals
(response is Y)
20
Frequency
15
Comment on the validity of this regression model.
What would you do to improve the model. Explain.
Assumptions:
1) Linearity – checked from scatter or residual
10
plot
2) Constant variance of residuals – violated
5
since residual variance increases ! take the
LN of response and rerun regression
0
-80
-40
0
40
80
Residual
3) Normality of residuals – checked from
normal probability plot
4) Independence of residuals – no problem since cross-sectional data
Bayes’ Theorem
P ( B A) P ( A)
P( A B) =
P ( B A) P ( A) + P( B Ac ) P ( Ac )
sample mean
∑X
X=
n
sample standard deviation
∑(X − X )
s=
2
n −1
sample correlation coefficient
r=
n∑ XY − ( ∑ X )(∑ Y )
n∑ X 2 − (∑ X )
n ∑ Y 2 − (∑ Y )
2
2
binomial probability distribution
P( X = k ) =
n!
p k (1 − p) n − k
k !(n − k )!
and
E ( X ) = np
n: number of trials,
k: number of succeses
poisson probability distribution
P( X = k ) =
e− λ λ k
k!
E( X ) = λ
and
exponential distribution
P( X ≤ k ) = 1 − e
−λ k
and
λ : mean number of events per unit time
least squares method (simple linear regression)
b1 =
∑ ( X − X )(Y − Y )
∑(X − X )
2
b0 = Y − b1 X
standard error of the estimate
s y,x =
∑ (Y − Yˆ )2
n−2
sY , Xs =
(simple linear regression)
∑ (Y − Yˆ )
n − k −1
2
(multiple linear regression)
standard error of the forecast (simple linear regression)
s f = sy, x 1 +
1
( X − X )2
+
n ∑ ( X − X )2
standard error of the regression coefficient (simple linear regression)
sb1 = s y , x /
∑(X − X )
2
t statistic for hypothesis testing:
t=
b1
sb1
prediction interval (simple linear regression)
Yˆ ± ts f
Related documents