Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Question) A certain brand of lightbulb is expected to last 1000 hours and a classroom has 5 of these lightbulbs. If all of these lightbulbs are operating at the moment, compute the probability that at least one of them fails within the next 1000 hours. X : lifetime of a lightbulb ~Expo (λ = 1/1000) P(X≤1000) = 1 – e-1000/1000 = 0.632 Y: number of lightbulbs that fails withing the next 1000 hours ~Binom(n=5,k,p=0.632) 5 P(Y≥1) = 1 – P(Y=0) = 1 – (1-0.632) = 0.993 Question) You are given the following data set, where Y is the response variable and X1 and X2 are the explanatory variables. The correlation coefficient between Y and X1 is 0.152, the correlation coefficient between Y and X2 is 0.932, and the correlation coefficient between X1 and X2 is 0.363. The standard deviation of X1 is 2.52, standard deviation of X2 is 6.00 and the standard deviation of Y is 22.23. The mean of X1 is 8.3, the mean of X2 is 17.15 and the mean of Y is 44.4. The F statistic value for the regression of Y against X1 and X2 is 84.62. (a) Compute correlation coefficient between Y and the estimated values of Y 2 SST = (22.23) x 19 = SSR + SSE F = 84.62 = MSR / MSE = (SSR/2) / (SSE/17) ! Solve with the above equation for SSR and SSE 2 SSR / SST = R ! Multiple-R = R (b) Compute coefficient of determination of the regression model (c) Compute standard error of the estimates of the regression model Question) You are given a data set (which is partially shown below), where you attempt to explain Y as a linear function of X1, X2, X3 and X4. The Excel output for the related analysis is given below. a) Write down the regression equation. Y^ = 24.11 + 17.64X1 -53.62X2 + 6.31X3 + 0.378X4 b) Interpret the coefficient of X1. If the value of X1 increases by 1, we expect the value of Y to increase by 17.64. c) What is the coefficient of determination for this model? Interpret. 85.67% of the total variation in the response is predicted by the regression model. d) Conduct an overall significance test. Clearly write down the null and alternative hypotheses and test it at the 0.05 level of significance. H0: β1= β2= β3= β4=0 H1: at least one coeff. is different than 0 Since 4.51E-81 is less than 0.05 we reject H0. e) Identify which variables significantly affect the value of the response variable at a significance level of 0.05. Write down one of the null and alternative hypotheses to demonstrate. f) What does the [16.286, 19.006] interval mean in the output (check the X1 row)? Interpret. It is the 99% confidence interval for the population coeff. of X1. β1 is between these limits with 0.95 probability. g) Briefly explain how you would go about eliminating the insignificant variables. Question) Assume that you are given a cross-sectional data set (which is partially shown below) where Y is the response variable and the X is the explanatory variable. The scatter plot of Y versus X is also given. Scatterplot of Y vs X 400 Y 300 200 100 0 0 20 40 60 X The Excel output for the simple linear regression run is: 80 100 Residuals Versus the Fitted Values Normal Probability Plot of the Residuals (response is Y) (response is Y) 99.9 100 99 50 80 70 60 50 40 30 20 Residual Percent 95 90 10 0 -50 5 1 0.1 -100 -100 -50 0 Residual 50 100 150 100 150 200 250 Fitted Value 300 350 Histogram of the Residuals (response is Y) 20 Frequency 15 Comment on the validity of this regression model. What would you do to improve the model. Explain. Assumptions: 1) Linearity – checked from scatter or residual 10 plot 2) Constant variance of residuals – violated 5 since residual variance increases ! take the LN of response and rerun regression 0 -80 -40 0 40 80 Residual 3) Normality of residuals – checked from normal probability plot 4) Independence of residuals – no problem since cross-sectional data Bayes’ Theorem P ( B A) P ( A) P( A B) = P ( B A) P ( A) + P( B Ac ) P ( Ac ) sample mean ∑X X= n sample standard deviation ∑(X − X ) s= 2 n −1 sample correlation coefficient r= n∑ XY − ( ∑ X )(∑ Y ) n∑ X 2 − (∑ X ) n ∑ Y 2 − (∑ Y ) 2 2 binomial probability distribution P( X = k ) = n! p k (1 − p) n − k k !(n − k )! and E ( X ) = np n: number of trials, k: number of succeses poisson probability distribution P( X = k ) = e− λ λ k k! E( X ) = λ and exponential distribution P( X ≤ k ) = 1 − e −λ k and λ : mean number of events per unit time least squares method (simple linear regression) b1 = ∑ ( X − X )(Y − Y ) ∑(X − X ) 2 b0 = Y − b1 X standard error of the estimate s y,x = ∑ (Y − Yˆ )2 n−2 sY , Xs = (simple linear regression) ∑ (Y − Yˆ ) n − k −1 2 (multiple linear regression) standard error of the forecast (simple linear regression) s f = sy, x 1 + 1 ( X − X )2 + n ∑ ( X − X )2 standard error of the regression coefficient (simple linear regression) sb1 = s y , x / ∑(X − X ) 2 t statistic for hypothesis testing: t= b1 sb1 prediction interval (simple linear regression) Yˆ ± ts f