* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 16
Survey
Document related concepts
Transcript
Univariate Linear Regression Problem • • • • Model: Y=b0+b1X+e Test: H0: β1=0. Alternative: H1: β1>0. The distribution of Y is normal under both null and alternative. • Under null, var(Y)=σ02. • Under alternative, β1>0, and var(Y)=σ12. Step 1: Choose the test statistic and specify its null distribution • Use conditions of the null to find: bˆ1 ~ N (0, n (x i 1 i 2 0 xn ) ). 2 Bringing sample size into regression design • The sample size n is hidden in the regression results. That is, let: n (x i 1 i x n ) n . 2 2 X Step 2: Define the critical value • For the univariate linear regression test: CV 0 | z | 0 ( 0 / X ) 0 | z | . 2 n n X Step 3: Define the Rejection Rule • Each test is a right sided test, and so the rule is to reject when the test statistic is greater than the critical value. Step 4: Specify the Distribution of Test Statistic under Alternative • Use conditions of the null to find: bˆ1 ~ N ( E1 , 12 / X2 n ). Step 5: Define a Type II Error • For the univariate linear regression test: ( 0 / X ) ˆ b1 CV 0 | z | . n Step 6: Find β • For a univariate linear regression test: ( 0 / X ) (0 | z | E1 ) ˆ ˆ ( b1 E ( b1 )) n b Pr1{ }. ( 1 / X ) ( bˆ1 ) n Basic Insight • Notice that all three problems have the same basic structure. • That is, if you understand the solution of the one sample test, then you can derive the answer to the other problems. Step 7: Phrase requirement on β • For example, we seek to “choose n so that β=0.01.” • That is, “choose n so that Pr1{Accept H0}=β=0.01. Step 7: Phrase requirement on β • For example, we seek to “choose n so that 0 / X (0 | z | ˆ ˆ ( b1 E ( b1 )) n Pr1{ 1 / X ( bˆ1 ) n E1 ) } b. Step 7: Phrase requirement on β • Notice the parallel phrasing: Pr{Z | z b |} b . Step 7: Phrase requirement on β • That is, “choose n so that (note that E0=0): ( E0 | z | 0 / X n 1 / X n E1 ) | zb | . Step 7: Phrase requirement on β • That is, choose n so that (after algebraic clearing out): 0 1 ( E1 E0 ) n | z | | zb | . X X Step 8: State the conclusion • The result for a left sided test has to be worked through but is similar. You must remember to keep all entries positive. This is reasonable if both α and β are constrained to be less than or equal to 0.5. The restriction is not a hardship in practice. Univariate Linear Regression • Note that the σ0 factor is changed to σ0/σX. • There is a similar adjustment for the alternative standard deviation. Example Problem Group • Two hundred values of an independent variable xi are chosen so that Σ(xi-xbar)2 is equal to 400,000. For each setting of xi, the random variable Yi=β0+β1xi+σZi is observed. Here β0 and β1 are fixed but unknown parameters, σ=400, and the Zi are independent standard normal random variables. Example Problem Group • The null hypothesis to be tested is H0: β1=0, α=0.01, and the alternative is H1: β1<0. The random variable B1 is the OLS estimate of β1. Example Question 1 • When H0 is true, what is the standard deviation of B1, the OLS estimate of the slope? • Var(B1)=σ2/Σ(xi-xbar)2=4002/400,000=0.4. • sd(B1)=0.632. Example Question 2 • What is the probability of a Type II error in the test specified in the common section using B1, the OLS estimator of the slope, as test statistic when β1=-4, α=0.01, σ=400, and Σ(xi-xbar)2 is equal to 400,000? Solution to Question 2 • The critical value is 0-2.326(0.632)=-1.47 • A Type II error occurs when B1>-1.47. • Under alternative B1 is normal with expected value -4 and standard deviation (error) 0.632. • Pr{B1>-1.47}=Pr{Z>(-1.47-(-4))/0.632} =Pr{Z>4.00}=.000032 • The answer is 0.000032. Example Question 3 • How many observations n are necessary so that the probability of a Type II error in the test specified in the common section when β1=-4, α=0.01, σ=400, and Σ(xi-xbarn)2 is equal to 2,000n? Outline of Solution to Problem 3 • • • • • • For σo term, use (4002/2000)0.5=8.94. Use same value for σ1 term. Use |z0.01|=2.326. Use |E1-E0|=|-4-0|=4. Square root of sample size is 10.39. Sample size is 109 or more. Chapter 21: Residual Analysis • If the assumptions in regression are violated: – Residuals are one way of checking model: Ri = Yi - Fitted value at xi Checking the Assumptions – Check for normality (test of normality, histogram, q-q plots) – Check variance if it is the same for all values of the independent variable (plot residuals against predicted values) – Check independence (plot residuals against sequence variable) – Check for linearity (plot dependent variable against independent variable) Residual Plots • Plot residuals against independent variable. – Plot should be flat indicating the same variance. – There should be no fanning out pattern. – Check for influential observations. • Plot residuals against predicted variable. – For univariate regression this is the same as the above plot. There should be no pattern. What to do if problem? • Can look for transformations of either independent or dependent variable or both. • Using computer this is easy: compute option from menu bar. Influential Points • An easier way to look for points that have a large impact on the slope is to plot the change in slope against an arbitrary case sequence number. Example • Data set in the web page • aim: predict final exam score from midterm score • dependent variable: final exam score • independent variable: midterm score • model, check assumptions, predict 700 final examination score 600 500 400 300 200 0 score on first exam 100 200 300 Output • Model: Y= b0 + b1 X + e • R2 = 0.508 • F statistics=60.91, Significance=0.0 b1=1.391117, t statistic=7.805, Significance=0.0 b0=238.95, t statistic=8.329, Significance=0.0 200 100 Residual 0 -100 -200 300 Predicted Value 400 500 600 14 12 10 8 6 4 2 Std. Dev = 66.68 Mean = 0.0 0 N = 61.00 -160.0 -120.0 -80.0 -40.0 0.0 40.0 80.0 120.0 -140.0 -100.0 -60.0 -20.0 20.0 60.0 100.0 Residual Normal Q-Q Plot of Residual 3 Expected Normal Value 2 1 0 -1 -2 -3 -200 -100 Observed Value 0 100 200 Next Class • Multiple Regression! • Check web site for your data file